Programming Python OReilly 4th Ed. Ed

User Manual: OReilly-Programming-Python-4th-ed.

Open the PDF directly: View PDF .
Page Count: 1628 [warning: Documents this large are best viewed by clicking the View PDF Link!]

Table of Contents
Preface
Part I. The Beginning
- Chapter 1. A Sneak Preview
Part II. System Programming
Part III. GUI Programming
Part IV. Internet Programming
Part V. Tools and Techniques
Part VI. The End
- Chapter 21. Conclusion: Python and the Development Cycle
Index

Programming Python

FOURTH EDITION

Programming Python

Mark Lutz

Beijing

•

Cambridge

•

Farnham

•

Köln

•

Sebastopol

•

Tokyo

Programming Python, Fourth Edition

by Mark Lutz

Printed in the United States of America.

Published by O’Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472.

O’Reilly books may be purchased for educational, business, or sales promotional use. Online editions

are also available for most titles (http://my.safaribooksonline.com). For more information, contact our

corporate/institutional sales department: (800) 998-9938 or corporate@oreilly.com.

Editor: Julie Steele

Production Editor: Teresa Elsey

Proofreader: Teresa Elsey

Indexer: Lucie Haskins

Cover Designer: Karen Montgomery

Interior Designer: David Futato

Illustrator: Robert Romano

Printing History:

October 1996: First Edition.

March 2001: Second Edition.

August 2006: Third Edition.

December 2010: Fourth Edition.

Nutshell Handbook, the Nutshell Handbook logo, and the O’Reilly logo are registered trademarks of

O’Reilly Media, Inc. Programming Python, the image of an African rock python, and related trade dress

are trademarks of O’Reilly Media, Inc.

Many of the designations used by manufacturers and sellers to distinguish their products are claimed as

trademarks. Where those designations appear in this book, and O’Reilly Media, Inc., was aware of a

trademark claim, the designations have been printed in caps or initial caps.

While every precaution has been taken in the preparation of this book, the publisher and author assume

no responsibility for errors or omissions, or for damages resulting from the use of the information con-

tained herein.

ISBN: 978-0-596-15810-1

[QG]

1292258056

Table of Contents

Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxiii

Part I. The Beginning

1. A Sneak Preview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

“Programming Python: The Short Story” 3

The Task 4

Step 1: Representing Records 4

Using Lists 4

Using Dictionaries 9

Step 2: Storing Records Persistently 14

Using Formatted Files 14

Using Pickle Files 19

Using Per-Record Pickle Files 22

Using Shelves 23

Step 3: Stepping Up to OOP 26

Using Classes 27

Adding Behavior 29

Adding Inheritance 29

Refactoring Code 31

Adding Persistence 34

Other Database Options 36

Step 4: Adding Console Interaction 37

A Console Shelve Interface 37

Step 5: Adding a GUI 40

GUI Basics 40

Using OOP for GUIs 42

Getting Input from a User 44

A GUI Shelve Interface 46

Step 6: Adding a Web Interface 52

CGI Basics 52

Running a Web Server 55

Using Query Strings and urllib 57

Formatting Reply Text 59

A Web-Based Shelve Interface 60

The End of the Demo 69

Part II. System Programming

2. System Tools .......................................................... 73

“The os.path to Knowledge” 73

Why Python Here? 73

The Next Five Chapters 74

System Scripting Overview 75

Python System Modules 76

Module Documentation Sources 77

Paging Documentation Strings 78

A Custom Paging Script 79

String Method Basics 80

Other String Concepts in Python 3.X: Unicode and bytes 82

File Operation Basics 83

Using Programs in Two Ways 84

Python Library Manuals 85

Commercially Published References 86

Introducing the sys Module 86

Platforms and Versions 86

The Module Search Path 87

The Loaded Modules Table 88

Exception Details 89

Other sys Module Exports 90

Introducing the os Module 90

Tools in the os Module 90

Administrative Tools 91

Portability Constants 92

Common os.path Tools 92

Running Shell Commands from Scripts 94

Other os Module Exports 100

3. Script Execution Context . .............................................. 103

“I’d Like to Have an Argument, Please” 103

Current Working Directory 104

CWD, Files, and Import Paths 104

CWD and Command Lines 106

vi | Table of Contents

Command-Line Arguments 106

Parsing Command-Line Arguments 107

Shell Environment Variables 109

Fetching Shell Variables 110

Changing Shell Variables 111

Shell Variable Fine Points: Parents, putenv, and getenv 112

Standard Streams 113

Redirecting Streams to Files and Programs 114

Redirected Streams and User Interaction 119

Redirecting Streams to Python Objects 123

The io.StringIO and io.BytesIO Utility Classes 126

Capturing the stderr Stream 127

Redirection Syntax in Print Calls 127

Other Redirection Options: os.popen and subprocess Revisited 128

4. File and Directory Tools . ............................................... 135

“Erase Your Hard Drive in Five Easy Steps!” 135

File Tools 135

The File Object Model in Python 3.X 136

Using Built-in File Objects 137

Binary and Text Files 146

Lower-Level File Tools in the os Module 155

File Scanners 160

Directory Tools 163

Walking One Directory 164

Walking Directory Trees 168

Handling Unicode Filenames in 3.X: listdir, walk, glob 172

5. Parallel System Tools . ................................................. 177

“Telling the Monkeys What to Do” 177

Forking Processes 179

The fork/exec Combination 182

Threads 186

The _thread Module 189

The threading Module 199

The queue Module 204

Preview: GUIs and Threads 208

More on the Global Interpreter Lock

Although it’s a lower-level topic than you generally need to do useful thread work in

Python, the implementation of Python’s threads can have impacts on both performance

and coding. This section summarizes implementation details and some of their

ramifications.

Threads implementation in the upcoming Python 3.2: This section de-

scribes the current implementation of threads up to and including Py-

thon 3.1. At this writing, Python 3.2 is still in development, but one of

its likely enhancements is a new version of the GIL that provides better

performance, especially on some multicore CPUs. The new GIL imple-

mentation will still synchronize access to the PVM (Python language

code is still multiplexed as before), but it will use a context switching

scheme that is more efficient than the current N-bytecode-instruction

approach.

Among other things, the current sys.setcheckinterval call will likely

be replaced with a timer duration call in the new scheme. Specifically,

the concept of a check interval for thread switches will be abandoned

and replaced by an absolute time duration expressed in seconds. It’s

anticipated that this duration will default to 5 milliseconds, but it will

be tunable through sys.setswitchinterval.

Moreover, there have been a variety of plans made to remove the GIL

altogether (including goals of the Unladen Swallow project being con-

ducted by Google employees), though none have managed to produce

any fruit thus far. Since I cannot predict the future, please see Python

release documents to follow this (well…) thread.

Strictly speaking, Python currently uses the global interpreter lock (GIL) mechanism

introduced at the start of this section, which guarantees that one thread, at most, is

running code within the Python interpreter at any given point in time. In addition, to

make sure that each thread gets a chance to run, the interpreter automatically switches

its attention between threads at regular intervals (in Python 3.1, by releasing and ac-

quiring the lock after a number of bytecode instructions) as well as at the start of long-

running operations (e.g., on some file input/output requests).

This scheme avoids problems that could arise if multiple threads were to update Python

system data at the same time. For instance, if two threads were allowed to simultane-

ously change an object’s reference count, the result might be unpredictable. This

scheme can also have subtle consequences. In this chapter’s threading examples, for

instance, the stdout stream can be corrupted unless each thread’s call to write text is

synchronized with thread locks.

Moreover, even though the GIL prevents more than one Python thread from running

at the same time, it is not enough to ensure thread safety in general, and it does not

Threads | 211

address higher-level synchronization issues at all. For example, as we saw, when more

than one thread might attempt to update the same variable at the same time, the threads

should generally be given exclusive access to the object with locks. Otherwise, it’s not

impossible that thread switches will occur in the middle of an update statement’s

bytecode.

Locks are not strictly required for all shared object access, especially if a single thread

updates an object inspected by other threads. As a rule of thumb, though, you should

generally use locks to synchronize threads whenever update rendezvous are possible

instead of relying on artifacts of the current thread implementation.

The thread switch interval

Some concurrent updates might work without locks if the thread-switch interval is set

high enough to allow each thread to finish without being swapped out. The

sys.setcheckinterval(N) call sets the frequency with which the interpreter checks for

things like thread switches and signal handlers.

This interval defines the number of bytecode instructions before a switch. It does not

need to be reset for most programs, but it can be used to tune thread performance.

Setting higher values means switches happen less often: threads incur less overhead but

they are less responsive to events. Setting lower values makes threads more responsive

to events but increases thread switch overhead.

Atomic operations

Because of the way Python uses the GIL to synchronize threads’ access to the virtual

machine, whole statements are not generally thread-safe, but each bytecode instruction

is. Because of this bytecode indivisibility, some Python language operations are thread-

safe—also called atomic, because they run without interruption—and do not require

the use of locks or queues to avoid concurrent update issues. For instance, as of this

writing, list.append, fetches and some assignments for variables, list items, dictionary

keys, and object attributes, and other operations were still atomic in standard C Python;

others, such as x = x+1 (and any operation in general that reads data, modifies it, and

writes it back) were not.

As mentioned earlier, though, relying on these rules is a bit of a gamble, because they

require a deep understanding of Python internals and may vary per release. Indeed, the

set of atomic operations may be radically changed if a new free-threaded implementa-

tion ever appears. As a rule of thumb, it may be easier to use locks for all access to global

and shared objects than to try to remember which types of access may or may not be

safe across multiple threads.

C API thread considerations

Finally, if you plan to mix Python with C, also see the thread interfaces described in

the Python/C API standard manual. In threaded programs, C extensions must release

212 | Chapter 5: Parallel System Tools

and reacquire the GIL around long-running operations to let the Python language por-

tions of other Python threads run during the wait. Specifically, the long-running C

extension function should release the lock on entry and reacquire it on exit when re-

suming Python code.

Also note that even though the Python code in Python threads cannot truly overlap in

time due to the GIL synchronization, the C-coded portions of threads can. Any number

may be running in parallel, as long as they do work outside the scope of the Python

virtual machine. In fact, C threads may overlap both with other C threads and with

Python language threads run in the virtual machine. Because of this, splitting code off

to C libraries is one way that Python applications can still take advantage of multi-CPU

machines.

Still, it may often be easier to leverage such machines by simply writing Python pro-

grams that fork processes instead of starting threads. The complexity of process and

thread code is similar. For more on C extensions and their threading requirements, see

Chapter 20. In short, Python includes C language tools (including a pair of GIL man-

agement macros) that can be used to wrap long-running operations in C-coded exten-

sions and that allow other Python language threads to run in parallel.

A process-based alternative: multiprocessing (ahead)

By now, you should have a basic grasp of parallel processes and threads, and Python’s

tools that support them. Later in this chapter, we’ll revisit both ideas to study the

multiprocessing module—a standard library tool that seeks to combine the simplicity

and portability of threads with the benefits of processes, by implementing a threading-

like API that runs processes instead of threads. It seeks to address the portability issue

of processes, as well as the multiple-CPU limitations imposed in threads by the GIL,

but it cannot be used as a replacement for forking in some contexts, and it imposes

some constraints that threads do not, which stem from its process-based model (for

instance, mutable object state is not directly shared because objects are copied across

process boundaries, and unpickleable objects such as bound methods cannot be as

freely used).

Because the multiprocessing module also implements tools to simplify tasks such as

inter-process communication and exit status, though, let’s first get a handle on Python’s

support in those domains as well, and explore some more process and thread examples

along the way.

Program Exits

As we’ve seen, unlike C, there is no “main” function in Python. When we run a program,

we simply execute all of the code in the top-level file, from top to bottom (i.e., in the

filename we listed in the command line, clicked in a file explorer, and so on). Scripts

Program Exits | 213

normally exit when Python falls off the end of the file, but we may also call for program

exit explicitly with tools in the sys and os modules.

sys Module Exits

For example, the built-in sys.exit function ends a program when called, and earlier

than normal:

>>> sys.exit(N) # exit with status N, else exits on end of script

Interestingly, this call really just raises the built-in SystemExit exception. Because of

this, we can catch it as usual to intercept early exits and perform cleanup activities; if

uncaught, the interpreter exits as usual. For instance:

C:\...\PP4E\System> python

>>> import sys

>>> try:

... sys.exit() # see also: os._exit, Tk().quit()

... except SystemExit:

... print('ignoring exit')

...

ignoring exit

>>>

Programming tools such as debuggers can make use of this hook to avoid shutting

down. In fact, explicitly raising the built-in SystemExit exception with a Python raise

statement is equivalent to calling sys.exit. More realistically, a try block would catch

the exit exception raised elsewhere in a program; the script in Example 5-15, for in-

stance, exits from within a processing function.

Example 5-15. PP4E\System\Exits\testexit_sys.py

def later():

import sys

print('Bye sys world')

sys.exit(42)

print('Never reached')

if __name__ == '__main__': later()

Running this program as a script causes it to exit before the interpreter falls off the end

of the file. But because sys.exit raises a Python exception, importers of its function

can trap and override its exit exception or specify a finally cleanup block to be run

during program exit processing:

C:\...\PP4E\System\Exits> python testexit_sys.py

Bye sys world

C:\...\PP4E\System\Exits> python

>>> from testexit_sys import later

>>> try:

... later()

... except SystemExit:

214 | Chapter 5: Parallel System Tools

... print('Ignored...')

...

Bye sys world

Ignored...

>>> try:

... later()

... finally:

... print('Cleanup')

...

Bye sys world

Cleanup

C:\...\PP4E\System\Exits> # interactive session process exits

os Module Exits

It’s possible to exit Python in other ways, too. For instance, within a forked child proc-

ess on Unix, we typically call the os._exit function rather than sys.exit; threads may

exit with a _thread.exit call; and tkinter GUI applications often end by calling some-

thing named Tk().quit(). We’ll meet the tkinter module later in this book; let’s take

a look at os exits here.

On os._exit, the calling process exits immediately instead of raising an exception that

could be trapped and ignored. In fact, the process also exits without flushing output

stream buffers or running cleanup handlers (defined by the atexit standard library

module), so this generally should be used only by child processes after a fork, where

overall program shutdown actions aren’t desired. Example 5-16 illustrates the basics.

Example 5-16. PP4E\System\Exits\testexit_os.py

def outahere():

import os

print('Bye os world')

os._exit(99)

print('Never reached')

if __name__ == '__main__': outahere()

Unlike sys.exit, os._exit is immune to both try/except and try/finally interception:

C:\...\PP4E\System\Exits> python testexit_os.py

Bye os world

C:\...\PP4E\System\Exits> python

>>> from testexit_os import outahere

>>> try:

... outahere()

... except:

... print('Ignored')

...

Bye os world # exits interactive process

C:\...\PP4E\System\Exits> python

Program Exits | 215

>>> from testexit_os import outahere

>>> try:

... outahere()

... finally:

... print('Cleanup')

...

Bye os world # ditto

Shell Command Exit Status Codes

Both the sys and os exit calls we just met accept an argument that denotes the exit

status code of the process (it’s optional in the sys call but required by os). After exit,

this code may be interrogated in shells and by programs that ran the script as a child

process. On Linux, for example, we ask for the status shell variable’s value in order to

fetch the last program’s exit status; by convention, a nonzero status generally indicates

that some sort of problem occurred:

[mark@linux]$ python testexit_sys.py

Bye sys world

[mark@linux]$ echo $status

[mark@linux]$ python testexit_os.py

Bye os world

[mark@linux]$ echo $status

In a chain of command-line programs, exit statuses could be checked along the way as

a simple form of cross-program communication.

We can also grab hold of the exit status of a program run by another script. For instance,

as introduced in Chapters 2 and 3, when launching shell commands, exit status is

provided as:

• The return value of an os.system call

• The return value of the close method of an os.popen object (for historical reasons,

None is returned if the exit status was 0, which means no error occurred)

• A variety of interfaces in the subprocess module (e.g., the call function’s return

value, a Popen object’s returnvalue attribute and wait method result)

In addition, when running programs by forking processes, the exit status is available

through the os.wait and os.waitpid calls in a parent process.

Exit status with os.system and os.popen

Let’s look at the case of the shell commands first—the following, run on Linux, spawns

Example 5-15, and Example 5-16 reads the output streams through pipes and fetches

their exit status codes:

[mark@linux]$ python

>>> import os

>>> pipe = os.popen('python testexit_sys.py')

216 | Chapter 5: Parallel System Tools

>>> pipe.read()

'Bye sys world\012'

>>> stat = pipe.close() # returns exit code

>>> stat

10752

>>> hex(stat)

'0x2a00'

>>> stat >> 8 # extract status from bitmask on Unix-likes

>>> pipe = os.popen('python testexit_os.py')

>>> stat = pipe.close()

>>> stat, stat >> 8

(25344, 99)

This code works the same under Cygwin Python on Windows. When using os.popen

on such Unix-like platforms, for reasons we won’t go into here, the exit status is actually

packed into specific bit positions of the return value; it’s really there, but we need to

shift the result right by eight bits to see it. Commands run with os.system send their

statuses back directly through the Python library call:

>>> stat = os.system('python testexit_sys.py')

Bye sys world

>>> stat, stat >> 8

(10752, 42)

>>> stat = os.system('python testexit_os.py')

Bye os world

>>> stat, stat >> 8

(25344, 99)

All of this code works under the standard version of Python for Windows, too, though

exit status is not encoded in a bit mask (test sys.platform if your code must handle

both formats):

C:\...\PP4E\System\Exits> python

>>> os.system('python testexit_sys.py')

Bye sys world

>>> os.system('python testexit_os.py')

Bye os world

>>> pipe = os.popen('python testexit_sys.py')

>>> pipe.read()

'Bye sys world\n'

>>> pipe.close()

>>>

>>> os.popen('python testexit_os.py').close()

Program Exits | 217

Output stream buffering: A first look

Notice that the last test in the preceding code didn’t attempt to read the command’s

output pipe. If we do, we may have to run the target script in unbuffered mode with the

-u Python command-line flag or change the script to flush its output manually with

sys.stdout.flush. Otherwise, the text printed to the standard output stream might not

be flushed from its buffer when os._exit is called in this case for immediate shutdown.

By default, standard output is fully buffered when connected to a pipe like this; it’s only

line-buffered when connected to a terminal:

>>> pipe = os.popen('python testexit_os.py')

>>> pipe.read() # streams not flushed on exit

>>> pipe = os.popen('python -u testexit_os.py') # force unbuffered streams

>>> pipe.read()

'Bye os world\n'

Confusingly, you can pass mode and buffering argument to specify line buffering in

both os.popen and subprocess.Popen, but this won’t help here—arguments passed to

these tools pertain to the calling process’s input end of the pipe, not to the spawned

program’s output stream:

>>> pipe = os.popen('python testexit_os.py', 'r', 1) # line buffered only

>>> pipe.read() # but my pipe, not program's!

>>> from subprocess import Popen, PIPE

>>> pipe = Popen('python testexit_os.py', bufsize=1, stdout=PIPE) # for my pipe

>>> pipe.stdout.read() # doesn't help

b''

Really, buffering mode arguments in these tools pertain to output the caller writes to

a command’s standard input stream, not to output read from that command.

If required, the spawned script itself can also manually flush its output buffers period-

ically or before forced exits. More on buffering when we discuss the potential for

deadlocks later in this chapter, and again in Chapters 10 and 12 where we’ll see how it

applies to sockets. Since we brought up subprocess, though, let’s turn to its exit tools

next.

Exit status with subprocess

The alternative subprocess module offers exit status in a variety of ways, as we saw in

Chapters 2 and 3 (a None value in returncode indicates that the spawned program has

not yet terminated):

C:\...\PP4E\System\Exits> python

>>> from subprocess import Popen, PIPE, call

>>> pipe = Popen('python testexit_sys.py', stdout=PIPE)

>>> pipe.stdout.read()

b'Bye sys world\r\n'

218 | Chapter 5: Parallel System Tools

>>> pipe.wait()

>>> call('python testexit_sys.py')

Bye sys world

>>> pipe = Popen('python testexit_sys.py', stdout=PIPE)

>>> pipe.communicate()

(b'Bye sys world\r\n', None)

>>> pipe.returncode

The subprocess module works the same on Unix-like platforms like Cygwin, but unlike

os.popen, the exit status is not encoded, and so it matches the Windows result (note

that shell=True is needed to run this as is on Cygwin and Unix-like platforms, as we

learned in Chapter 2; on Windows this argument is required only to run commands

built into the shell, like dir):

[C:\...\PP4E\System\Exits]$ python

>>> from subprocess import Popen, PIPE, call

>>> pipe = Popen('python testexit_sys.py', stdout=PIPE, shell=True)

>>> pipe.stdout.read()

b'Bye sys world\n'

>>> pipe.wait()

>>> call('python testexit_sys.py', shell=True)

Bye sys world

Process Exit Status and Shared State

Now, to learn how to obtain the exit status from forked processes, let’s write a simple

forking program: the script in Example 5-17 forks child processes and prints child

process exit statuses returned by os.wait calls in the parent until a “q” is typed at the

console.

Example 5-17. PP4E\System\Exits\testexit_fork.py

"""

fork child processes to watch exit status with os.wait; fork works on Unix

and Cygwin but not standard Windows Python 3.1; note: spawned threads share

globals, but each forked process has its own copy of them (forks share file

descriptors)--exitstat is always the same here but will vary if for threads;

"""

import os

exitstat = 0

def child(): # could os.exit a script here

global exitstat # change this process's global

exitstat += 1 # exit status to parent's wait

Program Exits | 219

print('Hello from child', os.getpid(), exitstat)

os._exit(exitstat)

print('never reached')

def parent():

while True:

newpid = os.fork() # start a new copy of process

if newpid == 0: # if in copy, run child logic

child() # loop until 'q' console input

else:

pid, status = os.wait()

print('Parent got', pid, status, (status >> 8))

if input() == 'q': break

if __name__ == '__main__': parent()

Running this program on Linux, Unix, or Cygwin (remember, fork still doesn’t work

on standard Windows Python as I write the fourth edition of this book) produces the

following sort of results:

[C:\...\PP4E\System\Exits]$ python testexit_fork.py

Hello from child 5828 1

Parent got 5828 256 1

Hello from child 9540 1

Parent got 9540 256 1

Hello from child 3152 1

Parent got 3152 256 1

If you study this output closely, you’ll notice that the exit status (the last number prin-

ted) is always the same—the number 1. Because forked processes begin life as copies

of the process that created them, they also have copies of global memory. Because of

that, each forked child gets and changes its own exitstat global variable without

changing any other process’s copy of this variable. At the same time, forked processes

copy and thus share file descriptors, which is why prints go to the same place.

Thread Exits and Shared State

In contrast, threads run in parallel within the same process and share global memory.

Each thread in Example 5-18 changes the single shared global variable, exitstat.

Example 5-18. PP4E\System\Exits\testexit_thread.py

"""

spawn threads to watch shared global memory change; threads normally exit

when the function they run returns, but _thread.exit() can be called to

exit calling thread; _thread.exit is the same as sys.exit and raising

SystemExit; threads communicate with possibly locked global vars; caveat:

may need to make print/input calls atomic on some platforms--shared stdout;

"""

220 | Chapter 5: Parallel System Tools

import _thread as thread

exitstat = 0

def child():

global exitstat # process global names

exitstat += 1 # shared by all threads

threadid = thread.get_ident()

print('Hello from child', threadid, exitstat)

thread.exit()

print('never reached')

def parent():

while True:

thread.start_new_thread(child, ())

if input() == 'q': break

if __name__ == '__main__': parent()

The following shows this script in action on Windows; unlike forks, threads run in the

standard version of Python on Windows, too. Thread identifiers created by Python

differ each time—they are arbitrary but unique among all currently active threads and

so may be used as dictionary keys to keep per-thread information (a thread’s id may be

reused after it exits on some platforms):

C:\...\PP4E\System\Exits> python testexit_thread.py

Hello from child 4908 1

Hello from child 4860 2

Hello from child 2752 3

Hello from child 8964 4

Notice how the value of this script’s global exitstat is changed by each thread, because

threads share global memory within the process. In fact, this is often how threads com-

municate in general. Rather than exit status codes, threads assign module-level globals

or change shared mutable objects in-place to signal conditions, and they use thread

module locks and queues to synchronize access to shared items if needed. This script

might need to synchronize, too, if it ever does something more realistic—for global

counter changes, but even print and input may have to be synchronized if they overlap

stream access badly on some platforms. For this simple demo, we forego locks by as-

suming threads won’t mix their operations oddly.

As we’ve learned, a thread normally exits silently when the function it runs returns,

and the function return value is ignored. Optionally, the _thread.exit function can be

called to terminate the calling thread explicitly and silently. This call works almost

exactly like sys.exit (but takes no return status argument), and it works by raising a

SystemExit exception in the calling thread. Because of that, a thread can also prema-

turely end by calling sys.exit or by directly raising SystemExit. Be sure not to call

os._exit within a thread function, though—doing so can have odd results (the last time

Program Exits | 221

I tried, it hung the entire process on my Linux system and killed every thread in the

process on Windows!).

The alternative threading module for threads has no method equivalent to

_thread.exit(), but since all that the latter does is raise a system-exit exception, doing

the same in threading has the same effect—the thread exits immediately and silently,

as in the following sort of code (see testexit-threading.py in the example tree for this

code):

import threading, sys, time

def action():

sys.exit() # or raise SystemExit()

print('not reached')

threading.Thread(target=action).start()

time.sleep(2)

print('Main exit')

On a related note, keep in mind that threads and processes have default lifespan models,

which we explored earlier. By way of review, when child threads are still running, the

two thread modules’ behavior differs—programs on most platforms exit when the pa-

rent thread does under _thread, but not normally under threading unless children are

made daemons. When using processes, children normally outlive their parent. This

different process behavior makes sense if you remember that threads are in-process

function calls, but processes are more independent and autonomous.

When used well, exit status can be used to implement error detection and simple com-

munication protocols in systems composed of command-line scripts. But having said

that, I should underscore that most scripts do simply fall off the end of the source to

exit, and most thread functions simply return; explicit exit calls are generally employed

for exceptional conditions and in limited contexts only. More typically, programs com-

municate with richer tools than integer exit codes; the next section shows how.

Interprocess Communication

As we saw earlier, when scripts spawn threads—tasks that run in parallel within the

program—they can naturally communicate by changing and inspecting names and

objects in shared global memory. This includes both accessible variables and attributes,

as well as referenced mutable objects. As we also saw, some care must be taken to use

locks to synchronize access to shared items that can be updated concurrently. Still,

threads offer a fairly straightforward communication model, and the queue module can

make this nearly automatic for many programs.

Things aren’t quite as simple when scripts start child processes and independent pro-

grams that do not share memory in general. If we limit the kinds of communications

that can happen between programs, many options are available, most of which we’ve

222 | Chapter 5: Parallel System Tools

already seen in this and the prior chapters. For example, the following simple mecha-

nisms can all be interpreted as cross-program communication devices:

• Simple files

• Command-line arguments

• Program exit status codes

• Shell environment variables

• Standard stream redirections

• Stream pipes managed by os.popen and subprocess

For instance, sending command-line options and writing to input streams lets us pass

in program execution parameters; reading program output streams and exit codes gives

us a way to grab a result. Because shell environment variable settings are inherited by

spawned programs, they provide another way to pass context in. And pipes made by

os.popen or subprocess allow even more dynamic communication. Data can be sent

between programs at arbitrary times, not only at program start and exit.

Beyond this set, there are other tools in the Python library for performing Inter-Process

Communication (IPC). This includes sockets, shared memory, signals, anonymous and

named pipes, and more. Some vary in portability, and all vary in complexity and utility.

For instance:

•Signals allow programs to send simple notification events to other programs.

•Anonymous pipes allow threads and related processes that share file descriptors to

pass data, but generally rely on the Unix-like forking model for processes, which

is not universally portable.

•Named pipes are mapped to the system’s filesystem—they allow completely unre-

lated programs to converse, but are not available in Python on all platforms.

•Sockets map to system-wide port numbers—they similarly let us transfer data be-

tween arbitrary programs running on the same computer, but also between pro-

grams located on remote networked machines, and offer a more portable option.

While some of these can be used as communication devices by threads, too, their full

power becomes more evident when leveraged by separate processes which do not share

memory at large.

In this section, we explore directly managed pipes (both anonymous and named), as

well as signals. We also take a first look at sockets here, but largely as a preview; sockets

can be used for IPC on a single machine, but because the larger socket story also involves

their role in networking, we’ll save most of their details until the Internet part of this

book.

Other IPC tools are available to Python programmers (e.g., shared memory as provided

by the mmap module) but are not covered here for lack of space; search the Python

Interprocess Communication | 223

manuals and website for more details on other IPC schemes if you’re looking for some-

thing more specific.

After this section, we’ll also study the multiprocessing module, which offers additional

and portable IPC options as part of its general process-launching API, including shared

memory, and pipes and queues of arbitrary pickled Python objects. For now, let’s study

traditional approaches first.

Anonymous Pipes

Pipes, a cross-program communication device, are implemented by your operating

system and made available in the Python standard library. Pipes are unidirectional

channels that work something like a shared memory buffer, but with an interface re-

sembling a simple file on each of two ends. In typical use, one program writes data on

one end of the pipe, and another reads that data on the other end. Each program sees

only its end of the pipes and processes it using normal Python file calls.

Pipes are much more within the operating system, though. For instance, calls to read

a pipe will normally block the caller until data becomes available (i.e., is sent by the

program on the other end) instead of returning an end-of-file indicator. Moreover, read

calls on a pipe always return the oldest data written to the pipe, resulting in a first-in-

first-out model—the first data written is the first to be read. Because of such properties,

pipes are also a way to synchronize the execution of independent programs.

Pipes come in two flavors—anonymous and named. Named pipes (often called fifos)

are represented by a file on your computer. Because named pipes are really external

files, the communicating processes need not be related at all; in fact, they can be inde-

pendently started programs.

By contrast, anonymous pipes exist only within processes and are typically used in

conjunction with process forks as a way to link parent and spawned child processes

within an application. Parent and child converse over shared pipe file descriptors, which

are inherited by spawned processes. Because threads run in the same process and share

all global memory in general, anonymous pipes apply to them as well.

Anonymous pipe basics

Since they are more traditional, let’s start with a look at anonymous pipes. To illustrate,

the script in Example 5-19 uses the os.fork call to make a copy of the calling process

as usual (we met forks earlier in this chapter). After forking, the original parent process

and its child copy speak through the two ends of a pipe created with os.pipe prior to

the fork. The os.pipe call returns a tuple of two file descriptors—the low-level file iden-

tifiers we met in Chapter 4—representing the input and output sides of the pipe. Be-

cause forked child processes get copies of their parents’ file descriptors, writing to the

pipe’s output descriptor in the child sends data back to the parent on the pipe created

before the child was spawned.

224 | Chapter 5: Parallel System Tools

Example 5-19. PP4E\System\Processes\pipe1.py

import os, time

def child(pipeout):

zzz = 0

while True:

time.sleep(zzz) # make parent wait

msg = ('Spam %03d' % zzz).encode() # pipes are binary bytes

os.write(pipeout, msg) # send to parent

zzz = (zzz+1) % 5 # goto 0 after 4

def parent():

pipein, pipeout = os.pipe() # make 2-ended pipe

if os.fork() == 0: # copy this process

child(pipeout) # in copy, run child

else: # in parent, listen to pipe

while True:

line = os.read(pipein, 32) # blocks until data sent

print('Parent %d got [%s] at %s' % (os.getpid(), line, time.time()))

parent()

If you run this program on Linux, Cygwin, or another Unix-like platform (pipe is avail-

able on standard Windows Python, but fork is not), the parent process waits for the

child to send data on the pipe each time it calls os.read. It’s almost as if the child and

parent act as client and server here—the parent starts the child and waits for it to initiate

communication.# To simulate differing task durations, the child keeps the parent wait-

ing one second longer between messages with time.sleep calls, until the delay has

reached four seconds. When the zzz delay counter hits 005, it rolls back down to 000

and starts again:

[C:\...\PP4E\System\Processes]$ python pipe1.py

Parent 6716 got [b'Spam 000'] at 1267996104.53

Parent 6716 got [b'Spam 001'] at 1267996105.54

Parent 6716 got [b'Spam 002'] at 1267996107.55

Parent 6716 got [b'Spam 003'] at 1267996110.56

Parent 6716 got [b'Spam 004'] at 1267996114.57

Parent 6716 got [b'Spam 000'] at 1267996114.57

Parent 6716 got [b'Spam 001'] at 1267996115.59

Parent 6716 got [b'Spam 002'] at 1267996117.6

Parent 6716 got [b'Spam 003'] at 1267996120.61

Parent 6716 got [b'Spam 004'] at 1267996124.62

Parent 6716 got [b'Spam 000'] at 1267996124.62

#We will clarify the notions of “client” and “server” in the Internet programming part of this book. There,

we’ll communicate with sockets (which we’ll see later in this chapter are roughly like bidirectional pipes for

programs running both across networks and on the same machine), but the overall conversation model is

similar. Named pipes (fifos), described ahead, are also a better match to the client/server model because they

can be accessed by arbitrary, unrelated processes (no forks are required). But as we’ll see, the socket port

model is generally used by most Internet scripting protocols—email, for instance, is mostly just formatted

strings shipped over sockets between programs on standard port numbers reserved for the email protocol.

Interprocess Communication | 225

Parent 6716 got [b'Spam 001'] at 1267996125.63

...etc.: Ctrl-C to exit...

Notice how the parent received a bytes string through the pipe. Raw pipes normally

deal in binary byte strings when their descriptors are used directly this way with the

descriptor-based file tools we met in Chapter 4 (as we saw there, descriptor read and

write tools in os always return and expect byte strings). That’s why we also have to

manually encode to bytes when writing in the child—the string formatting operation

is not available on bytes. As the next section shows, it’s also possible to wrap a pipe

descriptor in a text-mode file object, much as we did in the file examples in Chap-

ter 4, but that object simply performs encoding and decoding automatically on trans-

fers; it’s still bytes in the pipe.

Wrapping pipe descriptors in file objects

If you look closely at the preceding output, you’ll see that when the child’s delay counter

hits 004, the parent ends up reading two messages from the pipe at the same time; the

child wrote two distinct messages, but on some platforms or configurations (other than

that used here) they might be interleaved or processed close enough in time to be fetched

as a single unit by the parent. Really, the parent blindly asks to read, at most, 32 bytes

each time, but it gets back whatever text is available in the pipe, when it becomes

available.

To distinguish messages better, we can mandate a separator character in the pipe. An

end-of-line makes this easy, because we can wrap the pipe descriptor in a file object

with os.fdopen and rely on the file object’s readline method to scan up through the

next \n separator in the pipe. This also lets us leverage the more powerful tools of the

text-mode file object we met in Chapter 4. Example 5-20 implements this scheme for

the parent’s end of the pipe.

Example 5-20. PP4E\System\Processes\pipe2.py

# same as pipe1.py, but wrap pipe input in stdio file object

# to read by line, and close unused pipe fds in both processes

import os, time

def child(pipeout):

zzz = 0

while True:

time.sleep(zzz) # make parent wait

msg = ('Spam %03d\n' % zzz).encode() # pipes are binary in 3.X

os.write(pipeout, msg) # send to parent

zzz = (zzz+1) % 5 # roll to 0 at 5

def parent():

pipein, pipeout = os.pipe() # make 2-ended pipe

if os.fork() == 0: # in child, write to pipe

os.close(pipein) # close input side here

child(pipeout)

226 | Chapter 5: Parallel System Tools

else: # in parent, listen to pipe

os.close(pipeout) # close output side here

pipein = os.fdopen(pipein) # make text mode input file object

while True:

line = pipein.readline()[:-1] # blocks until data sent

print('Parent %d got [%s] at %s' % (os.getpid(), line, time.time()))

parent()

This version has also been augmented to close the unused end of the pipe in each process

(e.g., after the fork, the parent process closes its copy of the output side of the pipe

written by the child); programs should close unused pipe ends in general. Running with

this new version reliably returns a single child message to the parent each time it reads

from the pipe, because they are separated with markers when written:

[C:\...\PP4E\System\Processes]$ python pipe2.py

Parent 8204 got [Spam 000] at 1267997789.33

Parent 8204 got [Spam 001] at 1267997790.03

Parent 8204 got [Spam 002] at 1267997792.05

Parent 8204 got [Spam 003] at 1267997795.06

Parent 8204 got [Spam 004] at 1267997799.07

Parent 8204 got [Spam 000] at 1267997799.07

Parent 8204 got [Spam 001] at 1267997800.08

Parent 8204 got [Spam 002] at 1267997802.09

Parent 8204 got [Spam 003] at 1267997805.1

Parent 8204 got [Spam 004] at 1267997809.11

Parent 8204 got [Spam 000] at 1267997809.11

Parent 8204 got [Spam 001] at 1267997810.13

...etc.: Ctrl-C to exit...

Notice that this version’s reads also return a text data str object now, per the default

r text mode for os.fdopen. As mentioned, pipes normally deal in binary byte strings

when their descriptors are used directly with os file tools, but wrapping in text-mode

files allows us to use str strings to represent text data instead of bytes. In this example,

bytes are decoded to str when read by the parent; using os.fdopen and text mode in

the child would allow us to avoid its manual encoding call, but the file object would

encode the str data anyhow (though the encoding is trivial for ASCII bytes like those

used here). As for simple files, the best mode for processing pipe data in is determined

by its nature.

Anonymous pipes and threads

Although the os.fork call required by the prior section’s examples isn’t available on

standard Windows Python, os.pipe is. Because threads all run in the same process and

share file descriptors (and global memory in general), this makes anonymous pipes

usable as a communication and synchronization device for threads, too. This is an

arguably lower-level mechanism than queues or shared names and objects, but it pro-

vides an additional IPC option for threads. Example 5-21, for instance, demonstrates

the same type of pipe-based communication occurring between threads instead of

processes.

Interprocess Communication | 227

Example 5-21. PP4E\System\Processes\pipe-thread.py

# anonymous pipes and threads, not processes; this version works on Windows

import os, time, threading

def child(pipeout):

zzz = 0

while True:

time.sleep(zzz) # make parent wait

msg = ('Spam %03d' % zzz).encode() # pipes are binary bytes

os.write(pipeout, msg) # send to parent

zzz = (zzz+1) % 5 # goto 0 after 4

def parent(pipein):

while True:

line = os.read(pipein, 32) # blocks until data sent

print('Parent %d got [%s] at %s' % (os.getpid(), line, time.time()))

pipein, pipeout = os.pipe()

threading.Thread(target=child, args=(pipeout,)).start()

parent(pipein)

Since threads work on standard Windows Python, this script does too. The output is

similar here, but the speakers are in-process threads, not processes (note that because

of its simple-minded infinite loops, at least one of its threads may not die on a Ctrl-C—

on Windows you may need to use Task Manager to kill the python.exe process running

this script or close its window to exit):

C:\...\PP4E\System\Processes> pipe-thread.py

Parent 8876 got [b'Spam 000'] at 1268579215.71

Parent 8876 got [b'Spam 001'] at 1268579216.73

Parent 8876 got [b'Spam 002'] at 1268579218.74

Parent 8876 got [b'Spam 003'] at 1268579221.75

Parent 8876 got [b'Spam 004'] at 1268579225.76

Parent 8876 got [b'Spam 000'] at 1268579225.76

Parent 8876 got [b'Spam 001'] at 1268579226.77

Parent 8876 got [b'Spam 002'] at 1268579228.79

...etc.: Ctrl-C or Task Manager to exit...

Bidirectional IPC with anonymous pipes

Pipes normally let data flow in only one direction—one side is input, one is output.

What if you need your programs to talk back and forth, though? For example, one

program might send another a request for information and then wait for that informa-

tion to be sent back. A single pipe can’t generally handle such bidirectional conversa-

tions, but two pipes can. One pipe can be used to pass requests to a program and

another can be used to ship replies back to the requestor.

This really does have real-world applications. For instance, I once added a GUI interface

to a command-line debugger for a C-like programming language by connecting two

processes with pipes this way. The GUI ran as a separate process that constructed and

228 | Chapter 5: Parallel System Tools

sent commands to the non-GUI debugger’s input stream pipe and parsed the results

that showed up in the debugger’s output stream pipe. In effect, the GUI acted like a

programmer typing commands at a keyboard and a client to the debugger server. More

generally, by spawning command-line programs with streams attached by pipes, sys-

tems can add new interfaces to legacy programs. In fact, we’ll see a simple example of

this sort of GUI program structure in Chapter 10.

The module in Example 5-22 demonstrates one way to apply this idea to link the input

and output streams of two programs. Its spawn function forks a new child program and

connects the input and output streams of the parent to the output and input streams

of the child. That is:

• When the parent reads from its standard input, it is reading text sent to the child’s

standard output.

• When the parent writes to its standard output, it is sending data to the child’s

standard input.

The net effect is that the two independent programs communicate by speaking over

their standard streams.

Example 5-22. PP4E\System\Processes\pipes.py

"""

spawn a child process/program, connect my stdin/stdout to child process's

stdout/stdin--my reads and writes map to output and input streams of the

spawned program; much like tying together streams with subprocess module;

"""

import os, sys

def spawn(prog, *args): # pass progname, cmdline args

stdinFd = sys.stdin.fileno() # get descriptors for streams

stdoutFd = sys.stdout.fileno() # normally stdin=0, stdout=1

parentStdin, childStdout = os.pipe() # make two IPC pipe channels

childStdin, parentStdout = os.pipe() # pipe returns (inputfd, outoutfd)

pid = os.fork() # make a copy of this process

if pid:

os.close(childStdout) # in parent process after fork:

os.close(childStdin) # close child ends in parent

os.dup2(parentStdin, stdinFd) # my sys.stdin copy = pipe1[0]

os.dup2(parentStdout, stdoutFd) # my sys.stdout copy = pipe2[1]

else:

os.close(parentStdin) # in child process after fork:

os.close(parentStdout) # close parent ends in child

os.dup2(childStdin, stdinFd) # my sys.stdin copy = pipe2[0]

os.dup2(childStdout, stdoutFd) # my sys.stdout copy = pipe1[1]

args = (prog,) + args

os.execvp(prog, args) # new program in this process

assert False, 'execvp failed!' # os.exec call never returns here

if __name__ == '__main__':

Interprocess Communication | 229

mypid = os.getpid()

spawn('python', 'pipes-testchild.py', 'spam') # fork child program

print('Hello 1 from parent', mypid) # to child's stdin

sys.stdout.flush() # subvert stdio buffering

reply = input() # from child's stdout

sys.stderr.write('Parent got: "%s"\n' % reply) # stderr not tied to pipe!

print('Hello 2 from parent', mypid)

sys.stdout.flush()

reply = sys.stdin.readline()

sys.stderr.write('Parent got: "%s"\n' % reply[:-1])

The spawn function in this module does not work on standard Windows Python (re-

member that fork isn’t yet available there today). In fact, most of the calls in this module

map straight to Unix system calls (and may be arbitrarily terrifying at first glance to

non-Unix developers!). We’ve already met some of these (e.g., os.fork), but much of

this code depends on Unix concepts we don’t have time to address well in this text.

But in simple terms, here is a brief summary of the system calls demonstrated in this

code:

os.fork

Copies the calling process as usual and returns the child’s process ID in the parent

process only.

os.execvp

Overlays a new program in the calling process; it’s just like the os.execlp used

earlier but takes a tuple or list of command-line argument strings (collected with

the *args form in the function header).

os.pipe

Returns a tuple of file descriptors representing the input and output ends of a pipe,

as in earlier examples.

os.close(fd)

Closes the descriptor-based file fd.

os.dup2(fd1,fd2)

Copies all system information associated with the file named by the file descriptor

fd1 to the file named by fd2.

In terms of connecting standard streams, os.dup2 is the real nitty-gritty here. For ex-

ample, the call os.dup2(parentStdin,stdinFd) essentially assigns the parent process’s

stdin file to the input end of one of the two pipes created; all stdin reads will henceforth

come from the pipe. By connecting the other end of this pipe to the child process’s copy

of the stdout stream file with os.dup2(childStdout,stdoutFd), text written by the child

to its sdtdout winds up being routed through the pipe to the parent’s stdin stream. The

effect is reminiscent of the way we tied together streams with the subprocess module

in Chapter 3, but this script is more low-level and less portable.

230 | Chapter 5: Parallel System Tools

To test this utility, the self-test code at the end of the file spawns the program shown

in Example 5-23 in a child process and reads and writes standard streams to converse

with it over two pipes.

Example 5-23. PP4E\System\Processes\pipes-testchild.py

import os, time, sys

mypid = os.getpid()

parentpid = os.getppid()

sys.stderr.write('Child %d of %d got arg: "%s"\n' %

(mypid, parentpid, sys.argv[1]))

for i in range(2):

time.sleep(3) # make parent process wait by sleeping here

recv = input() # stdin tied to pipe: comes from parent's stdout

time.sleep(3)

send = 'Child %d got: [%s]' % (mypid, recv)

print(send) # stdout tied to pipe: goes to parent's stdin

sys.stdout.flush() # make sure it's sent now or else process blocks

The following is our test in action on Cygwin (it’s similar other Unix-like platforms like

Linux); its output is not incredibly impressive to read, but it represents two programs

running independently and shipping data back and forth through a pipe device man-

aged by the operating system. This is even more like a client/server model (if you imag-

ine the child as the server, responding to requests sent from the parent). The text in

square brackets in this output went from the parent process to the child and back to

the parent again, all through pipes connected to standard streams:

[C:\...\PP4E\System\Processes]$ python pipes.py

Child 9228 of 9096 got arg: "spam"

Parent got: "Child 9228 got: [Hello 1 from parent 9096]"

Parent got: "Child 9228 got: [Hello 2 from parent 9096]"

Output stream buffering revisited: Deadlocks and flushes

The two processes of the prior section’s example engage in a simple dialog, but it’s

already enough to illustrate some of the dangers lurking in cross-program communi-

cations. First of all, notice that both programs need to write to stderr to display a

message; their stdout streams are tied to the other program’s input stream. Because

processes share file descriptors, stderr is the same in both parent and child, so status

messages show up in the same place.

More subtly, note that both parent and child call sys.stdout.flush after they print text

to the output stream. Input requests on pipes normally block the caller if no data is

available, but it seems that this shouldn’t be a problem in our example because there

are as many writes as there are reads on the other side of the pipe. By default, though,

sys.stdout is buffered in this context, so the printed text may not actually be transmitted

until some time in the future (when the output buffers fill up). In fact, if the flush calls

are not made, both processes may get stuck on some platforms waiting for input from

the other—input that is sitting in a buffer and is never flushed out over the pipe. They

Interprocess Communication | 231

wind up in a deadlock state, both blocked on input calls waiting for events that never

occur.

Technically, by default stdout is just line-buffered when connected to a terminal, but

it is fully buffered when connected to other devices such as files, sockets, and the pipes

used here. This is why you see a script’s printed text in a shell window immediately as

it is produced, but not until the process exits or its buffer fills when its output stream

is connected to something else.

This output buffering is really a function of the system libraries used to access pipes,

not of the pipes themselves (pipes do queue up output data, but they never hide it from

readers!). In fact, it appears to occur in this example only because we copy the pipe’s

information over to sys.stdout, a built-in file object that uses stream buffering by de-

fault. However, such anomalies can also occur when using other cross-process tools.

In general terms, if your programs engage in a two-way dialog like this, there are a

variety of ways to avoid buffering-related deadlock problems:

•Flushes: As demonstrated in Examples 5-22 and 5-23, manually flushing output

pipe streams by calling the file object flush method is an easy way to force buffers

to be cleared. Use sys.stdout.flush for the output stream used by print.

•Arguments: As introduced earlier in this chapter, the -u Python command-line flag

turns off full buffering for the sys.stdout stream in Python programs. Setting your

PYTHONUNBUFFERED environment variable to a nonempty value is equivalent to pass-

ing this flag but applies to every program run.

•Open modes: It’s possible to use pipes themselves in unbuffered mode. Either use

low-level os module calls to read and write pipe descriptors directly, or pass a buffer

size argument of 0 (for unbuffered) or 1 (for line-buffered) to os.fdopen to disable

buffering in the file object used to wrap the descriptor. You can use open arguments

the same way to control buffering for output to fifo files (described in the next

section). Note that in Python 3.X, fully unbuffered mode is allowed only for binary

mode files, not text.

•Command pipes: As mentioned earlier in this chapter, you can similarly specify

buffering mode arguments for command-line pipes when they are created by

os.popen and subprocess.Popen, but this pertains to the caller’s end of the pipe, not

those of the spawned program. Hence it cannot prevent delayed outputs from the

latter, but can be used for text sent to another program’s input pipe.

•Sockets: As we’ll see later, the socket.makefile call accepts a similar buffering mode

argument for sockets (described later in this chapter and book), but in Python 3.X

this call requires buffering for text-mode access and appears to not support line-

buffered mode (more on this on Chapter 12).

•Tools: For more complex tasks, we can also use higher-level tools that essentially

fool a program into believing it is connected to a terminal. These address programs

232 | Chapter 5: Parallel System Tools

not written in Python, for which neither manual flush calls nor -u are an option.

See “More on Stream Buffering: pty and Pexpect” on page 233.

Thread can avoid blocking a main GUI, too, but really just delegate the problem (the

spawned thread will still be deadlocked). Of the options listed, the first two—manual

flushes and command-line arguments—are often the simplest solutions. In fact, be-

cause it is so useful, the second technique listed above merits a few more words. Try

this: comment-out all the sys.stdout.flush calls in Examples 5-22 and 5-23 (the files

pipes.py and pipes-testchild.py) and change the parent’s spawn call in pipes.py to this

(i.e., add a -u command-line argument):

spawn('python', '-u', 'pipes-testchild.py', 'spam')

Then start the program with a command line like this: python -u pipes.py. It will work

as it did with the manual stdout flush calls, because stdout will be operating in unbuf-

fered mode in both parent and child.

We’ll revisit the effects of unbuffered output streams in Chapter 10, where we’ll code

a simple GUI that displays the output of a non-GUI program by reading it over both a

nonblocking socket and a pipe in a thread. We’ll explore the topic again in more depth

in Chapter 12, where we will redirect standard streams to sockets in more general ways.

Deadlock in general, though, is a bigger problem than we have space to address fully

here. On the other hand, if you know enough that you want to do IPC in Python, you’re

probably already a veteran of the deadlock wars.

Anonymous pipes allow related tasks to communicate but are not directly suited for

independently launched programs. To allow the latter group to converse, we need to

move on to the next section and explore devices that have broader visibility.

More on Stream Buffering: pty and Pexpect

On Unix-like platforms, you may also be able to use the Python pty standard library

module to force another program’s standard output to be unbuffered, especially if it’s

not a Python program and you cannot change its code.

Technically, default buffering for stdout in other programs is determined outside Py-

thon by whether the underlying file descriptor refers to a terminal. This occurs in the

stdio file system library and cannot be controlled by the spawning program. In general,

output to terminals is line buffered, and output to nonterminals (including files, pipes,

and sockets) is fully buffered. This policy is used for efficiency. Files and streams created

within a Python script follow the same defaults, but you can specify buffering policies

in Python’s file creation tools.

The pty module essentially fools the spawned program into thinking it is connected to

a terminal so that only one line is buffered for stdout. The net effect is that each newline

flushes the prior line—typical of interactive programs, and what you need if you wish

to grab each piece of the printed output as it is produced.

Note, however, that the pty module is not required for this role when spawning Python

scripts with pipes: simply use the -u Python command-line flag, pass line-buffered mode

Interprocess Communication | 233

arguments to file creation tools, or manually call sys.stdout.flush() in the spawned

program. The pty module is also not available on all Python platforms today (most

notably, it runs on Cygwin but not the standard Windows Python).

The Pexpect package, a pure-Python equivalent of the Unix expect program, uses pty

to provide additional functionality and to handle interactions that bypass standard

streams (e.g., password inputs). See the Python library manual for more on pty, and

search the Web for Pexpect.

Named Pipes (Fifos)

On some platforms, it is also possible to create a long-lived pipe that exists as a real

named file in the filesystem. Such files are called named pipes (or, sometimes, fifos)

because they behave just like the pipes created by the previous section’s programs.

Because fifos are associated with a real file on your computer, though, they are external

to any particular program—they do not rely on memory shared between tasks, and so

they can be used as an IPC mechanism for threads, processes, and independently

launched programs.

Once a named pipe file is created, clients open it by name and read and write data using

normal file operations. Fifos are unidirectional streams. In typical operation, a server

program reads data from the fifo, and one or more client programs write data to it. In

addition, a set of two fifos can be used to implement bidirectional communication just

as we did for anonymous pipes in the prior section.

Because fifos reside in the filesystem, they are longer-lived than in-process anonymous

pipes and can be accessed by programs started independently. The unnamed, in-

process pipe examples thus far depend on the fact that file descriptors (including pipes)

are copied to child processes’ memory. That makes it difficult to use anonymous pipes

to connect programs started independently. With fifos, pipes are accessed instead by

a filename visible to all programs running on the computer, regardless of any parent/

child process relationships. In fact, like normal files, fifos typically outlive the programs

that access them. Unlike normal files, though, the operating system synchronizes fifo

access, making them ideal for IPC.

Because of their distinctions, fifo pipes are better suited as general IPC mechanisms for

independent client and server programs. For instance, a perpetually running server

program may create and listen for requests on a fifo that can be accessed later by arbi-

trary clients not forked by the server. In a sense, fifos are an alternative to the socket

port interface we’ll meet in the next section. Unlike sockets, though, fifos do not directly

support remote network connections, are not available in standard Windows Python

today, and are accessed using the standard file interface instead of the more unique

socket port numbers and calls we’ll study later.

234 | Chapter 5: Parallel System Tools

Named pipe basics

In Python, named pipe files are created with the os.mkfifo call, which is available today

on Unix-like platforms, including Cygwin’s Python on Windows, but is not currently

available in standard Windows Python. This call creates only the external file, though;

to send and receive data through a fifo, it must be opened and processed as if it were a

standard file.

To illustrate, Example 5-24 is a derivation of the pipe2.py script listed in Exam-

ple 5-20, but rewritten here to use fifos rather than anonymous pipes. Much like

pipe2.py, this script opens the fifo using os.open in the child for low-level byte string

access, but with the open built-in in the parent to treat the pipe as text; in general, either

end may use either technique to treat the pipe’s data as bytes or text.

Example 5-24. PP4E\System\Processes\pipefifo.py

"""

named pipes; os.mkfifo is not available on Windows (without Cygwin);

there is no reason to fork here, since fifo file pipes are external

to processes--shared fds in parent/child processes are irrelevent;

"""

import os, time, sys

fifoname = '/tmp/pipefifo' # must open same name

def child():

pipeout = os.open(fifoname, os.O_WRONLY) # open fifo pipe file as fd

zzz = 0

while True:

time.sleep(zzz)

msg = ('Spam %03d\n' % zzz).encode() # binary as opened here

os.write(pipeout, msg)

zzz = (zzz+1) % 5

def parent():

pipein = open(fifoname, 'r') # open fifo as text file object

while True:

line = pipein.readline()[:-1] # blocks until data sent

print('Parent %d got "%s" at %s' % (os.getpid(), line, time.time()))

if __name__ == '__main__':

if not os.path.exists(fifoname):

os.mkfifo(fifoname) # create a named pipe file

if len(sys.argv) == 1:

parent() # run as parent if no args

else: # else run as child process

child()

Because the fifo exists independently of both parent and child, there’s no reason to fork

here. The child may be started independently of the parent as long as it opens a fifo file

by the same name. Here, for instance, on Cygwin the parent is started in one shell

Interprocess Communication | 235

window and then the child is started in another. Messages start appearing in the parent

window only after the child is started and begins writing messages onto the fifo file:

[C:\...\PP4E\System\Processes] $ python pipefifo.py # parent window

Parent 8324 got "Spam 000" at 1268003696.07

Parent 8324 got "Spam 001" at 1268003697.06

Parent 8324 got "Spam 002" at 1268003699.07

Parent 8324 got "Spam 003" at 1268003702.08

Parent 8324 got "Spam 004" at 1268003706.09

Parent 8324 got "Spam 000" at 1268003706.09

Parent 8324 got "Spam 001" at 1268003707.11

Parent 8324 got "Spam 002" at 1268003709.12

Parent 8324 got "Spam 003" at 1268003712.13

Parent 8324 got "Spam 004" at 1268003716.14

Parent 8324 got "Spam 000" at 1268003716.14

Parent 8324 got "Spam 001" at 1268003717.15

...etc: Ctrl-C to exit...

[C:\...\PP4E\System\Processes]$ file /tmp/pipefifo # child window

/tmp/pipefifo: fifo (named pipe)

[C:\...\PP4E\System\Processes]$ python pipefifo.py -child

...Ctrl-C to exit...

Named pipe use cases

By mapping communication points to a file system entity accessible to all programs run

on a machine, fifos can address a broad range of IPC goals on platforms where they are

supported. For instance, although this section’s example runs independent programs,

named pipes can also be used as an IPC device by both in-process threads and directly

forked related processes, much as we saw for anonymous pipes earlier.

By also supporting unrelated programs, though, fifo files are more widely applicable to

general client/server models. For example, named pipes can make the GUI and

command-line debugger integration I described earlier for anonymous pipes even more

flexible—by using fifo files to connect the GUI to the non-GUI debugger’s streams, the

GUI could be started independently when needed.

Sockets provide similar functionality but also buy us both inherent network awareness

and broader portability to Windows—as the next section explains.

Sockets: A First Look

Sockets, implemented by the Python socket module, are a more general IPC device

than the pipes we’ve seen so far. Sockets let us transfer data between programs running

on the same computer, as well as programs located on remote networked machines.

When used as an IPC mechanism on the same machine, programs connect to sockets

by a machine-global port number and transfer data. When used as a networking con-

nection, programs provide both a machine name and port number to transfer data to

a remotely-running program.

236 | Chapter 5: Parallel System Tools

Socket basics

Although sockets are one of the most commonly used IPC tools, it’s impossible to fully

grasp their API without also seeing its role in networking. Because of that, we’ll defer

most of our socket coverage until we can explore their use in network scripting in

Chapter 12. This section provides a brief introduction and preview, so you can compare

with the prior section’s named pipes (a.k.a. fifos). In short:

• Like fifos, sockets are global across a machine; they do not require shared memory

among threads or processes, and are thus applicable to independent programs.

• Unlike fifos, sockets are identified by port number, not filesystem path name; they

employ a very different nonfile API, though they can be wrapped in a file-like object;

and they are more portable: they work on nearly every Python platform, including

standard Windows Python.

In addition, sockets support networking roles that go beyond both IPC and this chap-

ter’s scope. To illustrate the basics, though, Example 5-25 launches a server and 5

clients in threads running in parallel on the same machine, to communicate over a

socket—because all threads connect to the same port, the server consumes the data

added by each of the clients.

Example 5-25. PP4E\System\Processes\socket_preview.py

"""

sockets for cross-task communication: start threads to communicate over sockets;

independent programs can too, because sockets are system-wide, much like fifos;

see the GUI and Internet parts of the book for more realistic socket use cases;

some socket servers may also need to talk to clients in threads or processes;

sockets pass byte strings, but can be pickled objects or encoded Unicode text;

caveat: prints in threads may need to be synchronized if their output overlaps;

"""

from socket import socket, AF_INET, SOCK_STREAM # portable socket api

port = 50008 # port number identifies socket on machine

host = 'localhost' # server and client run on same local machine here

def server():

sock = socket(AF_INET, SOCK_STREAM) # ip addresses tcp connection

sock.bind(('', port)) # bind to port on this machine

sock.listen(5) # allow up to 5 pending clients

while True:

conn, addr = sock.accept() # wait for client to connect

data = conn.recv(1024) # read bytes data from this client

reply = 'server got: [%s]' % data # conn is a new connected socket

conn.send(reply.encode()) # send bytes reply back to client

def client(name):

sock = socket(AF_INET, SOCK_STREAM)

sock.connect((host, port)) # connect to a socket port

sock.send(name.encode()) # send bytes data to listener

reply = sock.recv(1024) # receive bytes data from listener

Interprocess Communication | 237

sock.close() # up to 1024 bytes in message

print('client got: [%s]' % reply)

if __name__ == '__main__':

from threading import Thread

sthread = Thread(target=server)

sthread.daemon = True # don't wait for server thread

sthread.start() # do wait for children to exit

for i in range(5):

Thread(target=client, args=('client%s' % i,)).start()

Study this script’s code and comments to see how the socket objects’ methods are used

to transfer data. In a nutshell, with this type of socket the server accepts a client con-

nection, which by default blocks until a client requests service, and returns a new socket

connected to the client. Once connected, the client and server transfer byte strings by

using send and receive calls instead of writes and reads, though as we’ll see later in the

book, sockets can be wrapped in file objects much as we did earlier for pipe descriptors.

Also like pipe descriptors, unwrapped sockets deal in binary bytes strings, not text

str; that’s why string formatting results are manually encoded again here.

Here is this script’s output on Windows:

C:\...\PP4E\System\Processes> socket_preview.py

client got: [b"server got: [b'client1']"]

client got: [b"server got: [b'client3']"]

client got: [b"server got: [b'client4']"]

client got: [b"server got: [b'client2']"]

client got: [b"server got: [b'client0']"]

This output isn’t much to look at, but each line reflects data sent from client to server,

and then back again: the server receives a bytes string from a connected client and

echoes it back in a larger reply string. Because all threads run in parallel, the order in

which the clients are served is random on this machine.

Sockets and independent programs

Although sockets work for threads, the shared memory model of threads often allows

them to employ simpler communication devices such as shared names and objects and

queues. Sockets tend to shine brighter when used for IPC by separate processes and

independently launched programs. Example 5-26, for instance, reuses the server

and client functions of the prior example, but runs them in both processes and threads

of independently launched programs.

Example 5-26. PP4E\System\Processes\socket-preview-progs.py

"""

same socket, but talk between independent programs too, not just threads;

server here runs in a process and serves both process and thread clients;

sockets are machine-global, much like fifos: don't require shared memory

"""

from socket_preview import server, client # both use same port number

238 | Chapter 5: Parallel System Tools

import sys, os

from threading import Thread

mode = int(sys.argv[1])

if mode == 1: # run server in this process

server()

elif mode == 2: # run client in this process

client('client:process=%s' % os.getpid())

else: # run 5 client threads in process

for i in range(5):

Thread(target=client, args=('client:thread=%s' % i,)).start()

Let’s run this script on Windows, too (again, this portability is a major advantage of

sockets). First, start the server in a process as an independently launched program in

its own window; this process runs perpetually waiting for clients to request connections

(and as for our prior pipe example you may need to use Task Manager or a window

close to kill the server process eventually):

C:\...\PP4E\System\Processes> socket-preview-progs.py 1

Now, in another window, run a few clients in both processes and thread, by launching

them as independent programs—using 2 as the command-line argument runs a single

client process, but 3 spawns five threads to converse with the server on parallel:

C:\...\PP4E\System\Processes> socket-preview-progs.py 2

client got: [b"server got: [b'client:process=7384']"]

C:\...\PP4E\System\Processes> socket-preview-progs.py 2

client got: [b"server got: [b'client:process=7604']"]

C:\...\PP4E\System\Processes> socket-preview-progs.py 3

client got: [b"server got: [b'client:thread=1']"]

client got: [b"server got: [b'client:thread=2']"]

client got: [b"server got: [b'client:thread=0']"]

client got: [b"server got: [b'client:thread=3']"]

client got: [b"server got: [b'client:thread=4']"]

C:\..\PP4E\System\Processes> socket-preview-progs.py 3

client got: [b"server got: [b'client:thread=3']"]

client got: [b"server got: [b'client:thread=1']"]

client got: [b"server got: [b'client:thread=2']"]

client got: [b"server got: [b'client:thread=4']"]

client got: [b"server got: [b'client:thread=0']"]

C:\...\PP4E\System\Processes> socket-preview-progs.py 2

client got: [b"server got: [b'client:process=6428']"]

Socket use cases

This section’s examples illustrate the basic IPC role of sockets, but this only hints at

their full utility. Despite their seemingly limited byte string nature, higher-order use

cases for sockets are not difficult to imagine. With a little extra work, for instance:

Interprocess Communication | 239

• Arbitrary Python objects like lists and dictionaries (or at least copies of them) can

be transferred over sockets, too, by shipping the serialized byte strings produced

by Python’s pickle module introduced in Chapter 1 and covered in full in

Chapter 17.

• As we’ll see in Chapter 10, the printed output of a simple script can be redirected

to a GUI window, by connecting the script’s output stream to a socket on which

a GUI is listening in nonblocking mode.

• Programs that fetch arbitrary text off the Web might read it as byte strings over

sockets, but manually decode it using encoding names embedded in content-type

headers or tags in the data itself.

• In fact, the entire Internet can be seen as a socket use case—as we’ll see in Chap-

ter 12, at the bottom, email, FTP, and web pages are largely just formatted byte

string messages shipped over sockets.

Plus any other context in which programs exchange data—sockets are a general, port-

able, and flexible tool. For instance, they would provide the same utility as fifos for the

GUI/debugger example used earlier, but would also work in Python on Windows and

would even allow the GUI to connect to a debugger running on a different computer

altogether. As such, they are seen by many as a more powerful IPC tool.

Again, you should consider this section just a preview; because the grander socket story

also entails networking concepts, we’ll defer a more in-depth look at the socket API

until Chapter 12. We’ll also see sockets again briefly in Chapter 10 in the GUI stream

redirection use case listed above, and we’ll explore a variety of additional socket use

cases in the Internet part of this book. In Part IV, for instance, we’ll use sockets to

transfer entire files and write more robust socket servers that spawn threads or processes

to converse with clients to avoid denying connections. For the purposes of this chapter,

let’s move on to one last traditional IPC tool—the signal.

Signals

For lack of a better analogy, signals are a way to poke a stick at a process. Programs

generate signals to trigger a handler for that signal in another process. The operating

system pokes, too—some signals are generated on unusual system events and may kill

the program if not handled. If this sounds a little like raising exceptions in Python, it

should; signals are software-generated events and the cross-process analog of excep-

tions. Unlike exceptions, though, signals are identified by number, are not stacked, and

are really an asynchronous event mechanism outside the scope of the Python interpreter

controlled by the operating system.

In order to make signals available to scripts, Python provides a signal module that

allows Python programs to register Python functions as handlers for signal events. This

module is available on both Unix-like platforms and Windows (though the Windows

version may define fewer kinds of signals to be caught). To illustrate the basic signal

240 | Chapter 5: Parallel System Tools

interface, the script in Example 5-27 installs a Python handler function for the signal

number passed in as a command-line argument.

Example 5-27. PP4E\System\Processes\signal1.py

"""

catch signals in Python; pass signal number N as a command-line arg,

use a "kill -N pid" shell command to send this process a signal; most

signal handlers restored by Python after caught (see network scripting

chapter for SIGCHLD details); on Windows, signal module is available,

but it defines only a few signal types there, and os.kill is missing;

"""

import sys, signal, time

def now(): return time.ctime(time.time()) # current time string

def onSignal(signum, stackframe): # python signal handler

print('Got signal', signum, 'at', now()) # most handlers stay in effect

signum = int(sys.argv[1])

signal.signal(signum, onSignal) # install signal handler

while True: signal.pause() # wait for signals (or: pass)

There are only two signal module calls at work here:

signal.signal

Takes a signal number and function object and installs that function to handle that

signal number when it is raised. Python automatically restores most signal handlers

when signals occur, so there is no need to recall this function within the signal

handler itself to reregister the handler. That is, except for SIGCHLD, a signal handler

remains installed until explicitly reset (e.g., by setting the handler to SIG_DFL to

restore default behavior or to SIG_IGN to ignore the signal). SIGCHLD behavior is

platform specific.

signal.pause

Makes the process sleep until the next signal is caught. A time.sleep call is similar

but doesn’t work with signals on my Linux box; it generates an interrupted system

call error. A busy while True: pass loop here would pause the script, too, but may

squander CPU resources.

Here is what this script looks like running on Cygwin on Windows (it works the same

on other Unix-like platforms like Linux): a signal number to watch for (12) is passed

in on the command line, and the program is made to run in the background with an

& shell operator (available in most Unix-like shells):

[C:\...\PP4E\System\Processes]$ python signal1.py 12 &

[1] 8224

$ ps

PID PPID PGID WINPID TTY UID STIME COMMAND

I 8944 1 8944 8944 con 1004 18:09:54 /usr/bin/bash

8224 7336 8224 10020 con 1004 18:26:47 /usr/local/bin/python

Interprocess Communication | 241

8380 7336 8380 428 con 1004 18:26:50 /usr/bin/ps

$ kill −12 8224

Got signal 12 at Sun Mar 7 18:27:28 2010

$ kill −12 8224

Got signal 12 at Sun Mar 7 18:27:30 2010

$ kill −9 8224

[1]+ Killed python signal1.py 12

Inputs and outputs can be a bit jumbled here because the process prints to the same

screen used to type new shell commands. To send the program a signal, the kill shell

command takes a signal number and a process ID to be signaled (8224); every time a

new kill command sends a signal, the process replies with a message generated by a

Python signal handler function. Signal 9 always kills the process altogether.

The signal module also exports a signal.alarm function for scheduling a SIGALRM signal

to occur at some number of seconds in the future. To trigger and catch timeouts, set

the alarm and install a SIGALRM handler as shown in Example 5-28.

Example 5-28. PP4E\System\Processes\signal2.py

"""

set and catch alarm timeout signals in Python; time.sleep doesn't play

well with alarm (or signal in general in my Linux PC), so we call

signal.pause here to do nothing until a signal is received;

"""

import sys, signal, time

def now(): return time.asctime()

def onSignal(signum, stackframe): # python signal handler

print('Got alarm', signum, 'at', now()) # most handlers stay in effect

while True:

print('Setting at', now())

signal.signal(signal.SIGALRM, onSignal) # install signal handler

signal.alarm(5) # do signal in 5 seconds

signal.pause() # wait for signals

Running this script on Cygwin on Windows causes its onSignal handler function to be

invoked every five seconds:

[C:\...\PP4E\System\Processes]$ python signal2.py

Setting at Sun Mar 7 18:37:10 2010

Got alarm 14 at Sun Mar 7 18:37:15 2010

Setting at Sun Mar 7 18:37:15 2010

Got alarm 14 at Sun Mar 7 18:37:20 2010

Setting at Sun Mar 7 18:37:20 2010

Got alarm 14 at Sun Mar 7 18:37:25 2010

Setting at Sun Mar 7 18:37:25 2010

Got alarm 14 at Sun Mar 7 18:37:30 2010

242 | Chapter 5: Parallel System Tools

Setting at Sun Mar 7 18:37:30 2010

...Ctrl-C to exit...

Generally speaking, signals must be used with cautions not made obvious by the ex-

amples we’ve just seen. For instance, some system calls don’t react well to being inter-

rupted by signals, and only the main thread can install signal handlers and respond to

signals in a multithreaded program.

When used well, though, signals provide an event-based communication mechanism.

They are less powerful than data streams such as pipes, but are sufficient in situations

in which you just need to tell a program that something important has occurred and

don’t need to pass along any details about the event itself. Signals are sometimes also

combined with other IPC tools. For example, an initial signal may inform a program

that a client wishes to communicate over a named pipe—the equivalent of tapping

someone’s shoulder to get their attention before speaking. Most platforms reserve one

or more SIGUSR signal numbers for user-defined events of this sort. Such an integration

structure is sometimes an alternative to running a blocking input call in a spawned

thread.

See also the os.kill(pid, sig) call for sending signals to known processes from within

a Python script on Unix-like platforms, much like the kill shell command used earlier;

the required process ID can be obtained from the os.fork call’s child process ID return

value or from other interfaces. Like os.fork, this call is also available in Cygwin Python,

but not in standard Windows Python. Also watch for the discussion about using signal

handlers to clean up “zombie” processes in Chapter 12.

The multiprocessing Module

Now that you know about IPC alternatives and have had a chance to explore processes,

threads, and both process nonportability and thread GIL limitations, it turns out that

there is another alternative, which aims to provide just the best of both worlds. As

mentioned earlier, Python’s standard library multiprocessing module package allows

scripts to spawn processes using an API very similar to the threading module.

This relatively new package works on both Unix and Windows, unlike low-level process

forks. It supports a process spawning model which is largely platform-neutral, and

provides tools for related goals, such as IPC, including locks, pipes, and queues. In

addition, because it uses processes instead of threads to run code in parallel, it effec-

tively works around the limitations of the thread GIL. Hence, multiprocessing allows

the programmer to leverage the capacity of multiple processors for parallel tasks, while

retaining much of the simplicity and portability of the threading model.

Why multiprocessing?

So why learn yet another parallel processing paradigm and toolkit, when we already

have the threads, processes, and IPC tools like sockets, pipes, and thread queues that

The multiprocessing Module | 243

we’ve already studied? Before we get into the details, I want to begin with a few words

about why you may (or may not) care about this package. In more specific terms,

although this module’s performance may not compete with that of pure threads or

process forks for some applications, this module offers a compelling solution for many:

• Compared to raw process forks, you gain cross-platform portability and powerful

IPC tools.

• Compared to threads, you essentially trade some potential and platform-

dependent extra task start-up time for the ability to run tasks in truly parallel fash-

ion on multi-core or multi-CPU machines.

On the other hand, this module imposes some constraints and tradeoffs that threads

do not:

• Since objects are copied across process boundaries, shared mutable state does not

work as it does for threads—changes in one process are not generally noticed in

the other. Really, freely shared state may be the most compelling reason to use

threads; its absence in this module may prove limiting in some threading contexts.

• Because this module requires pickleability for both its processes on Windows, as

well as some of its IPC tools in general, some coding paradigms are difficult or

nonportable—especially if they use bound methods or pass unpickleable objects

such as sockets to spawned processes.

For instance, common coding patterns with lambda that work for the threading module

cannot be used as process target callables in this module on Windows, because they

cannot be pickled. Similarly, because bound object methods are also not pickleable, a

threaded program may require a more indirect design if it either runs bound methods

in its threads or implements thread exit actions by posting arbitrary callables (possibly

including bound methods) on shared queues. The in-process model of threads supports

such direct lambda and bound method use, but the separate processes of

multiprocessing do not.

In fact we’ll write a thread manager for GUIs in Chapter 10 that relies on queueing

in-process callables this way to implement thread exit actions—the callables are queued

by worker threads, and fetched and dispatched by the main thread. Because the

threaded PyMailGUI program we’ll code in Chapter 14 both uses this manager to queue

bound methods for thread exit actions and runs bound methods as the main action of

a thread itself, it could not be directly translated to the separate process model implied

by multiprocessing.

Without getting into too many details here, to use multiprocessing, PyMailGUI’s ac-

tions might have to be coded as simple functions or complete process subclasses for

pickleability. Worse, they may have to be implemented as simpler action identifiers

dispatched in the main process, if they update either the GUI itself or object state in

general —pickling results in an object copy in the receiving process, not a reference to

the original, and forks on Unix essentially copy an entire process. Updating the state

244 | Chapter 5: Parallel System Tools

of a mutable message cache copied by pickling it to pass to a new process, for example,

has no effect on the original.

The pickleability constraints for process arguments on Windows can limit

multiprocessing’s scope in other contexts as well. For instance, in Chapter 12, we’ll

find that this module doesn’t directly solve the lack of portability for the os.fork call

for traditionally coded socket servers on Windows, because connected sockets are not

pickled correctly when passed into a new process created by this module to converse

with a client. In this context, threads provide a more portable and likely more efficient

solution.

Applications that pass simpler types of messages, of course, may fare better. Message

constraints are easier to accommodate when they are part of an initial process-based

design. Moreover, other tools in this module, such as its managers and shared memory

API, while narrowly focused and not as general as shared thread state, offer additional

mutable state options for some programs.

Fundamentally, though, because multiprocessing is based on separate processes, it

may be best geared for tasks which are relatively independent, do not share mutable

object state freely, and can make do with the message passing and shared memory tools

provided by this module. This includes many applications, but this module is not nec-

essarily a direct replacement for every threaded program, and it is not an alternative to

process forks in all contexts.

To truly understand both this module package’s benefits, as well as its tradeoffs, let’s

turn to a first example and explore this package’s implementation along the way.

The Basics: Processes and Locks

We don’t have space to do full justice to this sophisticated module in this book; see its

coverage in the Python library manual for the full story. But as a brief introduction, by

design most of this module’s interfaces mirror the threading and queue modules we’ve

already met, so they should already seem familiar. For example, the multiprocessing

module’s Process class is intended to mimic the threading module’s Thread class we

met earlier—it allows us to launch a function call in parallel with the calling script;

with this module, though, the function runs in a process instead of a thread. Exam-

ple 5-29 illustrates these basics in action:

Example 5-29. PP4E\System\Processes\multi1.py

"""

multiprocess basics: Process works like threading.Thread, but

runs function call in parallel in a process instead of a thread;

locks can be used to synchronize, e.g. prints on some platforms;

starts new interpreter on windows, forks a new process on unix;

"""

import os

from multiprocessing import Process, Lock

The multiprocessing Module | 245

def whoami(label, lock):

msg = '%s: name:%s, pid:%s'

with lock:

print(msg % (label, __name__, os.getpid()))

if __name__ == '__main__':

lock = Lock()

whoami('function call', lock)

p = Process(target=whoami, args=('spawned child', lock))

p.start()

p.join()

for i in range(5):

Process(target=whoami, args=(('run process %s' % i), lock)).start()

with lock:

print('Main process exit.')

When run, this script first calls a function directly and in-process; then launches a call

to that function in a new process and waits for it to exit; and finally spawns five function

call processes in parallel in a loop—all using an API identical to that of the

threading.Thread model we studied earlier in this chapter. Here’s this script’s output

on Windows; notice how the five child processes spawned at the end of this script

outlive their parent, as is the usual case for processes:

C:\...\PP4E\System\Processes> multi1.py

function call: name:__main__, pid:8752

spawned child: name:__main__, pid:9268

Main process exit.

run process 3: name:__main__, pid:9296

run process 1: name:__main__, pid:8792

run process 4: name:__main__, pid:2224

run process 2: name:__main__, pid:8716

run process 0: name:__main__, pid:6936

Just like the threading.Thread class we met earlier, the multiprocessing.Process object

can either be passed a target with arguments (as done here) or subclassed to redefine

its run action method. Its start method invokes its run method in a new process, and

the default run simply calls the passed-in target. Also like threading, a join method

waits for child process exit, and a Lock object is provided as one of a handful of process

synchronization tools; it’s used here to ensure that prints don’t overlap among pro-

cesses on platforms where this might matter (it may not on Windows).

Implementation and usage rules

Technically, to achieve its portability, this module currently works by selecting from

platform-specific alternatives:

• On Unix, it forks a new child process and invokes the Process object’s run method

in the new child.

246 | Chapter 5: Parallel System Tools

• On Windows, it spawns a new interpreter by using Windows-specific process cre-

ation tools, passing the pickled Process object in to the new process over a pipe,

and starting a “python -c” command line in the new process, which runs a special

Python-coded function in this package that reads and unpickles the Process and

invokes its run method.

We met pickling briefly in Chapter 1, and we will study it further later in this book.

The implementation is a bit more complex than this, and is prone to change over time,

of course, but it’s really quite an amazing trick. While the portable API generally hides

these details from your code, its basic structure can still have subtle impacts on the way

you’re allowed to use it. For instance:

• On Windows, the main process’s logic should generally be nested under a __name__

== __main__ test as done here when using this module, so it can be imported freely

by a new interpreter without side effects. As we’ll learn in more detail in Chap-

ter 17, unpickling classes and functions requires an import of their enclosing mod-

ule, and this is the root of this requirement.

• Moreover, when globals are accessed in child processes on Windows, their values

may not be the same as that in the parent at start time, because their module will

be imported into a new process.

• Also on Windows, all arguments to Process must be pickleable. Because this in-

cludes target, targets should be simple functions so they can be pickled; they can-

not be bound or unbound object methods and cannot be functions created with a

lambda. See pickle in Python’s library manual for more on pickleability rules;

nearly every object type works, but callables like functions and classes must be

importable—they are pickled by name only, and later imported to recreate byte-

code. On Windows, objects with system state, such as connected sockets, won’t

generally work as arguments to a process target either, because they are not

pickleable.

• Similarly, instances of custom Process subclasses must be pickleable on Windows

as well. This includes all their attribute values. Objects available in this package

(e.g., Lock in Example 5-29) are pickleable, and so may be used as both Process

constructor arguments and subclass attributes.

• IPC objects in this package that appear in later examples like Pipe and Queue accept

only pickleable objects, because of their implementation (more on this in the next

section).

• On Unix, although a child process can make use of a shared global item created in

the parent, it’s better to pass the object as an argument to the child process’s con-

structor, both for portability to Windows and to avoid potential problems if such

objects were garbage collected in the parent.

There are additional rules documented in the library manual. In general, though, if you

stick to passing in shared objects to processes and using the synchronization and

The multiprocessing Module | 247

communication tools provided by this package, your code will usually be portable and

correct. Let’s look next at a few of those tools in action.

IPC Tools: Pipes, Shared Memory, and Queues

While the processes created by this package can always communicate using general

system-wide tools like the sockets and fifo files we met earlier, the multiprocessing

module also provides portable message passing tools specifically geared to this purpose

for the processes it spawns:

• Its Pipe object provides an anonymous pipe, which serves as a connection between

two processes. When called, Pipe returns two Connection objects that represent the

ends of the pipe. Pipes are bidirectional by default, and allow arbitrary pickleable

Python objects to be sent and received. On Unix they are implemented internally

today with either a connected socket pair or the os.pipe call we met earlier, and

on Windows with named pipes specific to that platform. Much like the Process

object described earlier, though, the Pipe object’s portable API spares callers from

such things.

• Its Value and Array objects implement shared process/thread-safe memory for

communication between processes. These calls return scalar and array objects

based in the ctypes module and created in shared memory, with access synchron-

ized by default.

• Its Queue object serves as a FIFO list of Python objects, which allows multiple pro-

ducers and consumers. A queue is essentially a pipe with extra locking mechanisms

to coordinate more arbitrary accesses, and inherits the pickleability constraints of

Pipe.

Because these devices are safe to use across multiple processes, they can often serve to

synchronize points of communication and obviate lower-level tools like locks, much

the same as the thread queues we met earlier. As usual, a pipe (or a pair of them) may

be used to implement a request/reply model. Queues support more flexible models; in

fact, a GUI that wishes to avoid the limitations of the GIL might use the

multiprocessing module’s Process and Queue to spawn long-running tasks that post

results, rather than threads. As mentioned, although this may incur extra start-up

overhead on some platforms, unlike threads today, tasks coded this way can be as truly

parallel as the underlying platform allows.

One constraint worth noting here: this package’s pipes (and by proxy, queues) pickle

the objects passed through them, so that they can be reconstructed in the receiving

process (as we’ve seen, on Windows the receiver process may be a fully independent

Python interpreter). Because of that, they do not support unpickleable objects; as sug-

gested earlier, this includes some callables like bound methods and lambda functions

(see file multi-badq.py in the book examples package for a demonstration of code that

violates this constraint). Objects with system state, such as sockets, may fail as well.

248 | Chapter 5: Parallel System Tools

Most other Python object types, including classes and simple functions, work fine on

pipes and queues.

Also keep in mind that because they are pickled, objects transferred this way are effec-

tively copied in the receiving process; direct in-place changes to mutable objects’ state

won’t be noticed in the sender. This makes sense if you remember that this package

runs independent processes with their own memory spaces; state cannot be as freely

shared as in threading, regardless of which IPC tools you use.

multiprocessing pipes

To demonstrate the IPC tools listed above, the next three examples implement three

flavors of communication between parent and child processes. Example 5-30 uses a

simple shared pipe object to send and receive data between parent and child processes.

Example 5-30. PP4E\System\Processes\multi2.py

"""

Use multiprocess anonymous pipes to communicate. Returns 2 connection

object representing ends of the pipe: objects are sent on one end and

received on the other, though pipes are bidirectional by default

"""

import os

from multiprocessing import Process, Pipe

def sender(pipe):

"""

send object to parent on anonymous pipe

"""

pipe.send(['spam'] + [42, 'eggs'])

pipe.close()

def talker(pipe):

"""

send and receive objects on a pipe

"""

pipe.send(dict(name='Bob', spam=42))

reply = pipe.recv()

print('talker got:', reply)

if __name__ == '__main__':

(parentEnd, childEnd) = Pipe()

Process(target=sender, args=(childEnd,)).start() # spawn child with pipe

print('parent got:', parentEnd.recv()) # receive from child

parentEnd.close() # or auto-closed on gc

(parentEnd, childEnd) = Pipe()

child = Process(target=talker, args=(childEnd,))

child.start()

print('parent got:', parentEnd.recv()) # receieve from child

parentEnd.send({x * 2 for x in 'spam'}) # send to child

The multiprocessing Module | 249

child.join() # wait for child exit

print('parent exit')

When run on Windows, here’s this script’s output—one child passes an object to the

parent, and the other both sends and receives on the same pipe:

C:\...\PP4E\System\Processes> multi2.py

parent got: ['spam', 42, 'eggs']

parent got: {'name': 'Bob', 'spam': 42}

talker got: {'ss', 'aa', 'pp', 'mm'}

parent exit

This module’s pipe objects make communication between two processes portable (and

nearly trivial).

Shared memory and globals

Example 5-31 uses shared memory to serve as both inputs and outputs of spawned

processes. To make this work portably, we must create objects defined by the package

and pass them to Process constructors. The last test in this demo (“loop4”) probably

represents the most common use case for shared memory—that of distributing com-

putation work to multiple parallel processes.

Example 5-31. PP4E\System\Processes\multi3.py

"""

Use multiprocess shared memory objects to communicate.

Passed objects are shared, but globals are not on Windows.

Last test here reflects common use case: distributing work.

"""

import os

from multiprocessing import Process, Value, Array

procs = 3

count = 0 # per-process globals, not shared

def showdata(label, val, arr):

"""

print data values in this process

"""

msg = '%-12s: pid:%4s, global:%s, value:%s, array:%s'

print(msg % (label, os.getpid(), count, val.value, list(arr)))

def updater(val, arr):

"""

communicate via shared memory

"""

global count

count += 1 # global count not shared

val.value += 1 # passed in objects are

for i in range(3): arr[i] += 1

if __name__ == '__main__':

250 | Chapter 5: Parallel System Tools

scalar = Value('i', 0) # shared memory: process/thread safe

vector = Array('d', procs) # type codes from ctypes: int, double

# show start value in parent process

showdata('parent start', scalar, vector)

# spawn child, pass in shared memory

p = Process(target=showdata, args=('child ', scalar, vector))

p.start(); p.join()

# pass in shared memory updated in parent, wait for each to finish

# each child sees updates in parent so far for args (but not global)

print('\nloop1 (updates in parent, serial children)...')

for i in range(procs):

count += 1

scalar.value += 1

vector[i] += 1

p = Process(target=showdata, args=(('process %s' % i), scalar, vector))

p.start(); p.join()

# same as prior, but allow children to run in parallel

# all see the last iteration's result because all share objects

print('\nloop2 (updates in parent, parallel children)...')

ps = []

for i in range(procs):

count += 1

scalar.value += 1

vector[i] += 1

p = Process(target=showdata, args=(('process %s' % i), scalar, vector))

p.start()

ps.append(p)

for p in ps: p.join()

# shared memory updated in spawned children, wait for each

print('\nloop3 (updates in serial children)...')

for i in range(procs):

p = Process(target=updater, args=(scalar, vector))

p.start()

p.join()

showdata('parent temp', scalar, vector)

# same, but allow children to update in parallel

ps = []

print('\nloop4 (updates in parallel children)...')

for i in range(procs):

p = Process(target=updater, args=(scalar, vector))

p.start()

ps.append(p)

for p in ps: p.join()

# global count=6 in parent only

The multiprocessing Module | 251

# show final results here # scalar=12: +6 parent, +6 in 6 children

showdata('parent end', scalar, vector) # array[i]=8: +2 parent, +6 in 6 children

The following is this script’s output on Windows. Trace through this and the code to

see how it runs; notice how the changed value of the global variable is not shared by

the spawned processes on Windows, but passed-in Value and Array objects are. The

final output line reflects changes made to shared memory in both the parent and

spawned children—the array’s final values are all 8.0, because they were incremented

twice in the parent, and once in each of six spawned children; the scalar value similarly

reflects changes made by both parent and child; but unlike for threads, the global is

per-process data on Windows:

C:\...\PP4E\System\Processes> multi3.py

parent start: pid:6204, global:0, value:0, array:[0.0, 0.0, 0.0]

child : pid:9660, global:0, value:0, array:[0.0, 0.0, 0.0]

loop1 (updates in parent, serial children)...

process 0 : pid:3900, global:0, value:1, array:[1.0, 0.0, 0.0]

process 1 : pid:5072, global:0, value:2, array:[1.0, 1.0, 0.0]

process 2 : pid:9472, global:0, value:3, array:[1.0, 1.0, 1.0]

loop2 (updates in parent, parallel children)...

process 1 : pid:9468, global:0, value:6, array:[2.0, 2.0, 2.0]

process 2 : pid:9036, global:0, value:6, array:[2.0, 2.0, 2.0]

process 0 : pid:9548, global:0, value:6, array:[2.0, 2.0, 2.0]

loop3 (updates in serial children)...

parent temp : pid:6204, global:6, value:9, array:[5.0, 5.0, 5.0]

loop4 (updates in parallel children)...

parent end : pid:6204, global:6, value:12, array:[8.0, 8.0, 8.0]

If you imagine the last test here run with a much larger array and many more parallel

children, you might begin to sense some of the power of this package for distributing

work.

Queues and subclassing

Finally, besides basic spawning and IPC tools, the multiprocessing module also:

• Allows its Process class to be subclassed to provide structure and state retention

(much like threading.Thread, but for processes).

• Implements a process-safe Queue object which may be shared by any number of

processes for more general communication needs (much like queue.Queue, but for

processes).

Queues support a more flexible multiple client/server model. Example 5-32, for in-

stance, spawns three producer threads to post to a shared queue and repeatedly polls

for results to appear—in much the same fashion that a GUI might collect results in

parallel with the display itself, though here the concurrency is achieved with processes

instead of threads.

252 | Chapter 5: Parallel System Tools

Example 5-32. PP4E\System\Processes\multi4.py

"""

Process class can also be subclassed just like threading.Thread;

Queue works like queue.Queue but for cross-process, not cross-thread

"""

import os, time, queue

from multiprocessing import Process, Queue # process-safe shared queue

# queue is a pipe + locks/semas

class Counter(Process):

label = ' @'

def __init__(self, start, queue): # retain state for use in run

self.state = start

self.post = queue

Process.__init__(self)

def run(self): # run in newprocess on start()

for i in range(3):

time.sleep(1)

self.state += 1

print(self.label ,self.pid, self.state) # self.pid is this child's pid

self.post.put([self.pid, self.state]) # stdout file is shared by all

print(self.label, self.pid, '-')

if __name__ == '__main__':

print('start', os.getpid())

expected = 9

post = Queue()

p = Counter(0, post) # start 3 processes sharing queue

q = Counter(100, post) # children are producers

r = Counter(1000, post)

p.start(); q.start(); r.start()

while expected: # parent consumes data on queue

time.sleep(0.5) # this is essentially like a GUI,

try: # though GUIs often use threads

data = post.get(block=False)

except queue.Empty:

print('no data...')

else:

print('posted:', data)

expected -= 1

p.join(); q.join(); r.join() # must get before join putter

print('finish', os.getpid(), r.exitcode) # exitcode is child exit status

Notice in this code how:

•The time.sleep calls in this code’s producer simulate long-running tasks.

• All four processes share the same output stream; print calls go the same place and

don’t overlap badly on Windows (as we saw earlier, the multiprocessing module

also has a shareable Lock object to synchronize access if required).

The multiprocessing Module | 253

• The exit status of child process is available after they finish in their exitcode

attribute.

When run, the output of the main consumer process traces its queue fetches, and the

(indented) output of spawned child producer processes gives process IDs and state.

C:\...\PP4E\System\Processes> multi4.py

start 6296

no data...

@ 8008 101

posted: [8008, 101]

@ 6068 1

@ 3760 1001

posted: [6068, 1]

@ 8008 102

posted: [3760, 1001]

@ 6068 2

@ 3760 1002

posted: [8008, 102]

@ 8008 103

@ 8008 -

posted: [6068, 2]

@ 6068 3

@ 6068 -

@ 3760 1003

@ 3760 -

posted: [3760, 1002]

posted: [8008, 103]

posted: [6068, 3]

posted: [3760, 1003]

finish 6296 0

If you imagine the “@” lines here as results of long-running operations and the others

as a main GUI thread, the wide relevance of this package may become more apparent.

Starting Independent Programs

As we learned earlier, independent programs generally communicate with system-

global tools such as sockets and the fifo files we studied earlier. Although processes

spawned by multiprocessing can leverage these tools, too, their closer relationship

affords them the host of additional IPC communication devices provided by this

module.

Like threads, multiprocessing is designed to run function calls in parallel, not to start

entirely separate programs directly. Spawned functions might use tools like os.system,

os.popen, and subprocess to start a program if such an operation might block the caller,

but there’s otherwise often no point in starting a process that just starts a program

(you might as well start the program and skip a step). In fact, on Windows,

multiprocessing today uses the same process creation call as subprocess, so there’s little

point in starting two processes to run one.

254 | Chapter 5: Parallel System Tools

It is, however, possible to start new programs in the child processes spawned, using

tools like the os.exec* calls we met earlier—by spawning a process portably with

multiprocessing and overlaying it with a new program this way, we start a new inde-

pendent program, and effectively work around the lack of the os.fork call in standard

Windows Python.

This generally assumes that the new program doesn’t require any resources passed in

by the Process API, of course (once a new program starts, it erases that which was

running), but it offers a portable equivalent to the fork/exec combination on Unix.

Furthermore, programs started this way can still make use of more traditional IPC tools,

such as sockets and fifos, we met earlier in this chapter. Example 5-33 illustrates the

technique.

Example 5-33. PP4E\System\Processes\multi5.py

"Use multiprocessing to start independent programs, os.fork or not"

import os

from multiprocessing import Process

def runprogram(arg):

os.execlp('python', 'python', 'child.py', str(arg))

if __name__ == '__main__':

for i in range(5):

Process(target=runprogram, args=(i,)).start()

print('parent exit')

This script starts 5 instances of the child.py script we wrote in Example 5-4 as inde-

pendent processes, without waiting for them to finish. Here’s this script at work on

Windows, after deleting a superfluous system prompt that shows up arbitrarily in the

middle of its output (it runs the same on Cygwin, but the output is not interleaved

there):

C:\...\PP4E\System\Processes> type child.py

import os, sys

print('Hello from child', os.getpid(), sys.argv[1])

C:\...\PP4E\System\Processes> multi5.py

parent exit

Hello from child 9844 2

Hello from child 8696 4

Hello from child 1840 0

Hello from child 6724 1

Hello from child 9368 3

This technique isn’t possible with threads, because all threads run in the same process;

overlaying it with a new program would kill all its threads. Though this is unlikely to

be as fast as a fork/exec combination on Unix, it at least provides similar and portable

functionality on Windows when required.

The multiprocessing Module | 255

And Much More

Finally, multiprocessing provides many more tools than these examples deploy, in-

cluding condition, event, and semaphore synchronization tools, and local and remote

managers that implement servers for shared object. For instance, Example 5-34 dem-

onstrates its support for pools—spawned children that work in concert on a given task.

Example 5-34. PP4E\System\Processes\multi6.py

"Plus much more: process pools, managers, locks, condition,..."

import os

from multiprocessing import Pool

def powers(x):

#print(os.getpid()) # enable to watch children

return 2 ** x

if __name__ == '__main__':

workers = Pool(processes=5)

results = workers.map(powers, [2]*100)

print(results[:16])

print(results[-2:])

results = workers.map(powers, range(100))

print(results[:16])

print(results[-2:])

When run, Python arranges to delegate portions of the task to workers run in parallel:

C:\...\PP4E\System\Processes> multi6.py

[4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4]

[4, 4]

[1, 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024, 2048, 4096, 8192, 16384, 32768]

[316912650057057350374175801344, 633825300114114700748351602688]

And a little less…

To be fair, besides such additional features and tools, multiprocessing also comes with

additional constraints beyond those we’ve already covered (pickleability, mutable state,

and so on). For example, consider the following sort of code:

def action(arg1, arg2):

print(arg1, arg2)

if __name__ == '__main__':

Process(target=action, args=('spam', 'eggs')).start() # shell waits for child

This works as expected, but if we change the last line to the following it fails on Win-

dows because lambdas are not pickleable (really, not importable):

Process(target=(lambda: action('spam', 'eggs'))).start() # fails!-not pickleable

256 | Chapter 5: Parallel System Tools

This precludes a common coding pattern that uses lambda to add data to calls, which

we’ll use often for callbacks in the GUI part of this book. Moreover, this differs from

the threading module that is the model for this package—calls like the following which

work for threads must be translated to a callable and arguments:

threading.Thread(target=(lambda: action(2, 4))).start() # but lambdas work here

Conversely, some behavior of the threading module is mimicked by multiprocessing,

whether you wish it did or not. Because programs using this package wait for child

processes to end by default, we must mark processes as daemon if we don’t want to block

the shell where the following sort of code is run (technically, parents attempt to ter-

minate daemonic children on exit, which means that the program can exit when only

daemonic children remain, much like threading):

def action(arg1, arg2):

print(arg1, arg2)

time.sleep(5) # normally prevents the parent from exiting

if __name__ == '__main__':

p = Process(target=action, args=('spam', 'eggs'))

p.daemon = True # don't wait for it

p.start()

There’s more on some of these issues in the Python library manual; they are not show-

stoppers by any stretch, but special cases and potential pitfalls to some. We’ll revisit

the lambda and daemon issues in a more realistic context in Chapter 8, where we’ll use

multiprocessing to launch GUI demos independently.

Why multiprocessing? The Conclusion

As this section’s examples suggest, multiprocessing provides a powerful alternative

which aims to combine the portability and much of the utility of threads with the fully

parallel potential of processes and offers additional solutions to IPC, exit status, and

other parallel processing goals.

Hopefully, this section has also given you a better understanding of this module’s

tradeoffs discussed at its beginning. In particular, its separate process model precludes

the freely shared mutable state of threads, and bound methods and lambdas are pro-

hibited by both the pickleability requirements of its IPC pipes and queues, as well as

its process action implementation on Windows. Moreover, its requirement of pickle-

ability for process arguments on Windows also precludes it as an option for conversing

with clients in socket servers portably.

While not a replacement for threading in all applications, though, multiprocessing

offers compelling solutions for many. Especially for parallel-programming tasks which

can be designed to avoid its limitations, this module can offer both performance and

portability that Python’s more direct multitasking tools cannot.

The multiprocessing Module | 257

Unfortunately, beyond this brief introduction, we don’t have space for a more complete

treatment of this module in this book. For more details, refer to the Python library

manual. Here, we turn next to a handful of additional program launching tools and a

wrap up of this chapter.

Other Ways to Start Programs

We’ve seen a variety of ways to launch programs in this book so far—from the os.fork/

exec combination on Unix, to portable shell command-line launchers like os.system,

os.popen, and subprocess, to the portable multiprocessing module options of the last

section. There are still other ways to start programs in the Python standard library,

some of which are more platform neutral or obscure than others. This section wraps

up this chapter with a quick tour through this set.

The os.spawn Calls

The os.spawnv and os.spawnve calls were originally introduced to launch programs on

Windows, much like a fork/exec c a l l c o m b i n a t i o n o n U n i x - l i k e p l a t f o r m s . T o d a y , t h e s e

calls work on both Windows and Unix-like systems, and additional variants have been

added to parrot os.exec.

In recent versions of Python, the portable subprocess module has started to supersede

these calls. In fact, Python’s library manual includes a note stating that this module has

more powerful and equivalent tools and should be preferred to os.spawn calls. More-

over, the newer multiprocessing module can achieve similarly portable results today

when combined with os.exec calls, as we saw earlier. Still, the os.spawn calls continue

to work as advertised and may appear in Python code you encounter.

The os.spawn family of calls execute a program named by a command line in a new

process, on both Windows and Unix-like systems. In basic operation, they are similar

to the fork/exec call combination on Unix and can be used as alternatives to the

system a n d popen c a l l s w e ’ v e a l r e a d y l e a r n e d . I n t h e f o l l o w i n g i n t e r a c t i o n , f o r i n s t a n c e ,

we start a Python program with a command line in two traditional ways (the second

also reads its output):

C:\...\PP4E\System\Processes> python

>>> print(open('makewords.py').read())

print('spam')

print('eggs')

print('ham')

>>> import os

>>> os.system('python makewords.py')

spam

eggs

ham

258 | Chapter 5: Parallel System Tools

>>> result = os.popen('python makewords.py').read()

>>> print(result)

spam

eggs

ham

The equivalent os.spawn calls achieve the same effect, with a slightly more complex call

signature that provides more control over the way the program is launched:

>>> os.spawnv(os.P_WAIT, r'C:\Python31\python', ('python', 'makewords.py'))

spam

eggs

ham

>>> os.spawnl(os.P_NOWAIT, r'C:\Python31\python', 'python', 'makewords.py')

1820

>>> spam

eggs

ham

The spawn calls are also much like forking programs in Unix. They don’t actually copy

the calling process (so shared descriptor operations won’t work), but they can be used

to start a program running completely independent of the calling program, even on

Windows. The script in Example 5-35 makes the similarity to Unix programming pat-

terns more obvious. It launches a program with a fork/exec combination on Unix-like

platforms (including Cygwin), or an os.spawnv call on Windows.

Example 5-35. PP4E\System\Processes\spawnv.py

"""

start up 10 copies of child.py running in parallel;

use spawnv to launch a program on Windows (like fork+exec);

P_OVERLAY replaces, P_DETACH makes child stdout go nowhere;

or use portable subprocess or multiprocessing options today!

"""

import os, sys

for i in range(10):

if sys.platform[:3] == 'win':

pypath = sys.executable

os.spawnv(os.P_NOWAIT, pypath, ('python', 'child.py', str(i)))

else:

pid = os.fork()

if pid != 0:

print('Process %d spawned' % pid)

else:

os.execlp('python', 'python', 'child.py', str(i))

print('Main process exiting.')

To make sense of these examples, you have to understand the arguments being passed

to the spawn calls. In this script, we call os.spawnv with a process mode flag, the full

directory path to the Python interpreter, and a tuple of strings representing the shell

command line with which to start a new program. The path to the Python interpreter

Other Ways to Start Programs | 259

executable program running a script is available as sys.executable. In general, the

process mode flag is taken from these predefined values:

os.P_NOWAIT and os.P_NOWAITO

The spawn functions will return as soon as the new process has been created, with

the process ID as the return value. Available on Unix and Windows.

os.P_WAIT

The spawn functions will not return until the new process has run to completion

and will return the exit code of the process if the run is successful or “-signal” if a

signal kills the process. Available on Unix and Windows.

os.P_DETACH and os.P_OVERLAY

P_DETACH is similar to P_NOWAIT, but the new process is detached from the console

of the calling process. If P_OVERLAY is used, the current program will be replaced

(much like os.exec). Available on Windows.

In fact, there are eight different calls in the spawn family, which all start a program but

vary slightly in their call signatures. In their names, an “l” means you list arguments

individually, “p” means the executable file is looked up on the system path, and “e”

means a dictionary is passed in to provide the shelled environment of the spawned

program: the os.spawnve call, for example, works the same way as os.spawnv but accepts

an extra fourth dictionary argument to specify a different shell environment for the

spawned program (which, by default, inherits all of the parent’s settings):

os.spawnl(mode, path, ...)

os.spawnle(mode, path, ..., env)

os.spawnlp(mode, file, ...) # Unix only

os.spawnlpe(mode, file, ..., env) # Unix only

os.spawnv(mode, path, args)

os.spawnve(mode, path, args, env)

os.spawnvp(mode, file, args) # Unix only

os.spawnvpe(mode, file, args, env) # Unix only

Because these calls mimic the names and call signatures of the os.exec variants, see

earlier in this chapter for more details on the differences between these call forms.

Unlike the os.exec calls, only half of the os.spawn forms—those without system path

checking (and hence without a “p” in their names)—are currently implemented on

Windows. All the process mode flags are supported on Windows, but detach and

overlay modes are not available on Unix. Because this sort of detail may be prone to

change, to verify which are present, be sure to see the library manual or run a dir built-

in function call on the os module after an import.

Here is the script in Example 5-35 at work on Windows, spawning 10 independent

copies of the child.py Python program we met earlier in this chapter:

C:\...\PP4E\System\Processes> type child.py

import os, sys

print('Hello from child', os.getpid(), sys.argv[1])

C:\...\PP4E\System\Processes> python spawnv.py

260 | Chapter 5: Parallel System Tools

Hello from child −583587 0

Hello from child −558199 2

Hello from child −586755 1

Hello from child −562171 3

Main process exiting.

Hello from child −581867 6

Hello from child −588651 5

Hello from child −568247 4

Hello from child −563527 7

Hello from child −543163 9

Hello from child −587083 8

Notice that the copies print their output in random order, and the parent program exits

before all children do; all of these programs are really running in parallel on Windows.

Also observe that the child program’s output shows up in the console box where

spawnv.py was run; when using P_NOWAIT, standard output comes to the parent’s con-

sole, but it seems to go nowhere when using P_DETACH (which is most likely a feature

when spawning GUI programs).

But having shown you this call, I need to again point out that both the subprocess and

multiprocessing modules offer more portable alternatives for spawning programs with

command lines today. In fact, unless os.spawn calls provide unique behavior you can’t

live without (e.g., control of shell window pop ups on Windows), the platform-specific

alternatives code of Example 5-35 can be replaced altogether with the portable multi

processing code in Example 5-33.

The os.startfile call on Windows

Although os.spawn calls may be largely superfluous today, there are other tools that

can still make a strong case for themselves. For instance, the os.system call can be used

on Windows to launch a DOS start command, which opens (i.e., runs) a file inde-

pendently based on its Windows filename associations, as though it were clicked.

os.startfile makes this even simpler in recent Python releases, and it can avoid block-

ing its caller, unlike some other tools.

Using the DOS start command

To understand why, first you need to know how the DOS start command works in

general. Roughly, a DOS command line of the form start command works as if command

were typed in the Windows Run dialog box available in the Start button menu. If

command is a filename, it is opened exactly as if its name was double-clicked in the

Windows Explorer file selector GUI.

For instance, the following three DOS commands automatically start Internet Explorer,

my registered image viewer program, and my sound media player program on the files

named in the commands. Windows simply opens the file with whatever program is

associated to handle filenames of that form. Moreover, all three of these programs run

independently of the DOS console box where the command is typed:

Other Ways to Start Programs | 261

C:\...\PP4E\System\Media> start lp4e-preface-preview.html

C:\...\PP4E\System\Media> start ora-lp4e.jpg

C:\...\PP4E\System\Media> start sousa.au

Because the start command can run any file and command line, there is no reason it

cannot also be used to start an independently running Python program:

C:\...\PP4E\System\Processes> start child.py 1

This works because Python is registered to open names ending in .py when it is installed.

The script child.py is launched independently of the DOS console window even though

we didn’t provide the name or path of the Python interpreter program. Because

child.py simply prints a message and exits, though, the result isn’t exactly satisfying: a

new DOS window pops up to serve as the script’s standard output, and it immediately

goes away when the child exits. To do better, add an input call at the bottom of the

program file to wait for a key press before exiting:

C:\...\PP4E\System\Processes> type child-wait.py

import os, sys

print('Hello from child', os.getpid(), sys.argv[1])

input("Press <Enter>") # don't flash on Windows

C:\...\PP4E\System\Processes> start child-wait.py 2

Now the child’s DOS window pops up and stays up after the start command has

returned. Pressing the Enter key in the pop-up DOS window makes it go away.

Using start in Python scripts

Since we know that Python’s os.system and os.popen can be called by a script to run

any command line that can be typed at a DOS shell prompt, we can also start inde-

pendently running programs from a Python script by simply running a DOS start

command line. For instance:

C:\...\PP4E\System\Media> python

>>> import os

>>> cmd = 'start lp4e-preface-preview.html' # start IE browser

>>> os.system(cmd) # runs independent

The Python os.system calls here start whatever web page browser is registered on your

machine to open .html files (unless these programs are already running). The launched

programs run completely independent of the Python session—when running a DOS

start command, os.system does not wait for the spawned program to exit.

262 | Chapter 5: Parallel System Tools

The os.startfile call

In fact, start is so useful that recent Python releases also include an os.startfile call,

which is essentially the same as spawning a DOS start command with os.system and

works as though the named file were double-clicked. The following calls, for instance,

have a similar effect:

>>> os.startfile('lp-code-readme.txt')

>>> os.system('start lp-code-readme.txt')

Both pop up the text file in Notepad on my Windows computer. Unlike the second of

these calls, though, os.startfile provides no option to wait for the application to close

(the DOS start command’s /WAIT option does) and no way to retrieve the application’s

exit status (returned from os.system).

On recent versions of Windows, the following has a similar effect, too, because the

registry is used at the command line (though this form pauses until the file’s viewer is

closed—like using start /WAIT):

>>> os.system('lp-code-readme.txt') # 'start' is optional today

This is a convenient way to open arbitrary document and media files, but keep in mind

that the os.startfile call works only on Windows, because it uses the Windows reg-

istry to know how to open a file. In fact, there are even more obscure and nonportable

ways to launch programs, including Windows-specific options in the PyWin32 pack-

age, which we’ll finesse here. If you want to be more platform neutral, consider using

one of the other many program launcher tools we’ve seen, such as os.popen or

os.spawnv. Or better yet, write a module to hide the details—as the next and final

section demonstrates.

A Portable Program-Launch Framework

With all of these different ways to start programs on different platforms, it can be

difficult to remember what tools to use in a given situation. Moreover, some of these

tools are called in ways that are complicated and thus easy to forget. Although modules

like subprocess and multiprocessing offer fully portable options today, other tools

sometimes provide more specific behavior that’s better on a given platform; shell win-

dow pop ups on Windows, for example, are often better suppressed.

I write scripts that need to launch Python programs often enough that I eventually wrote

a module to try to hide most of the underlying details. By encapsulating the details in

this module, I’m free to change them to use new tools in the future without breaking

code that relies on them. While I was at it, I made this module smart enough to auto-

matically pick a “best” launch scheme based on the underlying platform. Laziness is

the mother of many a useful module.

A Portable Program-Launch Framework | 263

Example 5-36 collects in a single module many of the techniques we’ve met in this

chapter. It implements an abstract superclass, LaunchMode, which defines what it means

to start a Python program named by a shell command line, but it doesn’t define how.

Instead, its subclasses provide a run method that actually starts a Python program ac-

cording to a given scheme and (optionally) define an announce method to display a

program’s name at startup time.

Example 5-36. PP4E\launchmodes.py

"""

###################################################################################

launch Python programs with command lines and reusable launcher scheme classes;

auto inserts "python" and/or path to Python executable at front of command line;

some of this module may assume 'python' is on your system path (see Launcher.py);

subprocess module would work too, but os.popen() uses it internally, and the goal

is to start a program running independently here, not to connect to its streams;

multiprocessing module also is an option, but this is command-lines, not functions:

doesn't make sense to start a process which would just do one of the options here;

new in this edition: runs script filename path through normpath() to change any

/ to \ for Windows tools where required; fix is inherited by PyEdit and others;

on Windows, / is generally allowed for file opens, but not by all launcher tools;

###################################################################################

"""

import sys, os

pyfile = (sys.platform[:3] == 'win' and 'python.exe') or 'python'

pypath = sys.executable # use sys in newer pys

def fixWindowsPath(cmdline):

"""

change all / to \ in script filename path at front of cmdline;

used only by classes which run tools that require this on Windows;

on other platforms, this does not hurt (e.g., os.system on Unix);

"""

splitline = cmdline.lstrip().split(' ') # split on spaces

fixedpath = os.path.normpath(splitline[0]) # fix forward slashes

return ' '.join([fixedpath] + splitline[1:]) # put it back together

class LaunchMode:

"""

on call to instance, announce label and run command;

subclasses format command lines as required in run();

command should begin with name of the Python script

file to run, and not with "python" or its full path;

"""

def __init__(self, label, command):

self.what = label

self.where = command

def __call__(self): # on call, ex: button press callback

self.announce(self.what)

self.run(self.where) # subclasses must define run()

def announce(self, text): # subclasses may redefine announce()

264 | Chapter 5: Parallel System Tools

print(text) # methods instead of if/elif logic

def run(self, cmdline):

assert False, 'run must be defined'

class System(LaunchMode):

"""

run Python script named in shell command line

caveat: may block caller, unless & added on Unix

"""

def run(self, cmdline):

cmdline = fixWindowsPath(cmdline)

os.system('%s %s' % (pypath, cmdline))

class Popen(LaunchMode):

"""

run shell command line in a new process

caveat: may block caller, since pipe closed too soon

"""

def run(self, cmdline):

cmdline = fixWindowsPath(cmdline)

os.popen(pypath + ' ' + cmdline) # assume nothing to be read

class Fork(LaunchMode):

"""

run command in explicitly created new process

for Unix-like systems only, including cygwin

"""

def run(self, cmdline):

assert hasattr(os, 'fork')

cmdline = cmdline.split() # convert string to list

if os.fork() == 0: # start new child process

os.execvp(pypath, [pyfile] + cmdline) # run new program in child

class Start(LaunchMode):

"""

run command independent of caller

for Windows only: uses filename associations

"""

def run(self, cmdline):

assert sys.platform[:3] == 'win'

cmdline = fixWindowsPath(cmdline)

os.startfile(cmdline)

class StartArgs(LaunchMode):

"""

for Windows only: args may require real start

forward slashes are okay here

"""

def run(self, cmdline):

assert sys.platform[:3] == 'win'

os.system('start ' + cmdline) # may create pop-up window

class Spawn(LaunchMode):

"""

run python in new process independent of caller

A Portable Program-Launch Framework | 265

for Windows or Unix; use P_NOWAIT for dos box;

forward slashes are okay here

"""

def run(self, cmdline):

os.spawnv(os.P_DETACH, pypath, (pyfile, cmdline))

class Top_level(LaunchMode):

"""

run in new window, same process

tbd: requires GUI class info too

"""

def run(self, cmdline):

assert False, 'Sorry - mode not yet implemented'

# pick a "best" launcher for this platform

# may need to specialize the choice elsewhere

if sys.platform[:3] == 'win':

PortableLauncher = Spawn

else:

PortableLauncher = Fork

class QuietPortableLauncher(PortableLauncher):

def announce(self, text):

pass

def selftest():

file = 'echo.py'

input('default mode...')

launcher = PortableLauncher(file, file)

launcher() # no block

input('system mode...')

System(file, file)() # blocks

if sys.platform[:3] == 'win':

input('DOS start mode...') # no block

StartArgs(file, file)()

if __name__ == '__main__': selftest()

Near the end of the file, the module picks a default class based on the sys.platform

attribute: PortableLauncher is set to a class that uses spawnv on Windows and one that

uses the fork/exec combination elsewhere; in recent Pythons, we could probably just

use the spawnv scheme on most platforms, but the alternatives in this module are used

in additional contexts. If you import this module and always use its Portable

Launcher attribute, you can forget many of the platform-specific details enumerated in

this chapter.

To run a Python program, simply import the PortableLauncher class, make an instance

by passing a label and command line (without a leading “python” word), and then call

266 | Chapter 5: Parallel System Tools

the instance object as though it were a function. The program is started by a call op-

eration—by its __call__ operator-overloading method, instead of a normally named

method—so that the classes in this module can also be used to generate callback han-

dlers in tkinter-based GUIs. As we’ll see in the upcoming chapters, button-presses in

tkinter invoke a callable object with no arguments; by registering a PortableLauncher

instance to handle the press event, we can automatically start a new program from

another program’s GUI. A GUI might associate a launcher with a GUI’s button press

with code like this:

Button(root, text=name, command=PortableLauncher(name, commandLine))

When run standalone, this module’s selftest function is invoked as usual. As coded,

System blocks the caller until the program exits, but PortableLauncher (really, Spawn or

Fork) and Start do not:

C:\...\PP4E> type echo.py

print('Spam')

input('press Enter')

C:\...\PP4E> python launchmodes.py

default mode...

echo.py

system mode...

echo.py

Spam

press Enter

DOS start mode...

echo.py

As more practical applications, this file is also used in Chapter 8 to launch GUI dialog

demos independently, and again in a number of Chapter 10’s examples, including

PyDemos and PyGadgets—launcher scripts designed to run major examples in this

book in a portable fashion, which live at the top of this book’s examples distribution

directory. Because these launcher scripts simply import PortableLauncher and register

instances to respond to GUI events, they run on both Windows and Unix unchanged

(tkinter’s portability helps, too, of course). The PyGadgets script even customizes

PortableLauncher to update a GUI label at start time:

class Launcher(launchmodes.PortableLauncher): # use wrapped launcher class

def announce(self, text): # customize to set GUI label

Info.config(text=text)

We’ll explore these two client scripts, and others, such as Chapter 11’s PyEdit after we

start coding GUIs in Part III. Partly because of its role in PyEdit, this edition extends

this module to automatically replace forward slashes with backward slashes in the

script’s file path name. PyEdit uses forward slashes in some filenames because they are

allowed in file opens on Windows, but some Windows launcher tools require the

backslash form instead. Specifically, system, popen, and startfile in os require back-

slashes, but spawnv does not. PyEdit and others inherit the new pathname fix of

fixWindowsPath here simply by importing and using this module’s classes; PyEdit

A Portable Program-Launch Framework | 267

eventually changed so as to make this fix irrelevant for its own use case (see Chap-

ter 11), but other clients still acquire the fix for free.

Also notice how some of the classes in this example use the sys.executable path string

to obtain the Python executable’s full path name. This is partly due to their role in user-

friendly demo launchers. In prior versions that predated sys.executable, these classes

instead called two functions exported by a module named Launcher.py to find a suitable

Python executable, regardless of whether the user had added its directory to the system

PATH variable’s setting.

This search is no longer required. Since I’ll describe this module’s other roles in the

next chapter, and since this search has been largely precluded by Python’s perpetual

pandering to programmers’ professional proclivities, I’ll postpone any pointless peda-

gogical presentation here. (Period.)

Other System Tools Coverage

That concludes our tour of Python system tools. In this and the prior three chapters,

we’ve met most of the commonly used system tools in the Python library. Along the

way, we’ve also learned how to use them to do useful things such as start programs,

process directories, and so on. The next chapter wraps up this domain by using the

tools we’ve just met to implement scripts that do useful and more realistic system-level

work.

Still other system-related tools in Python appear later in this text. For instance:

• Sockets, used to communicate with other programs and networks and introduced

briefly here, show up again in Chapter 10 in a common GUI use case and are

covered in full in Chapter 12.

• Select calls, used to multiplex among tasks, are also introduced in Chapter 12 as a

way to implement servers.

• File locking with os.open, introduced in Chapter 4, is discussed again in conjunc-

tion with later examples.

• Regular expressions, string pattern matching used by many text processing tools

in the system administration domain, don’t appear until Chapter 19.

Moreover, things like forks and threads are used extensively in the Internet scripting

chapters: see the discussion of threaded GUIs in Chapters 9 and 10; the server imple-

mentations in Chapter 12; the FTP client GUI in Chapter 13; and the PyMailGUI pro-

gram in Chapter 14. Along the way, we’ll also meet higher-level Python modules, such

as socketserver, which implement fork and thread-based socket server code for us. In

fact, many of the last four chapters’ tools will pop up constantly in later examples in

this book—about what one would expect of general-purpose portable libraries.

268 | Chapter 5: Parallel System Tools

Last, but not necessarily least, I’d like to point out one more time that many additional

tools in the Python library don’t appear in this book at all. With hundreds of library

modules, more appearing all the time, and even more in the third-party domain, Python

book authors have to pick and choose their topics frugally! As always, be sure to browse

the Python library manuals and Web early and often in your Python career.

Other System Tools Coverage | 269

CHAPTER 6

Complete System Programs

“The Greps of Wrath”

This chapter wraps up our look at the system interfaces domain in Python by presenting

a collection of larger Python scripts that do real systems work—comparing and copying

directory trees, splitting files, searching files and directories, testing other programs,

configuring launched programs’ shell environments, and so on. The examples here are

Python system utility programs that illustrate typical tasks and techniques in this do-

main and focus on applying built-in tools, such as file and directory tree processing.

Although the main point of this case-study chapter is to give you a feel for realistic

scripts in action, the size of these examples also gives us an opportunity to see Python’s

support for development paradigms like object-oriented programming (OOP) and re-

use at work. It’s really only in the context of nontrivial programs such as the ones we’ll

meet here that such tools begin to bear tangible fruit. This chapter also emphasizes the

“why” of system tools, not just the “how”; along the way, I’ll point out real-world needs

met by the examples we’ll study, to help you put the details in context.

One note up front: this chapter moves quickly, and a few of its examples are largely

listed just for independent study. Because all the scripts here are heavily documented

and use Python system tools described in the preceding chapters, I won’t go through

all the code in exhaustive detail. You should read the source code listings and experi-

ment with these programs on your own computer to get a better feel for how to combine

system interfaces to accomplish realistic tasks. All are available in source code form in

the book’s examples distribution and most work on all major platforms.

I should also mention that most of these are programs I have really used, not examples

written just for this book. They were coded over a period of years and perform widely

differing tasks, so there is no obvious common thread to connect the dots here other

than need. On the other hand, they help explain why system tools are useful in the first

place, demonstrate larger development concepts that simpler examples cannot, and

bear collective witness to the simplicity and portability of automating system tasks with

Python. Once you’ve mastered the basics, you’ll wish you had done so sooner.

271

A Quick Game of “Find the Biggest Python File”

Quick: what’s the biggest Python source file on your computer? This was the query

innocently posed by a student in one of my Python classes. Because I didn’t know either,

it became an official exercise in subsequent classes, and it provides a good example of

ways to apply Python system tools for a realistic purpose in this book. Really, the query

is a bit vague, because its scope is unclear. Do we mean the largest Python file in a

directory, in a full directory tree, in the standard library, on the module import search

path, or on your entire hard drive? Different scopes imply different solutions.

Scanning the Standard Library Directory

For instance, Example 6-1 is a first-cut solution that looks for the biggest Python file

in one directory—a limited scope, but enough to get started.

Example 6-1. PP4E\System\Filetools\bigpy-dir.py

"""

Find the largest Python source file in a single directory.

Search Windows Python source lib, unless dir command-line arg.

"""

import os, glob, sys

dirname = r'C:\Python31\Lib' if len(sys.argv) == 1 else sys.argv[1]

allsizes = []

allpy = glob.glob(dirname + os.sep + '*.py')

for filename in allpy:

filesize = os.path.getsize(filename)

allsizes.append((filesize, filename))

allsizes.sort()

print(allsizes[:2])

print(allsizes[-2:])

This script uses the glob module to run through a directory’s files and detects the largest

by storing sizes and names on a list that is sorted at the end—because size appears first

in the list’s tuples, it will dominate the ascending value sort, and the largest percolates

to the end of the list. We could instead keep track of the currently largest as we go, but

the list scheme is more flexible. When run, this script scans the Python standard li-

brary’s source directory on Windows, unless you pass a different directory on the com-

mand line, and it prints both the two smallest and largest files it finds:

C:\...\PP4E\System\Filetools> bigpy-dir.py

[(0, 'C:\\Python31\\Lib\\build_class.py'), (56, 'C:\\Python31\\Lib\\struct.py')]

[(147086, 'C:\\Python31\\Lib\\turtle.py'), (211238, 'C:\\Python31\\Lib\\decimal.

py')]

C:\...\PP4E\System\Filetools> bigpy-dir.py .

[(21, '.\\__init__.py'), (461, '.\\bigpy-dir.py')]

272 | Chapter 6: Complete System Programs

[(1940, '.\\bigext-tree.py'), (2547, '.\\split.py')]

C:\...\PP4E\System\Filetools> bigpy-dir.py ..

[(21, '..\\__init__.py'), (29, '..\\testargv.py')]

[(541, '..\\testargv2.py'), (549, '..\\more.py')]

Scanning the Standard Library Tree

The prior section’s solution works, but it’s obviously a partial answer—Python files

are usually located in more than one directory. Even within the standard library, there

are many subdirectories for module packages, and they may be arbitrarily nested. We

really need to traverse an entire directory tree. Moreover, the first output above is dif-

ficult to read; Python’s pprint (for “pretty print”) module can help here. Exam-

ple 6-2 puts these extensions into code.

Example 6-2. PP4E\System\Filetools\bigpy-tree.py

"""

Find the largest Python source file in an entire directory tree.

Search the Python source lib, use pprint to display results nicely.

"""

import sys, os, pprint

trace = False

if sys.platform.startswith('win'):

dirname = r'C:\Python31\Lib' # Windows

else:

dirname = '/usr/lib/python' # Unix, Linux, Cygwin

allsizes = []

for (thisDir, subsHere, filesHere) in os.walk(dirname):

if trace: print(thisDir)

for filename in filesHere:

if filename.endswith('.py'):

if trace: print('...', filename)

fullname = os.path.join(thisDir, filename)

fullsize = os.path.getsize(fullname)

allsizes.append((fullsize, fullname))

allsizes.sort()

pprint.pprint(allsizes[:2])

pprint.pprint(allsizes[-2:])

When run, this new version uses os.walk to search an entire tree of directories for the

largest Python source file. Change this script’s trace variable if you want to track its

progress through the tree. As coded, it searches the Python standard library’s source

tree, tailored for Windows and Unix-like locations:

C:\...\PP4E\System\Filetools> bigpy-tree.py

[(0, 'C:\\Python31\\Lib\\build_class.py'),

(0, 'C:\\Python31\\Lib\\email\\mime\\__init__.py')]

[(211238, 'C:\\Python31\\Lib\\decimal.py'),

(380582, 'C:\\Python31\\Lib\\pydoc_data\\topics.py')]

A Quick Game of “Find the Biggest Python File” | 273

Scanning the Module Search Path

Sure enough—the prior section’s script found smallest and largest files in subdirecto-

ries. While searching Python’s entire standard library tree this way is more inclusive,

it’s still incomplete: there may be additional modules installed elsewhere on your com-

puter, which are accessible from the module import search path but outside Python’s

source tree. To be more exhaustive, we could instead essentially perform the same tree

search, but for every directory on the module import search path. Example 6-3 adds

this extension to include every importable Python-coded module on your computer—

located both on the path directly and nested in package directory trees.

Example 6-3. PP4E\System\Filetools\bigpy-path.py

"""

Find the largest Python source file on the module import search path.

Skip already-visited directories, normalize path and case so they will

match properly, and include line counts in pprinted result. It's not

enough to use os.environ['PYTHONPATH']: this is a subset of sys.path.

"""

import sys, os, pprint

trace = 0 # 1=dirs, 2=+files

visited = {}

allsizes = []

for srcdir in sys.path:

for (thisDir, subsHere, filesHere) in os.walk(srcdir):

if trace > 0: print(thisDir)

thisDir = os.path.normpath(thisDir)

fixcase = os.path.normcase(thisDir)

if fixcase in visited:

continue

else:

visited[fixcase] = True

for filename in filesHere:

if filename.endswith('.py'):

if trace > 1: print('...', filename)

pypath = os.path.join(thisDir, filename)

try:

pysize = os.path.getsize(pypath)

except os.error:

print('skipping', pypath, sys.exc_info()[0])

else:

pylines = len(open(pypath, 'rb').readlines())

allsizes.append((pysize, pylines, pypath))

print('By size...')

allsizes.sort()

pprint.pprint(allsizes[:3])

pprint.pprint(allsizes[-3:])

print('By lines...')

allsizes.sort(key=lambda x: x[1])

274 | Chapter 6: Complete System Programs

pprint.pprint(allsizes[:3])

pprint.pprint(allsizes[-3:])

When run, this script marches down the module import path and, for each valid di-

rectory it contains, attempts to search the entire tree rooted there. In fact, it nests loops

three deep—for items on the path, directories in the item’s tree, and files in the direc-

tory. Because the module path may contain directories named in arbitrary ways, along

the way this script must take care to:

• Normalize directory paths—fixing up slashes and dots to map directories to a

common form.

• Normalize directory name case—converting to lowercase on case-insensitive Win-

dows, so that same names match by string equality, but leaving case unchanged

on Unix, where it matters.

• Detect repeats to avoid visiting the same directory twice (the same directory might

be reached from more than one entry on sys.path).

• Skip any file-like item in the tree for which os.path.getsize fails (by default

os.walk itself silently ignores things it cannot treat as directories, both at the top

of and within the tree).

• Avoid potential Unicode decoding errors in file content by opening files in binary

mode in order to count their lines. Text mode requires decodable content, and

some files in Python 3.1’s library tree cannot be decoded properly on Windows.

Catching Unicode exceptions with a try statement would avoid program exits, too,

but might skip candidate files.

This version also adds line counts; this might add significant run time to this script too,

but it’s a useful metric to report. In fact, this version uses this value as a sort key to

report the three largest and smallest files by line counts too—this may differ from results

based upon raw file size. Here’s the script in action in Python 3.1 on my Windows 7

machine; since these results depend on platform, installed extensions, and path set-

tings, your sys.path and largest and smallest files may vary:

C:\...\PP4E\System\Filetools> bigpy-path.py

By size...

[(0, 0, 'C:\\Python31\\lib\\build_class.py'),

(0, 0, 'C:\\Python31\\lib\\email\\mime\\__init__.py'),

(0, 0, 'C:\\Python31\\lib\\email\\test\\__init__.py')]

[(161613, 3754, 'C:\\Python31\\lib\\tkinter\\__init__.py'),

(211238, 5768, 'C:\\Python31\\lib\\decimal.py'),

(380582, 78, 'C:\\Python31\\lib\\pydoc_data\\topics.py')]

By lines...

[(0, 0, 'C:\\Python31\\lib\\build_class.py'),

(0, 0, 'C:\\Python31\\lib\\email\\mime\\__init__.py'),

(0, 0, 'C:\\Python31\\lib\\email\\test\\__init__.py')]

[(147086, 4132, 'C:\\Python31\\lib\\turtle.py'),

(150069, 4268, 'C:\\Python31\\lib\\test\\test_descr.py'),

(211238, 5768, 'C:\\Python31\\lib\\decimal.py')]

A Quick Game of “Find the Biggest Python File” | 275

Again, change this script’s trace variable if you want to track its progress through the

tree. As you can see, the results for largest files differ when viewed by size and lines—

a disparity which we’ll probably have to hash out in our next requirements meeting.

Scanning the Entire Machine

Finally, although searching trees rooted in the module import path normally includes

every Python source file you can import on your computer, it’s still not complete.

Technically, this approach checks only modules; Python source files which are top-

level scripts run directly do not need to be included in the module path. Moreover, the

module search path may be manually changed by some scripts dynamically at runtime

(for example, by direct sys.path updates in scripts that run on web servers) to include

additional directories that Example 6-3 won’t catch.

Ultimately, finding the largest source file on your computer requires searching your

entire drive—a feat which our tree searcher in Example 6-2 almost supports, if we

generalize it to accept the root directory name as an argument and add some of the bells

and whistles of the path searcher version (we really want to avoid visiting the same

directory twice if we’re scanning an entire machine, and we might as well skip errors

and check line-based sizes if we’re investing the time). Example 6-4 implements such

general tree scans, outfitted for the heavier lifting required for scanning drives.

Example 6-4. PP4E\System\Filetools\bigext-tree.py

"""

Find the largest file of a given type in an arbitrary directory tree.

Avoid repeat paths, catch errors, add tracing and line count size.

Also uses sets, file iterators and generator to avoid loading entire

file, and attempts to work around undecodable dir/file name prints.

"""

import os, pprint

from sys import argv, exc_info

trace = 1 # 0=off, 1=dirs, 2=+files

dirname, extname = os.curdir, '.py' # default is .py files in cwd

if len(argv) > 1: dirname = argv[1] # ex: C:\, C:\Python31\Lib

if len(argv) > 2: extname = argv[2] # ex: .pyw, .txt

if len(argv) > 3: trace = int(argv[3]) # ex: ". .py 2"

def tryprint(arg):

try:

print(arg) # unprintable filename?

except UnicodeEncodeError:

print(arg.encode()) # try raw byte string

visited = set()

allsizes = []

for (thisDir, subsHere, filesHere) in os.walk(dirname):

if trace: tryprint(thisDir)

thisDir = os.path.normpath(thisDir)

276 | Chapter 6: Complete System Programs

fixname = os.path.normcase(thisDir)

if fixname in visited:

if trace: tryprint('skipping ' + thisDir)

else:

visited.add(fixname)

for filename in filesHere:

if filename.endswith(extname):

if trace > 1: tryprint('+++' + filename)

fullname = os.path.join(thisDir, filename)

try:

bytesize = os.path.getsize(fullname)

linesize = sum(+1 for line in open(fullname, 'rb'))

except Exception:

print('error', exc_info()[0])

else:

allsizes.append((bytesize, linesize, fullname))

for (title, key) in [('bytes', 0), ('lines', 1)]:

print('\nBy %s...' % title)

allsizes.sort(key=lambda x: x[key])

pprint.pprint(allsizes[:3])

pprint.pprint(allsizes[-3:])

Unlike the prior tree version, this one allows us to search in specific directories, and

for specific extensions. The default is to simply search the current working directory

for Python files:

C:\...\PP4E\System\Filetools> bigext-tree.py

By bytes...

[(21, 1, '.\\__init__.py'),

(461, 17, '.\\bigpy-dir.py'),

(818, 25, '.\\bigpy-tree.py')]

[(1696, 48, '.\\join.py'),

(1940, 49, '.\\bigext-tree.py'),

(2547, 57, '.\\split.py')]

By lines...

[(21, 1, '.\\__init__.py'),

(461, 17, '.\\bigpy-dir.py'),

(818, 25, '.\\bigpy-tree.py')]

[(1696, 48, '.\\join.py'),

(1940, 49, '.\\bigext-tree.py'),

(2547, 57, '.\\split.py')]

For more custom work, we can pass in a directory name, extension type, and trace level

on the command-line now (trace level 0 disables tracing, and 1, the default, shows

directories visited along the way):

C:\...\PP4E\System\Filetools> bigext-tree.py .. .py 0

By bytes...

[(21, 1, '..\\__init__.py'),

(21, 1, '..\\Filetools\\__init__.py'),

A Quick Game of “Find the Biggest Python File” | 277

(28, 1, '..\\Streams\\hello-out.py')]

[(2278, 67, '..\\Processes\\multi2.py'),

(2547, 57, '..\\Filetools\\split.py'),

(4361, 105, '..\\Tester\\tester.py')]

By lines...

[(21, 1, '..\\__init__.py'),

(21, 1, '..\\Filetools\\__init__.py'),

(28, 1, '..\\Streams\\hello-out.py')]

[(2547, 57, '..\\Filetools\\split.py'),

(2278, 67, '..\\Processes\\multi2.py'),

(4361, 105, '..\\Tester\\tester.py')]

This script also lets us scan for different file types; here it is picking out the smallest

and largest text file from one level up (at the time I ran this script, at least):

C:\...\PP4E\System\Filetools> bigext-tree.py .. .txt 1

..\Environment

..\Filetools

..\Processes

..\Streams

..\Tester

..\Tester\Args

..\Tester\Errors

..\Tester\Inputs

..\Tester\Outputs

..\Tester\Scripts

..\Tester\xxold

..\Threads

By bytes...

[(4, 2, '..\\Streams\\input.txt'),

(13, 1, '..\\Streams\\hello-in.txt'),

(20, 4, '..\\Streams\\data.txt')]

[(104, 4, '..\\Streams\\output.txt'),

(172, 3, '..\\Tester\\xxold\\README.txt.txt'),

(435, 4, '..\\Filetools\\temp.txt')]

By lines...

[(13, 1, '..\\Streams\\hello-in.txt'),

(22, 1, '..\\spam.txt'),

(4, 2, '..\\Streams\\input.txt')]

[(20, 4, '..\\Streams\\data.txt'),

(104, 4, '..\\Streams\\output.txt'),

(435, 4, '..\\Filetools\\temp.txt')]

And now, to search your entire system, simply pass in your machine’s root directory

name (use / instead of C:\ on Unix-like machines), along with an optional file extension

type (.py is just the default now). The winner is…(please, no wagering):

C:\...\PP4E\dev\Examples\PP4E\System\Filetools> bigext-tree.py C:\

C:\

C:\$Recycle.Bin

C:\$Recycle.Bin\S-1-5-21-3951091421-2436271001-910485044-1004

C:\cygwin

278 | Chapter 6: Complete System Programs

C:\cygwin\bin

C:\cygwin\cygdrive

C:\cygwin\dev

C:\cygwin\dev\mqueue

C:\cygwin\dev\shm

C:\cygwin\etc

...MANY more lines omitted...

By bytes...

[(0, 0, 'C:\\cygwin\\...\\python31\\Python-3.1.1\\Lib\\build_class.py'),

(0, 0, 'C:\\cygwin\\...\\python31\\Python-3.1.1\\Lib\\email\\mime\\__init__.py'),

(0, 0, 'C:\\cygwin\\...\\python31\\Python-3.1.1\\Lib\\email\\test\\__init__.py')]

[(380582, 78, 'C:\\Python31\\Lib\\pydoc_data\\topics.py'),

(398157, 83, 'C:\\...\\Install\\Source\\Python-2.6\\Lib\\pydoc_topics.py'),

(412434, 83, 'C:\\Python26\\Lib\\pydoc_topics.py')]

By lines...

[(0, 0, 'C:\\cygwin\\...\\python31\\Python-3.1.1\\Lib\\build_class.py'),

(0, 0, 'C:\\cygwin\\...\\python31\\Python-3.1.1\\Lib\\email\\mime\\__init__.py'),

(0, 0, 'C:\\cygwin\\...\\python31\\Python-3.1.1\\Lib\\email\\test\\__init__.py')]

[(204107, 5589, 'C:\\...\Install\\Source\\Python-3.0\\Lib\\decimal.py'),

(205470, 5768, 'C:\\cygwin\\...\\python31\\Python-3.1.1\\Lib\\decimal.py'),

(211238, 5768, 'C:\\Python31\\Lib\\decimal.py')]

The script’s trace logic is preset to allow you to monitor its directory progress. I’ve

shortened some directory names to protect the innocent here (and to fit on this page).

This command may take a long time to finish on your computer—on my sadly under-

powered Windows 7 netbook, it took 11 minutes to scan a solid state drive with some

59G of data, 200K files, and 25K directories when the system was lightly loaded (8

minutes when not tracing directory names, but half an hour when many other appli-

cations were running). Nevertheless, it provides the most exhaustive solution to the

original query of all our attempts.

This is also as complete a solution as we have space for in this book. For more fun,

consider that you may need to scan more than one drive, and some Python source files

may also appear in zip archives, both on the module path or not (os.walk silently ignores

zip files in Example 6-3). They might also be named in other ways—with .pyw exten-

sions to suppress shell pop ups on Windows, and with arbitrary extensions for some

top-level scripts. In fact, top-level scripts might have no filename extension at all, even

though they are Python source files. And while they’re generally not Python files, some

importable modules may also appear in frozen binaries or be statically linked into the

Python executable. In the interest of space, we’ll leave such higher resolution (and

potentially intractable!) search extensions as suggested exercises.

Printing Unicode Filenames

One fine point before we move on: notice the seemingly superfluous exception handling

in Example 6-4’s tryprint function. When I first tried to scan an entire drive as shown

in the preceding section, this script died on a Unicode encoding error while trying to

A Quick Game of “Find the Biggest Python File” | 279

print a directory name of a saved web page. Adding the exception handler skips the

error entirely.

This demonstrates a subtle but pragmatically important issue: Python 3.X’s Unicode

orientation extends to filenames, even if they are just printed. As we learned in Chap-

ter 4, because filenames may contain arbitrary text, os.listdir returns filenames in two

different ways—we get back decoded Unicode strings when we pass in a normal str

argument, and still-encoded byte strings when we send a bytes:

>>> import os

>>> os.listdir('.')[:4]

['bigext-tree.py', 'bigpy-dir.py', 'bigpy-path.py', 'bigpy-tree.py']

>>> os.listdir(b'.')[:4]

[b'bigext-tree.py', b'bigpy-dir.py', b'bigpy-path.py', b'bigpy-tree.py']

Both os.walk (used in the Example 6-4 script) and glob.glob inherit this behavior for

the directory and file names they return, because they work by calling os.listdir in-

ternally at each directory level. For all these calls, passing in a byte string argument

suppresses Unicode decoding of file and directory names. Passing a normal string as-

sumes that filenames are decodable per the file system’s Unicode scheme.

The reason this potentially mattered to this section’s example is that running the tree

search version over an entire hard drive eventually reached an undecodable filename

(an old saved web page with an odd name), which generated an exception when the

print function tried to display it. Here’s a simplified recreation of the error, run in a

shell window (Command Prompt) on Windows:

>>> root = r'C:\py3000'

>>> for (dir, subs, files) in os.walk(root): print(dir)

...

C:\py3000

C:\py3000\FutureProofPython - PythonInfo Wiki_files

C:\py3000\Oakwinter_com Code » Porting setuptools to py3k_files

Traceback (most recent call last):

File "<stdin>", line 1, in <module>

File "C:\Python31\lib\encodings\cp437.py", line 19, in encode

return codecs.charmap_encode(input,self.errors,encoding_map)[0]

UnicodeEncodeError: 'charmap' codec can't encode character '\u2019' in position

45: character maps to <undefined>

One way out of this dilemma is to use bytes strings for the directory root name—this

suppresses filename decoding in the os.listdir calls run by os.walk, and effectively

limits the scope of later printing to raw bytes. Since printing does not have to deal with

encodings, it works without error. Manually encoding to bytes prior to printing works

too, but the results are slightly different:

>>> root.encode()

b'C:\\py3000'

>>> for (dir, subs, files) in os.walk(root.encode()): print(dir)

...

280 | Chapter 6: Complete System Programs

b'C:\\py3000'

b'C:\\py3000\\FutureProofPython - PythonInfo Wiki_files'

b'C:\\py3000\\Oakwinter_com Code \xbb Porting setuptools to py3k_files'

b'C:\\py3000\\What\x92s New in Python 3_0 \x97 Python Documentation'

>>> for (dir, subs, files) in os.walk(root): print(dir.encode())

...

b'C:\\py3000'

b'C:\\py3000\\FutureProofPython - PythonInfo Wiki_files'

b'C:\\py3000\\Oakwinter_com Code \xc2\xbb Porting setuptools to py3k_files'

b'C:\\py3000\\What\xe2\x80\x99s New in Python 3_0 \xe2\x80\x94 Python Documentation'

Unfortunately, either approach means that all the directory names printed during the

walk display as cryptic byte strings. To maintain the better readability of normal strings,

I instead opted for the exception handler approach used in the script’s code. This avoids

the issues entirely:

>>> for (dir, subs, files) in os.walk(root):

... try:

... print(dir)

... except UnicodeEncodeError:

... print(dir.encode()) # or simply punt if enocde may fail too

...

C:\py3000

C:\py3000\FutureProofPython - PythonInfo Wiki_files

C:\py3000\Oakwinter_com Code » Porting setuptools to py3k_files

b'C:\\py3000\\What\xe2\x80\x99s New in Python 3_0 \xe2\x80\x94 Python Documentation'

Oddly, though, the error seems more related to printing than to Unicode encodings of

filenames—because the filename did not fail until printed, it must have been decodable

when its string was created initially. That’s why wrapping up the print in a try suffices;

otherwise, the error would occur earlier.

Moreover, this error does not occur if the script’s output is redirected to a file, either

at the shell level (bigext-tree.py c:\ > out), or by the print call itself (print(dir,

file=F)). In the latter case the output file must later be read back in binary mode, as

text mode triggers the same error when printing the file’s content to the shell window

(but again, not until printed). In fact, the exact same code that fails when run in a system

shell Command Prompt on Windows works without error when run in the IDLE GUI

on the same platform—the tkinter GUI used by IDLE handles display of characters that

printing to standard output connected to a shell terminal window does not:

>>> import os # run in IDLE (a tkinter GUI), not system shell

>>> root = r'C:\py3000'

>>> for (dir, subs, files) in os.walk(root): print(dir)

C:\py3000

C:\py3000\FutureProofPython - PythonInfo Wiki_files

C:\py3000\Oakwinter_com Code » Porting setuptools to py3k_files

C:\py3000\What's New in Python 3_0 — Python Documentation_files

In other words, the exception occurs only when printing to a shell window, and long

after the file name string is created. This reflects an artifact of extra translations

A Quick Game of “Find the Biggest Python File” | 281

performed by the Python printer, not of Unicode file names in general. Because we have

no room for further exploration here, though, we’ll have to be satisfied with the fact

that our exception handler sidesteps the printing problem altogether. You should still

be aware of the implications of Unicode filename decoding, though; on some platforms

you may need to pass byte strings to os.walk in this script to prevent decoding errors

as filenames are created.*

Since Unicode is still relatively new in 3.1, be sure to test for such errors on your com-

puter and your Python. Also see also Python’s manuals for more on the treatment of

Unicode filenames, and the text Learning Python for more on Unicode in general. As

noted earlier, our scripts also had to open text files in binary mode because some might

contain undecodable content too. It might seem surprising that Unicode issues can crop

up in basic printing like this too, but such is life in the brave new Unicode world. Many

real-world scripts don’t need to care much about Unicode, of course—including those

we’ll explore in the next section.

Splitting and Joining Files

Like most kids, mine spent a lot of time on the Internet when they were growing up.

As far as I could tell, it was the thing to do. Among their generation, computer geeks

and gurus seem to have been held in the same sort of esteem that my generation once

held rock stars. When kids disappeared into their rooms, chances were good that they

were hacking on computers, not mastering guitar riffs (well, real ones, at least). It may

or may not be healthier than some of the diversions of my own misspent youth, but

that’s a topic for another kind of book.

Despite the rhetoric of techno-pundits about the Web’s potential to empower an up-

coming generation in ways unimaginable by their predecessors, my kids seemed to

spend most of their time playing games. To fetch new ones in my house at the time,

they had to download to a shared computer which had Internet access and transfer

those games to their own computers to install. (Their own machines did not have

Internet access until later, for reasons that most parents in the crowd could probably

expand upon.)

The problem with this scheme is that game files are not small. They were usually much

too big to fit on a floppy or memory stick of the time, and burning a CD or DVD took

away valuable game-playing time. If all the machines in my house ran Linux, this would

have been a nonissue. There are standard command-line programs on Unix for chop-

ping a file into pieces small enough to fit on a transfer device (split), and others for

* For a related print issue, see Chapter 14’s workaround for program aborts when printing stack tracebacks

to standard output from spawned programs. Unlike the problem described here, that issue does not appear

to be related to Unicode characters that may be unprintable in shell windows but reflects another regression

for standard output prints in general in Python 3.1, which may or may not be repaired by the time you read

this text. See also the Python environment variable PYTHONIOENCODING, which can override the default

encoding used for standard streams.

282 | Chapter 6: Complete System Programs

putting the pieces back together to re-create the original file (cat). Because we had all

sorts of different machines in the house, though, we needed a more portable solution.†

Splitting Files Portably

Since all the computers in my house ran Python, a simple portable Python script came

to the rescue. The Python program in Example 6-5 distributes a single file’s contents

among a set of part files and stores those part files in a directory.

Example 6-5. PP4E\System\Filetools\split.py

#!/usr/bin/python

"""

################################################################################

split a file into a set of parts; join.py puts them back together;

this is a customizable version of the standard Unix split command-line

utility; because it is written in Python, it also works on Windows and

can be easily modified; because it exports a function, its logic can

also be imported and reused in other applications;

################################################################################

"""

import sys, os

kilobytes = 1024

megabytes = kilobytes * 1000

chunksize = int(1.4 * megabytes) # default: roughly a floppy

def split(fromfile, todir, chunksize=chunksize):

if not os.path.exists(todir): # caller handles errors

os.mkdir(todir) # make dir, read/write parts

else:

for fname in os.listdir(todir): # delete any existing files

os.remove(os.path.join(todir, fname))

partnum = 0

input = open(fromfile, 'rb') # binary: no decode, endline

while True: # eof=empty string from read

chunk = input.read(chunksize) # get next part <= chunksize

if not chunk: break

partnum += 1

filename = os.path.join(todir, ('part%04d' % partnum))

fileobj = open(filename, 'wb')

fileobj.write(chunk)

fileobj.close() # or simply open().write()

input.close()

assert partnum <= 9999 # join sort fails if 5 digits

return partnum

† I should note that this background story stems from the second edition of this book, written in 2000. Some

ten years later, floppies have largely gone the way of the parallel port and the dinosaur. Moreover, burning

a CD or DVD is no longer as painful as it once was; there are new options today such as large flash memory

cards, wireless home networks, and simple email; and naturally, my home computers configuration isn’t

what it once was. For that matter, some of my kids are no longer kids (though they’ve retained some backward

compatibility with their former selves).

Splitting and Joining Files | 283

if __name__ == '__main__':

if len(sys.argv) == 2 and sys.argv[1] == '-help':

print('Use: split.py [file-to-split target-dir [chunksize]]')

else:

if len(sys.argv) < 3:

interactive = True

fromfile = input('File to be split? ') # input if clicked

todir = input('Directory to store part files? ')

else:

interactive = False

fromfile, todir = sys.argv[1:3] # args in cmdline

if len(sys.argv) == 4: chunksize = int(sys.argv[3])

absfrom, absto = map(os.path.abspath, [fromfile, todir])

print('Splitting', absfrom, 'to', absto, 'by', chunksize)

try:

parts = split(fromfile, todir, chunksize)

except:

print('Error during split:')

print(sys.exc_info()[0], sys.exc_info()[1])

else:

print('Split finished:', parts, 'parts are in', absto)

if interactive: input('Press Enter key') # pause if clicked

By default, this script splits the input file into chunks that are roughly the size of a

floppy disk—perfect for moving big files between the electronically isolated machines

of the time. Most importantly, because this is all portable Python code, this script will

run on just about any machine, even ones without their own file splitter. All it requires

is an installed Python. Here it is at work splitting a Python 3.1 self-installer executable

located in the current working directory on Windows (I’ve omitted a few dir output

lines to save space here; use ls -l on Unix):

C:\temp> cd C:\temp

C:\temp> dir python-3.1.msi

...more...

06/27/2009 04:53 PM 13,814,272 python-3.1.msi

1 File(s) 13,814,272 bytes

0 Dir(s) 188,826,189,824 bytes free

C:\temp> python C:\...\PP4E\System\Filetools\split.py -help

Use: split.py [file-to-split target-dir [chunksize]]

C:\temp> python C:\...\P4E\System\Filetools\split.py python-3.1.msi pysplit

Splitting C:\temp\python-3.1.msi to C:\temp\pysplit by 1433600

Split finished: 10 parts are in C:\temp\pysplit

C:\temp> dir pysplit

...more...

02/21/2010 11:13 AM <DIR> .

02/21/2010 11:13 AM <DIR> ..

02/21/2010 11:13 AM 1,433,600 part0001

02/21/2010 11:13 AM 1,433,600 part0002

284 | Chapter 6: Complete System Programs

02/21/2010 11:13 AM 1,433,600 part0003

02/21/2010 11:13 AM 1,433,600 part0004

02/21/2010 11:13 AM 1,433,600 part0005

02/21/2010 11:13 AM 1,433,600 part0006

02/21/2010 11:13 AM 1,433,600 part0007

02/21/2010 11:13 AM 1,433,600 part0008

02/21/2010 11:13 AM 1,433,600 part0009

02/21/2010 11:13 AM 911,872 part0010

10 File(s) 13,814,272 bytes

2 Dir(s) 188,812,328,960 bytes free

Each of these generated part files represents one binary chunk of the file

python-3.1.msi—a chunk small enough to fit comfortably on a floppy disk of the time.

In fact, if you add the sizes of the generated part files given by the ls command, you’ll

come up with exactly the same number of bytes as the original file’s size. Before we see

how to put these files back together again, here are a few points to ponder as you study

this script’s code:

Operation modes

This script is designed to input its parameters in either interactive or command-

line mode; it checks the number of command-line arguments to find out the mode

in which it is being used. In command-line mode, you list the file to be split and

the output directory on the command line, and you can optionally override the

default part file size with a third command-line argument.

In interactive mode, the script asks for a filename and output directory at the con-

sole window with input and pauses for a key press at the end before exiting. This

mode is nice when the program file is started by clicking on its icon; on Windows,

parameters are typed into a pop-up DOS box that doesn’t automatically disappear.

The script also shows the absolute paths of its parameters (by running them

through os.path.abspath) because they may not be obvious in interactive mode.

Binary file mode

This code is careful to open both input and output files in binary mode (rb, wb),

because it needs to portably handle things like executables and audio files, not just

text. In Chapter 4, we learned that on Windows, text-mode files automatically map

\r\n end-of-line sequences to \n on input and map \n to \r\n on output. For true

binary data, we really don’t want any \r characters in the data to go away when

read, and we don’t want any superfluous \r characters to be added on output.

Binary-mode files suppress this \r mapping when the script is run on Windows

and so avoid data corruption.

In Python 3.X, binary mode also means that file data is bytes objects in our script,

not encoded str text, though we don’t need to do anything special—this script’s

file processing code runs the same on Python 3.X as it did on 2.X. In fact, binary

mode is required in 3.X for this program, because the target file’s data may not be

encoded text at all; text mode requires that file content must be decodable in 3.X,

and that might fail both for truly binary data and text files obtained from other

Splitting and Joining Files | 285

platforms. On output, binary mode accepts bytes and suppresses Unicode encod-

ing and line-end translations.

Manually closing files

This script also goes out of its way to manually close its files. As we also saw

in Chapter 4, we can often get by with a single line: open(partname,

'wb').write(chunk). This shorter form relies on the fact that the current Python

implementation automatically closes files for you when file objects are reclaimed

(i.e., when they are garbage collected, because there are no more references to the

file object). In this one-liner, the file object would be reclaimed immediately, be-

cause the open result is temporary in an expression and is never referenced by a

longer-lived name. Similarly, the input file is reclaimed when the split function

exits.

However, it’s not impossible that this automatic-close behavior may go away in

the future. Moreover, the Jython Java-based Python implementation does not re-

claim unreferenced objects as immediately as the standard Python. You should

close manually if you care about the Java port, your script may potentially create

many files in a short amount of time, and it may run on a machine that has a limit

on the number of open files per program. Because the split function in this module

is intended to be a general-purpose tool, it accommodates such worst-case

scenarios. Also see Chapter 4’s mention of the file context manager and the with

statement; this provides an alternative way to guarantee file closes.

Joining Files Portably

Back to moving big files around the house: after downloading a big game program file,

you can run the previous splitter script by clicking on its name in Windows Explorer

and typing filenames. After a split, simply copy each part file onto its own floppy (or

other more modern medium), walk the files to the destination machine, and re-create

the split output directory on the target computer by copying the part files. Finally, the

script in Example 6-6 is clicked or otherwise run to put the parts back together.

Example 6-6. PP4E\System\Filetools\join.py

#!/usr/bin/python

"""

################################################################################

join all part files in a dir created by split.py, to re-create file.

This is roughly like a 'cat fromdir/* > tofile' command on unix, but is

more portable and configurable, and exports the join operation as a

reusable function. Relies on sort order of filenames: must be same

length. Could extend split/join to pop up Tkinter file selectors.

################################################################################

"""

import os, sys

readsize = 1024

286 | Chapter 6: Complete System Programs

def join(fromdir, tofile):

output = open(tofile, 'wb')

parts = os.listdir(fromdir)

parts.sort()

for filename in parts:

filepath = os.path.join(fromdir, filename)

fileobj = open(filepath, 'rb')

while True:

filebytes = fileobj.read(readsize)

if not filebytes: break

output.write(filebytes)

fileobj.close()

output.close()

if __name__ == '__main__':

if len(sys.argv) == 2 and sys.argv[1] == '-help':

print('Use: join.py [from-dir-name to-file-name]')

else:

if len(sys.argv) != 3:

interactive = True

fromdir = input('Directory containing part files? ')

tofile = input('Name of file to be recreated? ')

else:

interactive = False

fromdir, tofile = sys.argv[1:]

absfrom, absto = map(os.path.abspath, [fromdir, tofile])

print('Joining', absfrom, 'to make', absto)

try:

join(fromdir, tofile)

except:

print('Error joining files:')

print(sys.exc_info()[0], sys.exc_info()[1])

else:

print('Join complete: see', absto)

if interactive: input('Press Enter key') # pause if clicked

Here is a join in progress on Windows, combining the split files we made a moment

ago; after running the join script, you still may need to run something like zip, gzip,

or tar to unpack an archive file unless it’s shipped as an executable, but at least the

original downloaded file is set to go‡:

C:\temp> python C:\...\PP4E\System\Filetools\join.py -help

Use: join.py [from-dir-name to-file-name]

‡ It turns out that the zip, gzip, and tar commands can all be replaced with pure Python code today, too. The

gzip module in the Python standard library provides tools for reading and writing compressed gzip files,

usually named with a .gz filename extension. It can serve as an all-Python equivalent of the standard gzip

and gunzip command-line utility programs. This built-in module uses another module called zlib that

implements gzip-compatible data compressions. In recent Python releases, the zipfile module can be

imported to make and use ZIP format archives (zip is an archive and compression format, gzip is a

compression scheme), and the tarfile module allows scripts to read and write tar archives. See the Python

library manual for details.

Splitting and Joining Files | 287

C:\temp> python C:\...\PP4E\System\Filetools\join.py pysplit mypy31.msi

Joining C:\temp\pysplit to make C:\temp\mypy31.msi

Join complete: see C:\temp\mypy31.msi

C:\temp> dir *.msi

...more...

02/21/2010 11:21 AM 13,814,272 mypy31.msi

06/27/2009 04:53 PM 13,814,272 python-3.1.msi

2 File(s) 27,628,544 bytes

0 Dir(s) 188,798,611,456 bytes free

C:\temp> fc /b mypy31.msi python-3.1.msi

Comparing files mypy31.msi and PYTHON-3.1.MSI

FC: no differences encountered

The join script simply uses os.listdir to collect all the part files in a directory created

by split, and sorts the filename list to put the parts back together in the correct order.

We get back an exact byte-for-byte copy of the original file (proved by the DOS fc

command in the code; use cmp on Unix).

Some of this process is still manual, of course (I never did figure out how to script the

“walk the floppies to your bedroom” step), but the split and join scripts make it both

quick and simple to move big files around. Because this script is also portable Python

code, it runs on any platform to which we cared to move split files. For instance, my

home computers ran both Windows and Linux at the time; since this script runs on

either platform, the gamers were covered. Before we move on, here are a couple of

implementation details worth underscoring in the join script’s code:

Reading by blocks or files

First of all, notice that this script deals with files in binary mode but also reads each

part file in blocks of 1 KB each. In fact, the readsize setting here (the size of each

block read from an input part file) has no relation to chunksize in split.py (the total

size of each output part file). As we learned in Chapter 4, this script could instead

read each part file all at once: output.write(open(filepath, 'rb').read()). The

downside to this scheme is that it really does load all of a file into memory at once.

For example, reading a 1.4 MB part file into memory all at once with the file object

read method generates a 1.4 MB string in memory to hold the file’s bytes. Since

split allows users to specify even larger chunk sizes, the join script plans for the

worst and reads in terms of limited-size blocks. To be completely robust, the

split script could read its input data in smaller chunks too, but this hasn’t become

a concern in practice (recall that as your program runs, Python automatically re-

claims strings that are no longer referenced, so this isn’t as wasteful as it might

seem).

Sorting filenames

If you study this script’s code closely, you may also notice that the join scheme it

uses relies completely on the sort order of filenames in the parts directory. Because

it simply calls the list sort method on the filenames list returned by os.listdir, it

implicitly requires that filenames have the same length and format when created

288 | Chapter 6: Complete System Programs

by split. To satisfy this requirement, the splitter uses zero-padding notation in a

string formatting expression ('part%04d') to make sure that filenames all have the

same number of digits at the end (four). When sorted, the leading zero characters

in small numbers guarantee that part files are ordered for joining correctly.

Alternatively, we could strip off digits in filenames, convert them with int, and sort

numerically, by using the list sort method’s keys argument, but that would still

imply that all filenames must start with the some type of substring, and so doesn’t

quite remove the file-naming dependency between the split and join scripts. Be-

cause these scripts are designed to be two steps of the same process, though, some

dependencies between them seem reasonable.

Usage Variations

Finally, let’s run a few more experiments with these Python system utilities to demon-

strate other usage modes. When run without full command-line arguments, both

split and join are smart enough to input their parameters interactively. Here they are

chopping and gluing the Python self-installer file on Windows again, with parameters

typed in the DOS console window:

C:\temp> python C:\...\PP4E\System\Filetools\split.py

File to be split? python-3.1.msi

Directory to store part files? splitout

Splitting C:\temp\python-3.1.msi to C:\temp\splitout by 1433600

Split finished: 10 parts are in C:\temp\splitout

Press Enter key

C:\temp> python C:\...\PP4E\System\Filetools\join.py

Directory containing part files? splitout

Name of file to be recreated? newpy31.msi

Joining C:\temp\splitout to make C:\temp\newpy31.msi

Join complete: see C:\temp\newpy31.msi

Press Enter key

C:\temp> fc /B python-3.1.msi newpy31.msi

Comparing files python-3.1.msi and NEWPY31.MSI

FC: no differences encountered

When these program files are double-clicked in a Windows file explorer GUI, they work

the same way (there are usually no command-line arguments when they are launched

this way). In this mode, absolute path displays help clarify where files really are. Re-

member, the current working directory is the script’s home directory when clicked like

this, so a simple name actually maps to a source code directory; type a full path to make

the split files show up somewhere else:

[in a pop-up DOS console box when split.py is clicked]

File to be split? c:\temp\python-3.1.msi

Directory to store part files? c:\temp\parts

Splitting c:\temp\python-3.1.msi to c:\temp\parts by 1433600

Split finished: 10 parts are in c:\temp\parts

Press Enter key

Splitting and Joining Files | 289

[in a pop-up DOS console box when join.py is clicked]

Directory containing part files? c:\temp\parts

Name of file to be recreated? c:\temp\morepy31.msi

Joining c:\temp\parts to make c:\temp\morepy31.msi

Join complete: see c:\temp\morepy31.msi

Press Enter key

Because these scripts package their core logic in functions, though, it’s just as easy to

reuse their code by importing and calling from another Python component (make sure

your module import search path includes the directory containing the PP4E root first;

the first abbreviated line here is one way to do so):

C:\temp> set PYTHONPATH=C:\...\dev\Examples

C:\temp> python

>>> from PP4E.System.Filetools.split import split

>>> from PP4E.System.Filetools.join import join

>>>

>>> numparts = split('python-3.1.msi', 'calldir')

>>> numparts

>>> join('calldir', 'callpy31.msi')

>>>

>>> import os

>>> os.system('fc /B python-3.1.msi callpy31.msi')

Comparing files python-3.1.msi and CALLPY31.msi

FC: no differences encountered

A word about performance: all the split and join tests shown so far process a 13 MB

file, but they take less than one second of real wall-clock time to finish on my Windows

7 2GHz Atom processor laptop computer—plenty fast for just about any use I could

imagine. Both scripts run just as fast for other reasonable part file sizes, too; here is the

splitter chopping up the file into 4MB and 500KB parts:

C:\temp> C:\...\PP4E\System\Filetools\split.py python-3.1.msi tempsplit 4000000

Splitting C:\temp\python-3.1.msi to C:\temp\tempsplit by 4000000

Split finished: 4 parts are in C:\temp\tempsplit

C:\temp> dir tempsplit

...more...

Directory of C:\temp\tempsplit

02/21/2010 01:27 PM <DIR> .

02/21/2010 01:27 PM <DIR> ..

02/21/2010 01:27 PM 4,000,000 part0001

02/21/2010 01:27 PM 4,000,000 part0002

02/21/2010 01:27 PM 4,000,000 part0003

02/21/2010 01:27 PM 1,814,272 part0004

4 File(s) 13,814,272 bytes

2 Dir(s) 188,671,983,616 bytes free

290 | Chapter 6: Complete System Programs

C:\temp> C:\...\PP4E\System\Filetools\split.py python-3.1.msi tempsplit 500000

Splitting C:\temp\python-3.1.msi to C:\temp\tempsplit by 500000

Split finished: 28 parts are in C:\temp\tempsplit

C:\temp> dir tempsplit

...more...

Directory of C:\temp\tempsplit

02/21/2010 01:27 PM <DIR> .

02/21/2010 01:27 PM <DIR> ..

02/21/2010 01:27 PM 500,000 part0001

02/21/2010 01:27 PM 500,000 part0002

02/21/2010 01:27 PM 500,000 part0003

02/21/2010 01:27 PM 500,000 part0004

02/21/2010 01:27 PM 500,000 part0005

...more lines omitted...

02/21/2010 01:27 PM 500,000 part0024

02/21/2010 01:27 PM 500,000 part0025

02/21/2010 01:27 PM 500,000 part0026

02/21/2010 01:27 PM 500,000 part0027

02/21/2010 01:27 PM 314,272 part0028

28 File(s) 13,814,272 bytes

2 Dir(s) 188,671,946,752 bytes free

The split can take noticeably longer to finish, but only if the part file’s size is set small

enough to generate thousands of part files—splitting into 1,382 parts works but runs

slower (though some machines today are quick enough that you might not notice):

C:\temp> C:\...\PP4E\System\Filetools\split.py python-3.1.msi tempsplit 10000

Splitting C:\temp\python-3.1.msi to C:\temp\tempsplit by 10000

Split finished: 1382 parts are in C:\temp\tempsplit

C:\temp> C:\...\PP4E\System\Filetools\join.py tempsplit manypy31.msi

Joining C:\temp\tempsplit to make C:\temp\manypy31.msi

Join complete: see C:\temp\manypy31.msi

C:\temp> fc /B python-3.1.msi manypy31.msi

Comparing files python-3.1.msi and MANYPY31.MSI

FC: no differences encountered

C:\temp> dir tempsplit

...more...

Directory of C:\temp\tempsplit

02/21/2010 01:40 PM <DIR> .

02/21/2010 01:40 PM <DIR> ..

02/21/2010 01:39 PM 10,000 part0001

02/21/2010 01:39 PM 10,000 part0002

02/21/2010 01:39 PM 10,000 part0003

02/21/2010 01:39 PM 10,000 part0004

02/21/2010 01:39 PM 10,000 part0005

Splitting and Joining Files | 291

...over 1,000 lines deleted...

02/21/2010 01:40 PM 10,000 part1378

02/21/2010 01:40 PM 10,000 part1379

02/21/2010 01:40 PM 10,000 part1380

02/21/2010 01:40 PM 10,000 part1381

02/21/2010 01:40 PM 4,272 part1382

1382 File(s) 13,814,272 bytes

2 Dir(s) 188,651,008,000 bytes free

Finally, the splitter is also smart enough to create the output directory if it doesn’t yet

exist and to clear out any old files there if it does exist—the following, for example,

leaves only new files in the output directory. Because the joiner combines whatever files

exist in the output directory, this is a nice ergonomic touch. If the output directory was

not cleared before each split, it would be too easy to forget that a prior run’s files are

still there. Given that target audience for these scripts, they needed to be as forgiving

as possible; your user base may vary (though you often shouldn’t assume so).

C:\temp> C:\...\PP4E\System\Filetools\split.py python-3.1.msi tempsplit 5000000

Splitting C:\temp\python-3.1.msi to C:\temp\tempsplit by 5000000

Split finished: 3 parts are in C:\temp\tempsplit

C:\temp> dir tempsplit

...more...

Directory of C:\temp\tempsplit

02/21/2010 01:47 PM <DIR> .

02/21/2010 01:47 PM <DIR> ..

02/21/2010 01:47 PM 5,000,000 part0001

02/21/2010 01:47 PM 5,000,000 part0002

02/21/2010 01:47 PM 3,814,272 part0003

3 File(s) 13,814,272 bytes

2 Dir(s) 188,654,452,736 bytes free

Of course, the dilemma that these scripts address might today be more easily addressed

by simply buying a bigger memory stick or giving kids their own Internet access. Still,

once you catch the scripting bug, you’ll find the ease and flexibility of Python to be

powerful and enabling tools, especially for writing custom automation scripts like

these. When used well, Python may well become your Swiss Army knife of

computing.

Generating Redirection Web Pages

Moving is rarely painless, even in cyberspace. Changing your website’s Internet address

can lead to all sorts of confusion. You need to ask known contacts to use the new

address and hope that others will eventually stumble onto it themselves. But if you rely

on the Internet, moves are bound to generate at least as much confusion as an address

change in the real world.

Unfortunately, such site relocations are often unavoidable. Both Internet Service Pro-

viders (ISPs) and server machines can come and go over the years. Moreover, some ISPs

292 | Chapter 6: Complete System Programs

let their service fall to intolerably low levels; if you are unlucky enough to have signed

up with such an ISP, there is not much recourse but to change providers, and that often

implies a change of web addresses.§

Imagine, though, that you are an O’Reilly author and have published your website’s

address in multiple books sold widely all over the world. What do you do when your

ISP’s service level requires a site change? Notifying each of the hundreds of thousands

of readers out there isn’t exactly a practical solution.

Probably the best you can do is to leave forwarding instructions at the old site for some

reasonably long period of time—the virtual equivalent of a “We’ve Moved” sign in a

storefront window. On the Web, such a sign can also send visitors to the new site

automatically: simply leave a page at the old site containing a hyperlink to the page’s

address at the new site, along with timed auto-relocation specifications. With such

forward-link files in place, visitors to the old addresses will be only one click or a few

seconds away from reaching the new ones.

That sounds simple enough. But because visitors might try to directly access the address

of any file at your old site, you generally need to leave one forward-link file for every

old file—HTML pages, images, and so on. Unless your prior server supports auto-

redirection (and mine did not), this represents a dilemma. If you happen to enjoy doing

lots of mindless typing, you could create each forward-link file by hand. But given that

my home site contained over 100 HTML files at the time I wrote this paragraph, the

prospect of running one editor session per file was more than enough motivation for

an automated solution.

Page Template File

Here’s what I came up with. First of all, I create a general page template text file, shown

in Example 6-7, to describe how all the forward-link files should look, with parts to be

filled in later.

Example 6-7. PP4E\System\Filetools\template.html

<HTML>

<head>

<title>Site Redirection Page: $file$</title>

</head>

<BODY>

<H1>This page has moved</H1>

<P>This page now lives at this address:

§ It happens. In fact, most people who spend any substantial amount of time in cyberspace could probably tell

a horror story or two. Mine goes like this: a number of years ago, I had an account with an ISP that went

completely offline for a few weeks in response to a security breach by an ex-employee. Worse, not only was

personal email disabled, but queued up messages were permanently lost. If your livelihood depends on email

and the Web as much as mine does, you’ll appreciate the havoc such an outage can wreak.

Generating Redirection Web Pages | 293

http://$server$/$home$/$file$</A>

<P>Please click on the new address to jump to this page, and

update any links accordingly. You will be redirectly shortly.

</P>

<HR>

</BODY></HTML>

To fully understand this template, you have to know something about HTML, a web

page description language that we’ll explore in Part IV. But for the purposes of this

example, you can ignore most of this file and focus on just the parts surrounded by

dollar signs: the strings $server$, $home$, and $file$ are targets to be replaced with real

values by global text substitutions. They represent items that vary per site relocation

and file.

Page Generator Script

Now, given a page template file, the Python script in Example 6-8 generates all the

required forward-link files automatically.

Example 6-8. PP4E\System\Filetools\site-forward.py

"""

################################################################################

Create forward-link pages for relocating a web site.

Generates one page for every existing site html file; upload the generated

files to your old web site. See ftplib later in the book for ways to run

uploads in scripts either after or during page file creation.

################################################################################

"""

import os

servername = 'learning-python.com' # where site is relocating to

homedir = 'books' # where site will be rooted

sitefilesdir = r'C:\temp\public_html' # where site files live locally

uploaddir = r'C:\temp\isp-forward' # where to store forward files

templatename = 'template.html' # template for generated pages

try:

os.mkdir(uploaddir) # make upload dir if needed

except OSError: pass

template = open(templatename).read() # load or import template text

sitefiles = os.listdir(sitefilesdir) # filenames, no directory prefix

count = 0

for filename in sitefiles:

if filename.endswith('.html') or filename.endswith('.htm'):

fwdname = os.path.join(uploaddir, filename)

print('creating', filename, 'as', fwdname)

294 | Chapter 6: Complete System Programs

filetext = template.replace('$server$', servername) # insert text

filetext = filetext.replace('$home$', homedir) # and write

filetext = filetext.replace('$file$', filename) # file varies

open(fwdname, 'w').write(filetext)

count += 1

print('Last file =>\n', filetext, sep='')

print('Done:', count, 'forward files created.')

Notice that the template’s text is loaded by reading a file; it would work just as well to

code it as an imported Python string variable (e.g., a triple-quoted string in a module

file). Also observe that all configuration options are assignments at the top of the

script, not command-line arguments; since they change so seldom, it’s convenient to

type them just once in the script itself.

But the main thing worth noticing here is that this script doesn’t care what the template

file looks like at all; it simply performs global substitutions blindly in its text, with a

different filename value for each generated file. In fact, we can change the template file

any way we like without having to touch the script. Though a fairly simple technique,

such a division of labor can be used in all sorts of contexts—generating “makefiles,”

form letters, HTML replies from CGI scripts on web servers, and so on. In terms of

library tools, the generator script:

• Uses os.listdir to step through all the filenames in the site’s directory

(glob.glob would work too, but may require stripping directory prefixes from file

names)

• Uses the string object’s replace method to perform global search-and-replace op-

erations that fill in the $-delimited targets in the template file’s text, and endswith

to skip non-HTML files (e.g., images—most browsers won’t know what to do with

HTML text in a “.jpg” file)

• Uses os.path.join and built-in file objects to write the resulting text out to a

forward-link file of the same name in an output directory

The end result is a mirror image of the original website directory, containing only

forward-link files generated from the page template. As an added bonus, the generator

script can be run on just about any Python platform—I can run it on my Windows

laptop (where I’m writing this book), as well as on a Linux server (where my http://

learning-python.com domain is hosted). Here it is in action on Windows:

C:\...\PP4E\System\Filetools> python site-forward.py

creating about-lp.html as C:\temp\isp-forward\about-lp.html

creating about-lp1e.html as C:\temp\isp-forward\about-lp1e.html

creating about-lp2e.html as C:\temp\isp-forward\about-lp2e.html

creating about-lp3e.html as C:\temp\isp-forward\about-lp3e.html

creating about-lp4e.html as C:\temp\isp-forward\about-lp4e.html

...many more lines deleted...

creating training.html as C:\temp\isp-forward\training.html

creating whatsnew.html as C:\temp\isp-forward\whatsnew.html

Generating Redirection Web Pages | 295

creating whatsold.html as C:\temp\isp-forward\whatsold.html

creating xlate-lp.html as C:\temp\isp-forward\xlate-lp.html

creating zopeoutline.htm as C:\temp\isp-forward\zopeoutline.htm

Last file =>

<HTML>

<head>

<META HTTP-EQUIV="Refresh" CONTENT="10; URL=http://learning-python.com/books/zop

eoutline.htm">

<title>Site Redirection Page: zopeoutline.htm</title>

</head>

<BODY>

<H1>This page has moved</H1>

<P>This page now lives at this address:

http://learning-python.com/books/zopeoutline.htm</A>

<P>Please click on the new address to jump to this page, and

update any links accordingly. You will be redirectly shortly.

</P>

<HR>

</BODY></HTML>

Done: 124 forward files created.

To verify this script’s output, double-click on any of the output files to see what they

look like in a web browser (or run a start command in a DOS console on Windows—

e.g., start isp-forward\about-lp4e.html). Figure 6-1 shows what one generated page

looks like on my machine.

Figure 6-1. Site-forward output file page

To complete the process, you still need to install the forward links: upload all the

generated files in the output directory to your old site’s web directory. If that’s too

much to do by hand, too, be sure to see the FTP site upload scripts in Chapter 13 for

296 | Chapter 6: Complete System Programs

an automatic way to do that step with Python as well (PP4E\Internet\Ftp\upload-

flat.py will do the job). Once you’ve started scripting in earnest, you’ll be amazed at

how much manual labor Python can automate. The next section provides another

prime example.

A Regression Test Script

Mistakes happen. As we’ve seen, Python provides interfaces to a variety of system serv-

ices, along with tools for adding others. Example 6-9 shows some of the more com-

monly used system tools in action. It implements a simple regression test system for

Python scripts—it runs each in a directory of Python scripts with provided input and

command-line arguments, and compares the output of each run to the prior run’s re-

sults. As such, this script can be used as an automated testing system to catch errors

introduced by changes in program source files; in a big system, you might not know

when a fix is really a bug in disguise.

Example 6-9. PP4E\System\Tester\tester.py

"""

################################################################################

Test a directory of Python scripts, passing command-line arguments,

piping in stdin, and capturing stdout, stderr, and exit status to

detect failures and regressions from prior run outputs. The subprocess

module spawns and controls streams (much like os.popen3 in Python 2.X),

and is cross-platform. Streams are always binary bytes in subprocess.

Test inputs, args, outputs, and errors map to files in subdirectories.

This is a command-line script, using command-line arguments for

optional test directory name, and force-generation flag. While we

could package it as a callable function, the fact that its results

are messages and output files makes a call/return model less useful.

Suggested enhancement: could be extended to allow multiple sets

of command-line arguments and/or inputs per test script, to run a

script multiple times (glob for multiple ".in*" files in Inputs?).

Might also seem simpler to store all test files in same directory

with different extensions, but this could grow large over time.

Could also save both stderr and stdout to Errors on failures, but

I prefer to have expected/actual output in Outputs on regressions.

################################################################################

"""

import os, sys, glob, time

from subprocess import Popen, PIPE

# configuration args

testdir = sys.argv[1] if len(sys.argv) > 1 else os.curdir

forcegen = len(sys.argv) > 2

print('Start tester:', time.asctime())

print('in', os.path.abspath(testdir))

A Regression Test Script | 297

def verbose(*args):

print('-'*80)

for arg in args: print(arg)

def quiet(*args): pass

trace = quiet

# glob scripts to be tested

testpatt = os.path.join(testdir, 'Scripts', '*.py')

testfiles = glob.glob(testpatt)

testfiles.sort()

trace(os.getcwd(), *testfiles)

numfail = 0

for testpath in testfiles: # run all tests in dir

testname = os.path.basename(testpath) # strip directory path

# get input and args

infile = testname.replace('.py', '.in')

inpath = os.path.join(testdir, 'Inputs', infile)

indata = open(inpath, 'rb').read() if os.path.exists(inpath) else b''

argfile = testname.replace('.py', '.args')

argpath = os.path.join(testdir, 'Args', argfile)

argdata = open(argpath).read() if os.path.exists(argpath) else ''

# locate output and error, scrub prior results

outfile = testname.replace('.py', '.out')

outpath = os.path.join(testdir, 'Outputs', outfile)

outpathbad = outpath + '.bad'

if os.path.exists(outpathbad): os.remove(outpathbad)

errfile = testname.replace('.py', '.err')

errpath = os.path.join(testdir, 'Errors', errfile)

if os.path.exists(errpath): os.remove(errpath)

# run test with redirected streams

pypath = sys.executable

command = '%s %s %s' % (pypath, testpath, argdata)

trace(command, indata)

process = Popen(command, shell=True, stdin=PIPE, stdout=PIPE, stderr=PIPE)

process.stdin.write(indata)

process.stdin.close()

outdata = process.stdout.read()

errdata = process.stderr.read() # data are bytes

exitstatus = process.wait() # requires binary files

trace(outdata, errdata, exitstatus)

# analyze results

if exitstatus != 0:

print('ERROR status:', testname, exitstatus) # status and/or stderr

if errdata:

print('ERROR stream:', testname, errpath) # save error text

open(errpath, 'wb').write(errdata)

298 | Chapter 6: Complete System Programs

if exitstatus or errdata: # consider both failure

numfail += 1 # can get status+stderr

open(outpathbad, 'wb').write(outdata) # save output to view

elif not os.path.exists(outpath) or forcegen:

print('generating:', outpath) # create first output

open(outpath, 'wb').write(outdata)

else:

priorout = open(outpath, 'rb').read() # or compare to prior

if priorout == outdata:

print('passed:', testname)

else:

numfail += 1

print('FAILED output:', testname, outpathbad)

open(outpathbad, 'wb').write(outdata)

print('Finished:', time.asctime())

print('%s tests were run, %s tests failed.' % (len(testfiles), numfail))

We’ve seen the tools used by this script earlier in this part of the book—subprocess,

os.path, glob, files, and the like. This example largely just pulls these tools together to

solve a useful purpose. Its core operation is comparing new outputs to old, in order to

spot changes (“regressions”). Along the way, it also manages command-line arguments,

error messages, status codes, and files.

This script is also larger than most we’ve seen so far, but it’s a realistic and representative

system administration tool (in fact, it’s derived from a similar tool I actually used in the

past to detect changes in a compiler). Probably the best way to understand how it works

is to demonstrate what it does. The next section steps through a testing session to be

read in conjunction with studying the test script’s code.

Running the Test Driver

Much of the magic behind the test driver script in Example 6-9 has to do with its

directory structure. When you run it for the first time in a test directory (or force it to

start from scratch there by passing a second command-line argument), it:

• Collects scripts to be run in the Scripts subdirectory

• Fetches any associated script input and command-line arguments from the

Inputs and Args subdirectories

• Generates initial stdout output files for tests that exit normally in the Outputs

subdirectory

• Reports tests that fail either by exit status code or by error messages appearing in

stderr

On all failures, the script also saves any stderr error message text, as well as any

stdout data generated up to the point of failure; standard error text is saved to a file in

the Errors subdirectory, and standard output of failed tests is saved with a special

A Regression Test Script | 299

“.bad” filename extension in Outputs (saving this normally in the Outputs subdirectory

would trigger a failure when the test is later fixed!). Here’s a first run:

C:\...\PP4E\System\Tester> python tester.py . 1

Start tester: Mon Feb 22 22:13:38 2010

in C:\Users\mark\Stuff\Books\4E\PP4E\dev\Examples\PP4E\System\Tester

generating: .\Outputs\test-basic-args.out

generating: .\Outputs\test-basic-stdout.out

generating: .\Outputs\test-basic-streams.out

generating: .\Outputs\test-basic-this.out

ERROR status: test-errors-runtime.py 1

ERROR stream: test-errors-runtime.py .\Errors\test-errors-runtime.err

ERROR status: test-errors-syntax.py 1

ERROR stream: test-errors-syntax.py .\Errors\test-errors-syntax.err

ERROR status: test-status-bad.py 42

generating: .\Outputs\test-status-good.out

Finished: Mon Feb 22 22:13:41 2010

8 tests were run, 3 tests failed.

To run each script, the tester configures any preset command-line arguments provided,

pipes in fetched canned input (if any), and captures the script’s standard output and

error streams, along with its exit status code. When I ran this example, there were 8

test scripts, along with a variety of inputs and outputs. Since the directory and file

naming structures are the key to this example, here is a listing of the test directory I

used—the Scripts directory is primary, because that’s where tests to be run are

collected:

C:\...\PP4E\System\Tester> dir /B

Args

Errors

Inputs

Outputs

Scripts

tester.py

xxold

C:\...\PP4E\System\Tester> dir /B Scripts

test-basic-args.py

test-basic-stdout.py

test-basic-streams.py

test-basic-this.py

test-errors-runtime.py

test-errors-syntax.py

test-status-bad.py

test-status-good.py

The other subdirectories contain any required inputs and any generated outputs asso-

ciated with scripts to be tested:

C:\...\PP4E\System\Tester> dir /B Args

test-basic-args.args

test-status-good.args

300 | Chapter 6: Complete System Programs

C:\...\PP4E\System\Tester> dir /B Inputs

test-basic-args.in

test-basic-streams.in

C:\...\PP4E\System\Tester> dir /B Outputs

test-basic-args.out

test-basic-stdout.out

test-basic-streams.out

test-basic-this.out

test-errors-runtime.out.bad

test-errors-syntax.out.bad

test-status-bad.out.bad

test-status-good.out

C:\...\PP4E\System\Tester> dir /B Errors

test-errors-runtime.err

test-errors-syntax.err

I won’t list all these files here (as you can see, there are many, and all are available in

the book examples distribution package), but to give you the general flavor, here are

the files associated with the test script test-basic-args.py:

C:\...\PP4E\System\Tester> type Scripts\test-basic-args.py

# test args, streams

import sys, os

print(os.getcwd()) # to Outputs

print(sys.path[0])

print('[argv]')

for arg in sys.argv: # from Args

print(arg) # to Outputs

print('[interaction]') # to Outputs

text = input('Enter text:') # from Inputs

rept = sys.stdin.readline() # from Inputs

sys.stdout.write(text * int(rept)) # to Outputs

C:\...\PP4E\System\Tester> type Args\test-basic-args.args

-command -line --stuff

C:\...\PP4E\System\Tester> type Inputs\test-basic-args.in

Eggs

C:\...\PP4E\System\Tester> type Outputs\test-basic-args.out

C:\Users\mark\Stuff\Books\4E\PP4E\dev\Examples\PP4E\System\Tester

C:\Users\mark\Stuff\Books\4E\PP4E\dev\Examples\PP4E\System\Tester\Scripts

[argv]

.\Scripts\test-basic-args.py

-command

-line

--stuff

[interaction]

Enter text:EggsEggsEggsEggsEggsEggsEggsEggsEggsEggs

A Regression Test Script | 301

And here are two files related to one of the detected errors—the first is its captured

stderr, and the second is its stdout generated up to the point where the error occurred;

these are for human (or other tools) inspection, and are automatically removed the next

time the tester script runs:

C:\...\PP4E\System\Tester> type Errors\test-errors-runtime.err

Traceback (most recent call last):

File ".\Scripts\test-errors-runtime.py", line 3, in <module>

print(1 / 0)

ZeroDivisionError: int division or modulo by zero

C:\...\PP4E\System\Tester> type Outputs\test-errors-runtime.out.bad

starting

Now, when run again without making any changes to the tests, the test driver script

compares saved prior outputs to new ones and detects no regressions; failures desig-

nated by exit status and stderr messages are still reported as before, but there are no

deviations from other tests’ saved expected output:

C:\...\PP4E\System\Tester> python tester.py

Start tester: Mon Feb 22 22:26:41 2010

in C:\Users\mark\Stuff\Books\4E\PP4E\dev\Examples\PP4E\System\Tester

passed: test-basic-args.py

passed: test-basic-stdout.py

passed: test-basic-streams.py

passed: test-basic-this.py

ERROR status: test-errors-runtime.py 1

ERROR stream: test-errors-runtime.py .\Errors\test-errors-runtime.err

ERROR status: test-errors-syntax.py 1

ERROR stream: test-errors-syntax.py .\Errors\test-errors-syntax.err

ERROR status: test-status-bad.py 42

passed: test-status-good.py

Finished: Mon Feb 22 22:26:43 2010

8 tests were run, 3 tests failed.

But when I make a change in one of the test scripts that will produce different output

(I changed a loop counter to print fewer lines), the regression is caught and reported;

the new and different output of the script is reported as a failure, and saved in

Outputs as a “.bad” for later viewing:

C:\...\PP4E\System\Tester> python tester.py

Start tester: Mon Feb 22 22:28:35 2010

in C:\Users\mark\Stuff\Books\4E\PP4E\dev\Examples\PP4E\System\Tester

passed: test-basic-args.py

FAILED output: test-basic-stdout.py .\Outputs\test-basic-stdout.out.bad

passed: test-basic-streams.py

passed: test-basic-this.py

ERROR status: test-errors-runtime.py 1

ERROR stream: test-errors-runtime.py .\Errors\test-errors-runtime.err

ERROR status: test-errors-syntax.py 1

ERROR stream: test-errors-syntax.py .\Errors\test-errors-syntax.err

ERROR status: test-status-bad.py 42

passed: test-status-good.py

Finished: Mon Feb 22 22:28:38 2010

302 | Chapter 6: Complete System Programs

8 tests were run, 4 tests failed.

C:\...\PP4E\System\Tester> type Outputs\test-basic-stdout.out.bad

begin

Spam!

Spam!Spam!

Spam!Spam!Spam!

Spam!Spam!Spam!Spam!

end

One last usage note: if you change the trace variable in this script to be verbose, you’ll

get much more output designed to help you trace the programs operation (but probably

too much for real testing runs):

C:\...\PP4E\System\Tester> tester.py

Start tester: Mon Feb 22 22:34:51 2010

in C:\Users\mark\Stuff\Books\4E\PP4E\dev\Examples\PP4E\System\Tester

--------------------------------------------------------------------------------

C:\Users\mark\Stuff\Books\4E\PP4E\dev\Examples\PP4E\System\Tester

.\Scripts\test-basic-args.py

.\Scripts\test-basic-stdout.py

.\Scripts\test-basic-streams.py

.\Scripts\test-basic-this.py

.\Scripts\test-errors-runtime.py

.\Scripts\test-errors-syntax.py

.\Scripts\test-status-bad.py

.\Scripts\test-status-good.py

--------------------------------------------------------------------------------

C:\Python31\python.exe .\Scripts\test-basic-args.py -command -line --stuff

b'Eggs\r\n10\r\n'

--------------------------------------------------------------------------------

b'C:\\Users\\mark\\Stuff\\Books\\4E\\PP4E\\dev\\Examples\\PP4E\\System\\Tester\r

\nC:\\Users\\mark\\Stuff\\Books\\4E\\PP4E\\dev\\Examples\\PP4E\\System\\Tester\\

Scripts\r\n[argv]\r\n.\\Scripts\\test-basic-args.py\r\n-command\r\n-line\r\n--st

uff\r\n[interaction]\r\nEnter text:EggsEggsEggsEggsEggsEggsEggsEggsEggsEggs'

b''

passed: test-basic-args.py

...more lines deleted...

Study the test driver’s code for more details. Naturally, there is much more to the

general testing story than we have space for here. For example, in-process tests don’t

need to spawn programs and can generally make do with importing modules and testing

them in try exception handler statements. There is also ample room for expansion and

customization in our testing script (see its docstring for starters). Moreover, Python

comes with two testing frameworks, doctest and unittest (a.k.a. PyUnit), which pro-

vide techniques and structures for coding regression and unit tests:

unittest

An object-oriented framework that specifies test cases, expected results, and test

suites. Subclasses provide test methods and use inherited assertion calls to specify

expected results.

A Regression Test Script | 303

doctest

Parses out and reruns tests from an interactive session log that is pasted into a

module’s docstrings. The logs give test calls and expected results; doctest essen-

tially reruns the interactive session.

See the Python library manual, the PyPI website, and your favorite Web search engine

for additional testing toolkits in both Python itself and the third-party domain.

For automated testing of Python command-line scripts that run as independent pro-

grams and tap into standard script execution context, though, our tester does the job.

Because the test driver is fully independent of the scripts it tests, we can drop in new

test cases without having to update the driver’s code. And because it is written in Py-

thon, it’s quick and easy to change as our testing needs evolve. As we’ll see again in the

next section, this “scriptability” that Python provides can be a decided advantage for

real tasks.

Testing Gone Bad?

Once we learn about sending email from Python scripts in Chapter 13, you might also

want to augment this script to automatically send out email when regularly run tests

fail (e.g., when run from a cron job on Unix). That way, you don’t even need to re-

member to check results. Of course, you could go further still.

One company I worked for added sound effects to compiler test scripts; you got an

audible round of applause if no regressions were found and an entirely different noise

otherwise. (See playfile.py at the end of this chapter for hints.)

Another company in my development past ran a nightly test script that automatically

isolated the source code file check-in that triggered a test regression and sent a nasty

email to the guilty party (and his or her supervisor). Nobody expects the Spanish

Inquisition!

Copying Directory Trees

My CD writer sometimes does weird things. In fact, copies of files with odd names can

be totally botched on the CD, even though other files show up in one piece. That’s not

necessarily a showstopper; if just a few files are trashed in a big CD backup copy, I can

always copy the offending files elsewhere one at a time. Unfortunately, drag-and-drop

copies on some versions of Windows don’t play nicely with such a CD: the copy op-

eration stops and exits the moment the first bad file is encountered. You get only as

many files as were copied up to the error, but no more.

In fact, this is not limited to CD copies. I’ve run into similar problems when trying to

back up my laptop’s hard drive to another drive—the drag-and-drop copy stops with

an error as soon as it reaches a file with a name that is too long or odd to copy (common

304 | Chapter 6: Complete System Programs

in saved web pages). The last 30 minutes spent copying is wasted time; frustrating, to

say the least!

There may be some magical Windows setting to work around this feature, but I gave

up hunting for one as soon as I realized that it would be easier to code a copier in Python.

The cpall.py script in Example 6-10 is one way to do it. With this script, I control what

happens when bad files are found—I can skip over them with Python exception han-

dlers, for instance. Moreover, this tool works with the same interface and effect on

other platforms. It seems to me, at least, that a few minutes spent writing a portable

and reusable Python script to meet a need is a better investment than looking for sol-

utions that work on only one platform (if at all).

Example 6-10. PP4E\System\Filetools\cpall.py

"""

################################################################################

Usage: "python cpall.py dirFrom dirTo".

Recursive copy of a directory tree. Works like a "cp -r dirFrom/* dirTo"

Unix command, and assumes that dirFrom and dirTo are both directories.

Was written to get around fatal error messages under Windows drag-and-drop

copies (the first bad file ends the entire copy operation immediately),

but also allows for coding more customized copy operations in Python.

################################################################################

"""

import os, sys

maxfileload = 1000000

blksize = 1024 * 500

def copyfile(pathFrom, pathTo, maxfileload=maxfileload):

"""

Copy one file pathFrom to pathTo, byte for byte;

uses binary file modes to supress Unicde decode and endline transform

"""

if os.path.getsize(pathFrom) <= maxfileload:

bytesFrom = open(pathFrom, 'rb').read() # read small file all at once

open(pathTo, 'wb').write(bytesFrom)

else:

fileFrom = open(pathFrom, 'rb') # read big files in chunks

fileTo = open(pathTo, 'wb') # need b mode for both

while True:

bytesFrom = fileFrom.read(blksize) # get one block, less at end

if not bytesFrom: break # empty after last chunk

fileTo.write(bytesFrom)

def copytree(dirFrom, dirTo, verbose=0):

"""

Copy contents of dirFrom and below to dirTo, return (files, dirs) counts;

may need to use bytes for dirnames if undecodable on other platforms;

may need to do more file type checking on Unix: skip links, fifos, etc.

"""

fcount = dcount = 0

for filename in os.listdir(dirFrom): # for files/dirs here

Copying Directory Trees | 305

pathFrom = os.path.join(dirFrom, filename)

pathTo = os.path.join(dirTo, filename) # extend both paths

if not os.path.isdir(pathFrom): # copy simple files

try:

if verbose > 1: print('copying', pathFrom, 'to', pathTo)

copyfile(pathFrom, pathTo)

fcount += 1

except:

print('Error copying', pathFrom, 'to', pathTo, '--skipped')

print(sys.exc_info()[0], sys.exc_info()[1])

else:

if verbose: print('copying dir', pathFrom, 'to', pathTo)

try:

os.mkdir(pathTo) # make new subdir

below = copytree(pathFrom, pathTo) # recur into subdirs

fcount += below[0] # add subdir counts

dcount += below[1]

dcount += 1

except:

print('Error creating', pathTo, '--skipped')

print(sys.exc_info()[0], sys.exc_info()[1])

return (fcount, dcount)

def getargs():

"""

Get and verify directory name arguments, returns default None on errors

"""

try:

dirFrom, dirTo = sys.argv[1:]

except:

print('Usage error: cpall.py dirFrom dirTo')

else:

if not os.path.isdir(dirFrom):

print('Error: dirFrom is not a directory')

elif not os.path.exists(dirTo):

os.mkdir(dirTo)

print('Note: dirTo was created')

return (dirFrom, dirTo)

else:

print('Warning: dirTo already exists')

if hasattr(os.path, 'samefile'):

same = os.path.samefile(dirFrom, dirTo)

else:

same = os.path.abspath(dirFrom) == os.path.abspath(dirTo)

if same:

print('Error: dirFrom same as dirTo')

else:

return (dirFrom, dirTo)

if __name__ == '__main__':

import time

dirstuple = getargs()

if dirstuple:

print('Copying...')

start = time.clock()

306 | Chapter 6: Complete System Programs

fcount, dcount = copytree(*dirstuple)

print('Copied', fcount, 'files,', dcount, 'directories', end=' ')

print('in', time.clock() - start, 'seconds')

This script implements its own recursive tree traversal logic and keeps track of both

the “from” and “to” directory paths as it goes. At every level, it copies over simple files,

creates directories in the “to” path, and recurs into subdirectories with “from” and “to”

paths extended by one level. There are other ways to code this task (e.g., we might

change the working directory along the way with os.chdir calls or there is probably an

os.walk solution which replaces from and to path prefixes as it walks), but extending

paths on recursive descent works well in this script.

Notice this script’s reusable copyfile function—just in case there are multigigabyte

files in the tree to be copied, it uses a file’s size to decide whether it should be read all

at once or in chunks (remember, the file read method without arguments actually loads

the entire file into an in-memory string). We choose fairly large file and block sizes,

because the more we read at once in Python, the faster our scripts will typically run.

This is more efficient than it may sound; strings left behind by prior reads will be

garbage collected and reused as we go. We’re using binary file modes here again, too,

to suppress the Unicode encodings and end-of-line translations of text files—trees may

contain arbitrary kinds of files.

Also notice that this script creates the “to” directory if needed, but it assumes that the

directory is empty when a copy starts up; for accuracy, be sure to remove the target

directory before copying a new tree to its name, or old files may linger in the target tree

(we could automatically remove the target first, but this may not always be desired).

This script also tries to determine if the source and target are the same; on Unix-like

platforms with oddities such as links, os.path.samefile does a more accurate job than

comparing absolute file names (different file names may be the same file).

Here is a copy of a big book examples tree (I use the tree from the prior edition

throughout this chapter) in action on Windows; pass in the name of the “from” and

“to” directories to kick off the process, redirect the output to a file if there are too many

error messages to read all at once (e.g., > output.txt), and run an rm –r or rmdir /S

shell command (or similar platform-specific tool) to delete the target directory first if

needed:

C:\...\PP4E\System\Filetools> rmdir /S copytemp

copytemp, Are you sure (Y/N)? y

C:\...\PP4E\System\Filetools> cpall.py C:\temp\PP3E\Examples copytemp

Note: dirTo was created

Copying...

Copied 1430 files, 185 directories in 10.4470980971 seconds

C:\...\PP4E\System\Filetools> fc /B copytemp\PP3E\Launcher.py

C:\temp\PP3E\Examples\PP3E\Launcher.py

Comparing files COPYTEMP\PP3E\Launcher.py and C:\TEMP\PP3E\EXAMPLES\PP3E\LAUNCHER.PY

FC: no differences encountered

Copying Directory Trees | 307

You can use the copy function’s verbose argument to trace the process if you wish. At

the time I wrote this edition in 2010, this test run copied a tree of 1,430 files and 185

directories in 10 seconds on my woefully underpowered netbook machine (the built-

in time.clock call is used to query the system time in seconds); it may run arbitrarily

faster or slower for you. Still, this is at least as fast as the best drag-and-drop I’ve timed

on this machine.

So how does this script work around bad files on a CD backup? The secret is that it

catches and ignores file exceptions, and it keeps walking. To copy all the files that are

good on a CD, I simply run a command line such as this one:

C:\...\PP4E\System\Filetools> python cpall.py G:\Examples C:\PP3E\Examples

Because the CD is addressed as “G:” on my Windows machine, this is the command-

line equivalent of drag-and-drop copying from an item in the CD’s top-level folder,

except that the Python script will recover from errors on the CD and get the rest. On

copy errors, it prints a message to standard output and continues; for big copies, you’ll

probably want to redirect the script’s output to a file for later inspection.

In general, cpall c a n b e p a s s e d a n y a b s o l u t e d i r e c t o r y p a t h o n y o u r m a c h i n e , e v e n t h o s e

that indicate devices such as CDs. To make this go on Linux, try a root directory such

as /dev/cdrom or something similar to address your CD drive. Once you’ve copied a

tree this way, you still might want to verify; to see how, let’s move on to the next

example.

Comparing Directory Trees

Engineers can be a paranoid sort (but you didn’t hear that from me). At least I am. It

comes from decades of seeing things go terribly wrong, I suppose. When I create a CD

backup of my hard drive, for instance, there’s still something a bit too magical about

the process to trust the CD writer program to do the right thing. Maybe I should, but

it’s tough to have a lot of faith in tools that occasionally trash files and seem to crash

my Windows machine every third Tuesday of the month. When push comes to shove,

it’s nice to be able to verify that data copied to a backup CD is the same as the original—

or at least to spot deviations from the original—as soon as possible. If a backup is ever

needed, it will be really needed.

Because data CDs are accessible as simple directory trees in the file system, we are once

again in the realm of tree walkers—to verify a backup CD, we simply need to walk its

top-level directory. If our script is general enough, we will also be able to use it to verify

other copy operations as well—e.g., downloaded tar files, hard-drive backups, and so

on. In fact, the combination of the cpall script of the prior section and a general tree

comparison would provide a portable and scriptable way to copy and verify data sets.

We’ve already studied generic directory tree walkers, but they won’t help us here di-

rectly: we need to walk two directories in parallel and inspect common files along the

way. Moreover, walking either one of the two directories won’t allow us to spot files

308 | Chapter 6: Complete System Programs

and directories that exist only in the other. Something more custom and recursive seems

in order here.

Finding Directory Differences

Before we start coding, the first thing we need to clarify is what it means to compare

two directory trees. If both trees have exactly the same branch structure and depth, this

problem reduces to comparing corresponding files in each tree. In general, though, the

trees can have arbitrarily different shapes, depths, and so on.

More generally, the contents of a directory in one tree may have more or fewer entries

than the corresponding directory in the other tree. If those differing contents are file-

names, there is no corresponding file to compare with; if they are directory names, there

is no corresponding branch to descend through. In fact, the only way to detect files and

directories that appear in one tree but not the other is to detect differences in each level’s

directory.

In other words, a tree comparison algorithm will also have to perform directory com-

parisons along the way. Because this is a nested and simpler operation, let’s start by

coding and debugging a single-directory comparison of filenames in Example 6-11.

Example 6-11. PP4E\System\Filetools\dirdiff.py

"""

################################################################################

Usage: python dirdiff.py dir1-path dir2-path

Compare two directories to find files that exist in one but not the other.

This version uses the os.listdir function and list difference. Note that

this script checks only filenames, not file contents--see diffall.py for an

extension that does the latter by comparing .read() results.

################################################################################

"""

import os, sys

def reportdiffs(unique1, unique2, dir1, dir2):

"""

Generate diffs report for one dir: part of comparedirs output

"""

if not (unique1 or unique2):

print('Directory lists are identical')

else:

if unique1:

print('Files unique to', dir1)

for file in unique1:

print('...', file)

if unique2:

print('Files unique to', dir2)

for file in unique2:

print('...', file)

def difference(seq1, seq2):

Comparing Directory Trees | 309

"""

Return all items in seq1 only;

a set(seq1) - set(seq2) would work too, but sets are randomly

ordered, so any platform-dependent directory order would be lost

"""

return [item for item in seq1 if item not in seq2]

def comparedirs(dir1, dir2, files1=None, files2=None):

"""

Compare directory contents, but not actual files;

may need bytes listdir arg for undecodable filenames on some platforms

"""

print('Comparing', dir1, 'to', dir2)

files1 = os.listdir(dir1) if files1 is None else files1

files2 = os.listdir(dir2) if files2 is None else files2

unique1 = difference(files1, files2)

unique2 = difference(files2, files1)

reportdiffs(unique1, unique2, dir1, dir2)

return not (unique1 or unique2) # true if no diffs

def getargs():

"Args for command-line mode"

try:

dir1, dir2 = sys.argv[1:] # 2 command-line args

except:

print('Usage: dirdiff.py dir1 dir2')

sys.exit(1)

else:

return (dir1, dir2)

if __name__ == '__main__':

dir1, dir2 = getargs()

comparedirs(dir1, dir2)

Given listings of names in two directories, this script simply picks out unique names

in the first and unique names in the second, and reports any unique names found as

differences (that is, files in one directory but not the other). Its comparedirs function

returns a true result if no differences were found, which is useful for detecting differ-

ences in callers.

Let’s run this script on a few directories; differences are detected and reported as names

unique in either passed-in directory pathname. Notice that this is only a structural

comparison that just checks names in listings, not file contents (we’ll add the latter in

a moment):

C:\...\PP4E\System\Filetools> dirdiff.py C:\temp\PP3E\Examples copytemp

Comparing C:\temp\PP3E\Examples to copytemp

Directory lists are identical

C:\...\PP4E\System\Filetools> dirdiff.py C:\temp\PP3E\Examples\PP3E\System ..

Comparing C:\temp\PP3E\Examples\PP3E\System to ..

Files unique to C:\temp\PP3E\Examples\PP3E\System

... App

310 | Chapter 6: Complete System Programs

... Exits

... Media

... moreplus.py

Files unique to ..

... more.pyc

... spam.txt

... Tester

... __init__.pyc

The unique function is the heart of this script: it performs a simple list difference

operation. When applied to directories, unique items represent tree differences, and

common items are names of files or subdirectories that merit further comparisons or

traversals. In fact, in Python 2.4 and later, we could also use the built-in set object type

if we don’t care about the order in the results—because sets are not sequences, they

would not maintain any original and possibly platform-specific left-to-right order of

the directory listings provided by os.listdir. For that reason (and to avoid requiring

users to upgrade), we’ll keep using our own comprehension-based function instead

of sets.

Finding Tree Differences

We’ve just coded a directory comparison tool that picks out unique files and directories.

Now all we need is a tree walker that applies dirdiff at each level to report unique

items, explicitly compares the contents of files in common, and descends through di-

rectories in common. Example 6-12 fits the bill.

Example 6-12. PP4E\System\Filetools\diffall.py

"""

################################################################################

Usage: "python diffall.py dir1 dir2".

Recursive directory tree comparison: report unique files that exist in only

dir1 or dir2, report files of the same name in dir1 and dir2 with differing

contents, report instances of same name but different type in dir1 and dir2,

and do the same for all subdirectories of the same names in and below dir1

and dir2. A summary of diffs appears at end of output, but search redirected

output for "DIFF" and "unique" strings for further details. New: (3E) limit

reads to 1M for large files, (3E) catch same name=file/dir, (4E) avoid extra

os.listdir() calls in dirdiff.comparedirs() by passing results here along.

################################################################################

"""

import os, dirdiff

blocksize = 1024 * 1024 # up to 1M per read

def intersect(seq1, seq2):

"""

Return all items in both seq1 and seq2;

a set(seq1) & set(seq2) woud work too, but sets are randomly

ordered, so any platform-dependent directory order would be lost

"""

return [item for item in seq1 if item in seq2]

Comparing Directory Trees | 311

def comparetrees(dir1, dir2, diffs, verbose=False):

"""

Compare all subdirectories and files in two directory trees;

uses binary files to prevent Unicode decoding and endline transforms,

as trees might contain arbitrary binary files as well as arbitrary text;

may need bytes listdir arg for undecodable filenames on some platforms

"""

# compare file name lists

print('-' * 20)

names1 = os.listdir(dir1)

names2 = os.listdir(dir2)

if not dirdiff.comparedirs(dir1, dir2, names1, names2):

diffs.append('unique files at %s - %s' % (dir1, dir2))

print('Comparing contents')

common = intersect(names1, names2)

missed = common[:]

# compare contents of files in common

for name in common:

path1 = os.path.join(dir1, name)

path2 = os.path.join(dir2, name)

if os.path.isfile(path1) and os.path.isfile(path2):

missed.remove(name)

file1 = open(path1, 'rb')

file2 = open(path2, 'rb')

while True:

bytes1 = file1.read(blocksize)

bytes2 = file2.read(blocksize)

if (not bytes1) and (not bytes2):

if verbose: print(name, 'matches')

break

if bytes1 != bytes2:

diffs.append('files differ at %s - %s' % (path1, path2))

print(name, 'DIFFERS')

break

# recur to compare directories in common

for name in common:

path1 = os.path.join(dir1, name)

path2 = os.path.join(dir2, name)

if os.path.isdir(path1) and os.path.isdir(path2):

missed.remove(name)

comparetrees(path1, path2, diffs, verbose)

# same name but not both files or dirs?

for name in missed:

diffs.append('files missed at %s - %s: %s' % (dir1, dir2, name))

print(name, 'DIFFERS')

if __name__ == '__main__':

dir1, dir2 = dirdiff.getargs()

diffs = []

312 | Chapter 6: Complete System Programs

comparetrees(dir1, dir2, diffs, True) # changes diffs in-place

print('=' * 40) # walk, report diffs list

if not diffs:

print('No diffs found.')

else:

print('Diffs found:', len(diffs))

for diff in diffs: print('-', diff)

At each directory in the tree, this script simply runs the dirdiff tool to detect unique

names, and then compares names in common by intersecting directory lists. It uses

recursive function calls to traverse the tree and visits subdirectories only after compar-

ing all the files at each level so that the output is more coherent to read (the trace output

for subdirectories appears after that for files; it is not intermixed).

Notice the misses list, added in the third edition of this book; it’s very unlikely, but not

impossible, that the same name might be a file in one directory and a subdirectory in

the other. Also notice the blocksize variable; much like the tree copy script we saw

earlier, instead of blindly reading entire files into memory all at once, we limit each read

to grab up to 1 MB at a time, just in case any files in the directories are too big to be

loaded into available memory. Without this limit, I ran into MemoryError exceptions on

some machines with a prior version of this script that read both files all at once, like this:

bytes1 = open(path1, 'rb').read()

bytes2 = open(path2, 'rb').read()

if bytes1 == bytes2: ...

This code was simpler, but is less practical for very large files that can’t fit into your

available memory space (consider CD and DVD image files, for example). In the new

version’s loop, the file reads return what is left when there is less than 1 MB present or

remaining and return empty strings at end-of-file. Files match if all blocks read are the

same, and they reach end-of-file at the same time.

We’re also dealing in binary files and byte strings again to suppress Unicode decoding

and end-line translations for file content, because trees may contain arbitrary binary

and text files. The usual note about changing this to pass byte strings to os.listdir on

platforms where filenames may generate Unicode decoding errors applies here as well

(e.g. pass dir1.encode()). On some platforms, you may also want to detect and skip

certain kinds of special files in order to be fully general, but these were not in my trees,

so they are not in my script.

One minor change for the fourth edition of this book: os.listdir results are now gath-

ered just once per subdirectory and passed along, to avoid extra calls in dirdiff—not

a huge win, but every cycle counts on the pitifully underpowered netbook I used when

writing this edition.

Comparing Directory Trees | 313

Running the Script

Since we’ve already studied the tree-walking tools this script employs, let’s jump right

into a few example runs. When run on identical trees, status messages scroll during the

traversal, and a No diffs found. message appears at the end:

C:\...\PP4E\System\Filetools> diffall.py C:\temp\PP3E\Examples copytemp > diffs.txt

C:\...\PP4E\System\Filetools> type diffs.txt | more

--------------------

Comparing C:\temp\PP3E\Examples to copytemp

Directory lists are identical

Comparing contents

README-root.txt matches

--------------------

Comparing C:\temp\PP3E\Examples\PP3E to copytemp\PP3E

Directory lists are identical

Comparing contents

echoEnvironment.pyw matches

LaunchBrowser.pyw matches

Launcher.py matches

Launcher.pyc matches

...over 2,000 more lines omitted...

--------------------

Comparing C:\temp\PP3E\Examples\PP3E\TempParts to copytemp\PP3E\TempParts

Directory lists are identical

Comparing contents

109_0237.JPG matches

lawnlake1-jan-03.jpg matches

part-001.txt matches

part-002.html matches

========================================

No diffs found.

I usually run this with the verbose flag passed in as True, and redirect output to a file

(for big trees, it produces too much output to scroll through comfortably); use False

to watch fewer status messages fly by. To show how differences are reported, we need

to generate a few; for simplicity, I’ll manually change a few files scattered about one of

the trees, but you could also run a global search-and-replace script like the one we’ll

write later in this chapter. While we’re at it, let’s remove a few common files so that

directory uniqueness differences show up on the scope, too; the last two removal com-

mands in the following will generate one difference in the same directory in different

trees:

C:\...\PP4E\System\Filetools> notepad copytemp\PP3E\README-PP3E.txt

C:\...\PP4E\System\Filetools> notepad copytemp\PP3E\System\Filetools\commands.py

C:\...\PP4E\System\Filetools> notepad C:\temp\PP3E\Examples\PP3E\__init__.py

C:\...\PP4E\System\Filetools> del copytemp\PP3E\System\Filetools\cpall_visitor.py

C:\...\PP4E\System\Filetools> del copytemp\PP3E\Launcher.py

C:\...\PP4E\System\Filetools> del C:\temp\PP3E\Examples\PP3E\PyGadgets.py

Now, rerun the comparison walker to pick out differences and redirect its output report

to a file for easy inspection. The following lists just the parts of the output report that

314 | Chapter 6: Complete System Programs

identify differences. In typical use, I inspect the summary at the bottom of the report

first, and then search for the strings "DIFF" and "unique" in the report’s text if I need

more information about the differences summarized; this interface could be much more

user-friendly, of course, but it does the job for me:

C:\...\PP4E\System\Filetools> diffall.py C:\temp\PP3E\Examples copytemp > diff2.txt

C:\...\PP4E\System\Filetools> notepad diff2.txt

--------------------

Comparing C:\temp\PP3E\Examples to copytemp

Directory lists are identical

Comparing contents

README-root.txt matches

--------------------

Comparing C:\temp\PP3E\Examples\PP3E to copytemp\PP3E

Files unique to C:\temp\PP3E\Examples\PP3E

... Launcher.py

Files unique to copytemp\PP3E

... PyGadgets.py

Comparing contents

echoEnvironment.pyw matches

LaunchBrowser.pyw matches

Launcher.pyc matches

...more omitted...

PyGadgets_bar.pyw matches

README-PP3E.txt DIFFERS

todos.py matches

tounix.py matches

__init__.py DIFFERS

__init__.pyc matches

--------------------

Comparing C:\temp\PP3E\Examples\PP3E\System\Filetools to copytemp\PP3E\System\Fil...

Files unique to C:\temp\PP3E\Examples\PP3E\System\Filetools

... cpall_visitor.py

Comparing contents

commands.py DIFFERS

cpall.py matches

...more omitted...

--------------------

Comparing C:\temp\PP3E\Examples\PP3E\TempParts to copytemp\PP3E\TempParts

Directory lists are identical

Comparing contents

109_0237.JPG matches

lawnlake1-jan-03.jpg matches

part-001.txt matches

part-002.html matches

========================================

Diffs found: 5

- unique files at C:\temp\PP3E\Examples\PP3E - copytemp\PP3E

- files differ at C:\temp\PP3E\Examples\PP3E\README-PP3E.txt –

copytemp\PP3E\README-PP3E.txt

- files differ at C:\temp\PP3E\Examples\PP3E\__init__.py –

copytemp\PP3E\__init__.py

- unique files at C:\temp\PP3E\Examples\PP3E\System\Filetools –

copytemp\PP3E\System\Filetools

Comparing Directory Trees | 315

- files differ at C:\temp\PP3E\Examples\PP3E\System\Filetools\commands.py –

copytemp\PP3E\System\Filetools\commands.py

I added line breaks and tabs in a few of these output lines to make them fit on this page,

but the report is simple to understand. In a tree with 1,430 files and 185 directories,

we found five differences—the three files we changed by edits, and the two directories

we threw out of sync with the three removal commands.

Verifying Backups

So how does this script placate CD backup paranoia? To double-check my CD writer’s

work, I run a command such as the following. I can also use a command like this to

find out what has been changed since the last backup. Again, since the CD is “G:” on

my machine when plugged in, I provide a path rooted there; use a root such as /dev/

cdrom or /mnt/cdrom on Linux:

C:\...\PP4E\System\Filetools> python diffall.py Examples g:\PP3E\Examples > diff0226

C:\...\PP4E\System\Filetools> more diff0226

...output omitted...

The CD spins, the script compares, and a summary of differences appears at the end

of the report. For an example of a full difference report, see the file diff*.txt files in the

book’s examples distribution package. And to be really sure, I run the following global

comparison command to verify the entire book development tree backed up to a mem-

ory stick (which works just like a CD in terms of the filesystem):

C:\...\PP4E\System\Filetools> diffall.py F:\writing-backups\feb-26-10\dev

C:\Users\mark\Stuff\Books\4E\PP4E\dev > diff3.txt

C:\...\PP4E\System\Filetools> more diff3.txt

--------------------

Comparing F:\writing-backups\feb-26-10\dev to C:\Users\mark\Stuff\Books\4E\PP4E\dev

Directory lists are identical

Comparing contents

ch00.doc DIFFERS

ch01.doc matches

ch02.doc DIFFERS

ch03.doc matches

ch04.doc DIFFERS

ch05.doc matches

ch06.doc DIFFERS

...more output omitted...

--------------------

Comparing F:\writing-backups\feb-26-10\dev\Examples\PP4E\System\Filetools to C:\…

Files unique to C:\Users\mark\Stuff\Books\4E\PP4E\dev\Examples\PP4E\System\Filetools

... copytemp

... cpall.py

... diff2.txt

... diff3.txt

... diffall.py

... diffs.txt

... dirdiff.py

... dirdiff.pyc

316 | Chapter 6: Complete System Programs

Comparing contents

bigext-tree.py matches

bigpy-dir.py matches

...more output omitted...

========================================

Diffs found: 7

- files differ at F:\writing-backups\feb-26-10\dev\ch00.doc –

C:\Users\mark\Stuff\Books\4E\PP4E\dev\ch00.doc

- files differ at F:\writing-backups\feb-26-10\dev\ch02.doc –

C:\Users\mark\Stuff\Books\4E\PP4E\dev\ch02.doc

- files differ at F:\writing-backups\feb-26-10\dev\ch04.doc –

C:\Users\mark\Stuff\Books\4E\PP4E\dev\ch04.doc

- files differ at F:\writing-backups\feb-26-10\dev\ch06.doc –

C:\Users\mark\Stuff\Books\4E\PP4E\dev\ch06.doc

- files differ at F:\writing-backups\feb-26-10\dev\TOC.txt –

C:\Users\mark\Stuff\Books\4E\PP4E\dev\TOC.txt

- unique files at F:\writing-backups\feb-26-10\dev\Examples\PP4E\System\Filetools –

C:\Users\mark\Stuff\Books\4E\PP4E\dev\Examples\PP4E\System\Filetools

- files differ at F:\writing-backups\feb-26-10\dev\Examples\PP4E\Tools\visitor.py –

C:\Users\mark\Stuff\Books\4E\PP4E\dev\Examples\PP4E\Tools\visitor.py

This particular run indicates that I’ve added a few examples and changed some chapter

files since the last backup; if run immediately after a backup, nothing should show up

on diffall radar except for any files that cannot be copied in general. This global

comparison can take a few minutes. It performs byte-for-byte comparisons of all chap-

ter files and screenshots, the examples tree, and more, but it’s an accurate and complete

verification. Given that this book development tree contained many files, a more man-

ual verification procedure without Python’s help would be utterly impossible.

After writing this script, I also started using it to verify full automated backups of my

laptops onto an external hard-drive device. To do so, I run the cpall copy script we

wrote earlier in the preceding section of this chapter, and then the comparison script

developed here to check results and get a list of files that didn’t copy correctly. The last

time I did this, this procedure copied and compared 225,000 files and 15,000 directories

in 20 GB of space—not the sort of task that lends itself to manual labor!

Here are the magic incantations on my Windows laptop. f:\ is a partition on my ex-

ternal hard drive, and you shouldn’t be surprised if each of these commands runs for

half an hour or more on currently common hardware. A drag-and-drop copy takes at

least as long (assuming it works at all!):

C:\...\System\Filetools> cpall.py c:\ f:\ > f:\copy-log.txt

C:\...\System\Filetools> diffall.py f:\ c:\ > f:\diff-log.txt

Reporting Differences and Other Ideas

Finally, it’s worth noting that this script still only detects differences in the tree but does

not give any further details about individual file differences. In fact, it simply loads and

compares the binary contents of corresponding files with string comparisons. It’s a

simple yes/no result.

Comparing Directory Trees | 317

If and when I need more details about how two reported files actually differ, I either

edit the files or run the file-comparison command on the host platform (e.g., fc on

Windows/DOS, diff or cmp on Unix and Linux). That’s not a portable solution for this

last step; but for my purposes, just finding the differences in a 1,400-file tree was much

more critical than reporting which lines differ in files flagged in the report.

Of course, since we can always run shell commands in Python, this last step could be

automated by spawning a diff or fc command with os.popen as differences are en-

countered (or after the traversal, by scanning the report summary). The output of these

system calls could be displayed verbatim, or parsed for relevant parts.

We also might try to do a bit better here by opening true text files in text mode to ignore

line-terminator differences caused by transferring across platforms, but it’s not clear

that such differences should be ignored (what if the caller wants to know whether line-

end markers have been changed?). For example, after downloading a website with an

FTP script we’ll meet in Chapter 13, the diffall script detected a discrepancy between

the local copy of a file and the one at the remote server. To probe further, I simply ran

some interactive Python code:

>>> a = open('lp2e-updates.html', 'rb').read()

>>> b = open(r'C:\Mark\WEBSITE\public_html\lp2e-updates.html', 'rb').read()

>>> a == b

False

This verifies that there really is a binary difference in the downloaded and local versions

of the file; to see whether it’s because a Unix or DOS line end snuck into the file, try

again in text mode so that line ends are all mapped to the standard \n character:

>>> a = open('lp2e-updates.html', 'r').read()

>>> b = open(r'C:\Mark\WEBSITE\public_html\lp2e-updates.html', 'r').read()

>>> a == b

True

Sure enough; now, to find where the difference is, the following code checks character

by character until the first mismatch is found (in binary mode, so we retain the

difference):

>>> a = open('lp2e-updates.html', 'rb').read()

>>> b = open(r'C:\Mark\WEBSITE\public_html\lp2e-updates.html', 'rb').read()

>>> for (i, (ac, bc)) in enumerate(zip(a, b)):

... if ac != bc:

... print(i, repr(ac), repr(bc))

... break

...

37966 '\r' '\n'

This means that at byte offset 37,966, there is a \r in the downloaded file, but a \n in

the local copy. This line has a DOS line end in one and a Unix line end in the other. To

see more, print text around the mismatch:

318 | Chapter 6: Complete System Programs

>>> for (i, (ac, bc)) in enumerate(zip(a, b)):

... if ac != bc:

... print(i, repr(ac), repr(bc))

... print(repr(a[i-20:i+20]))

... print(repr(b[i-20:i+20]))

... break

...

37966 '\r' '\n'

're>\r\ndef min(*args):\r\n tmp = list(arg'

're>\r\ndef min(*args):\n tmp = list(args'

Apparently, I wound up with a Unix line end at one point in the local copy and a DOS

line end in the version I downloaded—the combined effect of the text mode used by

the download script itself (which translated \n to \r\n) and years of edits on both Linux

and Windows PDAs and laptops (I probably coded this change on Linux and copied

it to my local Windows copy in binary mode). Code such as this could be integrated

into the diffall script to make it more intelligent about text files and difference

reporting.

Because Python excels at processing files and strings, it’s even possible to go one step

further and code a Python equivalent of the fc and diff commands. In fact, much of

the work has already been done; the standard library module difflib could make this

task simple. See the Python library manual for details and usage examples.

We could also be smarter by avoiding the load and compare steps for files that differ

in size, and we might use a smaller block size to reduce the script’s memory require-

ments. For most trees, such optimizations are unnecessary; reading multimegabyte files

into strings is very fast in Python, and garbage collection reclaims the space as you go.

Since such extensions are beyond both this script’s scope and this chapter’s size limits,

though, they will have to await the attention of a curious reader (this book doesn’t have

formal exercises, but that almost sounds like one, doesn’t it?). For now, let’s move on

to explore ways to code one more common directory task: search.

Searching Directory Trees

Engineers love to change things. As I was writing this book, I found it almost irresisti-

ble to move and rename directories, variables, and shared modules in the book exam-

ples tree whenever I thought I’d stumbled onto a more coherent structure. That was

fine early on, but as the tree became more intertwined, this became a maintenance

nightmare. Things such as program directory paths and module names were hardcoded

all over the place—in package import statements, program startup calls, text notes,

configuration files, and more.

One way to repair these references, of course, is to edit every file in the directory by

hand, searching each for information that has changed. That’s so tedious as to be utterly

impossible in this book’s examples tree, though; the examples of the prior edition con-

tained 186 directories and 1,429 files! Clearly, I needed a way to automate updates after

Searching Directory Trees | 319

changes. There are a variety of solutions to such goals—from shell commands, to find

operations, to custom tree walkers, to general-purpose frameworks. In this and the next

section, we’ll explore each option in turn, just as I did while refining solutions to this

real-world dilemma.

Greps and Globs and Finds

If you work on Unix-like systems, you probably already know that there is a standard

way to search files for strings on such platforms—the command-line program grep and

its relatives list all lines in one or more files containing a string or string pattern.‖ Given

that shells expand (i.e., “glob”) filename patterns automatically, a command such as

the following will search a single directory’s Python files for a string named on the

command line (this uses the grep command installed with the Cygwin Unix-like system

for Windows that I described in the prior chapter):

C:\...\PP4E\System\Filetools> c:\cygwin\bin\grep.exe walk *.py

bigext-tree.py:for (thisDir, subsHere, filesHere) in os.walk(dirname):

bigpy-path.py: for (thisDir, subsHere, filesHere) in os.walk(srcdir):

bigpy-tree.py:for (thisDir, subsHere, filesHere) in os.walk(dirname):

As we’ve seen, we can often accomplish the same within a Python script by running

such a shell command with os.system or os.popen. And if we search its results manually,

we can also achieve similar results with the Python glob module we met in Chapter 4;

it expands a filename pattern into a list of matching filename strings much like a shell:

C:\...\PP4E\System\Filetools> python

>>> import os

>>> for line in os.popen(r'c:\cygwin\bin\grep.exe walk *.py'):

... print(line, end='')

...

bigext-tree.py:for (thisDir, subsHere, filesHere) in os.walk(dirname):

bigpy-path.py: for (thisDir, subsHere, filesHere) in os.walk(srcdir):

bigpy-tree.py:for (thisDir, subsHere, filesHere) in os.walk(dirname):

>>> from glob import glob

>>> for filename in glob('*.py'):

... if 'walk' in open(filename).read():

... print(filename)

...

bigext-tree.py

bigpy-path.py

bigpy-tree.py

Unfortunately, these tools are generally limited to a single directory. glob can visit

multiple directories given the right sort of pattern string, but it’s not a general directory

walker of the sort I need to maintain a large examples tree. On Unix-like systems, a

find shell command can go the extra mile to traverse an entire directory tree. For

‖In fact, the act of searching files often goes by the colloquial name “grepping” among developers who have

spent any substantial time in the Unix ghetto.

320 | Chapter 6: Complete System Programs

instance, the following Unix command line would pinpoint lines and files at and below

the current directory that mention the string popen:

find . -name "*.py" -print -exec fgrep popen {} \;

If you happen to have a Unix-like find command on every machine you will ever use,

this is one way to process directories.

Rolling Your Own find Module

But if you don’t happen to have a Unix find on all your computers, not to worry—it’s

easy to code a portable one in Python. Python itself used to have a find module in its

standard library, which I used frequently in the past. Although that module was re-

moved between the second and third editions of this book, the newer os.walk makes

writing your own simple. Rather than lamenting the demise of a module, I decided to

spend 10 minutes coding a custom equivalent.

Example 6-13 implements a find utility in Python, which collects all matching filenames

in a directory tree. Unlike glob.glob, its find.find automatically matches through an

entire tree. And unlike the tree walk structure of os.walk, we can treat find.find results

as a simple linear group.

Example 6-13. PP4E\Tools\find.py

#!/usr/bin/python

"""

################################################################################

Return all files matching a filename pattern at and below a root directory;

custom version of the now deprecated find module in the standard library:

import as "PP4E.Tools.find"; like original, but uses os.walk loop, has no

support for pruning subdirs, and is runnable as a top-level script;

find() is a generator that uses the os.walk() generator to yield just

matching filenames: use findlist() to force results list generation;

################################################################################

"""

import fnmatch, os

def find(pattern, startdir=os.curdir):

for (thisDir, subsHere, filesHere) in os.walk(startdir):

for name in subsHere + filesHere:

if fnmatch.fnmatch(name, pattern):

fullpath = os.path.join(thisDir, name)

yield fullpath

def findlist(pattern, startdir=os.curdir, dosort=False):

matches = list(find(pattern, startdir))

if dosort: matches.sort()

return matches

Searching Directory Trees | 321

if __name__ == '__main__':

import sys

namepattern, startdir = sys.argv[1], sys.argv[2]

for name in find(namepattern, startdir): print(name)

There’s not much to this file—it’s largely just a minor extension to os.walk—but calling

its find function provides the same utility as both the deprecated find standard library

module and the Unix utility of the same name. It’s also much more portable, and no-

ticeably easier than repeating all of this file’s code every time you need to perform a

find-type search. Because this file is instrumented to be both a script and a library, it

can also be both run as a command-line tool or called from other programs.

For instance, to process every Python file in the directory tree rooted one level up from

the current working directory, I simply run the following command line from a system

console window. Run this yourself to watch its progress; the script’s standard output

is piped into the more command to page it here, but it can be piped into any processing

program that reads its input from the standard input stream:

C:\...\PP4E\Tools> python find.py *.py .. | more

..\LaunchBrowser.py

..\Launcher.py

..\__init__.py

..\Preview\attachgui.py

..\Preview\customizegui.py

...more lines omitted...

For more control, run the following sort of Python code from a script or interactive

prompt. In this mode, you can apply any operation to the found files that the Python

language provides:

C:\...\PP4E\System\Filetools> python

>>> from PP4E.Tools import find # or just import find if in cwd

>>> for filename in find.find('*.py', '..'):

... if 'walk' in open(filename).read():

... print(filename)

...

..\Launcher.py

..\System\Filetools\bigext-tree.py

..\System\Filetools\bigpy-path.py

..\System\Filetools\bigpy-tree.py

..\Tools\cleanpyc.py

..\Tools\find.py

..\Tools\visitor.py

Notice how this avoids having to recode the nested loop structure required for

os.walk every time you want a list of matching file names; for many use cases, this seems

conceptually simpler. Also note that because this finder is a generator function, your

script doesn’t have to wait until all matching files have been found and collected;

os.walk yields results as it goes, and find.find yields matching files among that set.

Here’s a more complex example of our find module at work: the following system

command line lists all Python files in directory C:\temp\PP3E whose names begin with

322 | Chapter 6: Complete System Programs

the letter q or t. Note how find returns full directory paths that begin with the start

directory specification:

C:\...\PP4E\Tools> find.py [qx]*.py C:\temp\PP3E

C:\temp\PP3E\Examples\PP3E\Database\SQLscripts\querydb.py

C:\temp\PP3E\Examples\PP3E\Gui\Tools\queuetest-gui-class.py

C:\temp\PP3E\Examples\PP3E\Gui\Tools\queuetest-gui.py

C:\temp\PP3E\Examples\PP3E\Gui\Tour\quitter.py

C:\temp\PP3E\Examples\PP3E\Internet\Other\Grail\Question.py

C:\temp\PP3E\Examples\PP3E\Internet\Other\XML\xmlrpc.py

C:\temp\PP3E\Examples\PP3E\System\Threads\queuetest.py

And here’s some Python code that does the same find but also extracts base names and

file sizes for each file found:

C:\...\PP4E\Tools> python

>>> import os

>>> from find import find

>>> for name in find('[qx]*.py', r'C:\temp\PP3E'):

... print(os.path.basename(name), os.path.getsize(name))

...

querydb.py 635

queuetest-gui-class.py 1152

queuetest-gui.py 963

quitter.py 801

Question.py 817

xmlrpc.py 705

queuetest.py 1273

The fnmatch module

To achieve such code economy, the find module calls os.walk to walk the tree and

simply yields matching filenames along the way. New here, though, is the fnmatch

module—yet another Python standard library module that performs Unix-like pattern

matching against filenames. This module supports common operators in name pattern

strings: * to match any number of characters, ? to match any single character, and

[...] and [!...] to match any character inside the bracket pairs or not; other characters

match themselves. Unlike the re module, fnmatch supports only common Unix shell

matching operators, not full-blown regular expression patterns; we’ll see why this dis-

tinction matters in Chapter 19.

Interestingly, Python’s glob.glob function also uses the fnmatch module to match

names: it combines os.listdir and fnmatch to match in directories in much the same

way our find.find combines os.walk and fnmatch to match in trees (though os.walk

ultimately uses os.listdir as well). One ramification of all this is that you can pass

byte strings for both pattern and start-directory to find.find if you need to suppress

Unicode filename decoding, just as you can for os.walk and glob.glob; you’ll receive

byte strings for filenames in the result. See Chapter 4 for more details on Unicode

filenames.

Searching Directory Trees | 323

By comparison, find.find with just “*” for its name pattern is also roughly equivalent

to platform-specific directory tree listing shell commands such as dir /B /S on DOS

and Windows. Since all files match “*”, this just exhaustively generates all the file names

in a tree with a single traversal. Because we can usually run such shell commands in a

Python script with os.popen, the following do the same work, but the first is inherently

nonportable and must start up a separate program along the way:

>>> import os

>>> for line in os.popen('dir /B /S'): print(line, end='')

>>> from PP4E.Tools.find import find

>>> for name in find(pattern='*', startdir='.'): print(name)

Watch for this utility to show up in action later in this chapter and book, including an

arguably strong showing in the next section and a cameo appearance in the Grep dialog

of Chapter 11’s PyEdit text editor GUI, where it will serve a central role in a threaded

external files search tool. The standard library’s find module may be gone, but it need

not be forgotten.

In fact, you must pass a bytes pattern string for a bytes filename to

fnnmatch (or pass both as str), because the re pattern matching module

it uses does not allow the string types of subject and pattern to be mixed.

This rule is inherited by our find.find for directory and pattern. See

Chapter 19 for more on re.

Curiously, the fnmatch module in Python 3.1 also converts a bytes pat-

tern string to and from Unicode str in order to perform internal text

processing, using the Latin-1 encoding. This suffices for many contexts,

but may not be entirely sound for some encodings which do not map to

Latin-1 cleanly. sys.getfilesystemencoding might be a better encoding

choice in such contexts, as this reflects the underlying file system’s con-

straints (as we learned in Chapter 4, sys.getdefaultencoding reflects file

content, not names).

In the absence of bytes, os.walk assumes filenames follow the platform’s

convention and does not ignore decoding errors triggered by os.list

dir. In the “grep” utility of Chapter 11’s PyEdit, this picture is further

clouded by the fact that a str pattern string from a GUI would have to

be encoded to bytes using a potentially inappropriate encoding for some

files present. See fnmatch.py and os.py in Python’s library and the Py-

thon library manual for more details. Unicode can be a very subtle affair.

Cleaning Up Bytecode Files

The find module of the prior section isn’t quite the general string searcher we’re after,

but it’s an important first step—it collects files that we can then search in an automated

script. In fact, the act of collecting matching files in a tree is enough by itself to support

a wide variety of day-to-day system tasks.

324 | Chapter 6: Complete System Programs

For example, one of the other common tasks I perform on a regular basis is removing

all the bytecode files in a tree. Because these are not always portable across major Python

releases, it’s usually a good idea to ship programs without them and let Python create

new ones on first imports. Now that we’re expert os.walk users, we could cut out the

middleman and use it directly. Example 6-14 codes a portable and general command-

line tool, with support for arguments, exception processing, tracing, and list-only

mode.

Example 6-14. PP4E\Tools\cleanpyc.py

"""

delete all .pyc bytecode files in a directory tree: use the

command line arg as root if given, else current working dir

"""

import os, sys

findonly = False

rootdir = os.getcwd() if len(sys.argv) == 1 else sys.argv[1]

found = removed = 0

for (thisDirLevel, subsHere, filesHere) in os.walk(rootdir):

for filename in filesHere:

if filename.endswith('.pyc'):

fullname = os.path.join(thisDirLevel, filename)

print('=>', fullname)

if not findonly:

try:

os.remove(fullname)

removed += 1

except:

type, inst = sys.exc_info()[:2]

print('*'*4, 'Failed:', filename, type, inst)

found += 1

print('Found', found, 'files, removed', removed)

When run, this script walks a directory tree (the CWD by default, or else one passed

in on the command line), deleting any and all bytecode files along the way:

C:\...\Examples\PP4E> Tools\cleanpyc.py

=> C:\Users\mark\Stuff\Books\4E\PP4E\dev\Examples\PP4E\__init__.pyc

=> C:\Users\mark\Stuff\Books\4E\PP4E\dev\Examples\PP4E\Preview\initdata.pyc

=> C:\Users\mark\Stuff\Books\4E\PP4E\dev\Examples\PP4E\Preview\make_db_file.pyc

=> C:\Users\mark\Stuff\Books\4E\PP4E\dev\Examples\PP4E\Preview\manager.pyc

=> C:\Users\mark\Stuff\Books\4E\PP4E\dev\Examples\PP4E\Preview\person.pyc

...more lines here...

Found 24 files, removed 24

C:\...\PP4E\Tools> cleanpyc.py .

=> .\find.pyc

=> .\visitor.pyc

=> .\__init__.pyc

Found 3 files, removed 3

Searching Directory Trees | 325

This script works, but it’s a bit more manual and code-y than it needs to be. In fact,

now that we also know about find operations, writing scripts based upon them is almost

trivial when we just need to match filenames. Example 6-15, for instance, falls back on

spawning shell find commands if you have them.

Example 6-15. PP4E\Tools\cleanpyc-find-shell.py

"""

find and delete all "*.pyc" bytecode files at and below the directory

named on the command-line; assumes a nonportable Unix-like find command

"""

import os, sys

rundir = sys.argv[1]

if sys.platform[:3] == 'win':

findcmd = r'c:\cygwin\bin\find %s -name "*.pyc" -print' % rundir

else:

findcmd = 'find %s -name "*.pyc" -print' % rundir

print(findcmd)

count = 0

for fileline in os.popen(findcmd): # for all result lines

count += 1 # have \n at the end

print(fileline, end='')

os.remove(fileline.rstrip())

print('Removed %d .pyc files' % count)

When run, files returned by the shell command are removed:

C:\...\PP4E\Tools> cleanpyc-find-shell.py .

c:\cygwin\bin\find . -name "*.pyc" -print

./find.pyc

./visitor.pyc

./__init__.pyc

Removed 3 .pyc files

This script uses os.popen to collect the output of a Cygwin find program installed on

one of my Windows computers, or else the standard find tool on the Linux side. It’s

also completely nonportable to Windows machines that don’t have the Unix-like find

program installed, and that includes other computers of my own (not to mention those

throughout most of the world at large). As we’ve seen, spawning shell commands also

incurs performance penalties for starting a new program.

We can do much better on the portability and performance fronts and still retain code

simplicity, by applying the find tool we wrote in Python in the prior section. The new

script is shown in Example 6-16.

Example 6-16. PP4E\Tools\cleanpyc-find-py.py

"""

find and delete all "*.pyc" bytecode files at and below the directory

named on the command-line; this uses a Python-coded find utility, and

326 | Chapter 6: Complete System Programs

so is portable; run this to delete .pyc's from an old Python release;

"""

import os, sys, find # here, gets Tools.find

count = 0

for filename in find.find('*.pyc', sys.argv[1]):

count += 1

print(filename)

os.remove(filename)

print('Removed %d .pyc files' % count)

When run, all bytecode files in the tree rooted at the passed-in directory name are

removed as before; this time, though, our script works just about everywhere Python

does:

C:\...\PP4E\Tools> cleanpyc-find-py.py .

.\find.pyc

.\visitor.pyc

.\__init__.pyc

Removed 3 .pyc files

This works portably, and it avoids external program startup costs. But find is really

just half the story—it collects files matching a name pattern but doesn’t search their

content. Although extra code can add such searching to a find’s result, a more manual

approach can allow us to tap into the search process more directly. The next section

shows how.

A Python Tree Searcher

After experimenting with greps and globs and finds, in the end, to help ease the task

of performing global searches on all platforms I might ever use, I wound up coding a

task-specific Python script to do most of the work for me. Example 6-17 employs the

following standard Python tools that we met in the preceding chapters: os.walk to visit

files in a directory, os.path.splitext to skip over files with binary-type extensions, and

os.path.join to portably combine a directory path and filename.

Because it’s pure Python code, it can be run the same way on both Linux and Windows.

In fact, it should work on any computer where Python has been installed. Moreover,

because it uses direct system calls, it will likely be faster than approaches that rely on

underlying shell commands.

Example 6-17. PP4E\Tools\search_all.py

"""

################################################################################

Use: "python ...\Tools\search_all.py dir string".

Search all files at and below a named directory for a string; uses the

os.walk interface, rather than doing a find.find to collect names first;

similar to calling visitfile for each find.find result for "*" pattern;

Searching Directory Trees | 327

################################################################################

"""

import os, sys

listonly = False

textexts = ['.py', '.pyw', '.txt', '.c', '.h'] # ignore binary files

def searcher(startdir, searchkey):

global fcount, vcount

fcount = vcount = 0

for (thisDir, dirsHere, filesHere) in os.walk(startdir):

for fname in filesHere: # do non-dir files here

fpath = os.path.join(thisDir, fname) # fnames have no dirpath

visitfile(fpath, searchkey)

def visitfile(fpath, searchkey): # for each non-dir file

global fcount, vcount # search for string

print(vcount+1, '=>', fpath) # skip protected files

try:

if not listonly:

if os.path.splitext(fpath)[1] not in textexts:

print('Skipping', fpath)

elif searchkey in open(fpath).read():

input('%s has %s' % (fpath, searchkey))

fcount += 1

except:

print('Failed:', fpath, sys.exc_info()[0])

vcount += 1

if __name__ == '__main__':

searcher(sys.argv[1], sys.argv[2])

print('Found in %d files, visited %d' % (fcount, vcount))

Operationally, this script works roughly the same as calling its visitfile function for

every result generated by our find.find tool with a pattern of “*”; but because this

version is specific to searching content it can better tailored for its goal. Really, this

equivalence holds only because a “*” pattern invokes an exhaustive traversal in

find.find, and that’s all that this new script’s searcher function does. The finder is

good at selecting specific file types, but this script benefits from a more custom single

traversal.

When run standalone, the search key is passed on the command line; when imported,

clients call this module’s searcher function directly. For example, to search (that is,

grep) for all appearances of a string in the book examples tree, I run a command line

like this in a DOS or Unix shell:

C:\\PP4E> Tools\search_all.py . mimetypes

1 => .\LaunchBrowser.py

2 => .\Launcher.py

3 => .\Launch_PyDemos.pyw

4 => .\Launch_PyGadgets_bar.pyw

5 => .\__init__.py

6 => .\__init__.pyc

328 | Chapter 6: Complete System Programs

Skipping .\__init__.pyc

7 => .\Preview\attachgui.py

8 => .\Preview\bob.pkl

Skipping .\Preview\bob.pkl

...more lines omitted: pauses for Enter key press at matches...

Found in 2 files, visited 184

The script lists each file it checks as it goes, tells you which files it is skipping (names

that end in extensions not listed in the variable textexts that imply binary data), and

pauses for an Enter key press each time it announces a file containing the search string.

The search_all script works the same way when it is imported rather than run, but

there is no final statistics output line (fcount and vcount live in the module and so would

have to be imported to be inspected here):

C:\...\PP4E\dev\Examples\PP4E> python

>>> import Tools.search_all

>>> search_all.searcher(r'C:\temp\PP3E\Examples', 'mimetypes')

...more lines omitted: 8 pauses for Enter key press along the way...

>>> search_all.fcount, search_all.vcount # matches, files

(8, 1429)

However launched, this script tracks down all references to a string in an entire directory

tree: a name of a changed book examples file, object, or directory, for instance. It’s

exactly what I was looking for—or at least I thought so, until further deliberation drove

me to seek more complete and better structured solutions, the topic of the next section.

Be sure to also see the coverage of regular expressions in Chapter 19.

The search_all script here searches for a simple string in each file with

the in string membership expression, but it would be trivial to extend

it to search for a regular expression pattern match instead (roughly, just

replace in with a call to a regular expression object’s search method).

Of course, such a mutation will be much more trivial after we’ve learned

how.

Also notice the textexts list in Example 6-17, which attempts to list all

possible binary file types: it would be more general and robust to use

the mimetypes logic we will meet near the end of this chapter in order to

guess file content type from its name, but the skips list provides more

control and sufficed for the trees I used this script against.

Finally note that for simplicity many of the directory searches in this

chapter assume that text is encoded per the underlying platform’s Uni-

code default. They could open text in binary mode to avoid decoding

errors, but searches might then be inaccurate because of encoding

scheme differences in the raw encoded bytes. To see how to do better,

watch for the “grep” utility in Chapter 11’s PyEdit GUI, which will apply

an encoding name to all the files in a searched tree and ignore those text

or binary files that fail to decode.

Searching Directory Trees | 329

Visitor: Walking Directories “++”

Laziness is the mother of many a framework. Armed with the portable search_all script

from Example 6-17, I was able to better pinpoint files to be edited every time I changed

the book examples tree content or structure. At least initially, in one window I ran

search_all to pick out suspicious files and edited each along the way by hand in another

window.

Pretty soon, though, this became tedious, too. Manually typing filenames into editor

commands is no fun, especially when the number of files to edit is large. Since I occa-

sionally have better things to do than manually start dozens of text editor sessions, I

started looking for a way to automatically run an editor on each suspicious file.

Unfortunately, search_all simply prints results to the screen. Although that text could

be intercepted with os.popen and parsed by another program, a more direct approach

that spawns edit sessions during the search may be simpler. That would require major

changes to the tree search script as currently coded, though, and make it useful for just

one specific purpose. At this point, three thoughts came to mind:

Redundancy

After writing a few directory walking utilities, it became clear that I was rewriting

the same sort of code over and over again. Traversals could be even further sim-

plified by wrapping common details for reuse. Although the os.walk tool avoids

having to write recursive functions, its model tends to foster redundant operations

and code (e.g., directory name joins, tracing prints).

Extensibility

Past experience informed me that it would be better in the long run to add features

to a general directory searcher as external components, rather than changing the

original script itself. Because editing files was just one possible extension (what

about automating text replacements, too?), a more general, customizable, and re-

usable approach seemed the way to go. Although os.walk is straightforward to use,

its nested loop-based structure doesn’t quite lend itself to customization the way

a class can.

Encapsulation

Based on past experience, I also knew that it’s a generally good idea to insulate

programs from implementation details as much as possible. While os.walk hides

the details of recursive traversal, it still imposes a very specific interface on its cli-

ents, which is prone to change over time. Indeed it has—as I’ll explain further at

the end of this section, one of Python’s tree walkers was removed altogether in 3.X,

instantly breaking code that relied upon it. It would be better to hide such de-

pendencies behind a more neutral interface, so that clients won’t break as our needs

change.

Of course, if you’ve studied Python in any depth, you know that all these goals point

to using an object-oriented framework for traversals and searching. Example 6-18 is a

330 | Chapter 6: Complete System Programs

concrete realization of these goals. It exports a general FileVisitor class that mostly

just wraps os.walk for easier use and extension, as well as a generic SearchVisitor class

that generalizes the notion of directory searches.

By itself, SearchVisitor simply does what search_all did, but it also opens up the search

process to customization—bits of its behavior can be modified by overloading its

methods in subclasses. Moreover, its core search logic can be reused everywhere we

need to search. Simply define a subclass that adds extensions for a specific task. The

same goes for FileVisitor—by redefining its methods and using its attributes, we can

tap into tree search using OOP coding techniques. As is usual in programming, once

you repeat tactical tasks often enough, they tend to inspire this kind of strategic

thinking.

Example 6-18. PP4E\Tools\visitor.py

"""

####################################################################################

Test: "python ...\Tools\visitor.py dir testmask [string]". Uses classes and

subclasses to wrap some of the details of os.walk call usage to walk and search;

testmask is an integer bitmask with 1 bit per available self-test; see also:

visitor_*/.py subclasses use cases; frameworks should generally use__X pseudo

private names, but all names here are exported for use in subclasses and clients;

redefine reset to support multiple independent walks that require subclass updates;

####################################################################################

"""

import os, sys

class FileVisitor:

"""

Visits all nondirectory files below startDir (default '.');

override visit* methods to provide custom file/dir handlers;

context arg/attribute is optional subclass-specific state;

trace switch: 0 is silent, 1 is directories, 2 adds files

"""

def __init__(self, context=None, trace=2):

self.fcount = 0

self.dcount = 0

self.context = context

self.trace = trace

def run(self, startDir=os.curdir, reset=True):

if reset: self.reset()

for (thisDir, dirsHere, filesHere) in os.walk(startDir):

self.visitdir(thisDir)

for fname in filesHere: # for non-dir files

fpath = os.path.join(thisDir, fname) # fnames have no path

self.visitfile(fpath)

def reset(self): # to reuse walker

self.fcount = self.dcount = 0 # for independent walks

def visitdir(self, dirpath): # called for each dir

Visitor: Walking Directories “++” | 331

self.dcount += 1 # override or extend me

if self.trace > 0: print(dirpath, '...')

def visitfile(self, filepath): # called for each file

self.fcount += 1 # override or extend me

if self.trace > 1: print(self.fcount, '=>', filepath)

class SearchVisitor(FileVisitor):

"""

Search files at and below startDir for a string;

subclass: redefine visitmatch, extension lists, candidate as needed;

subclasses can use testexts to specify file types to search (but can

also redefine candidate to use mimetypes for text content: see ahead)

"""

skipexts = []

testexts = ['.txt', '.py', '.pyw', '.html', '.c', '.h'] # search these exts

#skipexts = ['.gif', '.jpg', '.pyc', '.o', '.a', '.exe'] # or skip these exts

def __init__(self, searchkey, trace=2):

FileVisitor.__init__(self, searchkey, trace)

self.scount = 0

def reset(self): # on independent walks

self.scount = 0

def candidate(self, fname): # redef for mimetypes

ext = os.path.splitext(fname)[1]

if self.testexts:

return ext in self.testexts # in test list

else: # or not in skip list

return ext not in self.skipexts

def visitfile(self, fname): # test for a match

FileVisitor.visitfile(self, fname)

if not self.candidate(fname):

if self.trace > 0: print('Skipping', fname)

else:

text = open(fname).read() # 'rb' if undecodable

if self.context in text: # or text.find() != −1

self.visitmatch(fname, text)

self.scount += 1

def visitmatch(self, fname, text): # process a match

print('%s has %s' % (fname, self.context)) # override me lower

if __name__ == '__main__':

# self-test logic

dolist = 1

dosearch = 2 # 3=do list and search

donext = 4 # when next test added

def selftest(testmask):

332 | Chapter 6: Complete System Programs

if testmask & dolist:

visitor = FileVisitor(trace=2)

visitor.run(sys.argv[2])

print('Visited %d files and %d dirs' % (visitor.fcount, visitor.dcount))

if testmask & dosearch:

visitor = SearchVisitor(sys.argv[3], trace=0)

visitor.run(sys.argv[2])

print('Found in %d files, visited %d' % (visitor.scount, visitor.fcount))

selftest(int(sys.argv[1])) # e.g., 3 = dolist | dosearch

This module primarily serves to export classes for external use, but it does something

useful when run standalone, too. If you invoke it as a script with a test mask of 1 and

a root directory name, it makes and runs a FileVisitor object and prints an exhaustive

listing of every file and directory at and below the root:

C:\...\PP4E\Tools> visitor.py 1 C:\temp\PP3E\Examples

C:\temp\PP3E\Examples ...

1 => C:\temp\PP3E\Examples\README-root.txt

C:\temp\PP3E\Examples\PP3E ...

2 => C:\temp\PP3E\Examples\PP3E\echoEnvironment.pyw

3 => C:\temp\PP3E\Examples\PP3E\LaunchBrowser.pyw

4 => C:\temp\PP3E\Examples\PP3E\Launcher.py

5 => C:\temp\PP3E\Examples\PP3E\Launcher.pyc

...more output omitted (pipe into more or a file)...

1424 => C:\temp\PP3E\Examples\PP3E\System\Threads\thread-count.py

1425 => C:\temp\PP3E\Examples\PP3E\System\Threads\thread1.py

C:\temp\PP3E\Examples\PP3E\TempParts ...

1426 => C:\temp\PP3E\Examples\PP3E\TempParts\109_0237.JPG

1427 => C:\temp\PP3E\Examples\PP3E\TempParts\lawnlake1-jan-03.jpg

1428 => C:\temp\PP3E\Examples\PP3E\TempParts\part-001.txt

1429 => C:\temp\PP3E\Examples\PP3E\TempParts\part-002.html

Visited 1429 files and 186 dirs

If you instead invoke this script with a 2 as its first command-line argument, it makes

and runs a SearchVisitor object using the third argument as the search key. This form

is similar to running the search_all.py script we met earlier, but it simply reports each

matching file without pausing:

C:\...\PP4E\Tools> visitor.py 2 C:\temp\PP3E\Examples mimetypes

C:\temp\PP3E\Examples\PP3E\extras\LosAlamosAdvancedClass\day1-system\data.txt ha

s mimetypes

C:\temp\PP3E\Examples\PP3E\Internet\Email\mailtools\mailParser.py has mimetypes

C:\temp\PP3E\Examples\PP3E\Internet\Email\mailtools\mailSender.py has mimetypes

C:\temp\PP3E\Examples\PP3E\Internet\Ftp\mirror\downloadflat.py has mimetypes

C:\temp\PP3E\Examples\PP3E\Internet\Ftp\mirror\downloadflat_modular.py has mimet

ypes

C:\temp\PP3E\Examples\PP3E\Internet\Ftp\mirror\ftptools.py has mimetypes

C:\temp\PP3E\Examples\PP3E\Internet\Ftp\mirror\uploadflat.py has mimetypes

C:\temp\PP3E\Examples\PP3E\System\Media\playfile.py has mimetypes

Found in 8 files, visited 1429

Visitor: Walking Directories “++” | 333

Technically, passing this script a first argument of 3 runs both a FileVisitor and a

SearchVisitor (two separate traversals are performed). The first argument is really used

as a bit mask to select one or more supported self-tests; if a test’s bit is on in the binary

value of the argument, the test will be run. Because 3 is 011 in binary, it selects both a

search (010) and a listing (001). In a more user-friendly system, we might want to be

more symbolic about that (e.g., check for -search and -list arguments), but bit masks

work just as well for this script’s scope.

As usual, this module can also be used interactively. The following is one way to de-

termine how many files and directories you have in specific directories; the last com-

mand walks over your entire drive (after a generally noticeable delay!). See also the

“biggest file” example at the start of this chapter for issues such as potential repeat visits

not handled by this walker:

C:\...\PP4E\Tools> python

>>> from visitor import FileVisitor

>>> V = FileVisitor(trace=0)

>>> V.run(r'C:\temp\PP3E\Examples')

>>> V.dcount, V.fcount

(186, 1429)

>>> V.run('..') # independent walk (reset counts)

>>> V.dcount, V.fcount

(19, 181)

>>> V.run('..', reset=False) # accumulative walk (keep counts)

>>> V.dcount, V.fcount

(38, 362)

>>> V = FileVisitor(trace=0) # new independent walker (own counts)

>>> V.run(r'C:\\') # entire drive: try '/' on Unix-en

>>> V.dcount, V.fcount

(24992, 198585)

Although the visitor module is useful by itself for listing and searching trees, it was

really designed to be extended. In the rest of this section, let’s quickly step through a

handful of visitor clients which add more specific tree operations, using normal OO

customization techniques.

Editing Files in Directory Trees (Visitor)

After genericizing tree traversals and searches, it’s easy to add automatic file editing in

a brand-new, separate component. Example 6-19 defines a new EditVisitor class that

simply customizes the visitmatch method of the SearchVisitor class to open a text

editor on the matched file. Yes, this is the complete program—it needs to do something

special only when visiting matched files, and so it needs to provide only that behavior.

The rest of the traversal and search logic is unchanged and inherited.

334 | Chapter 6: Complete System Programs

Example 6-19. PP4E\Tools\visitor_edit.py

"""

Use: "python ...\Tools\visitor_edit.py string rootdir?".

Add auto-editor startup to SearchVisitor in an external subclass component;

Automatically pops up an editor on each file containing string as it traverses;

can also use editor='edit' or 'notepad' on Windows; to use texteditor from

later in the book, try r'python Gui\TextEditor\textEditor.py'; could also

send a search command to go to the first match on start in some editors;

"""

import os, sys

from visitor import SearchVisitor

class EditVisitor(SearchVisitor):

"""

edit files at and below startDir having string

"""

editor = r'C:\cygwin\bin\vim-nox.exe' # ymmv!

def visitmatch(self, fpathname, text):

os.system('%s %s' % (self.editor, fpathname))

if __name__ == '__main__':

visitor = EditVisitor(sys.argv[1])

visitor.run('.' if len(sys.argv) < 3 else sys.argv[2])

print('Edited %d files, visited %d' % (visitor.scount, visitor.fcount))

When we make and run an EditVisitor, a text editor is started with the os.system

command-line spawn call, which usually blocks its caller until the spawned program

finishes. As coded, when run on my machines, each time this script finds a matched

file during the traversal, it starts up the vi text editor within the console window where

the script was started; exiting the editor resumes the tree walk.

Let’s find and edit some files. When run as a script, we pass this program the search

string as a command argument (here, the string mimetypes is the search key). The root

directory passed to the run method is either the second argument or “.” (the current

run directory) by default. Traversal status messages show up in the console, but each

matched file now automatically pops up in a text editor along the way. In the following,

the editor is started eight times—try this with an editor and tree of your own to get a

better feel for how it works:

C:\...\PP4E\Tools> visitor_edit.py mimetypes C:\temp\PP3E\Examples

C:\temp\PP3E\Examples ...

1 => C:\temp\PP3E\Examples\README-root.txt

C:\temp\PP3E\Examples\PP3E ...

2 => C:\temp\PP3E\Examples\PP3E\echoEnvironment.pyw

3 => C:\temp\PP3E\Examples\PP3E\LaunchBrowser.pyw

4 => C:\temp\PP3E\Examples\PP3E\Launcher.py

5 => C:\temp\PP3E\Examples\PP3E\Launcher.pyc

Skipping C:\temp\PP3E\Examples\PP3E\Launcher.pyc

...more output omitted...

1427 => C:\temp\PP3E\Examples\PP3E\TempParts\lawnlake1-jan-03.jpg

Visitor: Walking Directories “++” | 335

Skipping C:\temp\PP3E\Examples\PP3E\TempParts\lawnlake1-jan-03.jpg

1428 => C:\temp\PP3E\Examples\PP3E\TempParts\part-001.txt

1429 => C:\temp\PP3E\Examples\PP3E\TempParts\part-002.html

Edited 8 files, visited 1429

This, finally, is the exact tool I was looking for to simplify global book examples tree

maintenance. After major changes to things such as shared modules and file and di-

rectory names, I run this script on the examples root directory with an appropriate

search string and edit any files it pops up as needed. I still need to change files by hand

in the editor, but that’s often safer than blind global replacements.

Global Replacements in Directory Trees (Visitor)

But since I brought it up: given a general tree traversal class, it’s easy to code a global

search-and-replace subclass, too. The ReplaceVisitor class in Example 6-20 is a Search

Visitor subclass that customizes the visitfile method to globally replace any appear-

ances of one string with another, in all text files at and below a root directory. It also

collects the names of all files that were changed in a list just in case you wish to go

through and verify the automatic edits applied (a text editor could be automatically

popped up on each changed file, for instance).

Example 6-20. PP4E\Tools\visitor_replace.py

"""

Use: "python ...\Tools\visitor_replace.py rootdir fromStr toStr".

Does global search-and-replace in all files in a directory tree: replaces

fromStr with toStr in all text files; this is powerful but dangerous!!

visitor_edit.py runs an editor for you to verify and make changes, and so

is safer; use visitor_collect.py to simply collect matched files list;

listonly mode here is similar to both SearchVisitor and CollectVisitor;

"""

import sys

from visitor import SearchVisitor

class ReplaceVisitor(SearchVisitor):

"""

Change fromStr to toStr in files at and below startDir;

files changed available in obj.changed list after a run

"""

def __init__(self, fromStr, toStr, listOnly=False, trace=0):

self.changed = []

self.toStr = toStr

self.listOnly = listOnly

SearchVisitor.__init__(self, fromStr, trace)

def visitmatch(self, fname, text):

self.changed.append(fname)

if not self.listOnly:

fromStr, toStr = self.context, self.toStr

text = text.replace(fromStr, toStr)

open(fname, 'w').write(text)

336 | Chapter 6: Complete System Programs

if __name__ == '__main__':

listonly = input('List only?') == 'y'

visitor = ReplaceVisitor(sys.argv[2], sys.argv[3], listonly)

if listonly or input('Proceed with changes?') == 'y':

visitor.run(startDir=sys.argv[1])

action = 'Changed' if not listonly else 'Found'

print('Visited %d files' % visitor.fcount)

print(action, '%d files:' % len(visitor.changed))

for fname in visitor.changed: print(fname)

To run this script over a directory tree, run the following sort of command line with

appropriate “from” and “to” strings. On my shockingly underpowered netbook ma-

chine, doing this on a 1429-file tree and changing 101 files along the way takes roughly

three seconds of real clock time when the system isn’t particularly busy.

C:\...\PP4E\Tools> visitor_replace.py C:\temp\PP3E\Examples PP3E PP4E

List only?y

Visited 1429 files

Found 101 files:

C:\temp\PP3E\Examples\README-root.txt

C:\temp\PP3E\Examples\PP3E\echoEnvironment.pyw

C:\temp\PP3E\Examples\PP3E\Launcher.py

...more matching filenames omitted...

C:\...\PP4E\Tools> visitor_replace.py C:\temp\PP3E\Examples PP3E PP4E

List only?n

Proceed with changes?y

Visited 1429 files

Changed 101 files:

C:\temp\PP3E\Examples\README-root.txt

C:\temp\PP3E\Examples\PP3E\echoEnvironment.pyw

C:\temp\PP3E\Examples\PP3E\Launcher.py

...more changed filenames omitted...

C:\...\PP4E\Tools> visitor_replace.py C:\temp\PP3E\Examples PP3E PP4E

List only?n

Proceed with changes?y

Visited 1429 files

Changed 0 files:

Naturally, we can also check our work by running the visitor script (and

SearchVisitor superclass):

C:\...\PP4E\Tools> visitor.py 2 C:\temp\PP3E\Examples PP3E

Found in 0 files, visited 1429

C:\...\PP4E\Tools> visitor.py 2 C:\temp\PP3E\Examples PP4E

C:\temp\PP3E\Examples\README-root.txt has PP4E

C:\temp\PP3E\Examples\PP3E\echoEnvironment.pyw has PP4E

C:\temp\PP3E\Examples\PP3E\Launcher.py has PP4E

...more matching filenames omitted...

Found in 101 files, visited 1429

Visitor: Walking Directories “++” | 337

This is both wildly powerful and dangerous. If the string to be replaced can show up

in places you didn’t anticipate, you might just ruin an entire tree of files by running the

ReplaceVisitor object defined here. On the other hand, if the string is something very

specific, this object can obviate the need to manually edit suspicious files. For instance,

website addresses in HTML files are likely too specific to show up in other places by

chance.

Counting Source Code Lines (Visitor)

The two preceding visitor module clients were both search-oriented, but it’s just as

easy to extend the basic walker class for more specific goals. Example 6-21, for instance,

extends FileVisitor to count the number of lines in program source code files of various

types throughout an entire tree. The effect is much like calling the visitfile method

of this class for each filename returned by the find tool we wrote earlier in this chapter,

but the OO structure here is arguably more flexible and extensible.

Example 6-21. PP4E\Tools\visitor_sloc.py

"""

Count lines among all program source files in a tree named on the command

line, and report totals grouped by file types (extension). A simple SLOC

(source lines of code) metric: skip blank and comment lines if desired.

"""

import sys, pprint, os

from visitor import FileVisitor

class LinesByType(FileVisitor):

srcExts = [] # define in subclass

def __init__(self, trace=1):

FileVisitor.__init__(self, trace=trace)

self.srcLines = self.srcFiles = 0

self.extSums = {ext: dict(files=0, lines=0) for ext in self.srcExts}

def visitsource(self, fpath, ext):

if self.trace > 0: print(os.path.basename(fpath))

lines = len(open(fpath, 'rb').readlines())

self.srcFiles += 1

self.srcLines += lines

self.extSums[ext]['files'] += 1

self.extSums[ext]['lines'] += lines

def visitfile(self, filepath):

FileVisitor.visitfile(self, filepath)

for ext in self.srcExts:

if filepath.endswith(ext):

self.visitsource(filepath, ext)

break

class PyLines(LinesByType):

srcExts = ['.py', '.pyw'] # just python files

338 | Chapter 6: Complete System Programs

class SourceLines(LinesByType):

srcExts = ['.py', '.pyw', '.cgi', '.html', '.c', '.cxx', '.h', '.i']

if __name__ == '__main__':

walker = SourceLines()

walker.run(sys.argv[1])

print('Visited %d files and %d dirs' % (walker.fcount, walker.dcount))

print('-'*80)

print('Source files=>%d, lines=>%d' % (walker.srcFiles, walker.srcLines))

print('By Types:')

pprint.pprint(walker.extSums)

print('\nCheck sums:', end=' ')

print(sum(x['lines'] for x in walker.extSums.values()), end=' ')

print(sum(x['files'] for x in walker.extSums.values()))

print('\nPython only walk:')

walker = PyLines(trace=0)

walker.run(sys.argv[1])

pprint.pprint(walker.extSums)

When run as a script, we get trace messages during the walk (omitted here to save

space), and a report with line counts grouped by file type. Run this on trees of your

own to watch its progress; my tree has 907 source files and 48K source lines, including

783 files and 34K lines of “.py” Python code:

C:\...\PP4E\Tools> visitor_sloc.py C:\temp\PP3E\Examples

Visited 1429 files and 186 dirs

--------------------------------------------------------------------------------

Source files=>907, lines=>48047

By Types:

{'.c': {'files': 45, 'lines': 7370},

'.cgi': {'files': 5, 'lines': 122},

'.cxx': {'files': 4, 'lines': 2278},

'.h': {'files': 7, 'lines': 297},

'.html': {'files': 48, 'lines': 2830},

'.i': {'files': 4, 'lines': 49},

'.py': {'files': 783, 'lines': 34601},

'.pyw': {'files': 11, 'lines': 500}}

Check sums: 48047 907

Python only walk:

{'.py': {'files': 783, 'lines': 34601}, '.pyw': {'files': 11, 'lines': 500}}

Recoding Copies with Classes (Visitor)

Let’s peek at one more visitor use case. When I first wrote the cpall.py script earlier in

this chapter, I couldn’t see a way that the visitor class hierarchy we met earlier would

help. Two directories needed to be traversed in parallel (the original and the copy), and

visitor is based on walking just one tree with os.walk. There seemed no easy way to

keep track of where the script was in the copy directory.

Visitor: Walking Directories “++” | 339

The trick I eventually stumbled onto is not to keep track at all. Instead, the script in

Example 6-22 simply replaces the “from” directory path string with the “to” directory

path string, at the front of all directory names and pathnames passed in from os.walk.

The results of the string replacements are the paths to which the original files and

directories are to be copied.

Example 6-22. PP4E\Tools\visitor_cpall.py

"""

Use: "python ...\Tools\visitor_cpall.py fromDir toDir trace?"

Like System\Filetools\cpall.py, but with the visitor classes and os.walk;

does string replacement of fromDir with toDir at the front of all the names

that the walker passes in; assumes that the toDir does not exist initially;

"""

import os

from visitor import FileVisitor # visitor is in '.'

from PP4E.System.Filetools.cpall import copyfile # PP4E is in a dir on path

class CpallVisitor(FileVisitor):

def __init__(self, fromDir, toDir, trace=True):

self.fromDirLen = len(fromDir) + 1

self.toDir = toDir

FileVisitor.__init__(self, trace=trace)

def visitdir(self, dirpath):

toPath = os.path.join(self.toDir, dirpath[self.fromDirLen:])

if self.trace: print('d', dirpath, '=>', toPath)

os.mkdir(toPath)

self.dcount += 1

def visitfile(self, filepath):

toPath = os.path.join(self.toDir, filepath[self.fromDirLen:])

if self.trace: print('f', filepath, '=>', toPath)

copyfile(filepath, toPath)

self.fcount += 1

if __name__ == '__main__':

import sys, time

fromDir, toDir = sys.argv[1:3]

trace = len(sys.argv) > 3

print('Copying...')

start = time.clock()

walker = CpallVisitor(fromDir, toDir, trace)

walker.run(startDir=fromDir)

print('Copied', walker.fcount, 'files,', walker.dcount, 'directories', end=' ')

print('in', time.clock() - start, 'seconds')

340 | Chapter 6: Complete System Programs

This version accomplishes roughly the same goal as the original, but it has made a few

assumptions to keep the code simple. The “to” directory is assumed not to exist initially,

and exceptions are not ignored along the way. Here it is copying the book examples

tree from the prior edition again on Windows:

C:\...\PP4E\Tools> set PYTHONPATH

PYTHONPATH=C:\Users\Mark\Stuff\Books\4E\PP4E\dev\Examples

C:\...\PP4E\Tools> rmdir /S copytemp

copytemp, Are you sure (Y/N)? y

C:\...\PP4E\Tools> visitor_cpall.py C:\temp\PP3E\Examples copytemp

Copying...

Copied 1429 files, 186 directories in 11.1722033777 seconds

C:\...\PP4E\Tools> fc /B copytemp\PP3E\Launcher.py

C:\temp\PP3E\Examples\PP3E\Launcher.py

Comparing files COPYTEMP\PP3E\Launcher.py and C:\TEMP\PP3E\EXAMPLES\PP3E\LAUNCHER.PY

FC: no differences encountered

Despite the extra string slicing going on, this version seems to run just as fast as the

original (the actual difference can be chalked up to system load variations). For tracing

purposes, this version also prints all the “from” and “to” copy paths during the traversal

if you pass in a third argument on the command line:

C:\...\PP4E\Tools> rmdir /S copytemp

copytemp, Are you sure (Y/N)? y

C:\...\PP4E\Tools> visitor_cpall.py C:\temp\PP3E\Examples copytemp 1

Copying...

d C:\temp\PP3E\Examples => copytemp\

f C:\temp\PP3E\Examples\README-root.txt => copytemp\README-root.txt

d C:\temp\PP3E\Examples\PP3E => copytemp\PP3E

...more lines omitted: try this on your own for the full output...

Other Visitor Examples (External)

Although the visitor is widely applicable, we don’t have space to explore additional

subclasses in this book. For more example clients and use cases, see the following

examples in book’s examples distribution package described in the Preface:

•Tools\visitor_collect.py collects and/or prints files containing a search string

•Tools\visitor_poundbang.py replaces directory paths in “#!” lines at the top of Unix

scripts

•Tools\visitor_cleanpyc.py is a visitor-based recoding of our earlier bytecode cleanup

scripts

•Tools\visitor_bigpy.py is a visitor-based version of the “biggest file” example at the

start of this chapter

Visitor: Walking Directories “++” | 341

Most of these are almost as trivial as the visitor_edit.py code in Example 6-19, because

the visitor framework handles walking details automatically. The collector, for in-

stance, simply appends to a list as a search visitor detects matched files and allows the

default list of text filename extensions in the search visitor to be overridden per

instance—it’s roughly like a combination of find and grep on Unix:

>>> from visitor_collect import CollectVisitor

>>> V = CollectVisitor('mimetypes', testexts=['.py', '.pyw'], trace=0)

>>> V.run(r'C:\temp\PP3E\Examples')

>>> for name in V.matches: print(name) # .py and .pyw files with 'mimetypes'

...

C:\temp\PP3E\Examples\PP3E\Internet\Email\mailtools\mailParser.py

C:\temp\PP3E\Examples\PP3E\Internet\Email\mailtools\mailSender.py

C:\temp\PP3E\Examples\PP3E\Internet\Ftp\mirror\downloadflat.py

C:\temp\PP3E\Examples\PP3E\Internet\Ftp\mirror\downloadflat_modular.py

C:\temp\PP3E\Examples\PP3E\Internet\Ftp\mirror\ftptools.py

C:\temp\PP3E\Examples\PP3E\Internet\Ftp\mirror\uploadflat.py

C:\temp\PP3E\Examples\PP3E\System\Media\playfile.py

C:\...\PP4E\Tools> visitor_collect.py mimetypes C:\temp\PP3E\Examples # as script

The core logic of the biggest-file visitor is similarly straightforward, and harkens back

to chapter start:

class BigPy(FileVisitor):

def __init__(self, trace=0):

FileVisitor.__init__(self, context=[], trace=trace)

def visitfile(self, filepath):

FileVisitor.visitfile(self, filepath)

if filepath.endswith('.py'):

self.context.append((os.path.getsize(filepath), filepath))

And the bytecode-removal visitor brings us back full circle, showing an additional al-

ternative to those we met earlier in this chapter. It’s essentially the same code, but it

runs os.remove on “.pyc” file visits.

In the end, while the visitor classes are really just simple wrappers for os.walk, they

further automate walking chores and provide a general framework and alternative class-

based structure which may seem more natural to some than simple unstructured loops.

They’re also representative of how Python’s OOP support maps well to real-world

structures like file systems. Although os.walk works well for one-off scripts, the better

extensibility, reduced redundancy, and greater encapsulation possible with OOP can

be a major asset in real work as our needs change and evolve over time.

342 | Chapter 6: Complete System Programs

In fact, those needs have changed over time. Between the third and

fourth editions of this book, the original os.path.walk call was removed

in Python 3.X, and os.walk became the only automated way to perform

tree walks in the standard library. Examples from the prior edition that

used os.path.walk were effectively broken. By contrast, although the

visitor classes used this call, too, its clients did not. Because updating

the visitor classes to use os.walk internally did not alter those classes’

interfaces, visitor-based tools continued to work unchanged.

This seems a prime example of the benefits of OOP’s support for en-

capsulation. Although the future is never completely predictable, in

practice, user-defined tools like visitor tend to give you more control

over changes than standard library tools like os.walk. Trust me on that;

as someone who has had to update three Python books over the last 15

years, I can say with some certainty that Python change is a constant!

Playing Media Files

We have space for just one last, quick example in this chapter, so we’ll close with a bit

of fun. Did you notice how the file extensions for text and binary file types were hard-

coded in the directory search scripts of the prior two sections? That approach works

for the trees they were applied to, but it’s not necessarily complete or portable. It would

be better if we could deduce file type from file name automatically. That’s exactly what

Python’s mimetypes module can do for us. In this section, we’ll use it to build a script

that attempts to launch a file based upon its media type, and in the process develop

general tools for opening media portably with specific or generic players.

As we’ve seen, on Windows this task is trivial—the os.startfile call opens files per

the Windows registry, a system-wide mapping of file extension types to handler pro-

grams. On other platforms, we can either run specific media handlers per media type,

or fall back on a resident web browser to open the file generically using Python’s

webbrowser module. Example 6-23 puts these ideas into code.

Example 6-23. PP4E\System\Media\playfile.py

#!/usr/local/bin/python

"""

##################################################################################

Try to play an arbitrary media file. Allows for specific players instead of

always using general web browser scheme. May not work on your system as is;

audio files use filters and command lines on Unix, and filename associations

on Windows via the start command (i.e., whatever you have on your machine to

run .au files--an audio player, or perhaps a web browser). Configure and

extend as needed. playknownfile assumes you know what sort of media you wish

to open, and playfile tries to determine media type automatically using Python

mimetypes module; both try to launch a web browser with Python webbrowser module

as a last resort when mimetype or platform unknown.

##################################################################################

"""

Playing Media Files | 343

import os, sys, mimetypes, webbrowser

helpmsg = """

Sorry: can't find a media player for '%s' on your system!

Add an entry for your system to the media player dictionary

for this type of file in playfile.py, or play the file manually.

"""

def trace(*args): print(*args) # with spaces between

##################################################################################

# player techniques: generic and otherwise: extend me

##################################################################################

class MediaTool:

def __init__(self, runtext=''):

self.runtext = runtext

def run(self, mediafile, **options): # most ignore options

fullpath = os.path.abspath(mediafile) # cwd may be anything

self.open(fullpath, **options)

class Filter(MediaTool):

def open(self, mediafile, **ignored):

media = open(mediafile, 'rb')

player = os.popen(self.runtext, 'w') # spawn shell tool

player.write(media.read()) # send to its stdin

class Cmdline(MediaTool):

def open(self, mediafile, **ignored):

cmdline = self.runtext % mediafile # run any cmd line

os.system(cmdline) # use %s for filename

class Winstart(MediaTool): # use Windows registry

def open(self, mediafile, wait=False, **other): # or os.system('start file')

if not wait: # allow wait for curr media

os.startfile(mediafile)

else:

os.system('start /WAIT ' + mediafile)

class Webbrowser(MediaTool):

# file:// requires abs path

def open(self, mediafile, **options):

webbrowser.open_new('file://%s' % mediafile, **options)

##################################################################################

# media- and platform-specific policies: change me, or pass one in

##################################################################################

# map platform to player: change me!

audiotools = {

'sunos5': Filter('/usr/bin/audioplay'), # os.popen().write()

'linux2': Cmdline('cat %s > /dev/audio'), # on zaurus, at least

'sunos4': Filter('/usr/demo/SOUND/play'), # yes, this is that old!

344 | Chapter 6: Complete System Programs

'win32': Winstart() # startfile or system

#'win32': Cmdline('start %s')

}

videotools = {

'linux2': Cmdline('tkcVideo_c700 %s'), # zaurus pda

'win32': Winstart(), # avoid DOS pop up

}

imagetools = {

'linux2': Cmdline('zimager %s'), # zaurus pda

'win32': Winstart(),

}

texttools = {

'linux2': Cmdline('vi %s'), # zaurus pda

'win32': Cmdline('notepad %s') # or try PyEdit?

}

apptools = {

'win32': Winstart() # doc, xls, etc: use at your own risk!

}

# map mimetype of filenames to player tables

mimetable = {'audio': audiotools,

'video': videotools,

'image': imagetools,

'text': texttools, # not html text: browser

'application': apptools}

##################################################################################

# top-level interfaces

##################################################################################

def trywebbrowser(filename, helpmsg=helpmsg, **options):

"""

try to open a file in a web browser

last resort if unknown mimetype or platform, and for text/html

"""

trace('trying browser', filename)

try:

player = Webbrowser() # open in local browser

player.run(filename, **options)

except:

print(helpmsg % filename) # else nothing worked

def playknownfile(filename, playertable={}, **options):

"""

play media file of known type: uses platform-specific

player objects, or spawns a web browser if nothing for

this platform; accepts a media-specific player table

"""

if sys.platform in playertable:

playertable[sys.platform].run(filename, **options) # specific tool

Playing Media Files | 345

else:

trywebbrowser(filename, **options) # general scheme

def playfile(filename, mimetable=mimetable, **options):

"""

play media file of any type: uses mimetypes to guess media

type and map to platform-specific player tables; spawn web

browser if text/html, media type unknown, or has no table

"""

contenttype, encoding = mimetypes.guess_type(filename) # check name

if contenttype == None or encoding is not None: # can't guess

contenttype = '?/?' # poss .txt.gz

maintype, subtype = contenttype.split('/', 1) # 'image/jpeg'

if maintype == 'text' and subtype == 'html':

trywebbrowser(filename, **options) # special case

elif maintype in mimetable:

playknownfile(filename, mimetable[maintype], **options) # try table

else:

trywebbrowser(filename, **options) # other types

###############################################################################

# self-test code

###############################################################################

if __name__ == '__main__':

# media type known

playknownfile('sousa.au', audiotools, wait=True)

playknownfile('ora-pp3e.gif', imagetools, wait=True)

playknownfile('ora-lp4e.jpg', imagetools)

# media type guessed

input('Stop players and press Enter')

playfile('ora-lp4e.jpg') # image/jpeg

playfile('ora-pp3e.gif') # image/gif

playfile('priorcalendar.html') # text/html

playfile('lp4e-preface-preview.html') # text/html

playfile('lp-code-readme.txt') # text/plain

playfile('spam.doc') # app

playfile('spreadsheet.xls') # app

playfile('sousa.au', wait=True) # audio/basic

input('Done') # stay open if clicked

Although it’s generally possible to open most media files by passing their names to a

web browser these days, this module provides a simple framework for launching media

files with more specific tools, tailored by both media type and platform. A web browser

is used only as a fallback option, if more specific tools are not available. The net result

is an extendable media file player, which is as specific and portable as the customiza-

tions you provide for its tables.

We’ve seen the program launch tools employed by this script in prior chapters. The

script’s main new concepts have to do with the modules it uses: the webbrowser module

to open some files in a local web browser, as well as the Python mimetypes module to

346 | Chapter 6: Complete System Programs

determine media type from file name. Since these are the heart of this code’s matter,

let’s explore these briefly before we run the script.

The Python webbrowser Module

The standard library webbrowser module used by this example provides a portable in-

terface for launching web browsers from Python scripts. It attempts to locate a suitable

web browser on your local machine to open a given URL (file or web address) for

display. Its interface is straightforward:

>>> import webbrowser

>>> webbrowser.open_new('file://' + fullfilename) # use os.path.abspath()

This code will open the named file in a new web browser window using whatever

browser is found on the underlying computer, or raise an exception if it cannot. You

can tailor the browsers used on your platform, and the order in which they are attemp-

ted, by using the BROWSER environment variable and register function. By default,

webbrowser attempts to be automatically portable across platforms.

Use an argument string of the form “file://...” or “http://...” to open a file on the local

computer or web server, respectively. In fact, you can pass in any URL that the browser

understands. The following pops up Python’s home page in a new locally-running

browser window, for example:

>>> webbrowser.open_new('http://www.python.org')

Among other things, this is an easy way to display HTML documents as well as media

files, as demonstrated by this section’s example. For broader applicability, this module

can be used as both command-line script (Python’s -m module search path flag helps

here) and as importable tool:

C:\Users\mark\Stuff\Websites\public_html> python -m webbrowser about-pp.html

C:\Users\mark\Stuff\Websites\public_html> python -m webbrowser -n about-pp.html

C:\Users\mark\Stuff\Websites\public_html> python -m webbrowser -t about-pp.html

C:\Users\mark\Stuff\Websites\public_html> python

>>> import webbrowser

>>> webbrowser.open('about-pp.html') # reuse, new window, new tab

True

>>> webbrowser.open_new('about-pp.html') # file:// optional on Windows

True

>>> webbrowser.open_new_tab('about-pp.html')

True

In both modes, the difference between the three usage forms is that the first tries to

reuse an already-open browser window if possible, the second tries to open a new

window, and the third tries to open a new tab. In practice, though, their behavior is

totally dependent on what the browser selected on your platform supports, and even

on the platform in general. All three forms may behave the same.

Playing Media Files | 347

On Windows, for example, all three simply run os.startfile by default and thus create

a new tab in an existing window under Internet Explorer 8. This is also why I didn’t

need the “file://” full URL prefix in the preceding listing. Technically, Internet Explorer

is only run if this is what is registered on your computer for the file type being opened;

if not, that file type’s handler is opened instead. Some images, for example, may open

in a photo viewer instead. On other platforms, such as Unix and Mac OS X, browser

behavior differs, and non-URL file names might not be opened; use “file://” for

portability.

We’ll use this module again later in this book. For example, the PyMailGUI program

in Chapter 14 will employ it as a way to display HTML-formatted email messages and

attachments, as well as program help. See the Python library manual for more details.

In Chapters 13 and 15, we’ll also meet a related call, urllib.request.urlopen, which

fetches a web page’s text given a URL, but does not open it in a browser; it may be

parsed, saved, or otherwise used.

The Python mimetypes Module

To make this media player module even more useful, we also use the Python

mimetypes standard library module to automatically determine the media type from the

filename. We get back a type/subtype MIME content-type string if the type can be

determined or None if the guess failed:

>>> import mimetypes

>>> mimetypes.guess_type('spam.jpg')

('image/jpeg', None)

>>> mimetypes.guess_type('TheBrightSideOfLife.mp3')

('audio/mpeg', None)

>>> mimetypes.guess_type('lifeofbrian.mpg')

('video/mpeg', None)

>>> mimetypes.guess_type('lifeofbrian.xyz') # unknown type

(None, None)

Stripping off the first part of the content-type string gives the file’s general media type,

which we can use to select a generic player; the second part (subtype) can tell us if text

is plain or HTML:

>>> contype, encoding = mimetypes.guess_type('spam.jpg')

>>> contype.split('/')[0]

'image'

>>> mimetypes.guess_type('spam.txt') # subtype is 'plain'

('text/plain', None)

>>> mimetypes.guess_type('spam.html')

('text/html', None)

348 | Chapter 6: Complete System Programs

>>> mimetypes.guess_type('spam.html')[0].split('/')[1]

'html'

A subtle thing: the second item in the tuple returned from the mimetypes guess is an

encoding type we won’t use here for opening purposes. We still have to pay attention

to it, though—if it is not None, it means the file is compressed (gzip or compress), even

if we receive a media content type. For example, if the filename is something like

spam.gif.gz, it’s a compressed image that we don’t want to try to open directly:

>>> mimetypes.guess_type('spam.gz') # content unknown

(None, 'gzip')

>>> mimetypes.guess_type('spam.gif.gz') # don't play me!

('image/gif', 'gzip')

>>> mimetypes.guess_type('spam.zip') # archives

('application/zip', None)

>>> mimetypes.guess_type('spam.doc') # office app files

('application/msword', None)

If the filename you pass in contains a directory path, the path portion is ignored (only

the extension is used). This module is even smart enough to give us a filename extension

for a type—useful if we need to go the other way, and create a file name from a content

type:

>>> mimetypes.guess_type(r'C:\songs\sousa.au')

('audio/basic', None)

>>> mimetypes.guess_extension('audio/basic')

'.au'

Try more calls on your own for more details. We’ll use the mimetypes module again in

FTP examples in Chapter 13 to determine transfer type (text or binary), and in our

email examples in Chapters 13, 14, and 16 to send, save, and open mail attachments.

In Example 6-23, we use mimetypes to select a table of platform-specific player com-

mands for the media type of the file to be played. That is, we pick a player table for the

file’s media type, and then pick a command from the player table for the platform. At

both steps, we give up and run a web browser if there is nothing more specific to be

done.

Using mimetypes guesses for SearchVisitor

To use this module for directing our text file search scripts we wrote earlier in this

chapter, simply extract the first item in the content-type returned for a file’s name. For

instance, all in the following list are considered text (except “.pyw”, which we may

have to special-case if we must care):

>>> for ext in ['.txt', '.py', '.pyw', '.html', '.c', '.h', '.xml']:

... print(ext, mimetypes.guess_type('spam' + ext))

...

Playing Media Files | 349

.txt ('text/plain', None)

.py ('text/x-python', None)

.pyw (None, None)

.html ('text/html', None)

.c ('text/plain', None)

.h ('text/plain', None)

.xml ('text/xml', None)

We can add this technique to our earlier SearchVisitor class by redefining its candidate

selection method, in order to replace its default extension lists with mimetypes guesses—

yet more evidence of the power of OOP customization at work:

C:\...\PP4E\Tools> python

>>> import mimetypes

>>> from visitor import SearchVisitor # or PP4E.Tools.visitor if not .

>>>

>>> class SearchMimeVisitor(SearchVisitor):

... def candidate(self, fname):

... contype, encoding = mimetypes.guess_type(fname)

... return (contype and

... contype.split('/')[0] == 'text' and

... encoding == None)

...

>>> V = SearchMimeVisitor('mimetypes', trace=0) # search key

>>> V.run(r'C:\temp\PP3E\Examples') # root dir

C:\temp\PP3E\Examples\PP3E\extras\LosAlamosAdvancedClass\day1-system\data.txt ha

s mimetypes

C:\temp\PP3E\Examples\PP3E\Internet\Email\mailtools\mailParser.py has mimetypes

C:\temp\PP3E\Examples\PP3E\Internet\Email\mailtools\mailSender.py has mimetypes

C:\temp\PP3E\Examples\PP3E\Internet\Ftp\mirror\downloadflat.py has mimetypes

C:\temp\PP3E\Examples\PP3E\Internet\Ftp\mirror\downloadflat_modular.py has mimet

ypes

C:\temp\PP3E\Examples\PP3E\Internet\Ftp\mirror\ftptools.py has mimetypes

C:\temp\PP3E\Examples\PP3E\Internet\Ftp\mirror\uploadflat.py has mimetypes

C:\temp\PP3E\Examples\PP3E\System\Media\playfile.py has mimetypes

>>> V.scount, V.fcount, V.dcount

(8, 1429, 186)

Because this is not completely accurate, though (you may need to add logic to include

extensions like “.pyw” missed by the guess), and because it’s not even appropriate for

all search clients (some may want to search specific kinds of text only), this scheme was

not used for the original class. Using and tailoring it for your own searches is left as

optional exercise.

Running the Script

Now, when Example 6-23 is run from the command line, if all goes well its canned self-

test code at the end opens a number of audio, image, text, and other file types located

in the script’s directory, using either platform-specific players or a general web browser.

On my Windows 7 laptop, GIF and HTML files open in new IE browser tabs; JPEG

files in Windows Photo Viewer; plain text files in Notepad; DOC and XLS files in

Microsoft Word and Excel; and audio files in Windows Media Player.

350 | Chapter 6: Complete System Programs

Because the programs used and their behavior may vary widely from machine to ma-

chine, though, you’re best off studying this script’s code and running it on your own

computer and with your own test files to see what happens. As usual, you can also test

it interactively (use the package path like this one to import from a different directory,

assuming your module search path includes the PP4E root):

>>> from PP4E.System.Media.playfile import playfile

>>> playfile(r'C:\movies\mov10428.mpg') # video/mpeg

We’ll use the playfile module again as an imported library like this in Chapter 13 to

open media files downloaded by FTP. Again, you may want to tweak this script’s tables

for your players. This script also assumes the media file is located on the local machine

(even though the webbrowser module supports remote files with “http://” names), and

it does not currently allow different players for most different MIME subtypes (it spe-

cial-cases text to handle “plain” and “html” differently, but no others). In fact, this

script is really just something of a simple framework that was designed to be extended.

As always, hack on; this is Python, after all.

Automated Program Launchers (External)

Finally, some optional reading—in the examples distribution package for this book

(available at sites listed in the Preface) you can find additional system-related scripts

we do not have space to cover here:

•PP4E\Launcher.py—contains tools used by some GUI programs later in the book

to start Python programs without any environment configuration. Roughly, it sets

up both the system path and module import search paths as needed to run book

examples, which are inherited by spawned programs. By using this module to

search for files and configure environments automatically, users can avoid (or at

least postpone) having to learn the intricacies of manual environment configura-

tion before running programs. Though there is not much new in this example from

a system interfaces perspective, we’ll refer back to it later, when we explore GUI

programs that use its tools, as well as those of its launchmodes cousin, which we

wrote in Chapter 5.

•PP4E\Launch_PyDemos.pyw and PP4E\Launch_PyGadgets_bar.pyw—use

Launcher.py to start major GUI book examples without any environment config-

uration. Because all spawned processes inherit configurations performed by the

launcher, they all run with proper search path settings. When run directly, the

underlying PyDemos2.pyw and PyGadgets_bar.pyw scripts (which we’ll explore

briefly at the end of Chapter 10) instead rely on the configuration settings on the

underlying machine. In other words, Launcher effectively hides configuration de-

tails from the GUI interfaces by enclosing them in a configuration program layer.

•PP4E\LaunchBrowser.pyw—portably locates and starts an Internet web browser

program on the host machine in order to view a local file or remote web page. In

Automated Program Launchers (External) | 351

prior versions, it used tools in Launcher.py to search for a reasonable browser to

run. The original version of this example has now been largely superseded by the

standard library’s webbrowser module, which arose after this example had been

developed (reptilian minds think alike!). In this edition, LaunchBrowser simply par-

ses command-line arguments for backward compatibility and invokes the open

function in webbrowser. See this module’s help text, or PyGadgets and PyDemos in

Chapter 10, for example command-line usage.

That’s the end of our system tools exploration. In the next part of this book we leave

the realm of the system shell and move on to explore ways to add graphical user inter-

faces to our program. Later, we’ll do the same using web-based approaches. As we

continue, keep in mind that the system tools we’ve studied in this part of the book see

action in a wide variety of programs. For instance, we’ll put threads to work to spawn

long-running tasks in the GUI part, use both threads and processes when exploring

server implementations in the Internet part, and use files and file-related system calls

throughout the remainder of the book.

Whether your interfaces are command lines, multiwindow GUIs, or distributed client/

server websites, Python’s system interfaces toolbox is sure to play a important part in

your Python programming future.

352 | Chapter 6: Complete System Programs

PART III

GUI Programming

This part of the book shows you how to apply Python to build portable graphical user

interfaces, primarily with Python’s standard tkinter library. The following chapters

cover this topic in depth:

Chapter 7

This chapter outlines GUI options available to Python developers, and then

presents a brief tutorial that illustrates core tkinter coding concepts.

Chapter 8

This chapter begins a two-part tour of the tkinter library—its widget set and related

tools. This first tour chapter covers simpler library tools and widgets: pop-up win-

dows, various types of buttons, images, and so on.

Chapter 9

This chapter continues the library tour begun in the prior chapter. It presents the

rest of the tkinter widget library, including menus, text, canvases, scroll bars, grids,

and time-based events and animation.

Chapter 10

This chapter takes a look at GUI programming techniques: we’ll learn how to build

menus automatically from object templates, spawn GUIs as separate programs,

run long-running tasks in parallel with threads and queues, and more.

Chapter 11

This chapter pulls the earlier chapters’ ideas together to implement a collection of

user interfaces. It presents a number of larger GUIs—clocks, text editors, drawing

programs, image viewers, and so on—which also demonstrate general Python

programming-in-the-large concepts along the way.

As in the first part of this book, the material presented here is applicable to a wide

variety of domains and will be utilized again to build domain-specific user interfaces

in later chapters of this book. For instance, the PyMailGUI and PyCalc examples of

later chapters will assume that you’ve covered the basics here.

CHAPTER 7

Graphical User Interfaces

“Here’s Looking at You, Kid”

For most software systems, a graphical user interface (GUI) has become an expected

part of the package. Even if the GUI acronym is new to you, chances are that you are

already familiar with such interfaces—the windows, buttons, and menus that we use

to interact with software programs. In fact, most of what we do on computers today is

done with some sort of point-and-click graphical interface. From web browsers to sys-

tem tools, programs are routinely dressed up with a GUI component to make them

more flexible and easier to use.

In this part of the book, we will learn how to make Python scripts sprout such graphical

interfaces, too, by studying examples of programming with the tkinter m o d u l e , a p o r t -

able GUI library that is a standard part of the Python system and the toolkit most widely

used by Python programmers. As we’ll see, it’s easy to program user interfaces in Python

scripts thanks to both the simplicity of the language and the power of its GUI libraries.

As an added bonus, GUIs programmed in Python with tkinter are automatically port-

able to all major computer systems.

GUI Programming Topics

Because GUIs are a major area, I want to say a few more words about this part of the

book before we get started. To make them easier to absorb, GUI programming topics

are split over the next five chapters:

• This chapter begins with a quick tkinter tutorial to teach coding basics. Interfaces

are kept simple here on purpose, so you can master the fundamentals before mov-

ing on to the following chapter’s interfaces. On the other hand, this chapter covers

all the basics: event processing, the pack g e o m e t r y m a n a g e r , u s i n g i n h e r i t a n c e a n d

composition in GUIs, and more. As we’ll see, object-oriented programming (OOP)

isn’t required for tkinter, but it makes GUIs structured and reusable.

355

• Chapters 8 and 9 take you on a tour of the tkinter widget set.* Roughly, Chap-

ter 8 presents simple widgets and Chapter 9 covers more advanced widgets and

related tools. Most of the interface devices you’re accustomed to seeing—sliders,

menus, dialogs, images, and their kin—show up here. These two chapters are not

a fully complete tkinter reference (which could easily fill a large book by itself), but

they should be enough to help you get started coding substantial Python GUIs.

The examples in these chapters are focused on widgets and tkinter tools, but Py-

thon’s support for code reuse is also explored along the way.

•Chapter 10 covers more advanced GUI programming techniques. It includes an

exploration of techniques for automating common GUI tasks with Python. Al-

though tkinter is a full-featured library, a small amount of reusable Python code

can make its interfaces even more powerful and easier to use.

•Chapter 11 wraps up by presenting a handful of complete GUI programs that make

use of coding and widget techniques presented in the four preceding chapters.

We’ll learn how to implement text editors, image viewers, clocks, and more.

Because GUIs are actually cross-domain tools, other GUI examples will also show up

throughout the remainder of this book. For example, we’ll later see complete email

GUIs and calculators, as well as a basic FTP client GUI; additional examples such as

tree viewers and table browsers are available externally in the book examples package.

Chapter 11 gives a list of forward pointers to other tkinter examples in this text.

After we explore GUIs, in Part IV we’ll also learn how to build basic user interfaces

within a web browser using HTML and Python scripts that run on a server—a very

different model with advantages and tradeoffs all its own that are important to under-

stand. Newer technologies such as the RIAs described later in this chapter build on the

web browser model to offer even more interface choices.

For now, though, our focus here is on more traditional GUIs—known as “desktop”

applications to some, and as “standalone” GUIs to others. As we’ll see when we meet

FTP and email client GUIs in the Internet part of this book, though, such programs

often connect to a network to do their work as well.

* The term “widget set” refers to the objects used to build familiar point-and-click user interface devices—

push buttons, sliders, input fields, and so on. tkinter comes with Python classes that correspond to all the

widgets you’re accustomed to seeing in graphical displays. Besides widgets, tkinter also comes with tools for

other activities, such as scheduling events to occur, waiting for socket data to arrive, and so on.

356 | Chapter 7: Graphical User Interfaces

Running the Examples

One other point I’d like to make right away: most GUIs are dynamic and interactive

interfaces, and the best I can do here is show static screenshots representing selected

states in the interactions such programs implement. This really won’t do justice to most

examples. If you are not working along with the examples already, I encourage you to

run the GUI examples in this and later chapters on your own.

On Windows, the standard Python install comes with tkinter support built in, so all

these examples should work immediately. Mac OS X comes bundled with a tkinter-

aware Python as well. For other systems, Pythons with tkinter support are either pro-

vided with the system itself or are readily available (see the top-level

README-PP4E.txt file in the book examples distribution for more details). Getting

tkinter to work on your computer is worth whatever extra install details you may need

to absorb, though; experimenting with these programs is a great way to learn about

both GUI programming and Python itself.

Also see the description of book example portability in general in this book’s Preface.

Although Python and tkinter are both largely platform neutral, you may run into some

minor platform-specific issues if you try to run this book’s examples on platforms other

than that used to develop this book. Mac OS X, for example, might pose subtle differ-

ences in some of the examples’ operation. Be sure to watch this book’s website for

pointers and possible future patches for using the examples on other platforms.

Has Anyone Noticed That G-U-I Are the First Three Letters of “GUIDO”?

Python creator Guido van Rossum didn’t originally set out to build a GUI development

tool, but Python’s ease of use and rapid turnaround have made this one of its primary

roles. From an implementation perspective, GUIs in Python are really just instances of

C extensions, and extensibility was one of the main ideas behind Python. When a script

builds push buttons and menus, it ultimately talks to a C library; and when a script

responds to a user event, a C library ultimately talks back to Python. It’s really just an

example of what is possible when Python is used to script external libraries.

But from a practical point of view, GUIs are a critical part of modern systems and an

ideal domain for a tool like Python. As we’ll see, Python’s simple syntax and object-

oriented flavor blend well with the GUI model—it’s natural to represent each device

drawn on a screen as a Python class. Moreover, Python’s quick turnaround lets pro-

grammers experiment with alternative layouts and behavior rapidly, in ways not pos-

sible with traditional development techniques. In fact, you can usually make a change

to a Python-based GUI and observe its effects in a matter of seconds. Don’t try this with

C++!

“Here’s Looking at You, Kid” | 357

Python GUI Development Options

Before we start wading into the tkinter pond, let’s begin with some perspective on

Python GUI options in general. Because Python has proven to be such a good match

for GUI work, this domain has seen much activity over the years. In fact, although

tkinter is by most accounts still the most widely used GUI toolkit in the Python world,

there are a variety of ways to program user interfaces in Python today. Some are specific

to Windows or X Windows,† s o m e a r e c r o s s - p l a t f o r m s o l u t i o n s , a n d a l l h a v e f o l l o w i n g s

and strong points of their own. To be fair to all the alternatives, here is a brief inventory

of GUI toolkits available to Python programmers as I write these words:

tkinter

An open source GUI library and the continuing de facto standard for portable GUI

development in Python. Python scripts that use tkinter to build GUIs run portably

on Windows, X Windows (Unix and Linux), and Macintosh OS X, and they display

a native look-and-feel on each of these platforms today. tkinter makes it easy to

build simple and portable GUIs quickly. Moreover, it can be easily augmented with

Python code, as well as with larger extension packages such as Pmw ( a t h i r d - p a r t y

widget library); Tix (another widget library, and now a standard part of Python);

PIL (an image-processing extension); and ttk (Tk themed widgets, also now a

standard part of Python as of version 3.1). More on such extensions like these later

in this introduction.

The underlying Tk library used by tkinter is a standard in the open source world

at large and is also used by the Perl, Ruby, PHP, Common Lisp, and Tcl scripting

languages, giving it a user base that likely numbers in the millions. The Python

binding to Tk is enhanced by Python’s simple object model—Tk widgets become

customizable and embeddable objects, instead of string commands. tkinter takes

the form of a module package in Python 3.X, with nested modules that group some

of its tools by functionality (it was formerly known as module Tkinter in Python

2.X, but was renamed to follow naming conventions, and restructured to provide

a more hierarchical organization).

tkinter is mature, robust, widely used, and well documented. It includes roughly

25 basic widget types, plus various dialogs and other tools. Moreover, there is a

dedicated book on the subject, plus a large library of published tkinter and Tk

documentation. Perhaps most importantly, because it is based on a library

† In this book, “Windows” refers to the Microsoft Windows interface common on PCs, and “X Windows”

refers to the X11 interface most commonly found on Unix and Linux platforms. These two interfaces are

generally tied to the Microsoft and Unix (and Unix-like) platforms, respectively. It’s possible to run X

Windows on top of a Microsoft operating system and Windows emulators on Unix and Linux, but it’s not

common. As if to muddy the waters further, Mac OS X supports Python’s tkinter on both X Windows and

the native Aqua GUI system directly, in addition to platform-specific cocoa options (though it’s usually not

too misleading to lump OS X in with the “Unix-like” crowd).

358 | Chapter 7: Graphical User Interfaces

developed for scripting languages, tkinter is also a relatively lightweight toolkit,

and as such it meshes well with a scripting language like Python.

Because of such attributes, Python’s tkinter module ships with Python as a standard

library module and is the basis of Python’s standard IDLE integrated development

environment GUI. In fact, tkinter is the only GUI toolkit that is part of Python; all

others on this list are third-party extensions. The underlying Tk library is also

shipped with Python on some platforms (including Windows, Mac OS X, and most

Linux and Unix-like systems). You can be reasonably sure that tkinter will be

present when your script runs, and you can guarantee this if needed by freezing

your GUI into a self-contained binary executable with tools like PyInstaller and

py2exe (see the Web for details).

Although tkinter is easy to use, its text and canvas widgets are powerful enough to

implement web pages, three-dimensional visualization, and animation. In addi-

tion, a variety of systems aim to provide GUI builders for Python/tkinter today,

including GUI Builder (formerly part of the Komodo IDE and relative of SpecTCL),

Rapyd-Tk, xRope, and others (though this set has historically tended to change

much over time; see http://wiki.python.org/moin/GuiProgramming or search the

Web for updates). As we will see, though, tkinter is usually so easy to code that

GUI builders are not widely used. This is especially true once we leave the realm

of the static layouts that builders typically support.

wxPython

A Python interface for the open source wxWidgets (formerly called wxWindows)

library, which is a portable GUI class framework originally written to be used from

the C++ programming language. The wxPython system is an extension module

that wraps wxWidgets classes. This library is generally considered to excel at

building sophisticated interfaces and is probably the second most popular Python

GUI toolkit today, behind tkinter. GUIs coded in Python with wxPython are port-

able to Windows, Unix-like platforms, and Mac OS X.

Because wxPython is based on a C++ class library, most observers consider it to

be more complex than tkinter: it provides hundreds of classes, generally requires

an object-oriented coding style, and has a design that some find reminiscent of the

MFC class library on Windows. wxPython often expects programmers to write

more code, partly because it is a more functional and thus complex system, and

partly because it inherits this mindset from its underlying C++ library.

Moreover, some of wxPython’s documentation is oriented toward C++, though

this story has been improved recently with the publication of a book dedicated to

wxPython. By contrast, tkinter is covered by one book dedicated to it, large sections

of other Python books, and an even larger library of existing literature on the un-

derlying Tk toolkit. Since the world of Python books has been remarkably dynamic

over the years, though, you should investigate the accuracy of these observations

at the time that you read these words; some books fade, while new Python books

appear on a regular basis.

Python GUI Development Options | 359

On the other hand, in exchange for its added complexity, wxPython provides a

powerful toolkit. wxPython comes with a richer set of widgets out of the box than

tkinter, including trees and HTML viewers—things that may require extensions

such as Pmw, Tix, or ttk in tkinter. In addition, some prefer the appearance of the

interfaces it renders. BoaConstructor and wxDesigner, among other options, pro-

vide a GUI builder that generates wxPython code. Some wxWidgets tools also

support non-GUI Python work as well. For a quick look at wxPython widgets and

code, run the demo that comes with the system (see http://wxpython.org/, or search

the Web for links).

PyQt

A Python interface to the Qt toolkit (now from Nokia, formerly by Trolltech), and

perhaps the third most widely used GUI toolkit for Python today. PyQt is a full-

featured GUI library and runs portably today on Windows, Mac OS X, and Unix

and Linux. Like wxPython, Qt is generally more complex, yet more feature rich,

than tkinter; it contains hundreds of classes and thousands of functions and meth-

ods. Qt grew up on Linux but became portable to other systems over time; reflect-

ing this heritage, the PyQt and PyKDE extension packages provide access to KDE

development libraries (PyKDE requires PyQt). The BlackAdder and Qt Designer

systems provide GUI builders for PyQt.

Perhaps Qt’s most widely cited drawback in the past has been that it was not

completely open source for full commercial use. Today, Qt provides both GPL and

LGPL open source licensing, as well as commercial license options. The LGPL and

GPL versions are open source, but conform to GPL licensing constraints (GPL may

also impose requirements beyond those of the Python BSD-style license; you must,

for example, make your source code freely available to end users).

PyGTK

A Python interface to GTK, a portable GUI library originally used as the core of

the Gnome window system on Linux. The gnome-python and PyGTK extension

packages export Gnome and GTK toolkit calls. At this writing, PyGTK runs port-

ably on Windows and POSIX systems such as Linux and Mac OS X (according to

its documentation, it currently requires that an X server for Mac OS X has been

installed, though a native Mac version is in the works).

Jython

Jython (the system formerly known as JPython) is a Python implementation for

Java, which compiles Python source code to Java bytecode, and gives Python scripts

seamless access to Java class libraries on the local machine. Because of that, Java

GUI libraries such as swing and awt become another way to construct GUIs in

Python code run by the JPython system. Such solutions are obviously Java specific

and limited in portability to that of Java and its libraries. Furthermore, swing may

be one of the largest and most complex GUI option for Python work. A new pack-

age named jTkinter also provides a tkinter port to Jython using Java’s JNI; if

360 | Chapter 7: Graphical User Interfaces

installed, Python scripts may also use tkinter to build GUIs under Jython. Jython

also has Internet roles we’ll meet briefly in Chapter 12.

IronPython

In a very similar vein, the IronPython system—an implementation of the Python

language for the .NET environment and runtime engine, which, among other

things, compiles Python programs to .NET bytecode—also offers Python scripts

GUI construction options in the .NET framework. You write Python code, but use

C#/.NET components to construct interfaces, and applications at large.

IronPython code can be run on .NET on Windows, but also on Linux under the

Mono implementation of .NET, and in the Silverlight client-side RIA framework

for web browsers (discussed ahead).

PythonCard

An open source GUI builder and library built on top of the wxPython toolkit and

considered by some to be one of Python’s closest equivalents to the kind of GUI

builders familiar to Visual Basic developers. PythonCard describes itself as a GUI

construction kit for building cross-platform desktop applications on Windows,

Mac OS X, and Linux, using the Python language.

Dabo

An open source GUI builder also built on wxPython, and a bit more. Dabo is a

portable, three-tier, cross-platform desktop application development framework,

inspired by Visual FoxPro and written in Python. Its tiers support database access,

business logic, and user interface. Its open design is intended to eventually support

a variety of databases and multiple user interfaces (wxPython, tkinter, and even

HTML over HTTP).

Rich Internet Applications (RIAs)

Although web pages rendered with HTML are also a kind of user interface, they

have historically been too limited to include in the general GUI category. However,

some observers would extend this category today to include systems which allow

browser-based interfaces to be much more dynamic than traditional web pages

have allowed. Because such systems provide widget toolkits rendered by web

browsers, they can offer some of the same portability benefits as web pages in

general.

The going buzzword for this brave new breed of toolkits is rich Internet applica-

tions (RIAs). It includes AJAX and JavaScript-oriented frameworks for use on the

client, such as:

Flex

An open source framework from Adobe and part of the Flash platform

Silverlight

A Microsoft framework which is also usable on Linux with Mono’s Moonlight,

and can be accessed by Python code with the IronPython system described

above

Python GUI Development Options | 361

JavaFX

A Java platform for building RIAs which can run across a variety of connected

devices

pyjamas

An AJAX-based port of the Google Web Toolkit to Python, which comes with

a set of interface widgets and compiles Python code that uses those widgets

into JavaScript, to be run in a browser on a client

The HTML5 standard under development proposes to address this domain as well.

Web browsers ultimately are “desktop” GUI applications, too, but are more per-

vasive than GUI libraries, and can be generalized with RIA tools to render other

GUIs. While it’s possible to build a widget-based GUI with such frameworks, they

can also add overheads associated with networking in general and often imply a

substantially heavier software stack than traditional GUI toolkits. Indeed, in order

to morph browsers into general GUI platforms, RIAs may imply extra software

layers and dependencies, and even multiple programming languages. Because of

that, and because not everyone codes for the Web today (despite what you may

have heard), we won’t include them in our look at traditional standalone/desktop

GUIs in this part of the book.

See the Internet part for more on RIAs and user interfaces based on browsers, and

be sure to watch for news and trends on this front over time. The interactivity these

tools provide is also a key part of what some refer to as “Web 2.0” when viewed

more from the perspective of the Web than GUIs. Since we’re concerned with the

latter here (and since user interaction is user interaction regardless of what jargon

we use for it), we’ll postpone further enumeration of this topic until the next part

of the book.

Platform-specific options

Besides the portable toolkits like tkinter, wxPython, and PyQt, and platform-

agnostic approaches such as RIAs, most major platforms have nonportable options

for Python-coded GUIs as well. For instance, on Macintosh OS X, PyObjC provides

a Python binding to Apple’s Objective-C/Cocoa framework, which is the basis for

much Mac development. On Windows, the PyWin32 extensions package for Py-

thon includes wrappers for the C++ Microsoft Foundation Classes (MFC) frame-

work (a library that includes interface components), as well as Pythonwin, an MFC

sample program that implements a Python development GUI. Although .NET

technically runs on Linux, too, the IronPython system mentioned earlier offers

additional Windows-focused options.

See the websites of these toolkits for more details. There are other lesser-known GUI

toolkits for Python, and new ones are likely to emerge by the time you read this book

(in fact, IronPython was new in the third edition, and RIAs are new in the fourth).

Moreover, packages like those in this list are prone to mutate over time. For an up-to-

date list of available tools, search the Web or browse http://www.python.org and the

PyPI third-party packages index maintained there.

362 | Chapter 7: Graphical User Interfaces

tkinter Overview

Of all the prior section’s GUI options, though, tkinter is by far the de facto standard

way to implement portable user interfaces in Python today, and the focus of this part

of the book. The rationale for this approach was explained in Chapter 1; in short, we

elected to present one toolkit in satisfying depth instead of many toolkits in less-than-

useful fashion. Moreover, most of the tkinter programming concepts you learn here

will translate directly to any other GUI toolkit you choose to utilize.

tkinter Pragmatics

Perhaps more to the point, though, there are pragmatic reasons that the Python world

still gravitates to tkinter as its de facto standard portable GUI toolkit. Among them,

tkinter’s accessibility, portability, availability, documentation, and extensions have

made it the most widely used Python GUI solution for many years running:

Accessibility

tkinter is generally regarded as a lightweight toolkit and one of the simplest GUI

solutions for Python available today. Unlike larger frameworks, it is easy to get

started in tkinter right away, without first having to grasp a much larger class in-

teraction model. As we’ll see, programmers can create simple tkinter GUIs in a few

lines of Python code and scale up to writing industrial-strength GUIs gradually.

Although the tkinter API is basic, additional widgets can be coded in Python or

obtained in extension packages such as Pmw, Tix, and ttk.

Portability

A Python script that builds a GUI with tkinter will run without source code changes

on all major windowing platforms today: Microsoft Windows, X Windows (on

Unix and Linux), and the Macintosh OS X (and also ran on Mac classics). Further,

that same script will provide a native look-and-feel to its users on each of these

platforms. In fact, this feature became more apparent as Tk matured. A Python/

tkinter script today looks like a Windows program on Windows; on Unix and

Linux, it provides the same interaction but sports an appearance familiar to X

Windows users; and on the Mac, it looks like a Mac program should.

Availability

tkinter is a standard module in the Python library, shipped with the interpreter. If

you have Python, you have tkinter. Moreover, most Python installation packages

(including the standard Python self-installer for Windows, that provided on Mac

OS X, and many Linux distributions) come with tkinter support bundled. Because

of that, scripts written to use the tkinter module work immediately on most Python

interpreters, without any extra installation steps. tkinter is also generally better

supported than its alternatives today. Because the underlying Tk library is also used

by the Tcl and Perl programming languages (and others), it tends to receive more

development resources than other toolkits available.

tkinter Overview | 363

Naturally, other factors such as documentation and extensions are important when

using a GUI toolkit, too; let’s take a quick look at the story tkinter has to tell on these

fronts as well.

tkinter Documentation

This book explores tkinter fundamentals and most widgets tools, and it should be

enough to get started with substantial GUI development in Python. On the other hand,

it is not an exhaustive reference to the tkinter library or extensions to it. Happily, at

least one book dedicated to using tkinter in Python is now commercially available as I

write this paragraph, and others are on the way (search the Web for details). Besides

books, you can also find tkinter documentation online; a complete set of tkinter man-

uals is currently maintained on the Web at http://www.pythonware.com/library.

In addition, because the underlying Tk toolkit used by tkinter is also a de facto standard

in the open source scripting community at large, other documentation sources apply.

For instance, because Tk has also been adopted by the Tcl and Perl programming lan-

guages, Tk-oriented books and documentation written for both of these are directly

applicable to Python/tkinter as well (albeit, with some syntactic mapping).

Frankly, I learned tkinter by studying Tcl/Tk texts and references—just replace Tcl

strings with Python objects and you have additional reference libraries at your disposal

(see Table 7-2, the Tk-to-tkinter conversion guide, at the end of this chapter for help

reading Tk documentation). For instance, the book Tcl/Tk Pocket Reference (O’Reilly)

can serve as a nice supplement to the tkinter tutorial material in this part of the book.

Moreover, since Tk concepts are familiar to a large body of programmers, Tk support

is also readily available on the Net.

After you’ve learned the basics, examples can help, too. You can find tkinter demo

programs, besides those you’ll study in this book, at various locations around the Web.

Python itself includes a set of demo programs in the Demos\tkinter subdirectory of its

source distribution package. The IDLE development GUI mentioned in the next section

makes for an interesting code read as well.

tkinter Extensions

Because tkinter is so widely used, programmers also have access to precoded Python

extensions designed to work with or augment it. Some of these may not yet be available

for Python 3.X as I write this but are expected to be soon. For instance:

Pmw

Python Mega Widgets is an extension toolkit for building high-level compound

widgets in Python using the tkinter module. It extends the tkinter API with a col-

lection of more sophisticated widgets for advanced GUI development and a frame-

work for implementing some of your own. Among the precoded and extensible

megawidgets shipped with the package are notebooks, combo boxes, selection

364 | Chapter 7: Graphical User Interfaces

widgets, paned widgets, scrolled widgets, dialog windows, button boxes, balloon

help, and an interface to the Blt graph widget.

The interface to Pmw megawidgets is similar to that of basic tkinter widgets, so

Python scripts can freely mix Pmw megawidgets with standard tkinter widgets.

Moreover, Pmw is pure Python code, and so requires no C compiler or tools to

install. To view its widgets and the corresponding code you use to construct them,

run the demos\All.py script in the Pmw distribution package. You can find Pmw at

http://pmw.sourceforge.net.

Tix

Tix is a collection of more than 40 advanced widgets, originally written for

Tcl/Tk but now available for use in Python/tkinter programs. This package is now

a Python standard library module, called tkinter.tix. Like Tk, the underlying Tix

library is also shipped today with Python on Windows. In other words, on Win-

dows, if you install Python, you also have Tix as a preinstalled library of additional

widgets.

Tix includes many of the same devices as Pmw, including spin boxes, trees, tabbed

notebooks, balloon help pop ups, paned windows, and much more. See the Python

library manual’s entry for the Tix module for more details. For a quick look at its

widgets, as well as the Python source code used to program them, run the

tixwidgets.py demonstration program in the Demo\tix directory of the Python

source distribution (this directory is not installed by default on Windows and is

prone to change—you can generally find it after fetching and unpacking Python’s

source code from Python.org).

ttk

Tk themed widgets, ttk, is a relatively new widget set which attempts to separate

the code implementing a widget’s behavior from that implementing its appearance.

Widget classes handle state and callback invocation, whereas widget appearance

is managed separately by themes. Much like Tix, this extension began life sepa-

rately, but was very recently incorporated into Python’s standard library in Python

3.1, as module tkinter.ttk.

Also like Tix, this extension comes with advanced widget types, some of which are

not present in standard tkinter itself. Specifically, ttk comes with 17 widgets, 11 of

which are already present in tkinter and are meant as replacements for some of

tkinter’s standard widgets, and 6 of which are new—Combobox, Notebook, Pro-

gressbar, Separator, Sizegrip and Treeview. In a nutshell, scripts import from the

ttk module after tkinter in order to use its replacement widgets and configure style

objects possibly shared by multiple widgets, instead of configuring widgets

themselves.

As we’ll see in this chapter, it’s possible to provide a common look-and-feel for a

set of widgets with standard tkinter, by subclassing its widget classes using normal

OOP techniques (see “Customizing Widgets with Classes” on page 400). How-

ever, ttk offers additional style options and advanced widget types. For more details

tkinter Overview | 365

on ttk widgets, see the entry in the Python library manual or search the Web; this

book focuses on tkinter fundamentals, and tix and ttk are both too large to cover

in a useful fashion here.

PIL

The Python Imaging Library (PIL) is an open source extension package that adds

image-processing tools to Python. Among other things, it provides tools for image

thumbnails, transforms, and conversions, and it extends the basic tkinter image

object to add support for displaying many image file types. PIL, for instance, allows

tkinter GUIs to display JPEG, TIFF, and PNG images not supported by the base

tkinter toolkit itself (without extension, tkinter supports GIFs and a handful of

bitmap formats). See the end of Chapter 8 for more details and examples; we’ll use

PIL in this book in a number of image-related example scripts. PIL can be found

at http://www.pythonware.com or via a web search.

IDLE

The IDLE integrated Python development environment is both written in Python

with tkinter and shipped and installed with the Python package (if you have a recent

Python interpreter, you should have IDLE too; on Windows, click the Start button,

select the Programs menu, and click the Python entry to find it). IDLE provides

syntax-coloring text editors for Python code, point-and-click debugging, and more,

and is an example of tkinter’s utility.

Others

Many of the extensions that provide visualization tools for Python are based on

the tkinter library and its canvas widget. See the PyPI website and your favorite

web search engine for more tkinter extension examples.

If you plan to do any commercial-grade GUI development with tkinter, you’ll probably

want to explore extensions such as Pmw, PIL, Tix, and ttk after learning tkinter basics

in this text. They can save development time and add pizzazz to your GUIs. See the

Python-related websites mentioned earlier for up-to-date details and links.

tkinter Structure

From a more nuts-and-bolts perspective, tkinter is an integration system that implies

a somewhat unique program structure. We’ll see what this means in terms of code in

a moment, but here is a brief introduction to some of the terms and concepts at the

core of Python GUI programming.

Implementation structure

Strictly speaking, tkinter is simply the name of Python’s interface to Tk—a GUI library

originally written for use with the Tcl programming language and developed by Tcl’s

creator, John Ousterhout. Python’s tkinter module talks to Tk, and the Tk library in

turn interfaces with the underlying window system: Microsoft Windows, X Windows

366 | Chapter 7: Graphical User Interfaces

on Unix, or whatever GUI system your Python uses on your Macintosh. The portability

of tkinter actually stems from the underling Tk library it wraps.

Python’s tkinter adds a software layer on top of Tk that allows Python scripts to call

out to Tk to build and configure interfaces and routes control back to Python scripts

that handle user-generated events (e.g., mouse clicks). That is, GUI calls are internally

routed from Python script, to tkinter, to Tk; GUI events are routed from Tk, to tkinter,

and back to a Python script. In Chapter 20, we’ll know these transfers by their C inte-

gration terms, extending and embedding.

Technically, tkinter is today structured as a combination of the Python-coded

tkinter module package’s files and an extension module called _tkinter that is written

in C. _tkinter interfaces with the Tk library using extending tools and dispatches call-

backs back to Python objects using embedding tools; tkinter simply adds a class-based

interface on top of _tkinter. You should almost always import tkinter in your scripts,

though, not _tkinter; the latter is an implementation module meant for internal use

only (and was oddly named for that reason).

Programming structure

Luckily, Python programmers don’t normally need to care about all this integration

and call routing going on internally; they simply make widgets and register Python

functions to handle widget events. Because of the overall structure, though, event han-

dlers are usually known as callback handlers, because the GUI library “calls back” to

Python code when events occur.

In fact, we’ll find that Python/tkinter programs are entirely event driven: they build

displays and register handlers for events, and then do nothing but wait for events to

occur. During the wait, the Tk GUI library runs an event loop that watches for mouse

clicks, keyboard presses, and so on. All application program processing happens in the

registered callback handlers in response to events. Further, any information needed

across events must be stored in long-lived references such as global variables and class

instance attributes. The notion of a traditional linear program control flow doesn’t

really apply in the GUI domain; you need to think in terms of smaller chunks.

In Python, Tk also becomes object oriented simply because Python is object oriented:

the tkinter layer exports Tk’s API as Python classes. With tkinter, we can either use a

simple function-call approach to create widgets and interfaces, or apply object-oriented

techniques such as inheritance and composition to customize and extend the base set

of tkinter classes. Larger tkinter GUIs are generally constructed as trees of linked tkinter

widget objects and are often implemented as Python classes to provide structure and

retain state information between events. As we’ll see in this part of the book, a tkinter

GUI coded with classes almost by default becomes a reusable software component.

tkinter Overview | 367

Climbing the GUI Learning Curve

On to the code; let’s start out by quickly stepping through a few small examples that

illustrate basic concepts and show the windows they create on the computer display.

The examples will become progressively more sophisticated as we move along, but let’s

get a handle on the fundamentals first.

“Hello World” in Four Lines (or Less)

The usual first example for GUI systems is to show how to display a “Hello World”

message in a window. As coded in Example 7-1, it’s just four lines in Python.

Example 7-1. PP4E\Gui\Intro\gui1.py

from tkinter import Label # get a widget object

widget = Label(None, text='Hello GUI world!') # make one

widget.pack() # arrange it

widget.mainloop() # start event loop

This is a complete Python tkinter GUI program. When this script is run, we get a simple

window with a label in the middle; it looks like Figure 7-1 on my Windows 7 laptop (I

stretched some windows in this book horizontally to reveal their window titles; your

platform’s window system may vary).

Figure 7-1. “Hello World” (gui1) on Windows

This isn’t much to write home about yet, but notice that this is a completely functional,

independent window on the computer’s display. It can be maximized to take up the

entire screen, minimized to hide it in the system bar, and resized. Click on the window’s

“X” box in the top right to kill the window and exit the program.

The script that builds this window is also fully portable. Run this script on your machine

to see how it renders. When this same file is run on Linux it produces a similar window,

but it behaves according to the underlying Linux window manager. Even on the same

operating system, the same Python code might yields a different look-and-feel for dif-

ferent window systems (for instance, under KDE and Gnome on Linux). The same

script file would look different still when run on Macintosh and other Unix-like window

managers. On all platforms, though, its basic functional behavior will be the same.

368 | Chapter 7: Graphical User Interfaces

tkinter Coding Basics

The gui1 script is a trivial example, but it illustrates steps common to most tkinter

programs. This Python code does the following:

1. Loads a widget class from the tkinter module

2. Makes an instance of the imported Label class

3. Packs (arranges) the new Label in its parent widget

4. Calls mainloop to bring up the window and start the tkinter event loop

The mainloop method called last puts the label on the screen and enters a tkinter wait

state, which watches for user-generated GUI events. Within the mainloop function,

tkinter internally monitors things such as the keyboard and mouse to detect user-

generated events. In fact, the tkinter mainloop function is similar in spirit to the fol-

lowing pseudo-Python code:

def mainloop():

while the main window has not been closed:

if an event has occurred:

run the associated event handler function

Because of this model, the mainloop call in Example 7-1 never returns to our script while

the GUI is displayed on-screen.‡ When we write larger scripts, the only way we can get

anything done after calling mainloop is to register callback handlers to respond to events.

This is called event-driven programming, and it is perhaps one of the most unusual

aspects of GUIs. GUI programs take the form of a set of event handlers that share saved

information rather than of a single main control flow. We’ll see how this looks in terms

of real code in later examples.

Note that for code in a script file, you really need to do steps 3 and 4 in the preceding

list to open this script’s GUI. To display a GUI’s window at all, you need to call main

loop; to display widgets within the window, they must be packed (or otherwise ar-

ranged) so that the tkinter geometry manager knows about them. In fact, if you call

either mainloop or pack without calling the other, your window won’t show up as ex-

pected: a mainloop without a pack shows an empty window, and a pack without a

mainloop in a script shows nothing since the script never enters an event wait state (try

it). The mainloop call is sometimes optional when you’re coding interactively, but you

shouldn’t rely on this in general.

Since the concepts illustrated by this simple script are at the core of most tkinter pro-

grams, let’s take a deeper look at some of them before moving on.

‡ Technically, the mainloop call returns to your script only after the tkinter event loop exits. This normally

happens when the GUI’s main window is closed, but it may also occur in response to explicit quit method

calls that terminate nested event loops but leave open the GUI at large. You’ll see why this matters in

Chapter 8.

Climbing the GUI Learning Curve | 369

Making Widgets

When widgets are constructed in tkinter, we can specify how they should be configured.

The gui1 script passes two arguments to the Label class constructor:

• The first is a parent-widget object, which we want the new label to be attached to.

Here, None means “attach the new Label to the default top-level window of this

program.” Later, we’ll pass real widgets in this position to attach our labels to other

container objects.

• The second is a configuration option for the Label, passed as a keyword argument:

the text option specifies a text string to appear as the label’s message. Most widget

constructors accept multiple keyword arguments for specifying a variety of options

(color, size, callback handlers, and so on). Most widget configuration options have

reasonable defaults per platform, though, and this accounts for much of tkinter’s

simplicity. You need to set most options only if you wish to do something custom.

As we’ll see, the parent-widget argument is the hook we use to build up complex GUIs

as widget trees. tkinter works on a “what-you-build-is-what-you-get” principle—we

construct widget object trees as models of what we want to see on the screen, and then

ask the tree to display itself by calling mainloop.

Geometry Managers

The pack widget method called by the gui1 script invokes the packer geometry man-

ager, one of three ways to control how widgets are arranged in a window. tkinter ge-

ometry managers simply arrange one or more widgets within a container (sometimes

called a parent or master). Both top-level windows and frames (a special kind of widget

we’ll meet later) can serve as containers, and containers may be nested inside other

containers to build hierarchical displays.

The packer geometry manager uses constraint option settings to automatically position

widgets in a window. Scripts supply higher-level instructions (e.g., “attach this widget

to the top of its container, and stretch it to fill its space vertically”), not absolute pixel

coordinates. Because such constraints are so abstract, the packer provides a powerful

and easy-to-use layout system. In fact, you don’t even have to specify constraints. If

you don’t pass any arguments to pack, you get default packing, which attaches the

widget to the top side of its container.

We’ll visit the packer repeatedly in this chapter and use it in many of the examples in

this book. In Chapter 9, we will also meet an alternative grid geometry manager—a

layout system that arranges widgets within a container in tabular form (i.e., by rows

and columns) and works well for input forms. A third alternative, called the placer

geometry manager system, is described in Tk documentation but not in this book; it’s

less popular than the pack and grid managers and can be difficult to use for larger GUIs

coded by hand.

370 | Chapter 7: Graphical User Interfaces

Running GUI Programs

Like all Python code, the module in Example 7-1 can be started in a number of ways—

by running it as a top-level program file:

C:\...\PP4E\Gui\Intro> python gui1.py

by importing it from a Python session or another module file:

>>> import gui1

by running it as a Unix executable if we add the special #! line at the top:

% gui1.py &

and in any other way Python programs can be launched on your platform. For instance,

the script can also be run by clicking on the file’s name in a Windows file explorer, and

its code can be typed interactively at the >>> prompt.§ It can even be run from a C

program by calling the appropriate embedding API function (see Chapter 20 for details

on C integration).

In other words, there are really no special rules to follow when launching GUI Python

code. The tkinter interface (and Tk itself) is linked into the Python interpreter. When

a Python program calls GUI functions, they’re simply passed to the embedded GUI

system behind the scenes. That makes it easy to write command-line tools that pop up

windows; they are run the same way as the purely text-based scripts we studied in the

prior part of this book.

Avoiding DOS consoles on Windows

In Chapters 3 and 6 we noted that if a program’s name ends in a .pyw extension rather

than a .py extension, the Windows Python port does not pop up a DOS console box

to serve as its standard streams when the file is launched by clicking its filename icon.

Now that we’ve finally started making windows of our own, that filename trick will

start to become even more useful.

If you just want to see the windows that your script makes no matter how it is launched,

be sure to name your GUI scripts with a .pyw if they might be run on Windows. For

instance, clicking on the file in Example 7-2 in a Windows explorer creates just the

window in Figure 7-1.

Example 7-2. PP4E\Gui\Intro\gui1.pyw

...same as gui1.py...

§ Tip: As suggested earlier, when typing tkinter GUI code interactively, you may or may not need to call

mainloop to display widgets. This is required in the current IDLE interface, but not from a simple interactive

session running in a system console window. In either case, control will return to the interactive prompt

when you kill the window you created. Note that if you create an explicit main-window widget by calling

Tk() and attach widgets to it (described later), you must call this again after killing the window; otherwise,

the application window will not exist.

Climbing the GUI Learning Curve | 371

You can also avoid the DOS pop up on Windows by running the program with the

pythonw.exe executable, not python.exe (in fact, .pyw files are simply registered to be

opened by pythonw). On Linux, the .pyw doesn’t hurt, but it isn’t necessary; there is no

notion of a streams pop up on Unix-like machines. On the other hand, if your GUI

scripts might run on Windows in the future, adding an extra “w” at the end of their

names now might save porting effort later. In this book, .py filenames are still sometimes

used to pop up console windows for viewing printed messages on Windows.

tkinter Coding Alternatives

As you might expect, there are a variety of ways to code the gui1 example. For instance,

if you want to make all your tkinter imports more explicit in your script, grab the whole

module and prefix all of its names with the module’s name, as in Example 7-3.

Example 7-3. PP4E\Gui\Intro\gui1b.py—import versus from

import tkinter

widget = tkinter.Label(None, text='Hello GUI world!')

widget.pack()

widget.mainloop()

That will probably get tedious in realistic examples, though—tkinter exports dozens

of widget classes and constants that show up all over Python GUI scripts. In fact, it is

usually easier to use a * to import everything from the tkinter module by name in one

shot. This is demonstrated in Example 7-4.

Example 7-4. PP4E\Gui\Intro\gui1c.py—roots, sides, pack in place

from tkinter import *

root = Tk()

Label(root, text='Hello GUI world!').pack(side=TOP)

root.mainloop()

The tkinter module goes out of its way to export only what we really need, so it’s one

of the few for which the * import form is relatively safe to apply.‖ The TOP constant in

the pack call here, for instance, is one of those many names exported by the tkinter

module. It’s simply a variable name (TOP="top") preassigned in constants, a module

automatically loaded by tkinter.

When widgets are packed, we can specify which side of their parent they should be

attached to—TOP, BOTTOM, LEFT, or RIGHT. If no side option is sent to pack (as in prior

examples), a widget is attached to its parent’s TOP by default. In general, larger tkinter

GUIs can be constructed as sets of rectangles, attached to the appropriate sides of other,

‖If you study the main tkinter file in the Python source library (currently, Lib\tkinter\__init__.py), you’ll notice

that top-level module names not meant for export start with a single underscore. Python never copies over

such names when a module is accessed with the * form of the from statement. The constants module is today

constants.py in the same module package directory, though this can change (and has) over time.

372 | Chapter 7: Graphical User Interfaces

enclosing rectangles. As we’ll see later, tkinter arranges widgets in a rectangle according

to both their packing order and their side attachment options. When widgets are grid-

ded, they are assigned row and column numbers instead. None of this will become very

meaningful, though, until we have more than one widget in a window, so let’s move on.

Notice that this version calls the pack method right away after creating the label, without

assigning it a variable. If we don’t need to save a widget, we can pack it in place like

this to eliminate a statement. We’ll use this form when a widget is attached to a larger

structure and never again referenced. This can be tricky if you assign the pack result,

though, but I’ll postpone an explanation of why until we’ve covered a few more basics.

We also use a Tk widget class instance, instead of None, as the parent here. Tk represents

the main (“root”) window of the program—the one that starts when the program does.

An automatically created Tk instance is also used as the default parent widget, both

when we don’t pass any parent to other widget calls and when we pass the parent as

None. In other words, widgets are simply attached to the main program window by

default. This script just makes this default behavior explicit by making and passing the

Tk object itself. In Chapter 8, we’ll see that Toplevel widgets are typically used to gen-

erate new pop-up windows that operate independently of the program’s main window.

In tkinter, some widget methods are exported as functions, and this lets us shave

Example 7-5 to just three lines of code.

Example 7-5. PP4E\Gui\Intro\gui1d.py—a minimal version

from tkinter import *

Label(text='Hello GUI world!').pack()

mainloop()

The tkinter mainloop can be called with or without a widget (i.e., as a function or

method). We didn’t pass Label a parent argument in this version, either: it simply

defaults to None when omitted (which in turn defaults to the automatically created Tk

object). But relying on that default is less useful once we start building larger displays.

Things such as labels are more typically attached to other widget containers.

Widget Resizing Basics

Top-level windows, such as the one built by all of the coding variants we have seen

thus far, can normally be resized by the user; simply drag out the window with your

mouse. Figure 7-2 shows how our window looks when it is expanded.

This isn’t very good—the label stays attached to the top of the parent window instead

of staying in the middle on expansion—but it’s easy to improve on this with a pair of

pack options, demonstrated in Example 7-6.

tkinter Coding Alternatives | 373

Example 7-6. PP4E\Gui\Intro\gui1e.py—expansion

from tkinter import *

Label(text='Hello GUI world!').pack(expand=YES, fill=BOTH)

mainloop()

When widgets are packed, we can specify whether a widget should expand to take up

all available space, and if so, how it should stretch to fill that space. By default, widgets

are not expanded when their parent is. But in this script, the names YES and BOTH (im-

ported from the tkinter module) specify that the label should grow along with its parent,

the main window. It does so in Figure 7-3.

Figure 7-3. gui1e with widget resizing

Technically, the packer geometry manager assigns a size to each widget in a display

based on what it contains (text string lengths, etc.). By default, a widget can occupy

only its allocated space and is no bigger than its assigned size. The expand and fill

options let us be more specific about such things:

Figure 7-2. Expanding gui1

374 | Chapter 7: Graphical User Interfaces

expand=YES option

Asks the packer to expand the allocated space for the widget in general into any

unclaimed space in the widget’s parent.

fill option

Can be used to stretch the widget to occupy all of its allocated space.

Combinations of these two options produce different layout and resizing effects, some

of which become meaningful only when there are multiple widgets in a window. For

example, using expand without fill centers the widget in the expanded space, and the

fill option can specify vertical stretching only (fill=Y), horizontal stretching only

(fill=X), or both (fill=BOTH). By providing these constraints and attachment sides for

all widgets in a GUI, along with packing order, we can control the layout in fairly precise

terms. In later chapters, we’ll find that the grid geometry manager uses a different

resizing protocol entirely, but it provides similar control when needed.

All of this can be confusing the first time you hear it, and we’ll return to this later. But

if you’re not sure what an expand and fill combination will do, simply try it out—this

is Python, after all. For now, remember that the combination of expand=YES and

fill=BOTH is perhaps the most common setting; it means “expand my space allocation

to occupy all available space on my side, and stretch me to fill the expanded space in

both directions.” For our “Hello World” example, the net result is that the label grows

as the window is expanded, and so is always centered.

Configuring Widget Options and Window Titles

So far, we’ve been telling tkinter what to display on our label by passing its text as a

keyword argument in label constructor calls. It turns out that there are two other ways

to specify widget configuration options. In Example 7-7, the text option of the label is

set after it is constructed, by assigning to the widget’s text key. Widget objects overload

(intercept) index operations such that options are also available as mapping keys, much

like a dictionary.

Example 7-7. PP4E\Gui\Intro\gui1f.py—option keys

from tkinter import *

widget = Label()

widget['text'] = 'Hello GUI world!'

widget.pack(side=TOP)

mainloop()

More commonly, widget options can be set after construction by calling the widget

config method, as in Example 7-8.

tkinter Coding Alternatives | 375

Example 7-8. PP4E\Gui\Intro\gui1g.py—config and titles

from tkinter import *

root = Tk()

widget = Label(root)

widget.config(text='Hello GUI world!')

widget.pack(side=TOP, expand=YES, fill=BOTH)

root.title('gui1g.py')

root.mainloop()

The config method (which can also be called by its synonym, configure) can be called

at any time after construction to change the appearance of a widget on the fly. For

instance, we could call this label’s config method again later in the script to change the

text that it displays; watch for such dynamic reconfigurations in later examples in this

part of the book.

Notice that this version also calls a root.title method; this call sets the label that

appears at the top of the window, as pictured in Figure 7-4. In general terms, top-level

windows like the Tk root here export window-manager interfaces—i.e., things that

have to do with the border around the window, not its contents.

Figure 7-4. gui1g with expansion and a window title

Just for fun, this version also centers the label upon resizes by setting the expand and

fill pack options. In fact, this version makes just about everything explicit and is more

representative of how labels are often coded in full-blown interfaces; their parents,

expansion policies, and attachments are usually spelled out rather than defaulted.

One More for Old Times’ Sake

Finally, if you are a minimalist and you’re nostalgic for old Python coding styles, you

can also program this “Hello World” example as in Example 7-9.

Example 7-9. PP4E\Gui\Intro\gui1-old.py—dictionary calls

from tkinter import *

Label(None, {'text': 'Hello GUI world!', Pack: {'side': 'top'}}).mainloop()

376 | Chapter 7: Graphical User Interfaces

This makes the window in just two lines, albeit arguably gruesome ones! This scheme

relies on an old coding style that was widely used until Python 1.3, which passed

configuration options in a dictionary instead of keyword arguments.# In this scheme,

packer options can be sent as values of the key Pack (a class in the tkinter module).

The dictionary call scheme still works and you may see it in old Python code, but it’s

probably best to not do this in code you type. Use keywords to pass options, and use

explicit pack method calls in your tkinter scripts instead. In fact, the only reason I didn’t

cut this example completely is that dictionaries can still be useful if you want to compute

and pass a set of options dynamically.

On the other hand, the func(*pargs, **kargs) syntax now also allows you to pass an

explicit dictionary of keyword arguments in its third argument slot:

options = {'text': 'Hello GUI world!'}

layout = {'side': 'top'}

Label(None, **options).pack(**layout) # keyword must be strings

Even in dynamic scenarios where widget options are determined at run time, there’s

no compelling reason to ever use the pre-1.3 tkinter dictionary call form.

Packing Widgets Without Saving Them

In gui1c.py (shown in Example 7-4), I started packing labels without assigning them to

names. This works, and it is an entirely valid coding style, but because it tends to

confuse beginners at first glance, I need to explain why it works in more detail here.

In tkinter, Python class objects correspond to real objects displayed on a screen; we

make the Python object to make a screen object, and we call the Python object’s meth-

ods to configure that screen object. Because of this correspondence, the lifetime of the

Python object must generally correspond to the lifetime of the corresponding object on

the screen.

Luckily, Python scripts don’t usually have to care about managing object lifetimes. In

fact, they do not normally need to maintain a reference to widget objects created along

the way at all unless they plan to reconfigure those objects later. For instance, it’s

common in tkinter programming to pack a widget immediately after creating it if no

further reference to the widget is required:

Label(text='hi').pack() # OK

This expression is evaluated left to right, as usual. It creates a new label and then im-

mediately calls the new object’s pack method to arrange it in the display. Notice,

though, that the Python Label object is temporary in this expression; because it is not

#In fact, Python’s pass-by-name keyword arguments were first introduced to help clean up tkinter calls such

as this one. Internally, keyword arguments really are passed as a dictionary (which can be collected with the

**name argument form in a def header), so the two schemes are similar in implementation. But they vary

widely in the number of characters you need to type and debug.

tkinter Coding Alternatives | 377

assigned to a name, it would normally be garbage collected (destroyed and reclaimed)

by Python immediately after running its pack method.

However, because tkinter emits Tk calls when objects are constructed, the label will be

drawn on the display as expected, even though we haven’t held onto the corresponding

Python object in our script. In fact, tkinter internally cross-links widget objects into a

long-lived tree used to represent the display, so the Label object made during this

statement actually is retained, even if not by our code.*

In other words, your scripts don’t generally have to care about widget object lifetimes,

and it’s OK to make widgets and pack them immediately in the same statement without

maintaining a reference to them explicitly in your code.

But that does not mean that it’s OK to say something like this:

widget = Label(text='hi').pack() # wrong!

...use widget...

This statement almost seems like it should assign a newly packed label to widget, but

it does not do this. In fact, it’s really a notorious tkinter beginner’s mistake. The widget

pack method packs the widget but does not return the widget thus packed. Really,

pack returns the Python object None; after such a statement, widget will be a reference

to None, and any further widget operations through that name will fail. For instance,

the following fails, too, for the same reason:

Label(text='hi').pack().mainloop() # wrong!

Since pack returns None, asking for its mainloop attribute generates an exception (as it

should). If you really want to both pack a widget and retain a reference to it, say this

instead:

widget = Label(text='hi') # OK too

widget.pack()

...use widget...

This form is a bit more verbose but is less tricky than packing a widget in the same

statement that creates it, and it allows you to hold onto the widget for later processing.

It’s probably more common in realistic scripts that perform more complex widget con-

figuration and layouts.

On the other hand, scripts that compose layouts often add some widgets once and for

all when they are created and never need to reconfigure them later; assigning to long-

lived names in such programs is pointless and unnecessary.

* Ex-Tcl programmers in the audience may be interested to know that, at least at the time I was writing this

footnote, Python not only builds the widget tree internally, but uses it to automatically generate widget

pathname strings coded manually in Tcl/Tk (e.g., .panel.row.cmd). Python uses the addresses of widget class

objects to fill in the path components and records pathnames in the widget tree. A label attached to a

container, for instance, might have an assigned name such as .8220096.8219408 inside tkinter. You don’t have

to care, though. Simply make and link widget objects by passing parents, and let Python manage pathname

details based on the object tree. See the end of this chapter for more on Tk/tkinter mappings.

378 | Chapter 7: Graphical User Interfaces

In Chapter 8, we’ll meet two exceptions to this rule. Scripts must man-

ually retain a reference to image objects because the underlying image

data is discarded if the Python image object is garbage collected. tkinter

variable class objects also temporarily unset an associated Tk variable if

reclaimed, but this is uncommon and less harmful.

Adding Buttons and Callbacks

So far, we’ve learned how to display messages in labels, and we’ve met tkinter core

concepts along the way. Labels are nice for teaching the basics, but user interfaces

usually need to do a bit more…like actually responding to users. To show how, the

program in Example 7-10 creates the window in Figure 7-5.

Example 7-10. PP4E\Gui\Intro\gui2.py

import sys

from tkinter import *

widget = Button(None, text='Hello widget world', command=sys.exit)

widget.pack()

widget.mainloop()

Figure 7-5. A button on the top

Here, instead of making a label, we create an instance of the tkinter Button class. It’s

attached to the default top level window as before on the default TOP packing side. But

the main thing to notice here is the button’s configuration arguments: we set an option

called command to the sys.exit function.

For buttons, the command option is the place where we specify a callback handler func-

tion to be run when the button is later pressed. In effect, we use command to register an

action for tkinter to call when a widget’s event occurs. The callback handler used here

isn’t very interesting: as we learned in Chapter 5, the built-in sys.exit function simply

shuts down the calling program. Here, that means that pressing this button makes the

window go away.

Just as for labels, there are other ways to code buttons. Example 7-11 is a version that

packs the button in place without assigning it to a name, attaches it to the LEFT side of

its parent window explicitly, and specifies root.quit as the callback handler—a stand-

ard Tk object method that shuts down the GUI and so ends the program. Technically,

quit ends the current mainloop event loop call, and thus the entire program here; when

Adding Buttons and Callbacks | 379

we start using multiple top-level windows in Chapter 8, we’ll find that quit usually

closes all windows, but its relative destroy erases just one window.

Example 7-11. PP4E\Gui\Intro\gui2b.py

from tkinter import *

root = Tk()

Button(root, text='press', command=root.quit).pack(side=LEFT)

root.mainloop()

This version produces the window in Figure 7-6. Because we didn’t tell the button to

expand into all available space, it does not do so.

Figure 7-6. A button on the left

In both of the last two examples, pressing the button makes the GUI program exit. In

older tkinter code, you may sometimes see the string exit assigned to the command option

to make the GUI go away when pressed. This exploits a tool in the underlying Tk library

and is less Pythonic than sys.exit or root.quit.

Widget Resizing Revisited: Expansion

Even with a GUI this simple, there are many ways to lay out its appearance with tkinter’s

constraint-based pack geometry manager. For example, to center the button in its win-

dow, add an expand=YES option to the button’s pack method call in Example 7-11. The

line of changed code looks like this:

Button(root, text='press', command=root.quit).pack(side=LEFT, expand=YES)

This makes the packer allocate all available space to the button but does not stretch

the button to fill that space. The result is the window captured in Figure 7-7.

Figure 7-7. pack(side=LEFT, expand=YES)

If you want the button to be given all available space and to stretch to fill all of its

assigned space horizontally, add expand=YES and fill=X keyword arguments to the

pack call. This will create the scene in Figure 7-8.

380 | Chapter 7: Graphical User Interfaces

This makes the button fill the whole window initially (its allocation is expanded, and

it is stretched to fill that allocation). It also makes the button grow as the parent window

is resized. As shown in Figure 7-9, the button in this window does expand when its

parent expands, but only along the X horizontal axis.

Figure 7-9. Resizing with expand=YES, fill=X

To make the button grow in both directions, specify both expand=YES and fill=BOTH in

the pack call; now resizing the window makes the button grow in general, as shown in

Figure 7-10. In fact, for more fun, maximize this window to fill the entire screen; you’ll

get one very big tkinter button indeed.

Figure 7-10. Resizing with expand=YES, fill=BOTH

Figure 7-8. pack(side=LEFT, expand=YES, fill=X)

Adding Buttons and Callbacks | 381

In more complex displays, such a button will expand only if all of the widgets it is

contained by are set to expand too. Here, the button’s only parent is the Tk root window

of the program, so parent expandability isn’t yet an issue; in later examples, we’ll need

to make enclosing Frame widgets expandable too. We will revisit the packer geometry

manager when we meet multiple-widget displays that use such devices later in this

tutorial, and again when we study the alternative grid call in Chapter 9.

Adding User-Defined Callback Handlers

In the simple button examples in the preceding section, the callback handler was simply

an existing function that killed the GUI program. It’s not much more work to register

callback handlers that do something a bit more useful. Example 7-12 defines a callback

handler of its own in Python.

Example 7-12. PP4E\Gui\Intro\gui3.py

import sys

from tkinter import *

def quit(): # a custom callback handler

print('Hello, I must be going...') # kill windows and process

sys.exit()

widget = Button(None, text='Hello event world', command=quit)

widget.pack()

widget.mainloop()

The window created by this script is shown in Figure 7-11. This script and its GUI are

almost identical to the last example. But here, the command option specifies a function

we’ve defined locally. When the button is pressed, tkinter calls the quit function in this

file to handle the event, passing it zero arguments. Inside quit, the print call statement

types a message on the program’s stdout stream, and the GUI process exits as before.

Figure 7-11. A button that runs a Python function

As usual, stdout is normally the window that the program was started from unless it’s

been redirected to a file. It’s a pop-up DOS console if you run this program by clicking

it on Windows; add an input call before sys.exit if you have trouble seeing the message

before the pop up disappears. Here’s what the printed output looks like back in

standard stream world when the button is pressed; it is generated by a Python function

called automatically by tkinter:

382 | Chapter 7: Graphical User Interfaces

C:\...\PP4E\Gui\Intro> python gui3.py

Hello, I must be going...

C:\...\PP4E\Gui\Intro>

Normally, such messages would be displayed in the GUI, but we haven’t gotten far

enough to know how just yet. Callback functions usually do more, of course (and may

even pop up new independent windows altogether), but this example illustrates the

basics.

In general, callback handlers can be any callable object: functions, anonymous func-

tions generated with lambda expressions, bound methods of class or type instances, or

class instances that inherit a __call__ operator overload method. For Button press call-

backs, callback handlers always receive no arguments (other than an automatic self,

for bound methods); any state information required by the callback handler must be

provided in other ways—as global variables, class instance attributes, extra arguments

provided by an indirection layer, and so on.

To make this a bit more concrete, let’s take a quick look at some other ways to code

the callback handler in this example.

Lambda Callback Handlers

Recall that the Python lambda expression generates a new, unnamed function object

when run. If we need extra data passed in to the handler function, we can register

lambda expressions to defer the call to the real handler function, and specify the extra

data it needs.

Later in this part of the book, we’ll see how this can be more useful, but to illustrate

the basic idea, Example 7-13 shows what Example 7-12 looks like when recoded to use

a lambda instead of a def.

Example 7-13. PP4E\Gui\Intro\gui3b.py

import sys

from tkinter import * # lambda generates a function

widget = Button(None, # but contains just an expression

text='Hello event world',

command=(lambda: print('Hello lambda world') or sys.exit()) )

widget.pack()

widget.mainloop()

This code is a bit tricky because lambdas can contain only an expression; to emulate

the original script, this version uses an or operator to force two expressions to be run

( print works as the first, because it’s a function call in Python 3.X—we don’t need to

resort to using sys.stdout directly).

Adding User-Defined Callback Handlers | 383

Deferring Calls with Lambdas and Object References

More typically, lambdas are used to provide an indirection layer that passes along extra

data to a callback handler (I omit pack and mainloop calls in the following snippets for

simplicity):

def handler(A, B): # would normally be called with no args

...use A and B...

X = 42

Button(text='ni', command=(lambda: handler(X, 'spam'))) # lambda adds arguments

Although tkinter invokes command callbacks with no arguments, such a lambda can be

used to provide an indirect anonymous function that wraps the real handler call and

passes along information that existed when the GUI was first constructed. The call to

the real handler is, in effect, deferred, so we can add the extra arguments it requires.

Here, the value of global variable X and string 'spam' will be passed to arguments A and

B, even though tkinter itself runs callbacks with no arguments. The net effect is that the

lambda serves to map a no-argument function call to one with arguments supplied by

the lambda.

If lambda syntax confuses you, remember that a lambda expression such as the one in

the preceding code can usually be coded as a simple def statement instead, nested or

otherwise. In the following code, the second function does exactly the same work as

the prior lambda—by referencing it in the button creation call, it effectively defers

invocation of the actual callback handler so that extra arguments can be passed:

def handler(A, B): # would normally be called with no args

...use A and B...

X = 42

def func(): # indirection layer to add arguments

handler(X, 'spam')

Button(text='ni', command=func)

To make the need for deferrals more obvious, notice what happens if you code a handler

call in the button creation call itself without a lambda or other intermediate function—

the callback runs immediately when the button is created, not when it is later clicked.

That’s why we need to wrap the call in an intermediate function to defer its invocation:

def handler(name):

print(name)

Button(command=handler('spam')) # BAD: runs the callback now!

Using either a lambda or a callable reference serves to defer callback invocation until

the event later occurs. For example, using a lambda to pass extra data with an inline

function definition that defers the call:

def handler(name):

print(name)

384 | Chapter 7: Graphical User Interfaces

Button(command=(lambda: handler('spam'))) # OK: wrap in a lambda to defer

is always equivalent to the longer, and to some observers less convenient, double-

function form:

def handler(name):

print(name)

def temp():

handler('spam')

Button(command=temp) # OK: refence but do not call

We need only the zero-argument lambda or the zero-argument callable reference,

though, not both—it makes no sense to code a lambda which simply calls a function if

no extra data must be passed in and only adds an extra pointless call:

def handler(name):

print(name)

def temp():

handler('spam')

Button(command=(lambda: temp())) # BAD: this adds a pointless call!

As we’ll see later, this includes references to other callables like bound methods and

callable instances which retain state in themselves—if they take zero arguments when

called, we can simply name them at widget construction time, and we don’t need to

wrap them in a superfluous lambda.

Callback Scope Issues

Although the prior section’s lambda and intermediate function techniques defer calls

and allow extra data to be passed in, they also raise some scoping issues that may seem

subtle at first glance. This is core language territory, but it comes up often in practice

in conjunction with GUI.

Arguments versus globals

For instance, notice that the handler function in the prior section’s initial code could

also refer to X directly, because it is a global variable (and would exist by the time the

code inside the handler is run). Because of that, we might make the handler a one-

argument function and pass in just the string 'spam' in the lambda:

def handler(A): # X is in my global scope, implicitly

...use global X and argument A...

X = 42

Button(text='ni', command=(lambda: handler('spam')))

Adding User-Defined Callback Handlers | 385

For that matter, A could be moved out to the global scope too, to remove the need for

lambda here entirely; we could register the handler itself and cut out the middleman.

Although simple in this trivial example, arguments are generally preferred to globals,

because they make external dependencies more explicit, and so make code easier to

understand and change. In fact, the same handler might be usable in other contexts, if

we don’t couple it to global variables’ values. While you’ll have to take it on faith until

we step up to larger examples with more complex state retention needs, avoiding glob-

als in callbacks and GUIs in general both makes them more reusable, and supports the

notion of multiple instances in the same program. It’s good programming practice, GUI

or not.

Passing in enclosing scope values with default arguments

More subtly, notice that if the button in this example was constructed inside a func-

tion rather than at the top level of the file, name X would no longer be global but would

be in the enclosing function’s local scope; it seems as if it would disappear after the

function exits and before the callback event occurs and runs the lambda’s code:

def handler(A, B):

...use A and B...

def makegui():

X = 42

Button(text='ni', command=(lambda: handler(X, 'spam'))) # remembers X

makegui()

mainloop() # makegui's scope is gone by this point

Luckily, Python’s enclosing scope reference model means that the value of X in the local

scope enclosing the lambda function is automatically retained, for use later when the

button press occurs. This usually works as we want today, and automatically handles

variable references in this role.

To make such enclosing scope usage explicit, though, default argument values can also

be used to remember the values of variables in the enclosing local scope, even after the

enclosing function returns. In the following code, for instance, the default argument

name X (on the left side of the X=X default) will remember object 42, because the variable

name X (on the right side of the X=X) is evaluated in the enclosing scope, and the gen-

erated function is later called without any arguments:

def handler(A, B): # older Pythons: defaults save state

...use A and B...

def makegui():

X = 42

Button(text='ni', command=(lambda X=X: handler(X, 'spam')))

Since default arguments are evaluated and saved when the lambda runs (not when the

function it creates is later called), they are a way to explicitly remember objects that

386 | Chapter 7: Graphical User Interfaces

must be accessed again later, during event processing. Because tkinter calls the lambda

function later with no arguments, all its defaults are used.

This was not an issue in the original version of this example because name X lived in

the global scope, and the code of the lambda will find it there when it is run. When

nested within a function, though, X may have disappeared after the enclosing function

exits.

Passing in enclosing scope values with automatic references

While they can make some external dependencies more explicit, defaults are not usually

required (since Python 2.2, at least) and are not used for this role in best practice code

today. Rather, lambdas simply defer the call to the actual handler and provide extra

handler arguments. Variables from the enclosing scope used by the lambda are auto-

matically retained, even after the enclosing function exits.

The prior code listing, for example, can today normally be coded as we did earlier—

name X in the handler will be automatically mapped to X in the enclosing scope, and so

effectively remember what X was when the button was made:

def makegui():

X = 42 # X is retained auto

Button(text='ni', command=(lambda: handler(X, 'spam'))) # no need for defaults

We’ll see this technique put to more concrete use later. When using classes to build

your GUI, for instance, the self argument is a local variable in methods, and is thus

automatically available in the bodies of lambda functions. There is no need to pass it

in explicitly with defaults:

class Gui:

def handler(self, A, B):

...use self, A and B...

def makegui(self):

X = 42

Button(text='ni', command=(lambda: self.handler(X, 'spam')))

Gui().makegui()

mainloop()

When using classes, though, instance attributes can provide extra state for use in call-

back handlers, and so provide an alternative to extra call arguments. We’ll see how in

a moment. First, though, we need to take a quick non-GUI diversion into a dark corner

of Python’s scope rules to understand why default arguments are still sometimes nec-

essary to pass values into nested lambda functions, especially in GUIs.

But you must still sometimes use defaults instead of enclosing scopes

Although you may still see defaults used to pass in enclosing scope references in some

older Python code, automatic enclosing scope references are generally preferred today.

In fact, it seems as though the newer nested scope lookup rules in Python automate

Adding User-Defined Callback Handlers | 387

and replace the previously manual task of passing in enclosing scope values with de-

faults altogether.

Well, almost. There is a catch. It turns out that within a lambda (or def), references to

names in the enclosing scope are actually resolved when the generated function is

called, not when it is created. Because of this, when the function is later called, such

name references will reflect the latest or final assignments made to the names anywhere

in the enclosing scope, which are not necessarily the values they held when the function

was made. This holds true even when the callback function is nested only in a module’s

global scope, not in an enclosing function; in either case, all enclosing scope references

are resolved at function call time, not at function creation time.

This is subtly different from default argument values, which are evaluated once when

the function is created, not when it is later called. Because of that, defaults can still be

useful for remembering the values of enclosing scope variables as they were when you

made the function. Unlike enclosing scope name references, defaults will not have a

different value if the variable later changes in the enclosing scope, between function

creation and call. (In fact, this is why mutable defaults like lists retain their state between

calls—they are created only once, when the function is made, and attached to the

function itself.)

This is normally a nonissue, because most enclosing scope references name a variable

that is assigned just once in the enclosing scope (the self argument in class methods,

for example). But this can lead to coding mistakes if not understood, especially if you

create functions within a loop; if those functions reference the loop variable, it will

evaluate to the value it was given on the last loop iteration in all the functions generated.

By contrast, if you use defaults instead, each function will remember the current value

of the loop variable, not the last.

Because of this difference, nested scope references are not always sufficient to remember

enclosing scope values, and defaults are sometimes still required today. Let’s see what

this means in terms of code. Consider the following nested function (this section’s code

snippets are saved in file defaults.py in the examples package, if you want to experiment

with them).

def simple():

spam = 'ni'

def action():

print(spam) # name maps to enclosing function

return action

act = simple() # make and return nested function

act() # then call it: prints 'ni'

This is the simple case for enclosing scope references, and it works the same way

whether the nested function is generated with a def or a lambda. But notice that this

still works if we assign the enclosing scope’s spam variable after the nested function is

created:

388 | Chapter 7: Graphical User Interfaces

def normal():

def action():

return spam # really, looked up when used

spam = 'ni'

return action

act = normal()

print(act()) # also prints 'ni'

As this implies, the enclosing scope name isn’t resolved when the nested function is

made—in fact, the name hasn’t even been assigned yet in this example. The name is

resolved when the nested function is called. The same holds true for lambdas:

def weird():

spam = 42

return (lambda: spam * 2) # remembers spam in enclosing scope

act = weird()

print(act()) # prints 84

So far, so good. The spam inside this nested lambda function remembers the value that

this variable had in the enclosing scope, even after the enclosing scope exits. This pat-

tern corresponds to a registered GUI callback handler run later on events. But once

again, the nested scope reference really isn’t being resolved when the lambda is run to

create the function; it’s being resolved when the generated function is later called. To

make that more apparent, look at this code:

def weird():

tmp = (lambda: spam * 2) # remembers spam

spam = 42 # even though not set till here

return tmp

act = weird()

print(act()) # prints 84

Here again, the nested function refers to a variable that hasn’t even been assigned yet

when that function is made. Really, enclosing scope references yield the latest setting

made in the enclosing scope, whenever the function is called. Watch what happens in

the following code:

def weird():

spam = 42

handler = (lambda: spam * 2) # func doesn't save 42 now

spam = 50

print(handler()) # prints 100: spam looked up now

spam = 60

print(handler()) # prints 120: spam looked up again now

weird()

Now, the reference to spam inside the lambda is different each time the generated func-

tion is called! In fact, it refers to what the variable was set to last in the enclosing scope

at the time the nested function is called, because it is resolved at function call time, not

at function creation time.

Adding User-Defined Callback Handlers | 389

In terms of GUIs, this becomes significant most often when you generate callback han-

dlers within loops and try to use enclosing scope references to remember extra data

created within the loops. If you’re going to make functions within a loop, you have to

apply the last example’s behavior to the loop variable:

def odd():

funcs = []

for c in 'abcdefg':

funcs.append((lambda: c)) # c will be looked up later

return funcs # does not remember current c

for func in odd():

print(func(), end=' ') # OOPS: print 7 g's, not a,b,c,... !

Here, the func list simulates registered GUI callback handlers associated with widgets.

This doesn’t work the way most people expect it to. The variable c within the nested

function will always be g here, the value that the variable was set to on the final iteration

of the loop in the enclosing scope. The net effect is that all seven generated lambda

functions wind up with the same extra state information when they are later called.

Analogous GUI code that adds information to lambda callback handlers will have sim-

ilar problems—all buttons created in a loop, for instance, may wind up doing the same

thing when clicked! To make this work, we still have to pass values into the nested

function with defaults in order to save the current value of the loop variable (not its

future value):

def odd():

funcs = []

for c in 'abcdefg':

funcs.append((lambda c=c: c)) # force to remember c now

return funcs # defaults eval now

for func in odd():

print(func(), end=' ') # OK: now prints a,b,c,...

This works now only because the default, unlike an external scope reference, is evalu-

ated at function creation time, not at function call time. It remembers the value that a

name in the enclosing scope had when the function was made, not the last assignment

made to that name anywhere in the enclosing scope. The same is true even if the func-

tion’s enclosing scope is a module, not another function; if we don’t use the default

argument in the following code, the loop variable will resolve to the same value in all

seven functions:

funcs = [] # enclosing scope is module

for c in 'abcdefg': # force to remember c now

funcs.append((lambda c=c: c)) # else prints 7 g's again

for func in funcs:

print(func(), end=' ') # OK: prints a,b,c,...

The moral of this story is that enclosing scope name references are a replacement for

passing values in with defaults, but only as long as the name in the enclosing scope will

390 | Chapter 7: Graphical User Interfaces

not change to a value you don’t expect after the nested function is created. You cannot

generally reference enclosing scope loop variables within a nested function, for exam-

ple, because they will change as the loop progresses. In most other cases, though, en-

closing scope variables will take on only one value in their scope and so can be used

freely.

We’ll see this phenomenon at work in later examples that construct larger GUIs. For

now, remember that enclosing scopes are not a complete replacement for defaults;

defaults are still required in some contexts to pass values into callback functions. Also

keep in mind that classes are often a better and simpler way to retain extra state for use

in callback handlers than are nested functions. Because state is explicit in classes, these

scope issues do not apply. The next two sections cover this in detail.

Bound Method Callback Handlers

Let’s get back to coding GUIs. Although functions and lambdas suffice in many cases,

bound methods of class instances work particularly well as callback handlers in GUIs—

they record both an instance to send the event to and an associated method to call. For

instance, Example 7-14 shows Examples 7-12 and 7-13 rewritten to register a bound

class method rather than a function or lambda result.

Example 7-14. PP4E\Gui\Intro\gui3c.py

import sys

from tkinter import *

class HelloClass:

def __init__(self):

widget = Button(None, text='Hello event world', command=self.quit)

widget.pack()

def quit(self):

print('Hello class method world') # self.quit is a bound method

sys.exit() # retains the self+quit pair

HelloClass()

mainloop()

On a button press, tkinter calls this class’s quit method with no arguments, as usual.

But really, it does receive one argument—the original self object—even though tkinter

doesn’t pass it explicitly. Because the self.quit bound method retains both self and

quit, it’s compatible with a simple function call; Python automatically passes the

self argument along to the method function. Conversely, registering an unbound in-

stance method that expects an argument, such as HelloClass.quit, won’t work, be-

cause there is no self object to pass along when the event later occurs.

Adding User-Defined Callback Handlers | 391

Later, we’ll see that class callback handler coding schemes provide a natural place to

remember information for use on events—simply assign the information to self

instance attributes:

class someGuiClass:

def __init__(self):

self.X = 42

self.Y = 'spam'

Button(text='Hi', command=self.handler)

def handler(self):

...use self.X, self.Y...

Because the event will be dispatched to this class’s method with a reference to the

original instance object, self gives access to attributes that retain original data. In effect,

the instance’s attributes retain state information to be used when events occur. Espe-

cially in larger GUIs, this is a much more flexible technique than global variables or

extra arguments added by lambdas.

Callable Class Object Callback Handlers

Because Python class instance objects can also be called if they inherit a __call__

method to intercept the operation, we can pass one of these to serve as a callback

handler too. Example 7-15 shows a class that provides the required function-like

interface.

Example 7-15. PP4E\Gui\Intro\gui3d.py

import sys

from tkinter import *

class HelloCallable:

def __init__(self): # __init__ run on object creation

self.msg = 'Hello __call__ world'

def __call__(self):

print(self.msg) # __call__ run later when called

sys.exit() # class object looks like a function

widget = Button(None, text='Hello event world', command=HelloCallable())

widget.pack()

widget.mainloop()

Here, the HelloCallable instance registered with command can be called like a normal

function; Python invokes its __call__ method to handle the call operation made in

tkinter on the button press. In effect, the general __call__ method replaces a specific

bound method in this case. Notice how self.msg is used to retain information for use

on events here; self is the original instance when the special __call__ method is au-

tomatically invoked.

All four gui3 variants create the same sort of GUI window (Figure 7-11), but print

different messages to stdout when their button is pressed:

392 | Chapter 7: Graphical User Interfaces

C:\...\PP4E\Gui\Intro> python gui3.py

Hello, I must be going...

C:\...\PP4E\Gui\Intro> python gui3b.py

Hello lambda world

C:\...\PP4E\Gui\Intro> python gui3c.py

Hello class method world

C:\...\PP4E\Gui\Intro> python gui3d.py

Hello __call__ world

There are good reasons for each callback coding scheme (function, lambda, class

method, callable class), but we need to move on to larger examples in order to uncover

them in less theoretical terms.

Other tkinter Callback Protocols

For future reference, also keep in mind that using command options to intercept user-

generated button press events is just one way to register callbacks in tkinter. In fact,

there are a variety of ways for tkinter scripts to catch events:

Button command options

As we’ve just seen, button press events are intercepted by providing a callable object

in widget command options. This is true of other kinds of button-like widgets we’ll

meet in Chapter 8 (e.g., radio and check buttons and scales).

Menu command options

In the upcoming tkinter tour chapters, we’ll also find that a command option is used

to specify callback handlers for menu selections.

Scroll bar protocols

Scroll bar widgets register handlers with command options, too, but they have a

unique event protocol that allows them to be cross-linked with the widget they are

meant to scroll (e.g., listboxes, text displays, and canvases): moving the scroll bar

automatically moves the widget, and vice versa.

General widget bind methods

A more general tkinter event bind method mechanism can be used to register call-

back handlers for lower-level interface events—key presses, mouse movement and

clicks, and so on. Unlike command callbacks, bind callbacks receive an event object

argument (an instance of the tkinter Event class) that gives context about the

event—subject widget, screen coordinates, and so on.

Window manager protocols

In addition, scripts can also intercept window manager events (e.g., window close

requests) by tapping into the window manager protocol method mechanism avail-

able on top-level window objects. Setting a handler for WM_DELETE_WINDOW, for in-

stance, takes over window close buttons.

Adding User-Defined Callback Handlers | 393

Scheduled event callbacks

Finally, tkinter scripts can also register callback handlers to be run in special con-

texts, such as timer expirations, input data arrival, and event-loop idle states.

Scripts can also pause for state-change events related to windows and special var-

iables. We’ll meet these event interfaces in more detail near the end of Chapter 9.

Binding Events

Of all the options listed in the prior section, bind is the most general, but also perhaps

the most complex. We’ll study it in more detail later, but to let you sample its flavor

now, Example 7-16 rewrites the prior section’s GUI again to use bind, not the

command keyword, to catch button presses.

Example 7-16. PP4E\Gui\Intro\gui3e.py

import sys

from tkinter import *

def hello(event):

print('Press twice to exit') # on single-left click

def quit(event): # on double-left click

print('Hello, I must be going...') # event gives widget, x/y, etc.

sys.exit()

widget = Button(None, text='Hello event world')

widget.pack()

widget.bind('<Button-1>', hello) # bind left mouse clicks

widget.bind('<Double-1>', quit) # bind double-left clicks

widget.mainloop()

In fact, this version doesn’t specify a command option for the button at all. Instead, it

binds lower-level callback handlers for both left mouse clicks (<Button-1>) and double-

left mouse clicks (<Double-1>) within the button’s display area. The bind method ac-

cepts a large set of such event identifiers in a variety of formats, which we’ll meet in

Chapter 8.

When run, this script makes the same window as before (see Figure 7-11). Clicking on

the button once prints a message but doesn’t exit; you need to double-click on the

button now to exit as before. Here is the output after clicking twice and double-clicking

once (a double-click fires the single-click callback first):

C:\...\PP4E\Gui\Intro> python gui3e.py

Press twice to exit

Hello, I must be going...

Although this script intercepts button clicks manually, the end result is roughly the

same; widget-specific protocols such as button command options are really just higher-

level interfaces to events you can also catch with bind.

394 | Chapter 7: Graphical User Interfaces

We’ll meet bind and all of the other tkinter event callback handler hooks again in more

detail later in this book. First, though, let’s focus on building GUIs that are larger than

a single button and explore a few other ways to use classes in GUI work.

Adding Multiple Widgets

It’s time to start building user interfaces with more than one widget. Example 7-17

makes the window shown in Figure 7-12.

Example 7-17. PP4E\Gui\Intro\gui4.py

from tkinter import *

def greeting():

print('Hello stdout world!...')

win = Frame()

win.pack()

Label(win, text='Hello container world').pack(side=TOP)

Button(win, text='Hello', command=greeting).pack(side=LEFT)

Button(win, text='Quit', command=win.quit).pack(side=RIGHT)

win.mainloop()

Figure 7-12. A multiple-widget window

This example makes a Frame widget (another tkinter class) and attaches three other

widget objects to it, a Label and two Buttons, by passing the Frame as their first argu-

ment. In tkinter terms, we say that the Frame becomes a parent to the other three

widgets. Both buttons on this display trigger callbacks:

• Pressing the Hello button triggers the greeting function defined within this file,

which prints to stdout again.

• Pressing the Quit button calls the standard tkinter quit method, inherited by win

from the Frame class (Frame.quit has the same effect as the Tk.quit we used earlier).

Here is the stdout text that shows up on Hello button presses, wherever this script’s

standard streams may be:

C:\...\PP4E\Gui\Intro> python gui4.py

Hello stdout world!...

Adding Multiple Widgets | 395

Hello stdout world!...

The notion of attaching widgets to containers turns out to be at the core of layouts in

tkinter. Before we go into more detail on that topic, though, let’s get small.

Widget Resizing Revisited: Clipping

Earlier, we saw how to make widgets expand along with their parent window, by pass-

ing expand and fill options to the pack geometry manager. Now that we have a window

with more than one widget, I can let you in on one of the more useful secrets in the

packer. As a rule, widgets packed first are clipped last when a window is shrunk. That

is, the order in which you pack items determines which items will be cut out of the

display if it is made too small. Widgets packed later are cut out first. For example,

Figure 7-13 shows what happens when the gui4 window is shrunk interactively.

Figure 7-13. gui4 gets small

Try reordering the label and button lines in the script and see what happens when the

window shrinks; the first one packed is always the last to go away. For instance, if the

label is packed last, Figure 7-14 shows that it is clipped first, even though it is attached

to the top: side attachments and packing order both impact the overall layout, but only

packing order matters when windows shrink. Here are the changed lines:

Button(win, text='Hello', command=greeting).pack(side=LEFT)

Button(win, text='Quit', command=win.quit).pack(side=RIGHT)

Label(win, text='Hello container world').pack(side=TOP)

Figure 7-14. Label packed last, clipped first

tkinter keeps track of the packing order internally to make this work. Scripts can plan

ahead for shrinkage by calling pack methods of more important widgets first. For in-

stance, on the upcoming tkinter tour, we’ll meet code that builds menus and toolbars

at the top and bottom of the window; to make sure these are lost last as a window is

shrunk, they are packed first, before the application components in the middle.

396 | Chapter 7: Graphical User Interfaces

Similarly, displays that include scroll bars normally pack them before the items they

scroll (e.g., text, lists) so that the scroll bars remain as the window shrinks.

Attaching Widgets to Frames

In larger terms, the critical innovation in this example is its use of frames: Frame widgets

are just containers for other widgets, and so give rise to the notion of GUIs as widget

hierarchies, or trees. Here, win serves as an enclosing window for the other three

widgets. In general, though, by attaching widgets to frames, and frames to other frames,

we can build up arbitrary GUI layouts. Simply divide the user interface into a set of

increasingly smaller rectangles, implement each as a tkinter Frame, and attach basic

widgets to the frame in the desired screen position.

In this script, when you specify win in the first argument to the Label and Button con-

structors, tkinter attaches them to the Frame (they become children of the win parent).

win itself is attached to the default top-level window, since we didn’t pass a parent to

the Frame constructor. When we ask win to run itself (by calling mainloop), tkinter draws

all the widgets in the tree we’ve built.

The three child widgets also provide pack options now: the side arguments tell which

part of the containing frame (i.e., win) to attach the new widget to. The label hooks

onto the top, and the buttons attach to the sides. TOP, LEFT, and RIGHT are all preassigned

string variables imported from tkinter. Arranging widgets is a bit subtler than simply

giving a side, though, but we need to take a quick detour into packer geometry man-

agement details to see why.

Layout: Packing Order and Side Attachments

When a widget tree is displayed, child widgets appear inside their parents and are

arranged according to their order of packing and their packing options. Because of this,

the order in which widgets are packed not only gives their clipping order, but also

determines how their side settings play out in the generated display.

Here’s how the packer’s layout system works:

1. The packer starts out with an available space cavity that includes the entire parent

container (e.g., the whole Frame or top-level window).

2. As each widget is packed on a side, that widget is given the entire requested side

in the remaining space cavity, and the space cavity is shrunk.

3. Later pack requests are given an entire side of what is left, after earlier pack requests

have shrunk the cavity.

4. After widgets are given cavity space, expand divides any space left, and fill and

anchor stretch and position widgets within their assigned space.

Adding Multiple Widgets | 397

For instance, if you recode the gui4 child widget creation logic like this:

Button(win, text='Hello', command=greeting).pack(side=LEFT)

Label(win, text='Hello container world').pack(side=TOP)

Button(win, text='Quit', command=win.quit).pack(side=RIGHT)

you will wind up with the very different display shown in Figure 7-15, even though

you’ve moved the label code only one line down in the source file (contrast with

Figure 7-12).

Figure 7-15. Packing the label second

Despite its side setting, the label does not get the entire top of the window now, and

you have to think in terms of shrinking cavities to understand why. Because the Hello

button is packed first, it is given the entire LEFT side of the Frame. Next, the label is given

the entire TOP side of what is left. Finally, the Quit button gets the RIGHT side of the

remainder—a rectangle to the right of the Hello button and under the label. When this

window shrinks, widgets are clipped in reverse order of their packing: the Quit button

disappears first, followed by the label.†

In the original version of this example (Figure 7-12), the label spans the entire top side

just because it is the first one packed, not because of its side option. In fact, if you look

at Figure 7-14 closely, you’ll see that it illustrates the same point—the label appeared

between the buttons, because they had already carved off the entire left and right sides.

The Packer’s Expand and Fill Revisited

Beyond the effects of packing order, the fill option we met earlier can be used to stretch

the widget to occupy all the space in the cavity side it has been given, and any cavity

space left after all packing is evenly allocated among widgets with the expand=YES we

saw before. For example, coding this way creates the window in Figure 7-16 (compare

this to Figure 7-15):

Button(win, text='Hello', command=greeting).pack(side=LEFT,fill=Y)

Label(win, text='Hello container world').pack(side=TOP)

Button(win, text='Quit', command=win.quit).pack(side=RIGHT, expand=YES, fill=X)

† Technically, the packing steps are just rerun again after a window resize. But since this means that there won’t

be enough space left for widgets packed last when the window shrinks, it is as if widgets packed first are

clipped last.

398 | Chapter 7: Graphical User Interfaces

To make all of these grow along with their window, though, we also need to make the

container frame expandable; widgets expand beyond their initial packer arrangement

only if all of their parents expand, too. Here are the changes in gui4.py:

win = Frame()

win.pack(side=TOP, expand=YES, fill=BOTH)

Button(win, text='Hello', command=greeting).pack(side=LEFT, fill=Y)

Label(win, text='Hello container world').pack(side=TOP)

Button(win, text='Quit', command=win.quit).pack(side=RIGHT, expand=YES,fill=X)

When this code runs, the Frame is assigned the entire top side of its parent as before

(that is, the top parcel of the root window); but because it is now marked to expand

into unused space in its parent and to fill that space both ways, it and all of its attached

children expand along with the window. Figure 7-17 shows how.

Figure 7-17. gui4 gets big with an expandable frame

Using Anchor to Position Instead of Stretch

And as if that isn’t flexible enough, the packer also allows widgets to be positioned

within their allocated space with an anchor option, instead of filling that space with a

fill. The anchor option accepts tkinter constants identifying all eight points of the

compass (N, NE, NW, S, etc.) and CENTER as its value (e.g., anchor=NW). It instructs the packer

to position the widget at the desired position within its allocated space, if the space

allocated for the widget is larger than the space needed to display the widget.

Figure 7-16. Packing with expand and fill options

Adding Multiple Widgets | 399

The default anchor is CENTER, so widgets show up in the middle of their space (the cavity

side they were given) unless they are positioned with anchor or stretched with fill. To

demonstrate, change gui4 to use this sort of code:

Button(win, text='Hello', command=greeting).pack(side=LEFT, anchor=N)

Label(win, text='Hello container world').pack(side=TOP)

Button(win, text='Quit', command=win.quit).pack(side=RIGHT)

The only thing new here is that the Hello button is anchored to the north side of its

space allocation. Because this button was packed first, it got the entire left side of the

parent frame. This is more space than is needed to show the button, so it shows up in

the middle of that side by default, as in Figure 7-15 (i.e., anchored to the center). Setting

the anchor to N moves it to the top of its side, as shown in Figure 7-18.

Figure 7-18. Anchoring a button to the north

Keep in mind that fill and anchor are applied after a widget has been allocated cavity

side space by its side, packing order, and expand extra space request. By playing with

packing orders, sides, fills, and anchors, you can generate lots of layout and clipping

effects, and you should take a few moments to experiment with alternatives if you

haven’t already. In the original version of this example, for instance, the label spans the

entire top side just because it is the first packed.

As we’ll see later, frames can be nested in other frames, too, in order to make more

complex layouts. In fact, because each parent container is a distinct space cavity, this

provides a sort of escape mechanism for the packer cavity algorithm: to better control

where a set of widgets show up, simply pack them within a nested subframe and attach

the frame as a package to a larger container. A row of push buttons, for example, might

be easier laid out in a frame of its own than if mixed with other widgets in the display

directly.

Finally, also keep in mind that the widget tree created by these examples is really an

implicit one; tkinter internally records the relationships implied by passed parent

widget arguments. In OOP terms, this is a composition relationship—the Frame contains

a Label and Buttons. Let’s look at inheritance relationships next.

Customizing Widgets with Classes

You don’t have to use OOP in tkinter scripts, but it can definitely help. As we just saw,

tkinter GUIs are built up as class-instance object trees. Here’s another way Python’s

400 | Chapter 7: Graphical User Interfaces

OOP features can be applied to GUI models: specializing widgets by inheritance.

Example 7-18 builds the window in Figure 7-19.

Example 7-18. PP4E\Gui\Intro\gui5.py

from tkinter import *

class HelloButton(Button):

def __init__(self, parent=None, **config): # add callback method

Button.__init__(self, parent, **config) # and pack myself

self.pack() # could config style too

self.config(command=self.callback)

def callback(self): # default press action

print('Goodbye world...') # replace in subclasses

self.quit()

if __name__ == '__main__':

HelloButton(text='Hello subclass world').mainloop()

Figure 7-19. A button subclass in action

This example isn’t anything special to look at: it just displays a single button that, when

pressed, prints a message and exits. But this time, it is a button widget we created on

our own. The HelloButton class inherits everything from the tkinter Button class, but

adds a callback method and constructor logic to set the command option to

self.callback, a bound method of the instance. When the button is pressed this time,

the new widget class’s callback method, not a simple function, is invoked.

The **config argument here is assigned unmatched keyword arguments in a dictionary,

so they can be passed along to the Button constructor. The **config in the Button

constructor call unpacks the dictionary back into keyword arguments (it’s actually

optional here, because of the old-style dictionary widget call form we met earlier, but

doesn’t hurt). We met the config widget method called in HelloButton’s constructor

earlier; it is just an alternative way to pass configuration options after the fact (instead

of passing constructor arguments).

Standardizing Behavior and Appearance

So what’s the point of subclassing widgets like this? In short, it allows sets of widgets

made from the customized classes to look and act the same. When coded well, we get

both “for free” from Python’s OOP model. This can be a powerful technique in larger

programs.

Customizing Widgets with Classes | 401

Common behavior

Example 7-18 standardizes behavior—it allows widgets to be configured by subclassing

instead of by passing in options. In fact, its HelloButton is a true button; we can pass

in configuration options such as its text as usual when one is made. But we can also

specify callback handlers by overriding the callback method in subclasses, as shown

in Example 7-19.

Example 7-19. PP4E\Gui\Intro\gui5b.py

from gui5 import HelloButton

class MyButton(HelloButton): # subclass HelloButton

def callback(self): # redefine press-handler method

print("Ignoring press!...")

if __name__ == '__main__':

MyButton(None, text='Hello subclass world').mainloop()

This script makes the same window; but instead of exiting, this MyButton button, when

pressed, prints to stdout and stays up. Here is its standard output after being pressed

a few times:

C:\...\PP4E\Gui\Intro> python gui5b.py

Ignoring press!...

Whether it’s simpler to customize widgets by subclassing or passing in options is prob-

ably a matter of taste in this simple example. But the larger point to notice is that Tk

becomes truly object oriented in Python, just because Python is object oriented—we

can specialize widget classes using normal class-based and object-oriented techniques.

In fact this applies to both widget behavior and appearance.

Common appearance

For example, although we won’t study widget configuration options until the next

chapter, a similar customized button class could provide a standard look-and-feel

different from tkinter’s defaults for every instance created from it, and approach the

notions of “styles” or “themes” in some GUI toolkits:

class ThemedButton(Button): # config my style too

def __init__(self, parent=None, **configs): # used for each instance

Button.__init__(self, parent, **configs) # see chapter 8 for options

self.pack()

self.config(fg='red', bg='black', font=('courier', 12), relief=RAISED, bd=5)

B1 = ThemedButton(text='spam', command=onSpam) # normal button widget objects

B2 = ThemedButton(text='eggs') # but same appearance by inheritance

B2.pack(expand=YES, fill=BOTH)

402 | Chapter 7: Graphical User Interfaces

This code is something of a preview; see file gui5b-themed.py in the examples package

for a complete version, and watch for more on its widget configuration options in

Chapter 8. But it illustrates the application of common appearance by subclassing

widgets directly—every button created from its class looks the same, and will pick up

any future changes in its configurations automatically.

Widget subclasses are a programmer’s tool, of course, but we can also make such con-

figurations accessible to a GUI’s users. In larger programs later in the book (e.g., PyEdit,

PyClock, and PyMailGUI), we’ll sometimes achieve a similar effect by importing con-

figurations from modules and applying them to widgets as they are built. If such ex-

ternal settings are used by a customized widget subclass like our ThemedButton above,

they will again apply to all its instances and subclasses (for reference, the full version

of the following code is in file gui5b-themed-user.py):

from user_preferences import bcolor, bfont, bsize # get user settings

class ThemedButton(Button):

def __init__(self, parent=None, **configs):

Button.__init__(self, parent, **configs)

self.pack()

self.config(bg=bcolor, font=(bfont, bsize))

ThemedButton(text='spam', command=onSpam) # normal button widget objects

ThemedButton(text='eggs', command=onEggs) # all inherit user preferences

class MyButton(ThemedButton): # subclasses inherit prefs too

def __init__(self, parent=None, **configs):

ThemedButton.__init__(self, parent, **configs)

self.config(text='subclass')

MyButton(command=onSpam)

Again, more on widget configuration in the next chapter; the big picture to take away

here is that customizing widget classes with subclasses allows us to tailor both their

behavior and their appearance for an entire set of widgets. The next example provides

yet another way to arrange for specialization—as customizable and attachable widget

packages, usually known as components.

Reusable GUI Components with Classes

Larger GUI interfaces are often built up as subclasses of Frame, with callback handlers

implemented as methods. This structure gives us a natural place to store information

between events: instance attributes record state. It also allows us to both specialize

GUIs by overriding their methods in new subclasses and attach them to larger GUI

structures to reuse them as general components. For instance, a GUI text editor im-

plemented as a Frame subclass can be attached to and configured by any number of

other GUIs; if done well, we can plug such a text editor into any user interface that

needs text editing tools.

Reusable GUI Components with Classes | 403

We’ll meet such a text editor component in Chapter 11. For now, Example 7-20 illus-

trates the concept in a simple way. The script gui6.py produces the window in

Figure 7-20.

Example 7-20. PP4E\Gui\Intro\gui6.py

from tkinter import *

class Hello(Frame): # an extended Frame

def __init__(self, parent=None):

Frame.__init__(self, parent) # do superclass init

self.pack()

self.data = 42

self.make_widgets() # attach widgets to self

def make_widgets(self):

widget = Button(self, text='Hello frame world!', command=self.message)

widget.pack(side=LEFT)

def message(self):

self.data += 1

print('Hello frame world %s!' % self.data)

if __name__ == '__main__': Hello().mainloop()

Figure 7-20. A custom Frame in action

This example pops up a single-button window. When pressed, the button triggers the

self.message bound method to print to stdout again. Here is the output after pressing

this button four times; notice how self.data (a simple counter here) retains its state

between presses:

C:\...\PP4E\Gui\Intro> python gui6.py

Hello frame world 43!

Hello frame world 44!

Hello frame world 45!

Hello frame world 46!

This may seem like a roundabout way to show a Button (we did it in fewer lines in

Examples 7-10, 7-11, and 7-12). But the Hello class provides an enclosing organiza-

tional structure for building GUIs. In the examples prior to the last section, we made

GUIs using a function-like approach: we called widget constructors as though they

were functions and hooked widgets together manually by passing in parents to widget

construction calls. There was no notion of an enclosing context, apart from the global

404 | Chapter 7: Graphical User Interfaces

scope of the module file containing the widget calls. This works for simple GUIs but

can make for brittle code when building up larger GUI structures.

But by subclassing Frame as we’ve done here, the class becomes an enclosing context

for the GUI:

• Widgets are added by attaching objects to self, an instance of a Frame container

subclass (e.g., Button).

• Callback handlers are registered as bound methods of self, and so are routed back

to code in the class (e.g., self.message).

• State information is retained between events by assigning to attributes of self,

visible to all callback methods in the class (e.g., self.data).

• It’s easy to make multiple copies of such a GUI component, even within the same

process, because each class instance is a distinct namespace.

• Classes naturally support customization by inheritance and by composition

attachment.

In a sense, entire GUIs become specialized Frame objects with extensions for an appli-

cation. Classes can also provide protocols for building widgets (e.g., the

make_widgets method here), handle standard configuration chores (like setting window

manager options), and so on. In short, Frame subclasses provide a simple way to or-

ganize collections of other widget-class objects.

Attaching Class Components

Perhaps more importantly, subclasses of Frame are true widgets: they can be further

extended and customized by subclassing and can be attached to enclosing widgets. For

instance, to attach the entire package of widgets that a class builds to something else,

simply create an instance of the class with a real parent widget passed in. To illustrate,

running the script in Example 7-21 creates the window shown in Figure 7-21.

Example 7-21. PP4E\Gui\Intro\gui6b.py

from sys import exit

from tkinter import * # get Tk widget classes

from gui6 import Hello # get the subframe class

parent = Frame(None) # make a container widget

parent.pack()

Hello(parent).pack(side=RIGHT) # attach Hello instead of running it

Button(parent, text='Attach', command=exit).pack(side=LEFT)

parent.mainloop()

Reusable GUI Components with Classes | 405

Figure 7-21. An attached class component on the right

This script just adds Hello’s button to the right side of parent—a container Frame. In

fact, the button on the right in this window represents an embedded component: its

button really represents an attached Python class object. Pressing the embedded class’s

button on the right prints a message as before; pressing the new button exits the GUI

by a sys.exit call:

C:\...\PP4E\Gui\Intro> python gui6b.py

Hello frame world 43!

Hello frame world 44!

Hello frame world 45!

Hello frame world 46!

In more complex GUIs, we might instead attach large Frame subclasses to other con-

tainer components and develop each independently. For instance, Example 7-22 is yet

another specialized Frame itself, but it attaches an instance of the original Hello class in

a more object-oriented fashion. When run as a top-level program, it creates a window

identical to the one shown in Figure 7-21.

Example 7-22. PP4E\Gui\Intro\gui6c.py

from tkinter import * # get Tk widget classes

from gui6 import Hello # get the subframe class

class HelloContainer(Frame):

def __init__(self, parent=None):

Frame.__init__(self, parent)

self.pack()

self.makeWidgets()

def makeWidgets(self):

Hello(self).pack(side=RIGHT) # attach a Hello to me

Button(self, text='Attach', command=self.quit).pack(side=LEFT)

if __name__ == '__main__': HelloContainer().mainloop()

This looks and works exactly like gui6b but registers the added button’s callback han-

dler as self.quit, which is just the standard quit widget method this class inherits from

Frame. The window this time represents two Python classes at work—the embedded

component’s widgets on the right (the original Hello button) and the container’s

widgets on the left.

Naturally, this is a simple example (we attached only a single button here, after all).

But in more practical user interfaces, the set of widget class objects attached in this way

406 | Chapter 7: Graphical User Interfaces

can be much larger. If you imagine replacing the Hello call in this script with a call to

attach an already coded and fully debugged calculator object, you’ll begin to better

understand the power of this paradigm. If we code all of our GUI components as classes,

they automatically become a library of reusable widgets, which we can combine in other

applications as often as we like.

Extending Class Components

When GUIs are built with classes, there are a variety of ways to reuse their code in other

displays. To extend Hello instead of attaching it, we just override some of its methods

in a new subclass (which itself becomes a specialized Frame widget). This technique is

shown in Example 7-23.

Example 7-23. PP4E\Gui\Intro\gui6d.py

from tkinter import *

from gui6 import Hello

class HelloExtender(Hello):

def make_widgets(self): # extend method here

Hello.make_widgets(self)

Button(self, text='Extend', command=self.quit).pack(side=RIGHT)

def message(self):

print('hello', self.data) # redefine method here

if __name__ == '__main__': HelloExtender().mainloop()

This subclass’s make_widgets method here first builds the superclass’s widgets and then

adds a second Extend button on the right, as shown in Figure 7-22.

Figure 7-22. A customized class’s widgets, on the left

Because it redefines the message method, pressing the original superclass’s button on

the left now prints a different string to stdout (when searching up from self, the

message attribute is found first in this subclass, not in the superclass):

C:\...\PP4E\Gui\Intro> python gui6d.py

hello 42

But pressing the new Extend button on the right, which is added by this subclass, exits

immediately, since the quit method (inherited from Hello, which inherits it from

Reusable GUI Components with Classes | 407

Frame) is the added button’s callback handler. The net effect is that this class customizes

the original to add a new button and change message’s behavior.

Although this example is simple, it demonstrates a technique that can be powerful in

practice: to change a GUI’s behavior, we can write a new class that customizes its parts

rather than changing the existing GUI code in place. The main code need be debugged

only once and can be customized with subclasses as unique needs arise.

The moral of this story is that tkinter GUIs can be coded without ever writing a single

new class, but using classes to structure your GUI code makes it much more reusable

in the long run. If done well, you can both attach already debugged components to new

interfaces and specialize their behavior in new external subclasses as needed for custom

requirements. Either way, the initial upfront investment to use classes is bound to save

coding time in the end.

Standalone Container Classes

Before we move on, I want to point out that it’s possible to reap most of the class-based

component benefits previously mentioned by creating standalone classes not derived

from tkinter Frames o r o t h e r w i d g e t s . F o r i n s t a n c e , t h e c l a s s i n Example 7-24 generates

the window shown in Figure 7-23.

Example 7-24. PP4E\Gui\Intro\gui7.py

from tkinter import *

class HelloPackage: # not a widget subbclass

def __init__(self, parent=None):

self.top = Frame(parent) # embed a Frame

self.top.pack()

self.data = 0

self.make_widgets() # attach widgets to self.top

def make_widgets(self):

Button(self.top, text='Bye', command=self.top.quit).pack(side=LEFT)

Button(self.top, text='Hye', command=self.message).pack(side=RIGHT)

def message(self):

self.data += 1

print('Hello number', self.data)

if __name__ == '__main__': HelloPackage().top.mainloop()

Figure 7-23. A standalone class package in action

408 | Chapter 7: Graphical User Interfaces

When run, the Hye button here prints to stdout and the Bye button closes and exits

the GUI, much as before:

C:\...\PP4E\Gui\Intro> python gui7.py

Hello number 1

Hello number 2

Hello number 3

Hello number 4

Also as before, self.data retains state between events, and callbacks are routed to the

self.message method within this class. Unlike before, the HelloPackage class is not itself

a kind of Frame widget. In fact, it’s not a kind of anything—it serves only as a generator

of namespaces for storing away real widget objects and state. Because of that, widgets

are attached to a self.top (an embedded Frame), not to self. Moreover, all references

to the object as a widget must descend to the embedded frame, as in the top.main

loop call to start the GUI at the end of the script.

This makes for a bit more coding within the class, but it avoids potential name clashes

with both attributes added to self by the tkinter framework and existing tkinter widget

methods. For instance, if you define a config method in your class, it will hide the

config call exported by tkinter. With the standalone class package in this example, you

get only the methods and instance attributes that your class defines.

In practice, tkinter doesn’t use very many names, so this is not generally a big con-

cern.‡ It can happen, of course; but frankly, I’ve never seen a real tkinter name clash in

widget subclasses in some 18 years of Python coding. Moreover, using standalone

classes is not without other downsides. Although they can generally be attached and

subclassed as before, they are not quite plug-and-play compatible with real widget

objects. For instance, the configuration calls made in Example 7-21 for the Frame sub-

class fail in Example 7-25.

Example 7-25. PP4E\Gui\Intro\gui7b.py

from tkinter import *

from gui7 import HelloPackage # or get from gui7c--__getattr__ added

frm = Frame()

frm.pack()

Label(frm, text='hello').pack()

part = HelloPackage(frm)

‡ If you study the tkinter module’s source code (today, mostly in file __init__.py in Lib\tkinter), you’ll notice

that many of the attribute names it creates start with a single underscore to make them unique from yours;

others do not because they are potentially useful outside of the tkinter implementation (e.g., self.master,

self.children). Curiously, at this writing most of tkinter still does not use the Python “pseudoprivate

attributes” trick of prefixing attribute names with two leading underscores to automatically add the enclosing

class’s name and thus localize them to the creating class. If tkinter is ever rewritten to employ this feature,

name clashes will be much less likely in widget subclasses. Most of the attributes of widget classes, though,

are methods intended for use in client scripts; the single underscore names are accessible too, but are less

likely to clash with most names of your own.

Reusable GUI Components with Classes | 409

part.pack(side=RIGHT) # FAILS!--need part.top.pack(side=RIGHT)

frm.mainloop()

This won’t quite work, because part isn’t really a widget. To treat it as such, you must

descend to part.top before making GUI configurations and hope that the name top is

never changed by the class’s developer. In other words, it exposes some of the class’s

internals. The class could make this better by defining a method that always routes

unknown attribute fetches to the embedded Frame, as in Example 7-26.

Example 7-26. PP4E\Gui\Intro\gui7c.py

import gui7

from tkinter import *

class HelloPackage(gui7.HelloPackage):

def __getattr__(self, name):

return getattr(self.top, name) # pass off to a real widget

if __name__ == '__main__': HelloPackage().mainloop() # invokes __getattr__!

As is, this script simply creates Figure 7-23 again; changing Example 7-25 to import

this extended HelloPackage from gui7c, though, produces the correctly-working win-

dow in Figure 7-24.

Figure 7-24. A standalone class package in action

Routing attribute fetches to nested widgets works this way, but that then requires even

more extra coding in standalone package classes. As usual, though, the significance of

all these trade-offs varies per application.

The End of the Tutorial

In this chapter, we learned the core concepts of Python/tkinter programming and met

a handful of simple widget objects along the way—e.g., labels, buttons, frames, and

the packer geometry manager. We’ve seen enough to construct simple interfaces, but

we have really only scratched the surface of the tkinter widget set.

In the next two chapters, we will apply what we’ve learned here to study the rest of the

tkinter library, and we’ll learn how to use it to generate the kinds of interfaces you

expect to see in realistic GUI programs. As a preview and roadmap, Table 7-1 lists the

kinds of widgets we’ll meet there in roughly their order of appearance. Note that this

410 | Chapter 7: Graphical User Interfaces

table lists only widget classes; along the way, we will also meet a few additional widget-

related topics that don’t appear in this table.

Table 7-1. tkinter widget classes

Widget class Description

Label A simple message area

Button A simple labeled push-button widget

Frame A container for attaching and arranging other widget objects

Toplevel, Tk A new window managed by the window manager

Message A multiline label

Entry A simple single-line text-entry field

Checkbutton A two-state button widget, typically used for multiple-choice selections

Radiobutton A two-state button widget, typically used for single-choice selections

Scale A slider widget with scalable positions

PhotoImage An image object used for displaying full-color images on other widgets

BitmapImage An image object used for displaying bitmap images on other widgets

Menu A set of options associated with a Menubutton or top-level window

Menubutton A button that opens a Menu of selectable options and submenus

Scrollbar A control for scrolling other widgets (e.g., listbox, canvas, text)

Listbox A list of selection names

Text A multiline text browse/edit widget, with support for fonts, and so on

Canvas A graphic drawing area, which supports lines, circles, photos, text, and so on

We’ve already met Label, Button, and Frame in this chapter’s tutorial. To make the

remaining topics easier to absorb, they are split over the next two chapters: Chap-

ter 8 covers the first widgets in this table up to but not including Menu, and Chapter 9

presents widgets that are lower in this table.

Besides the widget classes in this table, there are additional classes and tools in the

tkinter library, many of which we’ll explore in the following two chapters as well:

Geometry management

pack, grid, place

tkinter linked variables

StringVar, IntVar, DoubleVar, BooleanVar

Advanced Tk widgets

Spinbox, LabelFrame, PanedWindow

Composite widgets

Dialog, ScrolledText, OptionMenu

The End of the Tutorial | 411

Scheduled callbacks

Widget after, wait, and update methods

Other tools

Standard dialogs, clipboard, bind and Event, widget configuration options, custom

and modal dialogs, animation techniques

Most tkinter widgets are familiar user interface devices. Some are remarkably rich in

functionality. For instance, the Text class implements a sophisticated multiline text

widget that supports fonts, colors, and special effects and is powerful enough to im-

plement a web browser’s page display. The similarly feature-rich Canvas class provides

extensive drawing tools powerful enough for visualization and other image-processing

applications. Beyond this, tkinter extensions such as the Pmw, Tix, and ttk packages

described at the start of this chapter add even richer widgets to a GUI programmer’s

toolbox.

Python/tkinter for Tcl/Tk Converts

At the start of this chapter, I mentioned that tkinter is Python’s interface to the Tk GUI

library, originally written for the Tcl language. To help readers migrating from Tcl to

Python and to summarize some of the main topics we met in this chapter, this section

contrasts Python’s Tk interface with Tcl’s. This mapping also helps make Tk references

written for other languages more useful to Python developers.

In general terms, Tcl’s command-string view of the world differs widely from Python’s

object-based approach to programming. In terms of Tk programming, though, the

syntactic differences are fairly small. Here are some of the main distinctions in Python’s

tkinter interface:

Creation

Widgets are created as class instance objects by calling a widget class.

Masters (parents)

Parents are previously created objects that are passed to widget-class constructors.

Widget options

Options are constructor or config keyword arguments or indexed keys.

Operations

Widget operations (actions) become tkinter widget class object methods.

Callbacks

Callback handlers are any callable objects: function, method, lambda, and so on.

Extension

Widgets are extended using Python class inheritance mechanisms.

Composition

Interfaces are constructed by attaching objects, not by concatenating names.

412 | Chapter 7: Graphical User Interfaces

Linked variables (next chapter)

Variables associated with widgets are tkinter class objects with methods.

In Python, widget creation commands (e.g., button) are Python class names that start

with an uppercase letter (e.g., Button), two-word widget operations (e.g., add command)

become a single method name with an underscore (e.g., add_command), and the “con-

figure” method can be abbreviated as “config,” as in Tcl. In Chapter 8, we will also see

that tkinter “variables” associated with widgets take the form of class instance objects

(e.g., StringVar, IntVar) with get and set methods, not simple Python or Tcl variable

names. Table 7-2 shows some of the primary language mappings in more concrete

terms.

Table 7-2. Tk-to-tkinter mappings

Operation Tcl/Tk Python/tkinter

Creation Frame .panel panel = Frame()

Masters button .panel.quit quit = Button(panel)

Options button .panel.go -fg black go = Button(panel, fg='black')

Configure .panel.go config -bg red go.config(bg='red') go['bg'] = ‘red’

Actions .popup invoke popup.invoke()

Packing pack .panel -side left -fill x panel.pack(side=LEFT, fill=X)

Some of these differences are more than just syntactic, of course. For instance, Python

builds an internal widget object tree based on parent arguments passed to widget con-

structors, without ever requiring concatenated widget pathname strings. Once you’ve

made a widget object, you can use it directly by object reference. Tcl coders can hide

some dotted pathnames by manually storing them in variables, but that’s not quite the

same as Python’s purely object-based model.

Once you’ve written a few Python/tkinter scripts, though, the coding distinctions in

the Python object world will probably seem trivial. At the same time, Python’s support

for object-oriented techniques adds an entirely new component to Tk development;

you get the same widgets, plus Python’s support for code structure and reuse.

Python/tkinter for Tcl/Tk Converts | 413

CHAPTER 8

A tkinter Tour, Part 1

“Widgets and Gadgets and GUIs, Oh My!”

This chapter is a continuation of our look at GUI programming in Python. The previous

chapter used simple widgets—buttons, labels, and the like—to demonstrate the fun-

damentals of Python/tkinter coding. That was simple by design: it’s easier to grasp the

big GUI picture if widget interface details don’t get in the way. But now that we’ve seen

the basics, this chapter and the next move on to present a tour of more advanced widget

objects and tools available in the tkinter library.

As we’ll find, this is where GUI scripting starts getting both practical and fun. In these

two chapters, we’ll meet classes that build the interface devices you expect to see in

real programs—e.g., sliders, check buttons, menus, scrolled lists, dialogs, graphics, and

so on. After these chapters, the last GUI chapter moves on to present larger GUIs that

utilize the coding techniques and the interfaces shown in all prior GUI chapters. In

these two chapters, though, examples are small and self-contained so that we can focus

on widget details.

This Chapter’s Topics

Technically, we’ve already used a handful of simple widgets in Chapter 7. So far we’ve

met Label, Button, Frame, and Tk, and studied pack geometry management concepts

along the way. Although all of these are basic, they represent tkinter interfaces in general

and can be workhorses in typical GUIs. Frame containers, for instance, are the basis of

hierarchical display layout.

In this and the following chapter, we’ll explore additional options for widgets we’ve

already seen and move beyond the basics to cover the rest of the tkinter widget set.

Here are some of the widgets and topics we’ll explore in this chapter:

•Toplevel and Tk widgets

•Message and Entry widgets

•Checkbutton, Radiobutton, and Scale widgets

415

• Images: PhotoImage and BitmapImage objects

• Widget and window configuration options

• Dialogs, both standard and custom

• Low-level event binding

• tkinter linked variable objects

• Using the Python Imaging Library (PIL) extension for other image types and

operations

After this chapter, Chapter 9 concludes the two-part tour by presenting the remainder

of the tkinter library’s tool set: menus, text, canvases, animation, and more.

To make this tour interesting, I’ll also introduce a few notions of component reuse

along the way. For instance, some later examples will be built using components written

for prior examples. Although these two tour chapters introduce widget interfaces, this

book is also about Python programming in general; as we’ll see, tkinter programming

in Python can be much more than simply drawing circles and arrows.

Configuring Widget Appearance

So far, all the buttons and labels in examples have been rendered with a default look-

and-feel that is standard for the underlying platform. With my machine’s color scheme,

that usually means that they’re gray on Windows. tkinter widgets can be made to look

arbitrarily different, though, using a handful of widget and packer options.

Because I generally can’t resist the temptation to customize widgets in examples, I want

to cover this topic early on the tour. Example 8-1 introduces some of the configuration

options available in tkinter.

Example 8-1. PP4E\Gui\Tour\config-label.py

from tkinter import *

root = Tk()

labelfont = ('times', 20, 'bold') # family, size, style

widget = Label(root, text='Hello config world')

widget.config(bg='black', fg='yellow') # yellow text on black label

widget.config(font=labelfont) # use a larger font

widget.config(height=3, width=20) # initial size: lines,chars

widget.pack(expand=YES, fill=BOTH)

root.mainloop()

Remember, we can call a widget’s config method to reset its options at any time, instead

of passing all of them to the object’s constructor. Here, we use it to set options that

produce the window in Figure 8-1.

This may not be completely obvious unless you run this script on a real computer (alas,

I can’t show it in color here), but the label’s text shows up in yellow on a black

416 | Chapter 8: A tkinter Tour, Part 1

background, and with a font that’s very different from what we’ve seen so far. In fact,

this script customizes the label in a number of ways:

Color

By setting the bg option of the label widget here, its background is displayed in

black; the fg option similarly changes the foreground (text) color of the widget to

yellow. These color options work on most tkinter widgets and accept either a sim-

ple color name (e.g., 'blue') or a hexadecimal string. Most of the color names you

are familiar with are supported (unless you happen to work for Crayola). You can

also pass a hexadecimal color identifier string to these options to be more specific;

they start with a # and name a color by its red, green, and blue saturations, with

an equal number of bits in the string for each. For instance, '#ff0000' specifies

eight bits per color and defines pure red; “f” means four “1” bits in hexadecimal.

We’ll come back to this hex form when we meet the color selection dialog later in

this chapter.

Size

The label is given a preset size in lines high and characters wide by setting its

height and width attributes. You can use this setting to make the widget larger than

the tkinter geometry manager would by default.

Font

This script specifies a custom font for the label’s text by setting the label’s font

attribute to a three-item tuple giving the font family, size, and style (here: Times,

20-point, and bold). Font style can be normal, bold, roman, italic, underline, over

strike, or combinations of these (e.g., “bold italic”). tkinter guarantees that Times,

Courier, and Helvetica font family names exist on all platforms, but others may

work, too (e.g., system gives the system font on Windows). Font settings like this

work on all widgets with text, such as labels, buttons, entry fields, listboxes, and

Text (the latter of which can even display more than one font at once with “tags”).

The font option still accepts older X-Windows-style font indicators—long strings

with dashes and stars—but the newer tuple font indicator form is more platform

independent.

Figure 8-1. A custom label appearance

Configuring Widget Appearance | 417

Layout and expansion

Finally, the label is made generally expandable and stretched by setting the pack

expand and fill options we met in the last chapter; the label grows as the window

does. If you maximize this window, its black background fills the whole screen and

the yellow message is centered in the middle; try it.

In this script, the net effect of all these settings is that this label looks radically different

from the ones we’ve been making so far. It no longer follows the Windows standard

look-and-feel, but such conformance isn’t always important. For reference, tkinter

provides additional ways to customize appearance that are not used by this script, but

which may appear in others:

Border and relief

A bd= N widget option can be used to set border width, and a relief= S option can

specify a border style; S can be FLAT, SUNKEN, RAISED, GROOVE, SOLID, or RIDGE—all

constants exported by the tkinter module.

Cursor

A cursor option can be given to change the appearance of the mouse pointer when

it moves over the widget. For instance, cursor='gumby' changes the pointer to a

Gumby figure (the green kind). Other common cursor names used in this book

include watch, pencil, cross, and hand2.

State

Some widgets also support the notion of a state, which impacts their appearance.

For example, a state=DISABLED option will generally stipple (gray out) a widget on

screen and make it unresponsive; NORMAL does not. Some widgets support a

READONLY state as well, which displays normally but is unresponsive to changes.

Padding

Extra space can be added around many widgets (e.g., buttons, labels, and text)

with the padx= N and pady= N options. Interestingly, you can set these options both

in pack calls (where it adds empty space around the widget in general) and in a

widget object itself (where it makes the widget larger).

To illustrate some of these extra settings, Example 8-2 configures the custom button

captured in Figure 8-2 and changes the mouse pointer when it is positioned above it.

Example 8-2. PP4E\Gui\Tour\config-button.py

from tkinter import *

widget = Button(text='Spam', padx=10, pady=10)

widget.pack(padx=20, pady=20)

widget.config(cursor='gumby')

widget.config(bd=8, relief=RAISED)

widget.config(bg='dark green', fg='white')

widget.config(font=('helvetica', 20, 'underline italic'))

mainloop()

418 | Chapter 8: A tkinter Tour, Part 1

To see the effects generated by these two scripts’ settings, try out a few changes on your

computer. Most widgets can be given a custom appearance in the same way, and we’ll

see such options used repeatedly in this text. We’ll also meet operational configura-

tions, such as focus (for focusing input) and others. In fact, widgets can have dozens

of options; most have reasonable defaults that produce a native look-and-feel on each

windowing platform, and this is one reason for tkinter’s simplicity. But tkinter lets you

build more custom displays when you want to.

For more on ways to apply configuration options to provide common

look-and-feel for your widgets, refer back to “Customizing Widgets

with Classes” on page 400, especially its ThemedButton examples. Now

that you know more about configuration, its examples’ source code

should more readily show how configurations applied in widget sub-

classes are automatically inherited by all instances and subclasses. The

new ttk extension described in Chapter 7 also provides additional ways

to configure widgets with its notion of themes; see the preceding chapter

for more details and resources on ttk.

Top-Level Windows

tkinter GUIs always have an application root window, whether you get it by default or

create it explicitly by calling the Tk object constructor. This main root window is the

one that opens when your program runs, and it is where you generally pack your most

important and long-lived widgets. In addition, tkinter scripts can create any number

of independent windows, generated and popped up on demand, by creating Toplevel

widget objects.

Each Toplevel object created produces a new window on the display and automatically

adds it to the program’s GUI event-loop processing stream (you don’t need to call the

mainloop method of new windows to activate them). Example 8-3 builds a root and two

pop-up windows.

Figure 8-2. Config button at work

Top-Level Windows | 419

Example 8-3. PP4E\Gui\Tour\toplevel0.py

import sys

from tkinter import Toplevel, Button, Label

win1 = Toplevel() # two independent windows

win2 = Toplevel() # but part of same process

Button(win1, text='Spam', command=sys.exit).pack()

Button(win2, text='SPAM', command=sys.exit).pack()

Label(text='Popups').pack() # on default Tk() root window

win1.mainloop()

The toplevel0 script gets a root window by default (that’s what the Label is attached to,

since it doesn’t specify a real parent), but it also creates two standalone Toplevel win-

dows that appear and function independently of the root window, as seen in

Figure 8-3.

Figure 8-3. Two Toplevel windows and a root window

The two Toplevel windows on the right are full-fledged windows; they can be inde-

pendently iconified, maximized, and so on. Toplevels are typically used to implement

multiple-window displays and pop-up modal and nonmodal dialogs (more on dialogs

in the next section). They stay up until they are explicitly destroyed or until the appli-

cation that created them exits.

In fact, as coded here, pressing the X in the upper right corner of either of the

Toplevel windows kills that window only. On the other hand, the entire program and

all it remaining windows are closed if you press either of the created buttons or the

main window’s X (more on shutdown protocols in a moment).

It’s important to know that although Toplevels are independently active windows, they

are not separate processes; if your program exits, all of its windows are erased, including

all Toplevel windows it may have created. We’ll learn how to work around this rule

later by launching independent GUI programs.

420 | Chapter 8: A tkinter Tour, Part 1

Toplevel and Tk Widgets

A Toplevel is roughly like a Frame that is split off into its own window and has additional

methods that allow you to deal with top-level window properties. The Tk widget is

roughly like a Toplevel, but it is used to represent the application root window.

Toplevel windows have parents, but Tk windows do not—they are the true roots of the

widget hierarchies we build when making tkinter GUIs.

We got a Tk root for free in Example 8-3 because the Label had a default parent, des-

ignated by not having a widget in the first argument of its constructor call:

Label(text='Popups').pack() # on default Tk() root window

Passing None to a widget constructor’s first argument (or to its master keyword argu-

ment) has the same default-parent effect. In other scripts, we’ve made the Tk root more

explicit by creating it directly, like this:

root = Tk()

Label(root, text='Popups').pack() # on explicit Tk() root window

root.mainloop()

In fact, because tkinter GUIs are a hierarchy, by default you always get at least one Tk

root window, whether it is named explicitly, as here, or not. Though not typical, there

may be more than one Tk root if you make them manually, and a program ends if all

its Tk windows are closed. The first Tk top-level window created—whether explicitly

by your code, or automatically by Python when needed—is used as the default parent

window of widgets and other windows if no parent is provided.

You should generally use the Tk root window to display top-level information of some

sort. If you don’t attach widgets to the root, it may show up as an odd empty window

when you run your script (often because you used the default parent unintentionally

in your code by omitting a widget’s parent and didn’t pack widgets attached to it).

Technically, you can suppress the default root creation logic and make multiple root

windows with the Tk widget, as in Example 8-4.

Example 8-4. PP4E\Gui\Tour\toplevel1.py

import tkinter

from tkinter import Tk, Button

tkinter.NoDefaultRoot()

win1 = Tk() # two independent root windows

win2 = Tk()

Button(win1, text='Spam', command=win1.destroy).pack()

Button(win2, text='SPAM', command=win2.destroy).pack()

win1.mainloop()

When run, this script displays the two pop-up windows of the screenshot in Fig-

ure 8-3 only (there is no third root window). But it’s more common to use the Tk root

as a main window and create Toplevel widgets for an application’s pop-up windows.

Top-Level Windows | 421

Notice how this GUI’s windows use a window’s destroy method to close just one

window, instead of sys.exit to shut down the entire program; to see how this method

really does its work, let’s move on to window protocols.

Top-Level Window Protocols

Both Tk and Toplevel widgets export extra methods and features tailored for their top-

level role, as illustrated in Example 8-5.

Example 8-5. PP4E\Gui\Tour\toplevel2.py

"""

pop up three new windows, with style

destroy() kills one window, quit() kills all windows and app (ends mainloop);

top-level windows have title, icon, iconify/deiconify and protocol for wm events;

there always is an application root window, whether by default or created as an

explicit Tk() object; all top-level windows are containers, but they are never

packed/gridded; Toplevel is like Frame, but a new window, and can have a menu;

"""

from tkinter import *

root = Tk() # explicit root

trees = [('The Larch!', 'light blue'),

('The Pine!', 'light green'),

('The Giant Redwood!', 'red')]

for (tree, color) in trees:

win = Toplevel(root) # new window

win.title('Sing...') # set border

win.protocol('WM_DELETE_WINDOW', lambda:None) # ignore close

win.iconbitmap('py-blue-trans-out.ico') # not red Tk

msg = Button(win, text=tree, command=win.destroy) # kills one win

msg.pack(expand=YES, fill=BOTH)

msg.config(padx=10, pady=10, bd=10, relief=RAISED)

msg.config(bg='black', fg=color, font=('times', 30, 'bold italic'))

root.title('Lumberjack demo')

Label(root, text='Main window', width=30).pack()

Button(root, text='Quit All', command=root.quit).pack() # kills all app

root.mainloop()

This program adds widgets to the Tk root window, immediately pops up three

Toplevel windows with attached buttons, and uses special top-level protocols. When

run, it generates the scene captured in living black-and-white in Figure 8-4 (the buttons’

text shows up blue, green, and red on a color display).

422 | Chapter 8: A tkinter Tour, Part 1

Figure 8-4. Three Toplevel windows with configurations

There are a few operational details worth noticing here, all of which are more obvious

if you run this script on your machine:

Intercepting closes: protocol

Because the window manager close event has been intercepted by this script using

the top-level widget protocol method, pressing the X in the top-right corner doesn’t

do anything in the three Toplevel pop ups. The name string WM_DELETE_WINDOW

identifies the close operation. You can use this interface to disallow closes apart

from the widgets your script creates. The function created by this script’s

lambda:None does nothing but return None.

Killing one window (and its children): destroy

Pressing the big black buttons in any one of the three pop ups kills that pop up

only, because the pop up runs the widget destroy method. The other windows live

on, much as you would expect of a pop-up dialog window. Technically, this call

destroys the subject widget and any other widgets for which it is a parent. For

windows, this includes all their content. For simpler widgets, the widget is erased.

Because Toplevel windows have parents, too, their relationships might matter on

a destroy—destroying a window, even the automatic or first-made Tk root which

is used as the default parent, also destroys all its child windows. Since Tk root

windows have no parents, they are unaffected by destroys of other windows.

Moreover, destroying the last Tk root window remaining (or the only Tk root cre-

ated) effectively ends the program. Toplevel windows, however, are always de-

stroyed with their parents, and their destruction doesn’t impact other windows to

which they are not ancestors. This makes them ideal for pop-up dialogs. Techni-

cally, a Toplevel can be a child of any type of widget and will be destroyed with it,

though they are usually children of an automatic or explicit Tk.

Top-Level Windows | 423

Killing all windows: quit

To kill all the windows at once and end the GUI application (really, its active

mainloop call), the root window’s button runs the quit method instead. That is,

pressing the root window’s button ends the program. In general, the quit method

immediately ends the entire application and closes all its windows. It can be called

through any tkinter widget, not just through the top-level window; it’s also avail-

able on frames, buttons, and so on. See the discussion of the bind method and its

<Destroy> events later in this chapter for more on quit and destroy.

Window titles: title

As introduced in Chapter 7, top-level window widgets (Tk and Toplevel) have a

title method that lets you change the text displayed on the top border. Here, the

window title text is set to the string 'Sing...' in the pop-ups to override the default

'tk'.

Window icons: iconbitmap

The iconbitmap method changes a top-level window’s icon. It accepts an icon or

bitmap file and uses it for the window’s icon graphic when it is both minimized

and open. On Windows, pass in the name of a .ico file (this example uses one in

the current directory); it will replace the default red “Tk” icon that normally ap-

pears in the upper-lefthand corner of the window as well as in the Windows task-

bar. On other platforms, you may need to use other icon file conventions if the

icon calls in this book won’t work for you (or simply comment-out the calls alto-

gether if they cause scripts to fail); icons tend to be a platform-specific feature that

is dependent upon the underlying window manager.

Geometry management

Top-level windows are containers for other widgets, much like a standalone

Frame. Unlike frames, though, top-level window widgets are never themselves

packed (or gridded, or placed). To embed widgets, this script passes its windows

as parent arguments to label and button constructors.

It is also possible to fetch the maximum window size (the physical screen display

size, as a [width, height] tuple) with the maxsize() method, as well as set the initial

size of a window with the top-level geometry(" width x height + x + y ") method. It

is generally easier and more user-friendly to let tkinter (or your users) work out

window size for you, but display size may be used for tasks such as scaling images

(see the discussion on PyPhoto in Chapter 11 for an example).

In addition, top-level window widgets support other kinds of protocols that we will

utilize later on in this tour:

State

The iconify and withdraw top-level window object methods allow scripts to hide

and erase a window on the fly; deiconify redraws a hidden or erased window. The

state method queries or changes a window’s state; valid states passed in or re-

turned include iconic, withdrawn, zoomed (full screen on Windows: use geometry

424 | Chapter 8: A tkinter Tour, Part 1

elsewhere), and normal (large enough for window content). The methods lift and

lower raise and lower a window with respect to its siblings (lift is the Tk raise

command, but avoids a Python reserved word). See the alarm scripts near the end

of Chapter 9 for usage.

Menus

Each top-level window can have its own window menus too; both the Tk and the

Toplevel widgets have a menu option used to associate a horizontal menu bar of

pull-down option lists. This menu bar looks as it should on each platform on which

your scripts are run. We’ll explore menus early in Chapter 9.

Most top-level window-manager-related methods can also be named with a “wm_” at

the front; for instance, state and protocol can also be called wm_state and wm_protocol.

Notice that the script in Example 8-3 passes its Toplevel constructor calls an explicit

parent widget—the Tk root window (that is, Toplevel(root)). Toplevels can be asso-

ciated with a parent just as other widgets can, even though they are not visually em-

bedded in their parents. I coded the script this way to avoid what seems like an odd

feature; if coded instead like this:

win = Toplevel() # new window

and if no Tk root yet exists, this call actually generates a default Tk root window to serve

as the Toplevel’s parent, just like any other widget call without a parent argument. The

problem is that this makes the position of the following line crucial:

root = Tk() # explicit root

If this line shows up above the Toplevel calls, it creates the single root window as

expected. But if you move this line below the Toplevel calls, tkinter creates a default

Tk root window that is different from the one created by the script’s explicit Tk call. You

wind up with two Tk roots just as in Example 8-4. Move the Tk call below the

Toplevel calls and rerun it to see what I mean. You’ll get a fourth window that is com-

pletely empty! As a rule of thumb, to avoid such oddities, make your Tk root windows

early on and make them explicit.

All of the top-level protocol interfaces are available only on top-level window widgets,

but you can often access them by going through other widgets’ master attributes—links

to the widget parents. For example, to set the title of a window in which a frame is

contained, say something like this:

theframe.master.title('Spam demo') # master is the container window

Naturally, you should do so only if you’re sure that the frame will be used in only one

kind of window. General-purpose attachable components coded as classes, for in-

stance, should leave window property settings to their client applications.

Top-level widgets have additional tools, some of which we may not meet in this book.

For instance, under Unix window managers, you can also set the name used on the

window’s icon (iconname). Because some icon options may be useful when scripts run

Top-Level Windows | 425

on Unix only, see other Tk and tkinter resources for more details on this topic. For

now, the next scheduled stop on this tour explores one of the more common uses of

top-level windows.

Dialogs

Dialogs are windows popped up by a script to provide or request additional informa-

tion. They come in two flavors, modal and nonmodal:

Modal

These dialogs block the rest of the interface until the dialog window is dismissed;

users must reply to the dialog before the program continues.

Nonmodal

These dialogs can remain on-screen indefinitely without interfering with other

windows in the interface; they can usually accept inputs at any time.

Regardless of their modality, dialogs are generally implemented with the Toplevel win-

dow object we met in the prior section, whether you make the Toplevel or not. There

are essentially three ways to present pop-up dialogs to users with tkinter—by using

common dialog calls, by using the now-dated Dialog object, and by creating custom

dialog windows with Toplevels and other kinds of widgets. Let’s explore the basics of

all three schemes.

Standard (Common) Dialogs

Because standard dialog calls are simpler, let’s start here first. tkinter comes with a

collection of precoded dialog windows that implement many of the most common pop

ups programs generate—file selection dialogs, error and warning pop ups, and question

and answer prompts. They are called standard dialogs (and sometimes common dia-

logs) because they are part of the tkinter library, and they use platform-specific library

calls to look like they should on each platform. A tkinter file open dialog, for instance,

looks like any other on Windows.

All standard dialog calls are modal (they don’t return until the dialog box is dismissed

by the user), and they block the program’s main window while they are displayed.

Scripts can customize these dialogs’ windows by passing message text, titles, and the

like. Since they are so simple to use, let’s jump right into Example 8-6 (coded as

a .pyw file here to avoid a shell pop up when clicked in Windows).

Example 8-6. PP4E\Gui\Tour\dlg1.pyw

from tkinter import *

from tkinter.messagebox import *

def callback():

if askyesno('Verify', 'Do you really want to quit?'):

showwarning('Yes', 'Quit not yet implemented')

426 | Chapter 8: A tkinter Tour, Part 1

else:

showinfo('No', 'Quit has been cancelled')

errmsg = 'Sorry, no Spam allowed!'

Button(text='Quit', command=callback).pack(fill=X)

Button(text='Spam', command=(lambda: showerror('Spam', errmsg))).pack(fill=X)

mainloop()

A lambda anonymous function is used here to wrap the call to showerror so that it is

passed two hardcoded arguments (remember, button-press callbacks get no arguments

from tkinter itself). When run, this script creates the main window in Figure 8-5.

Figure 8-5. dlg1 main window: buttons to trigger pop ups

When you press this window’s Quit button, the dialog in Figure 8-6 is popped up by

calling the standard askyesno function in the tkinter package’s messagebox module. This

looks different on Unix and Macintosh systems, but it looks like you’d expect when

run on Windows (and in fact varies its appearance even across different versions and

configurations of Windows—using my default Window 7 setup, it looks slightly dif-

ferent than it did on Windows XP in the prior edition).

The dialog in Figure 8-6 blocks the program until the user clicks one of its buttons; if

the dialog’s Yes button is clicked (or the Enter key is pressed), the dialog call returns

with a true value and the script pops up the standard dialog in Figure 8-7 by calling

showwarning.

Figure 8-6. dlg1 askyesno dialog (Windows 7)

Dialogs | 427

There is nothing the user can do with Figure 8-7’s dialog but press OK. If No is clicked

in Figure 8-6’s quit verification dialog, a showinfo call creates the pop up in Fig-

ure 8-8 instead. Finally, if the Spam button is clicked in the main window, the standard

dialog captured in Figure 8-9 is generated with the standard showerror call.

Figure 8-8. dlg1 showinfo dialog

Figure 8-9. dlg1 showerror dialog

All of this makes for a lot of window pop ups, of course, and you need to be careful not

to rely on these dialogs too much (it’s generally better to use input fields in long-lived

Figure 8-7. dlg1 showwarning dialog

428 | Chapter 8: A tkinter Tour, Part 1

windows than to distract the user with pop ups). But where appropriate, such pop ups

save coding time and provide a nice native look-and-feel.

A “smart” and reusable Quit button

Let’s put some of these canned dialogs to better use. Example 8-7 implements an at-

tachable Quit button that uses standard dialogs to verify the quit request. Because it’s

a class, it can be attached and reused in any application that needs a verifying Quit

button. Because it uses standard dialogs, it looks as it should on each GUI platform.

Example 8-7. PP4E\Gui\Tour\quitter.py

"""

a Quit button that verifies exit requests;

to reuse, attach an instance to other GUIs, and re-pack as desired

"""

from tkinter import * # get widget classes

from tkinter.messagebox import askokcancel # get canned std dialog

class Quitter(Frame): # subclass our GUI

def __init__(self, parent=None): # constructor method

Frame.__init__(self, parent)

self.pack()

widget = Button(self, text='Quit', command=self.quit)

widget.pack(side=LEFT, expand=YES, fill=BOTH)

def quit(self):

ans = askokcancel('Verify exit', "Really quit?")

if ans: Frame.quit(self)

if __name__ == '__main__': Quitter().mainloop()

This module is mostly meant to be used elsewhere, but it puts up the button it imple-

ments when run standalone. Figure 8-10 shows the Quit button itself in the upper left,

and the askokcancel verification dialog that pops up when Quit is pressed.

Figure 8-10. Quitter, with askokcancel dialog

Dialogs | 429

If you press OK here, Quitter runs the Frame quit method to end the GUI to which this

button is attached (really, the mainloop call). But to really understand how such a spring-

loaded button can be useful, we need to move on and study a client GUI in the next

section.

A dialog demo launcher bar

So far, we’ve seen a handful of standard dialogs, but there are quite a few more. Instead

of just throwing these up in dull screenshots, though, let’s write a Python demo script

to generate them on demand. Here’s one way to do it. First of all, in Example 8-8 we

write a module to define a table that maps a demo name to a standard dialog call (and

we use lambda to wrap the call if we need to pass extra arguments to the dialog

function).

Example 8-8. PP4E\Gui\Tour\dialogTable.py

# define a name:callback demos table

from tkinter.filedialog import askopenfilename # get standard dialogs

from tkinter.colorchooser import askcolor # they live in Lib\tkinter

from tkinter.messagebox import askquestion, showerror

from tkinter.simpledialog import askfloat

demos = {

'Open': askopenfilename,

'Color': askcolor,

'Query': lambda: askquestion('Warning', 'You typed "rm *"\nConfirm?'),

'Error': lambda: showerror('Error!', "He's dead, Jim"),

'Input': lambda: askfloat('Entry', 'Enter credit card number')

}

I put this table in a module so that it might be reused as the basis of other demo scripts

later (dialogs are more fun than printing to stdout). Next, we’ll write a Python script,

shown in Example 8-9, which simply generates buttons for all of this table’s entries—

use its keys as button labels and its values as button callback handlers.

Example 8-9. PP4E\Gui\Tour\demoDlg.py

"create a bar of simple buttons that launch dialog demos"

from tkinter import * # get base widget set

from dialogTable import demos # button callback handlers

from quitter import Quitter # attach a quit object to me

class Demo(Frame):

def __init__(self, parent=None, **options):

Frame.__init__(self, parent, **options)

self.pack()

Label(self, text="Basic demos").pack()

for (key, value) in demos.items():

Button(self, text=key, command=value).pack(side=TOP, fill=BOTH)

Quitter(self).pack(side=TOP, fill=BOTH)

430 | Chapter 8: A tkinter Tour, Part 1

if __name__ == '__main__': Demo().mainloop()

This script creates the window shown in Figure 8-11 when run as a standalone program;

it’s a bar of demo buttons that simply route control back to the values of the table in

the module dialogTable when pressed.

Figure 8-11. demoDlg main window

Notice that because this script is driven by the contents of the dialogTable module’s

dictionary, we can change the set of demo buttons displayed by changing just

dialogTable (we don’t need to change any executable code in demoDlg). Also note that

the Quit button here is an attached instance of the Quitter class of the prior section

whose frame is repacked to stretch like the other buttons as needed here—it’s at least

one bit of code that you never have to write again.

This script’s class also takes care to pass any **options constructor configuration key-

word arguments on to its Frame superclass. Though not used here, this allows callers

to pass in configuration options at creation time (Demo(o=v)), instead of configuring

after the fact (d.config(o=v)). This isn’t strictly required, but it makes the demo class

work just like a normal tkinter frame widget (which is what subclassing makes it, after

all). We’ll see how this can be used to good effect later.

We’ve already seen some of the dialogs triggered by this demo bar window’s other

buttons, so I’ll just step through the new ones here. Pressing the main window’s Query

button, for example, generates the standard pop up in Figure 8-12.

This askquestion dialog looks like the askyesno we saw earlier, but actually it returns

either string "yes" or "no" (askyesno and askokcancel return True or False instead—

trivial but true). Pressing the demo bar’s Input button generates the standard ask

float dialog box shown in Figure 8-13.

Dialogs | 431

This dialog automatically checks the input for valid floating-point syntax before it re-

turns, and it is representative of a collection of single-value input dialogs (askinteger

and askstring prompt for integer and string inputs, too). It returns the input as a

floating-point number object (not as a string) when the OK button or Enter key is

pressed, or the Python None object if the user clicks Cancel. Its two relatives return the

input as integer and string objects instead.

When the demo bar’s Open button is pressed, we get the standard file open dialog

made by calling askopenfilename and captured in Figure 8-14. This is Windows 7’s

look-and-feel; it can look radically different on Macs, Linux, and older versions of

Windows, but appropriately so.

A similar dialog for selecting a save-as filename is produced by calling asksaveasfile

name (see the Text widget section in Chapter 9 for a first example). Both file dialogs let

the user navigate through the filesystem to select a subject filename, which is returned

with its full directory pathname when Open is pressed; an empty string comes back if

Cancel is pressed instead. Both also have additional protocols not demonstrated by this

example:

• They can be passed a filetypes keyword argument—a set of name patterns used

to select files, which appear in the pull-down list near the bottom of the dialog.

Figure 8-12. demoDlg query, askquestion dialog

Figure 8-13. demoDlg input, askfloat dialog

432 | Chapter 8: A tkinter Tour, Part 1

• They can be passed an initialdir (start directory), initialfile (for “File name”),

title (for the dialog window), defaultextension (appended if the selection has

none), and parent (to appear as an embedded child instead of a pop-up dialog).

• They can be made to remember the last directory selected by using exported objects

instead of these function calls—a hook we’ll make use of in later longer-lived

examples.

Another common dialog call in the tkinter filedialog module, askdirectory, can be

used to pop up a dialog that allows users to choose a directory rather than a file. It

presents a tree view that users can navigate to pick the desired directory, and it accepts

keyword arguments including initialdir and title. The corresponding Directory ob-

ject remembers the last directory selected and starts there the next time the dialog is

shown.

We’ll use most of these interfaces later in the book, especially for the file dialogs in the

PyEdit example in Chapter 11, but feel free to flip ahead for more details now. The

directory selection dialog will show up in the PyPhoto example in Chapter 11 and the

PyMailGUI example in Chapter 14; again, skip ahead for code and screenshots.

Finally, the demo bar’s Color button triggers a standard askcolor call, which generates

the standard color selection dialog shown in Figure 8-15.

Figure 8-14. demoDlg open, askopenfilename dialog

Dialogs | 433

Figure 8-15. demoDlg color, askcolor dialog

If you press its OK button, it returns a data structure that identifies the selected color,

which can be used in all color contexts in tkinter. It includes RGB values and a hexa-

decimal color string (e.g., ((160, 160, 160), '#a0a0a0')). More on how this tuple can

be useful in a moment. If you press Cancel, the script gets back a tuple containing two

nones (Nones of the Python variety, that is).

Printing dialog results and passing callback data with lambdas

The dialog demo launcher bar displays standard dialogs and can be made to display

others by simply changing the dialogTable module it imports. As coded, though, it

really shows only dialogs; it would also be nice to see their return values so that we

know how to use them in scripts. Example 8-10 adds printing of standard dialog results

to the stdout standard output stream.

Example 8-10. PP4E\Gui\Tour\demoDlg-print.py

"""

similar, but show return values of dialog calls; the lambda saves data from

the local scope to be passed to the handler (button press handlers normally

get no arguments, and enclosing scope references don't work for loop variables)

and works just like a nested def statement: def func(key=key): self.printit(key)

"""

434 | Chapter 8: A tkinter Tour, Part 1

from tkinter import * # get base widget set

from dialogTable import demos # button callback handlers

from quitter import Quitter # attach a quit object to me

class Demo(Frame):

def __init__(self, parent=None):

Frame.__init__(self, parent)

self.pack()

Label(self, text="Basic demos").pack()

for key in demos:

func = (lambda key=key: self.printit(key))

Button(self, text=key, command=func).pack(side=TOP, fill=BOTH)

Quitter(self).pack(side=TOP, fill=BOTH)

def printit(self, name):

print(name, 'returns =>', demos[name]()) # fetch, call, print

if __name__ == '__main__': Demo().mainloop()

This script builds the same main button-bar window, but notice that the callback han-

dler is an anonymous function made with a lambda now, not a direct reference to dialog

calls in the imported dialogTable dictionary:

# use enclosing scope lookup

func = (lambda key=key: self.printit(key))

We talked about this in the prior chapter’s tutorial, but this is the first time we’ve

actually used lambda like this, so let’s get the facts straight. Because button-press call-

backs are run with no arguments, if we need to pass extra data to the handler, it must

be wrapped in an object that remembers that extra data and passes it along, by deferring

the call to the actual handler. Here, a button press runs the function generated by the

lambda, an indirect call layer that retains information from the enclosing scope. The

net effect is that the real handler, printit, receives an extra required name argument

giving the demo associated with the button pressed, even though this argument wasn’t

passed back from tkinter itself. In effect, the lambda remembers and passes on state

information.

Notice, though, that this lambda function’s body references both self and key in the

enclosing method’s local scope. In all recent Pythons, the reference to self just works

because of the enclosing function scope lookup rules, but we need to pass key in ex-

plicitly with a default argument or else it will be the same in all the generated lambda

functions—the value it has after the last loop iteration. As we learned in Chapter 7,

enclosing scope references are resolved when the nested function is called, but defaults

are resolved when the nested function is created. Because self won’t change after the

function is made, we can rely on the scope lookup rules for that name, but not for loop

variables like key.

Dialogs | 435

In earlier Pythons, default arguments were required to pass all values in from enclosing

scopes explicitly, using either of these two techniques:

# use simple defaults

func = (lambda self=self, name=key: self.printit(name))

# use a bound method default

func = (lambda handler=self.printit, name=key: handler(name))

Today, we can get away with the simpler enclosing -scope reference technique for

self, though we still need a default for the key loop variable (and you may still see the

default forms in older Python code).

Note that the parentheses around the lambdas are not required here; I add them as a

personal style preference just to set the lambda off from its surrounding code (your

mileage may vary). Also notice that the lambda does the same work as a nested def

statement here; in practice, though, the lambda could appear within the call to

Button itself because it is an expression and it need not be assigned to a name. The

following two forms are equivalent:

for (key, value) in demos.items():

func = (lambda key=key: self.printit(key)) # can be nested i Button()

for (key, value) in demos.items():

def func(key=key): self.printit(key) # but def statement cannot

You can also use a callable class object here that retains state as instance attributes (see

the tutorial’s __call__ example in Chapter 7 for hints). But as a rule of thumb, if you

want a lambda’s result to use any names from the enclosing scope when later called,

either simply name them and let Python save their values for future use, or pass them

in with defaults to save the values they have at lambda function creation time. The latter

scheme is required only if the variable used may change before the callback occurs.

When run, this script creates the same window (Figure 8-11) but also prints dialog

return values to standard output; here is the output after clicking all the demo buttons

in the main window and picking both Cancel/No and then OK/Yes buttons in each

dialog:

C:\...\PP4E\Gui\Tour> python demoDlg-print.py

Color returns => (None, None)

Color returns => ((128.5, 128.5, 255.99609375), '#8080ff')

Query returns => no

Query returns => yes

Input returns => None

Input returns => 3.14159

Open returns =>

Open returns => C:/Users/mark/Stuff/Books/4E/PP4E/dev/Examples/PP4E/Launcher.py

Error returns => ok

Now that I’ve shown you these dialog results, I want to next show you how one of them

can actually be useful.

436 | Chapter 8: A tkinter Tour, Part 1

Letting users select colors on the fly

The standard color selection dialog isn’t just another pretty face—scripts can pass the

hexadecimal color string it returns to the bg and fg widget color configuration options

we met earlier. That is, bg and fg accept both a color name (e.g., blue) and an ask

color hex RGB result string that starts with a # (e.g., the #8080ff in the last output line

of the prior section).

This adds another dimension of customization to tkinter GUIs: instead of hardcoding

colors in your GUI products, you can provide a button that pops up color selectors that

let users choose color preferences on the fly. Simply pass the color string to widget

config methods in callback handlers, as in Example 8-11.

Example 8-11. PP4E\Gui\Tour\setcolor.py

from tkinter import *

from tkinter.colorchooser import askcolor

def setBgColor():

(triple, hexstr) = askcolor()

if hexstr:

print(hexstr)

push.config(bg=hexstr)

root = Tk()

push = Button(root, text='Set Background Color', command=setBgColor)

push.config(height=3, font=('times', 20, 'bold'))

push.pack(expand=YES, fill=BOTH)

root.mainloop()

This script creates the window in Figure 8-16 when launched (its button’s background

is a sort of green, but you’ll have to trust me on this). Pressing the button pops up the

color selection dialog shown earlier; the color you pick in that dialog becomes the

background color of this button after you press OK.

Figure 8-16. setcolor main window

Dialogs | 437

Color strings are also printed to the stdout stream (the console window); run this on

your computer to experiment with available color settings:

C:\...\PP4E\Gui\Tour> python setcolor.py

#0080c0

#408080

#77d5df

Other standard dialog calls

We’ve seen most of the standard dialogs and we’ll use these pop ups in examples

throughout the rest of this book. But for more details on other calls and options avail-

able, either consult other tkinter documentation or browse the source code of the

modules used at the top of the dialogTable module in Example 8-8; all are simple

Python files installed in the tkinter subdirectory of the Python source library on your

machine (e.g., in C:\Python31\Lib on Windows). And keep this demo bar example filed

away for future reference; we’ll reuse it later in the tour for callback actions when we

meet other button-like widgets.

The Old-Style Dialog Module

In older Python code, you may see dialogs occasionally coded with the standard tkinter

dialog module. This is a bit dated now, and it uses an X Windows look-and-feel; but

just in case you run across such code in your Python maintenance excursions, Exam-

ple 8-12 gives you a feel for the interface.

Example 8-12. PP4E\Gui\Tour\dlg-old.py

from tkinter import *

from tkinter.dialog import Dialog

class OldDialogDemo(Frame):

def __init__(self, master=None):

Frame.__init__(self, master)

Pack.config(self) # same as self.pack()

Button(self, text='Pop1', command=self.dialog1).pack()

Button(self, text='Pop2', command=self.dialog2).pack()

def dialog1(self):

ans = Dialog(self,

title = 'Popup Fun!',

text = 'An example of a popup-dialog '

'box, using older "Dialog.py".',

bitmap = 'questhead',

default = 0, strings = ('Yes', 'No', 'Cancel'))

if ans.num == 0: self.dialog2()

def dialog2(self):

Dialog(self, title = 'HAL-9000',

text = "I'm afraid I can't let you do that, Dave...",

bitmap = 'hourglass',

438 | Chapter 8: A tkinter Tour, Part 1

default = 0, strings = ('spam', 'SPAM'))

if __name__ == '__main__': OldDialogDemo().mainloop()

If you supply Dialog a tuple of button labels and a message, you get back the index of

the button pressed (the leftmost is index zero). Dialog windows are modal: the rest of

the application’s windows are disabled until the Dialog receives a response from the

user. When you press the Pop2 button in the main window created by this script, the

second dialog pops up, as shown in Figure 8-17.

Figure 8-17. Old-style dialog

This is running on Windows, and as you can see, it is nothing like what you would

expect on that platform for a question dialog. In fact, this dialog generates an X Win-

dows look-and-feel, regardless of the underlying platform. Because of both Dialog’s

appearance and the extra complexity required to program it, you are probably better

off using the standard dialog calls of the prior section instead.

Custom Dialogs

The dialogs we’ve seen so far have a standard appearance and interaction. They are fine

for many purposes, but often we need something a bit more custom. For example,

forms that request multiple field inputs (e.g., name, age, shoe size) aren’t directly ad-

dressed by the common dialog library. We could pop up one single-input dialog in turn

for each requested field, but that isn’t exactly user friendly.

Custom dialogs support arbitrary interfaces, but they are also the most complicated to

program. Even so, there’s not much to it—simply create a pop-up window as a

Toplevel with attached widgets, and arrange a callback handler to fetch user inputs

entered in the dialog (if any) and to destroy the window. To make such a custom dialog

modal, we also need to wait for a reply by giving the window input focus, making other

windows inactive, and waiting for an event. Example 8-13 illustrates the basics.

Example 8-13. PP4E\Gui\Tour\dlg-custom.py

import sys

from tkinter import *

makemodal = (len(sys.argv) > 1)

Dialogs | 439

def dialog():

win = Toplevel() # make a new window

Label(win, text='Hard drive reformatted!').pack() # add a few widgets

Button(win, text='OK', command=win.destroy).pack() # set destroy callback

if makemodal:

win.focus_set() # take over input focus,

win.grab_set() # disable other windows while I'm open,

win.wait_window() # and wait here until win destroyed

print('dialog exit') # else returns right away

root = Tk()

Button(root, text='popup', command=dialog).pack()

root.mainloop()

This script is set up to create a pop-up dialog window in either modal or nonmodal

mode, depending on its makemodal global variable. If it is run with no command-line

arguments, it picks nonmodal style, captured in Figure 8-18.

Figure 8-18. Nonmodal custom dialogs at work

The window in the upper right is the root window here; pressing its “popup” button

creates a new pop-up dialog window. Because dialogs are nonmodal in this mode, the

root window remains active after a dialog is popped up. In fact, nonmodal dialogs never

block other windows, so you can keep pressing the root’s button to generate as many

copies of the pop-up window as will fit on your screen. Any or all of the pop ups can

be killed by pressing their OK buttons, without killing other windows in this display.

Making custom dialogs modal

Now, when the script is run with a command-line argument (e.g.,

python dlg-custom.py 1), it makes its pop ups modal instead. Because modal dialogs

grab all of the interface’s attention, the main window becomes inactive in this mode

until the pop up is killed; you can’t even click on it to reactivate it while the dialog is

440 | Chapter 8: A tkinter Tour, Part 1

open. Because of that, you can never make more than one copy of the pop up on-screen

at once, as shown in Figure 8-19.

Figure 8-19. A modal custom dialog at work

In fact, the call to the dialog function in this script doesn’t return until the dialog

window on the left is dismissed by pressing its OK button. The net effect is that modal

dialogs impose a function call–like model on an otherwise event-driven programming

model; user inputs can be processed right away, not in a callback handler triggered at

some arbitrary point in the future.

Forcing such a linear control flow on a GUI takes a bit of extra work, though. The secret

to locking other windows and waiting for a reply boils down to three lines of code,

which are a general pattern repeated in most custom modal dialogs.

win.focus_set()

Makes the window take over the application’s input focus, as if it had been clicked

with the mouse to make it the active window. This method is also known by the

synonym focus, and it’s also common to set the focus on an input widget within

the dialog (e.g., an Entry) rather than on the entire window.

win.grab_set()

Disables all other windows in the application until this one is destroyed. The user

cannot interact with other windows in the program while a grab is set.

win.wait_window()

Pauses the caller until the win widget is destroyed, but keeps the main event-

processing loop (mainloop) active during the pause. That means that the GUI at

large remains active during the wait; its windows redraw themselves if covered and

uncovered, for example. When the window is destroyed with the destroy method,

it is erased from the screen, the application grab is automatically released, and this

method call finally returns.

Because the script waits for a window destroy event, it must also arrange for a callback

handler to destroy the window in response to interaction with widgets in the dialog

window (the only window active). This example’s dialog is simply informational, so

its OK button calls the window’s destroy method. In user-input dialogs, we might

instead install an Enter key-press callback handler that fetches data typed into an

Entry widget and then calls destroy (see later in this chapter).

Dialogs | 441

Other ways to be modal

Modal dialogs are typically implemented by waiting for a newly created pop-up win-

dow’s destroy event, as in this example. But other schemes are viable too. For example,

it’s possible to create dialog windows ahead of time, and show and hide them as needed

with the top-level window’s deiconify and withdraw methods (see the alarm scripts

near the end of Chapter 9 for details). Given that window creation speed is generally

fast enough as to appear instantaneous today, this is much less common than making

and destroying a window from scratch on each interaction.

It’s also possible to implement a modal state by waiting for a tkinter variable to change

its value, instead of waiting for a window to be destroyed. See this chapter’s later dis-

cussion of tkinter variables (which are class objects, not normal Python variables) and

the wait_variable method discussed near the end of Chapter 9 for more details. This

scheme allows a long-lived dialog box’s callback handler to signal a state change to a

waiting main program, without having to destroy the dialog box.

Finally, if you call the mainloop method recursively, the call won’t return until the widget

quit method has been invoked. The quit method terminates a mainloop call, and so

normally ends a GUI program. But it will simply exit a recursive mainloop level if one

is active. Because of this, modal dialogs can also be written without wait method calls

if you are careful. For instance, Example 8-14 works the same way as the modal mode

of dlg-custom.

Example 8-14. PP4E\Gui\Tour\dlg-recursive.py

from tkinter import *

def dialog():

win = Toplevel() # make a new window

Label(win, text='Hard drive reformatted!').pack() # add a few widgets

Button(win, text='OK', command=win.quit).pack() # set quit callback

win.protocol('WM_DELETE_WINDOW', win.quit) # quit on wm close too!

win.focus_set() # take over input focus,

win.grab_set() # disable other windows while I'm open,

win.mainloop() # and start a nested event loop to wait

win.destroy()

print('dialog exit')

root = Tk()

Button(root, text='popup', command=dialog).pack()

root.mainloop()

If you go this route, be sure to call quit rather than destroy in dialog callback handlers

(destroy doesn’t terminate the mainloop level), and be sure to use protocol to make the

window border close button call quit too (or else it won’t end the recursive mainloop

level call and may generate odd error messages when your program finally exits). Be-

cause of this extra complexity, you’re probably better off using wait_window or

wait_variable, not recursive mainloop calls.

442 | Chapter 8: A tkinter Tour, Part 1

We’ll see how to build form-like dialogs with labels and input fields later in this chapter

when we meet Entry, and again when we study the grid manager in Chapter 9. For

more custom dialog examples, see ShellGui (Chapter 10), PyMailGUI (Chapter 14),

PyCalc (Chapter 19), and the nonmodal form.py (Chapter 12). Here, we’re moving on

to learn more about events that will prove to be useful currency at later tour

destinations.

Binding Events

We met the bind widget method in the prior chapter, when we used it to catch button

presses in the tutorial. Because bind is commonly used in conjunction with other widg-

ets (e.g., to catch return key presses for input boxes), we’re going to make a stop early

in the tour here as well. Example 8-15 illustrates more bind event protocols.

Example 8-15. PP4E\Gui\Tour\bind.py

from tkinter import *

def showPosEvent(event):

print('Widget=%s X=%s Y=%s' % (event.widget, event.x, event.y))

def showAllEvent(event):

print(event)

for attr in dir(event):

if not attr.startswith('__'):

print(attr, '=>', getattr(event, attr))

def onKeyPress(event):

print('Got key press:', event.char)

def onArrowKey(event):

print('Got up arrow key press')

def onReturnKey(event):

print('Got return key press')

def onLeftClick(event):

print('Got left mouse button click:', end=' ')

showPosEvent(event)

def onRightClick(event):

print('Got right mouse button click:', end=' ')

showPosEvent(event)

def onMiddleClick(event):

print('Got middle mouse button click:', end=' ')

showPosEvent(event)

showAllEvent(event)

def onLeftDrag(event):

print('Got left mouse button drag:', end=' ')

showPosEvent(event)

Binding Events | 443

def onDoubleLeftClick(event):

print('Got double left mouse click', end=' ')

showPosEvent(event)

tkroot.quit()

tkroot = Tk()

labelfont = ('courier', 20, 'bold') # family, size, style

widget = Label(tkroot, text='Hello bind world')

widget.config(bg='red', font=labelfont) # red background, large font

widget.config(height=5, width=20) # initial size: lines,chars

widget.pack(expand=YES, fill=BOTH)

widget.bind('<Button-1>', onLeftClick) # mouse button clicks

widget.bind('<Button-3>', onRightClick)

widget.bind('<Button-2>', onMiddleClick) # middle=both on some mice

widget.bind('<Double-1>', onDoubleLeftClick) # click left twice

widget.bind('<B1-Motion>', onLeftDrag) # click left and move

widget.bind('<KeyPress>', onKeyPress) # all keyboard presses

widget.bind('<Up>', onArrowKey) # arrow button pressed

widget.bind('<Return>', onReturnKey) # return/enter key pressed

widget.focus() # or bind keypress to tkroot

tkroot.title('Click Me')

tkroot.mainloop()

Most of this file consists of callback handler functions triggered when bound events

occur. As we learned in Chapter 7, this type of callback receives an event object argu-

ment that gives details about the event that fired. Technically, this argument is an

instance of the tkinter Event class, and its details are attributes; most of the callbacks

simply trace events by displaying relevant event attributes.

When run, this script makes the window shown in Figure 8-20; it’s mostly intended

just as a surface for clicking and pressing event triggers.

Figure 8-20. A bind window for the clicking

The black-and-white medium of the book you’re holding won’t really do justice to this

script. When run live, it uses the configuration options shown earlier to make the

444 | Chapter 8: A tkinter Tour, Part 1

window show up as black on red, with a large Courier font. You’ll have to take my

word for it (or run this on your own).

But the main point of this example is to demonstrate other kinds of event binding

protocols at work. We saw a script that intercepted left and double-left mouse clicks

with the widget bind method in Chapter 7, using event names <Button-1> and

<Double-1>; the script here demonstrates other kinds of events that are commonly

caught with bind:

To catch the press of a single key on the keyboard, register a handler for the

<KeyPress> event identifier; this is a lower-level way to input data in GUI programs

than the Entry widget covered in the next section. The key pressed is returned in

ASCII string form in the event object passed to the callback handler (event.char).

Other attributes in the event structure identify the key pressed in lower-level detail.

Key presses can be intercepted by the top-level root window widget or by a widget

that has been assigned keyboard focus with the focus method used by this script.

<B1-Motion>

This script also catches mouse motion while a button is held down: the registered

<B1-Motion> event handler is called every time the mouse is moved while the left

button is pressed and receives the current X/Y coordinates of the mouse pointer in

its event argument (event.x, event.y). Such information can be used to implement

object moves, drag-and-drop, pixel-level painting, and so on (e.g., see the PyDraw

examples in Chapter 11).

<Button-3>, <Button-2>

This script also catches right and middle mouse button clicks (known as buttons

3 and 2). To make the middle button 2 click work on a two-button mouse, try

clicking both buttons at the same time; if that doesn’t work, check your mouse

setting in your properties interface (the Control Panel on Windows).

To catch more specific kinds of key presses, this script registers for the Return/

Enter and up-arrow key press events; these events would otherwise be routed to

the general <KeyPress> handler and require event analysis.

Here is what shows up in the stdout output stream after a left click, right click, left click

and drag, a few key presses, a Return and up-arrow press, and a final double-left click

to exit. When you press the left mouse button and drag it around on the display, you’ll

get lots of drag event messages; one is printed for every move during the drag (and one

Python callback is run for each):

C:\...\PP4E\Gui\Tour> python bind.py

Got left mouse button click: Widget=.25763696 X=376 Y=53

Got right mouse button click: Widget=.25763696 X=36 Y=60

Got left mouse button click: Widget=.25763696 X=144 Y=43

Got left mouse button drag: Widget=.25763696 X=144 Y=45

Got left mouse button drag: Widget=.25763696 X=144 Y=47

Binding Events | 445

Got left mouse button drag: Widget=.25763696 X=145 Y=50

Got left mouse button drag: Widget=.25763696 X=146 Y=51

Got left mouse button drag: Widget=.25763696 X=149 Y=53

Got key press: s

Got key press: p

Got key press: a

Got key press: m

Got key press: 1

Got key press: -

Got key press: 2

Got key press: .

Got return key press

Got up arrow key press

Got left mouse button click: Widget=.25763696 X=300 Y=68

Got double left mouse click Widget=.25763696 X=300 Y=68

For mouse-related events, callbacks print the X and Y coordinates of the mouse pointer,

in the event object passed in. Coordinates are usually measured in pixels from the

upper-left corner (0,0), but are relative to the widget being clicked. Here’s what is

printed for a left, middle, and double-left click. Notice that the middle-click callback

dumps the entire argument—all of the Event object’s attributes (less internal names

that begin with “__” which includes the __doc__ string, and default operator overload-

ing methods inherited from the implied object superclass in Python 3.X). Different

event types set different event attributes; most key presses put something in char, for

instance:

C:\...\PP4E\Gui\Tour> python bind.py

Got left mouse button click: Widget=.25632624 X=6 Y=6

Got middle mouse button click: Widget=.25632624 X=212 Y=95

<tkinter.Event object at 0x018CA210>

char => ??

delta => 0

height => ??

keycode => ??

keysym => ??

keysym_num => ??

num => 2

send_event => False

serial => 17

state => 0

time => 549707945

type => 4

widget => .25632624

width => ??

x => 212

x_root => 311

y => 95

y_root => 221

Got left mouse button click: Widget=.25632624 X=400 Y=183

Got double left mouse click Widget=.25632624 X=400 Y=183

446 | Chapter 8: A tkinter Tour, Part 1

Other bind Events

Besides those illustrated in this example, a tkinter script can register to catch additional

kinds of bindable events. For example:

•<ButtonRelease> fires when a button is released (<ButtonPress> is run when the

button first goes down).

•<Motion> is triggered when a mouse pointer is moved.

•<Enter> and <Leave> handlers intercept mouse entry and exit in a window’s display

area (useful for automatically highlighting a widget).

•<Configure> is invoked when the window is resized, repositioned, and so on (e.g.,

the event object’s width and height give the new window size). We’ll make use of

this to resize the display on window resizes in the PyClock example of Chapter 11.

•<Destroy> is invoked when the window widget is destroyed (and differs from the

protocol mechanism for window manager close button presses). Since this inter-

acts with widget quit and destroy methods, I’ll say more about the event later in

this section.

•<FocusIn> and <FocusOut> are run as the widget gains and loses focus.

•<Map> and <Unmap> are run when a window is opened and iconified.

•<Escape>, <BackSpace>, and <Tab> catch other special key presses.

•<Down>, <Left>, and <Right> catch other arrow key presses.

This is not a complete list, and event names can be written with a somewhat sophisti-

cated syntax of their own. For instance:

•Modifiers can be added to event identifiers to make them even more specific; for

instance, <B1-Motion> means moving the mouse with the left button pressed, and

<KeyPress-a> refers to pressing the “a” key only.

•Synonyms can be used for some common event names; for instance, <Button

Press-1>, <Button-1>, and <1> mean a left mouse button press, and <KeyPress-a>

and <Key-a> mean the “a” key. All forms are case sensitive: use <Key-Escape>, not

<KEY-ESCAPE>.

•Virtual event identifiers can be defined within double bracket pairs (e.g., <<Paste

Text>>) to refer to a selection of one or more event sequences.

In the interest of space, though, we’ll defer to other Tk and tkinter reference sources

for an exhaustive list of details on this front. Alternatively, changing some of the settings

in the example script and rerunning can help clarify some event behavior, too; this is

Python, after all.

More on <Destroy> events and the quit and destroy methods

Before we move on, one event merits a few extra words: the <Destroy> event (whose

name is case significant) is run when a widget is being destroyed, as a result of both

Binding Events | 447

script method calls and window closures in general, including those at program exit.

If you bind this on a window, it will be triggered once for each widget in the window;

the callback’s event argument widget attribute gives the widget being destroyed, and

you can check this to detect a particular widget’s destruction. If you bind this on a

specific widget instead, it will be triggered once for that widget’s destruction only.

It’s important to know that a widget is in a “half dead” state (Tk’s terminology) when

this event is triggered—it still exists, but most operations on it fail. Because of that, the

<Destroy> event is not intended for GUI activity in general; for instance, checking a text

widget’s changed state or fetching its content in a <Destroy> handler can both fail with

exceptions. In addition, this event’s handler cannot cancel the destruction in general

and resume the GUI; if you wish to intercept and verify or suppress window closes

when a user clicks on a window’s X button, use WM_DELETE_WINDOW in top-level windows’

protocol methods as described earlier in this chapter.

You should also know that running a tkinter widget’s quit method does not trigger any

<Destroy> events on exit, and even leads to a fatal Python error on program exit in 3.X

if any <Destroy> event handlers are registered. Because of this, programs that bind this

event for non-GUI window exit actions should usually call destroy instead of quit to

close, and rely on the fact that a program exits when the last remaining or only Tk root

window (default or explicit) is destroyed as described earlier. This precludes using

quit for immediate shutdowns, though you can still run sys.exit for brute-force exits.

A script can also perform program exit actions in code run after the mainloop call re-

turns, but the GUI is gone completely at this point, and this code is not associated with

any particular widget. Watch for more on this event when we study the PyEdit example

program in Chapter 11; at the risk of spoiling the end of this story, we’ll find it unusable

for verifying changed text saves.

Message and Entry

The Message and Entry widgets allow for display and input of simple text. Both are

essentially functional subsets of the Text widget we’ll meet later; Text can do everything

Message and Entry can, but not vice versa.

Message

The Message widget is simply a place to display text. Although the standard showinfo

dialog we met earlier is perhaps a better way to display pop-up messages, Message splits

up long strings automatically and flexibly and can be embedded inside container widg-

ets any time you need to add some read-only text to a display. Moreover, this widget

sports more than a dozen configuration options that let you customize its appearance.

Example 8-16 and Figure 8-21 illustrate Message basics, and demonstrates how

Message reacts to horizontal stretching with fill and expand; see Chapter 7 for more

on resizing and Tk or tkinter references for other options Message supports.

448 | Chapter 8: A tkinter Tour, Part 1

Example 8-16. PP4E\Gui\tour\message.py

from tkinter import *

msg = Message(text="Oh by the way, which one's Pink?")

msg.config(bg='pink', font=('times', 16, 'italic'))

msg.pack(fill=X, expand=YES)

mainloop()

Figure 8-21. A Message widget at work

Entry

The Entry widget is a simple, single-line text input field. It is typically used for input

fields in form-like dialogs and anywhere else you need the user to type a value into a

field of a larger display. Entry also supports advanced concepts such as scrolling, key

bindings for editing, and text selections, but it’s simple to use in practice. Exam-

ple 8-17 builds the input window shown in Figure 8-22.

Example 8-17. PP4E\Gui\tour\entry1.py

from tkinter import *

from quitter import Quitter

def fetch():

print('Input => "%s"' % ent.get()) # get text

root = Tk()

ent = Entry(root)

ent.insert(0, 'Type words here') # set text

ent.pack(side=TOP, fill=X) # grow horiz

ent.focus() # save a click

ent.bind('<Return>', (lambda event: fetch())) # on enter key

btn = Button(root, text='Fetch', command=fetch) # and on button

btn.pack(side=LEFT)

Quitter(root).pack(side=RIGHT)

root.mainloop()

Message and Entry | 449

Figure 8-22. entry1 caught in the act

On startup, the entry1 script fills the input field in this GUI with the text “Type words

here” by calling the widget’s insert method. Because both the Fetch button and the

Enter key are set to trigger the script’s fetch callback function, either user event gets

and displays the current text in the input field, using the widget’s get method:

C:\...\PP4E\Gui\Tour> python entry1.py

Input => "Type words here"

Input => "Have a cigar"

We met the <Return> event earlier when we studied bind; unlike button presses, these

lower-level callbacks get an event argument, so the script uses a lambda wrapper to

ignore it. This script also packs the entry field with fill=X to make it expand horizon-

tally with the window (try it out), and it calls the widget focus method to give the entry

field input focus when the window first appears. Manually setting the focus like this

saves the user from having to click the input field before typing. Our smart Quit button

we wrote earlier is attached here again as well (it verifies exit).

Programming Entry widgets

Generally speaking, the values typed into and displayed by Entry widgets are set and

fetched with either tied “variable” objects (described later in this chapter) or Entry

widget method calls such as this one:

ent.insert(0, 'some text') # set value

value = ent.get() # fetch value (a string)

The first parameter to the insert method gives the position where the text is to be

inserted. Here, “0” means the front because offsets start at zero, and integer 0 and string

'0' mean the same thing (tkinter method arguments are always converted to strings if

needed). If the Entry widget might already contain text, you also generally need to delete

its contents before setting it to a new value, or else new text will simply be added to

the text already present:

ent.delete(0, END) # first, delete from start to end

ent.insert(0, 'some text') # then set value

The name END here is a preassigned tkinter constant denoting the end of the widget;

we’ll revisit it in Chapter 9 when we meet the full-blown and multiple-line Text widget

(Entry’s more powerful cousin). Since the widget is empty after the deletion, this state-

ment sequence is equivalent to the prior one:

450 | Chapter 8: A tkinter Tour, Part 1

ent.delete('0', END) # delete from start to end

ent.insert(END, 'some text') # add at end of empty text

Either way, if you don’t delete the text first, new text that is inserted is simply added.

If you want to see how, try changing the fetch function in Example 8-17 to look like

this—an “x” is added at the beginning and end of the input field on each button or key

press:

def fetch():

print('Input => "%s"' % ent.get()) # get text

ent.insert(END, 'x') # to clear: ent.delete('0', END)

ent.insert(0, 'x') # new text simply added

In later examples, we’ll also see the Entry widget’s state='disabled' option, which

makes it read only, as well as its show='*' option, which makes it display each character

as a * (useful for password-type inputs). Try this out on your own by changing and

running this script for a quick look. Entry supports other options we’ll skip here, too;

see later examples and other resources for additional details.

Laying Out Input Forms

As mentioned, Entry widgets are often used to get field values in form-like displays.

We’re going to create such displays often in this book, but to show you how this works

in simpler terms, Example 8-18 combines labels, entries, and frames to achieve the

multiple-input display captured in Figure 8-23.

Example 8-18. PP4E\Gui\Tour\entry2.py

"""

use Entry widgets directly

lay out by rows with fixed-width labels: this and grid are best for forms

"""

from tkinter import *

from quitter import Quitter

fields = 'Name', 'Job', 'Pay'

def fetch(entries):

for entry in entries:

print('Input => "%s"' % entry.get()) # get text

def makeform(root, fields):

entries = []

for field in fields:

row = Frame(root) # make a new row

lab = Label(row, width=5, text=field) # add label, entry

ent = Entry(row)

row.pack(side=TOP, fill=X) # pack row on top

lab.pack(side=LEFT)

ent.pack(side=RIGHT, expand=YES, fill=X) # grow horizontal

entries.append(ent)

return entries

Message and Entry | 451

if __name__ == '__main__':

root = Tk()

ents = makeform(root, fields)

root.bind('<Return>', (lambda event: fetch(ents)))

Button(root, text='Fetch',

command= (lambda: fetch(ents))).pack(side=LEFT)

Quitter(root).pack(side=RIGHT)

root.mainloop()

Figure 8-23. entry2 (and entry3) form displays

The input fields here are just simple Entry widgets. The script builds an explicit list of

these widgets to be used to fetch their values later. Every time you press this window’s

Fetch button, it grabs the current values in all the input fields and prints them to the

standard output stream:

C:\...\PP4E\Gui\Tour> python entry2.py

Input => "Bob"

Input => "Technical Writer"

Input => "Jack"

You get the same field dump if you press the Enter key anytime this window has the

focus on your screen; this event has been bound to the whole root window this time,

not to a single input field.

Most of the art in form layout has to do with arranging widgets in a hierarchy. This

script builds each label/entry row as a new Frame attached to the window’s current

TOP; fixed-width labels are attached to the LEFT of their row, and entries to the RIGHT.

Because each row is a distinct Frame, its contents are insulated from other packing going

on in this window. The script also arranges for just the entry fields to grow vertically

on a resize, as in Figure 8-24.

Going modal again

Later on this tour, we’ll see how to make similar form layouts with the grid geometry

manager, where we arrange by row and column numbers instead of frames. But now

that we have a handle on form layout, let’s see how to apply the modal dialog techniques

we met earlier to a more complex input display.

452 | Chapter 8: A tkinter Tour, Part 1

Example 8-19 uses the prior example’s makeform and fetch functions to generate a form

and prints its contents, much as before. Here, though, the input fields are attached to

a new Toplevel pop-up window created on demand, and an OK button is added to the

new window to trigger a window destroy event that erases the pop up. As we learned

earlier, the wait_window call pauses until the destroy happens.

Example 8-19. PP4E\Gui\Tour\entry2-modal.py

# make form dialog modal; must fetch before destroy with entries

from tkinter import *

from entry2 import makeform, fetch, fields

def show(entries, popup):

fetch(entries) # must fetch before window destroyed!

popup.destroy() # fails with msgs if stmt order is reversed

def ask():

popup = Toplevel() # show form in modal dialog window

ents = makeform(popup, fields)

Button(popup, text='OK', command=(lambda: show(ents, popup))).pack()

popup.grab_set()

popup.focus_set()

popup.wait_window() # wait for destroy here

root = Tk()

Button(root, text='Dialog', command=ask).pack()

root.mainloop()

When you run this code, pressing the button in this program’s main window creates

the blocking form input dialog in Figure 8-25, as expected.

But a subtle danger is lurking in this modal dialog code: because it fetches user inputs

from Entry widgets embedded in the popped-up display, it must fetch those inputs

before destroying the pop-up window in the OK press callback handler. It turns out

that a destroy call really does destroy all the child widgets of the window destroyed;

trying to fetch values from a destroyed Entry not only doesn’t work, but also generates

a traceback with error messages in the console window. Try reversing the statement

order in the show function to see for yourself.

Figure 8-24. entry2 (and entry3) expansion at work

Message and Entry | 453

To avoid this problem, we can either be careful to fetch before destroying, or use tkinter

variables, the subject of the next section.

tkinter “Variables” and Form Layout Alternatives

Entry widgets (among others) support the notion of an associated variable—changing

the associated variable changes the text displayed in the Entry, and changing the text

in the Entry changes the value of the variable. These aren’t normal Python variable

names, though. Variables tied to widgets are instances of variable classes in the tkinter

module library. These classes are named StringVar, IntVar, DoubleVar, and Boolean

Var; you pick one based on the context in which it is to be used. For example, a String

Var class instance can be associated with an Entry field, as demonstrated in

Example 8-20.

Example 8-20. PP4E\Gui\Tour\entry3.py

"""

use StringVar variables

lay out by columns: this might not align horizontally everywhere (see entry2)

"""

from tkinter import *

from quitter import Quitter

fields = 'Name', 'Job', 'Pay'

def fetch(variables):

for variable in variables:

print('Input => "%s"' % variable.get()) # get from var

Figure 8-25. entry2-modal (and entry3-modal) displays

454 | Chapter 8: A tkinter Tour, Part 1

def makeform(root, fields):

form = Frame(root) # make outer frame

left = Frame(form) # make two columns

rite = Frame(form)

form.pack(fill=X)

left.pack(side=LEFT)

rite.pack(side=RIGHT, expand=YES, fill=X) # grow horizontal

variables = []

for field in fields:

lab = Label(left, width=5, text=field) # add to columns

ent = Entry(rite)

lab.pack(side=TOP)

ent.pack(side=TOP, fill=X) # grow horizontal

var = StringVar()

ent.config(textvariable=var) # link field to var

var.set('enter here')

variables.append(var)

return variables

if __name__ == '__main__':

root = Tk()

vars = makeform(root, fields)

Button(root, text='Fetch', command=(lambda: fetch(vars))).pack(side=LEFT)

Quitter(root).pack(side=RIGHT)

root.bind('<Return>', (lambda event: fetch(vars)))

root.mainloop()

Except for the fact that this script initializes input fields with the string 'enter here',

it makes a window virtually identical in appearance and function to that created by the

script entry2 (see Figures 8-23 and 8-24). For illustration purposes, the window is laid

out differently—as a Frame containing two nested subframes used to build the left and

right columns of the form area—but the end result is the same when it is displayed on

screen (for some GUIs on some platforms, at least: see the note at the end of this section

for a discussion of why layout by rows instead of columns is generally preferred).

The main thing to notice here, though, is the use of StringVar variables. Instead of using

a list of Entry widgets to fetch input values, this version keeps a list of StringVar objects

that have been associated with the Entry widgets, like this:

ent = Entry(rite)

var = StringVar()

ent.config(textvariable=var) # link field to var

Once you’ve tied variables in this way, changing and fetching the variable’s value:

var.set('text here')

value = var.get()

Message and Entry | 455

will really change and fetch the corresponding display’s input field value.* The variable

object get method returns as a string for StringVar, an integer for IntVar, and a floating-

point number for DoubleVar.

Of course, we’ve already seen that it’s easy to set and fetch text in Entry fields directly,

without adding extra code to use variables. So, why the bother about variable objects?

For one thing, it clears up that nasty fetch-after-destroy peril we met in the prior section.

Because StringVars live on after the Entry widgets they are tied to have been destroyed,

it’s OK to fetch input values from them long after a modal dialog has been dismissed,

as shown in Example 8-21.

Example 8-21. PP4E\Gui\Tour\entry3-modal.py

# can fetch values after destroy with stringvars

from tkinter import *

from entry3 import makeform, fetch, fields

def show(variables, popup):

popup.destroy() # order doesn't matter here

fetch(variables) # variables live on after window destroyed

def ask():

popup = Toplevel() # show form in modal dialog window

vars = makeform(popup, fields)

Button(popup, text='OK', command=(lambda: show(vars, popup))).pack()

popup.grab_set()

popup.focus_set()

popup.wait_window() # wait for destroy here

root = Tk()

Button(root, text='Dialog', command=ask).pack()

root.mainloop()

This version is the same as the original (shown in Example 8-19 and Figure 8-25), but

show now destroys the pop up before inputs are fetched through StringVars in the list

created by makeform. In other words, variables are a bit more robust in some contexts

because they are not part of a real display tree. For example, they are also commonly

associated with check buttons, radio boxes, and scales in order to provide access to

current settings and link multiple widgets together. Almost coincidentally, that’s the

topic of the next section.

* Historic anecdote: In a now-defunct tkinter release shipped with Python 1.3, you could also set and fetch

variable values by calling them like functions, with and without an argument (e.g., var(value) and var()).

Today, you call variable set and get methods instead. For unknown reasons, the function call form stopped

working years ago, but you may still see it in older Python code (and in first editions of at least one O’Reilly

Python book). If a fix made in the name of aesthetics breaks working code, is it really a fix?

456 | Chapter 8: A tkinter Tour, Part 1

We laid out input forms two ways in this section: by row frames with

fixed-width labels (entry2), and by column frames (entry3). In Chap-

ter 9 we’ll see a third form technique: layouts using the grid geometry

manager. Of these, gridding, and the rows with fixed-width labels of

entry2 tend to work best across all platforms.

Laying out by column frames as in entry3 works only on platforms

where the height of each label exactly matches the height of each entry

field. Because the two are not associated directly, they might not line up

properly on some platforms. When I tried running some forms that

looked fine on Windows XP on a Linux machine, labels and their cor-

responding entries did not line up horizontally.

Even the simple window produced by entry3 looks slightly askew on

closer inspection. It only appears the same as entry2 on some platforms

because of the small number of inputs and size defaults. On my Win-

dows 7 netbook, the labels and entries start to become horizontally

mismatched if you add 3 or 4 additional inputs to entry3’s fields tuple.

If you care about portability, lay out your forms either with the packed

row frames and fixed/maximum-width labels of entry2, or by gridding

widgets by row and column numbers instead of packing them. We’ll see

more on such forms in the next chapter. And in Chapter 12, we’ll write

a form-construction tool that hides the layout details from its clients

altogether (including its use case client in Chapter 13).

Checkbutton, Radiobutton, and Scale

This section introduces three widget types: the Checkbutton (a multiple-choice input

widget), the Radiobutton (a single-choice device), and the Scale (sometimes known as

a “slider”). All are variations on a theme and are somewhat related to simple buttons,

so we’ll explore them as a group here. To make these widgets more fun to play with,

we’ll reuse the dialogTable module shown in Example 8-8 to provide callbacks for

widget selections (callbacks pop up dialog boxes). Along the way, we’ll also use the

tkinter variables we just met to communicate with these widgets’ state settings.

Checkbuttons

The Checkbutton and Radiobutton widgets are designed to be associated with tkinter

variables: clicking the button changes the value of the variable, and setting the variable

changes the state of the button to which it is linked. In fact, tkinter variables are central

to the operation of these widgets:

• A collection of Checkbuttons implements a multiple-choice interface by assigning

each button a variable of its own.

• A collection of Radiobuttons imposes a mutually exclusive single-choice model by

giving each button a unique value and the same tkinter variable.

Checkbutton, Radiobutton, and Scale | 457

Both kinds of buttons provide both command a n d variable o p t i o n s . T h e command option

lets you register a callback to be run immediately on button-press events, much like

normal Button widgets. But by associating a tkinter variable with the variable option,

you can also fetch or change widget state at any time by fetching or changing the value

of the widget’s associated variable.

Since it’s a bit simpler, let’s start with the tkinter Checkbutton. Example 8-22 creates

the set of five captured in Figure 8-26. To make this more useful, it also adds a button

that dumps the current state of all Checkbuttons and attaches an instance of the verifying

Quitter button we built earlier in the tour.

Figure 8-26. demoCheck in action

Example 8-22. PP4E\Gui\Tour\demoCheck.py

"create a bar of check buttons that run dialog demos"

from tkinter import * # get base widget set

from dialogTable import demos # get canned dialogs

from quitter import Quitter # attach a quitter object to "me"

class Demo(Frame):

def __init__(self, parent=None, **options):

Frame.__init__(self, parent, **options)

self.pack()

self.tools()

Label(self, text="Check demos").pack()

self.vars = []

for key in demos:

var = IntVar()

Checkbutton(self,

text=key,

variable=var,

command=demos[key]).pack(side=LEFT)

self.vars.append(var)

def report(self):

for var in self.vars:

print(var.get(), end=' ') # current toggle settings: 1 or 0

print()

def tools(self):

frm = Frame(self)

frm.pack(side=RIGHT)

458 | Chapter 8: A tkinter Tour, Part 1

Button(frm, text='State', command=self.report).pack(fill=X)

Quitter(frm).pack(fill=X)

if __name__ == '__main__': Demo().mainloop()

In terms of program code, check buttons resemble normal buttons; they are even

packed within a container widget. Operationally, though, they are a bit different. As

you can probably tell from this figure (and can better tell by running this live), a check

button works as a toggle—pressing one changes its state from off to on (from deselected

to selected); or from on to off again. When a check button is selected, it has a checked

display, and its associated IntVar variable has a value of 1; when deselected, its display

is empty and its IntVar has a value of 0.

To simulate an enclosing application, the State button in this display triggers the script’s

report method to display the current values of all five toggles on the stdout stream.

Here is the output after a few clicks:

C:\...\PP4E\Gui\Tour> python demoCheck.py

0 0 0 0 0

1 0 0 0 0

1 0 1 0 0

1 0 1 1 0

1 0 0 1 0

1 0 0 1 1

Really, these are the values of the five tkinter variables associated with the

Checkbuttons with variable options, but they give the buttons’ values when queried.

This script associates IntVar variables with each Checkbutton in this display, since they

are 0 or 1 binary indicators. StringVars will work here, too, although their get methods

would return strings '0' or '1' (not integers) and their initial state would be an empty

string (not the integer 0).

This widget’s command option lets you register a callback to be run each time the button

is pressed. To illustrate, this script registers a standard dialog demo call as a handler

for each of the Checkbuttons—pressing a button changes the toggle’s state but also pops

up one of the dialog windows we visited earlier in this tour (regardless of its new state).

Interestingly, you can sometimes run the report method interactively, too—when

working as follows in a shell window, widgets pop up as lines are typed and are fully

active, even without calling mainloop (though this may not work in some interfaces like

IDLE if you must call mainloop to display your GUI):

C:\...\PP4E\Gui\Tour> python

>>> from demoCheck import Demo

>>> d = Demo()

>>> d.report()

0 0 0 0 0

>>> d.report()

1 0 0 0 0

>>> d.report()

1 0 0 1 1

Checkbutton, Radiobutton, and Scale | 459

Check buttons and variables

When I first studied check buttons, my initial reaction was: why do we need tkinter

variables here at all when we can register button-press callbacks? Linked variables may

seem superfluous at first glance, but they simplify some GUI chores. Instead of asking

you to accept this blindly, though, let me explain why.

Keep in mind that a Checkbutton’s command callback will be run on every press, whether

the press toggles the check button to a selected or a deselected state. Because of that,

if you want to run an action immediately when a check button is pressed, you will

generally want to check the button’s current value in the callback handler. Because

there is no check button “get” method for fetching values, you usually need to inter-

rogate an associated variable to see if the button is on or off.

Moreover, some GUIs simply let users set check buttons without running command call-

backs at all and fetch button settings at some later point in the program. In such a

scenario, variables serve to automatically keep track of button settings. The demo

Check script’s report method represents this latter approach.

Of course, you could manually keep track of each button’s state in press callback

handlers, too. Example 8-23 keeps its own list of state toggles and updates it manually

on command press callbacks.

Example 8-23. PP4E\Gui\Tour\demo-check-manual.py

# check buttons, the hard way (without variables)

from tkinter import *

states = [] # change object not name

def onPress(i): # keep track of states

states[i] = not states[i] # changes False->True, True->False

root = Tk()

for i in range(10):

chk = Checkbutton(root, text=str(i), command=(lambda i=i: onPress(i)) )

chk.pack(side=LEFT)

states.append(False)

root.mainloop()

print(states) # show all states on exit

The lambda here passes along the pressed button’s index in the states list. Otherwise,

we would need a separate callback function for each button. Here again, we need to

use a default argument to pass the loop variable into the lambda, or the loop variable

will be its value on the last loop iteration for all 10 of the generated functions (each

press would update the tenth item in the list; see Chapter 7 for background details on

this). When run, this script makes the 10–check button display in Figure 8-27.

460 | Chapter 8: A tkinter Tour, Part 1

Figure 8-27. Manual check button state window

Manually maintained state toggles are updated on every button press and are printed

when the GUI exits (technically, when the mainloop call returns); it’s a list of Boolean

state values, which could also be integers 1 or 0 if we cared to exactly imitate the

original:

C:\...\PP4E\Gui\Tour> python demo-check-manual.py

[False, False, True, False, True, False, False, False, True, False]

This works, and it isn’t too horribly difficult to manage manually. But linked tkinter

variables make this task noticeably easier, especially if you don’t need to process check

button states until some time in the future. This is illustrated in Example 8-24.

Example 8-24. PP4E\Gui\Tour\demo-check-auto.py

# check buttons, the easy way

from tkinter import *

root = Tk()

states = []

for i in range(10):

var = IntVar()

chk = Checkbutton(root, text=str(i), variable=var)

chk.pack(side=LEFT)

states.append(var)

root.mainloop() # let tkinter keep track

print([var.get() for var in states]) # show all states on exit (or map/lambda)

This looks and works the same way, but there is no command button-press callback

handler at all, because toggle state is tracked by tkinter automatically:

C:\...\PP4E\Gui\Tour> python demo-check-auto.py

[0, 0, 1, 1, 0, 0, 1, 0, 0, 1]

The point here is that you don’t necessarily have to link variables with check buttons,

but your GUI life will be simpler if you do. The list comprehension at the very end of

this script, by the way, is equivalent to the following unbound method and lambda/

bound-method map call forms:

print(list(map(IntVar.get, states)))

print(list(map(lambda var: var.get(), states)))

Though comprehensions are common in Python today, the form that seems clearest to

you may very well depend upon your shoe size…

Checkbutton, Radiobutton, and Scale | 461

Radio Buttons

Radio buttons are toggles too, but they are generally used in groups: just like the me-

chanical station selector pushbuttons on radios of times gone by, pressing one Radio

button widget in a group automatically deselects the one pressed last. In other words,

at most, only one can be selected at one time. In tkinter, associating all radio buttons

in a group with unique values and the same variable guarantees that, at most, only one

can ever be selected at a given time.

Like check buttons and normal buttons, radio buttons support a command option for

registering a callback to handle presses immediately. Like check buttons, radio buttons

also have a variable attribute for associating single-selection buttons in a group and

fetching the current selection at arbitrary times.

In addition, radio buttons have a value attribute that lets you tell tkinter what value

the button’s associated variable should have when the button is selected. Because more

than one radio button is associated with the same variable, you need to be explicit about

each button’s value (it’s not just a 1 or 0 toggle scenario). Example 8-25 demonstrates

radio button basics.

Example 8-25. PP4E\Gui\Tour\demoRadio.py

"create a group of radio buttons that launch dialog demos"

from tkinter import * # get base widget set

from dialogTable import demos # button callback handlers

from quitter import Quitter # attach a quit object to "me"

class Demo(Frame):

def __init__(self, parent=None, **options):

Frame.__init__(self, parent, **options)

self.pack()

Label(self, text="Radio demos").pack(side=TOP)

self.var = StringVar()

for key in demos:

Radiobutton(self, text=key,

command=self.onPress,

variable=self.var,

value=key).pack(anchor=NW)

self.var.set(key) # select last to start

Button(self, text='State', command=self.report).pack(fill=X)

Quitter(self).pack(fill=X)

def onPress(self):

pick = self.var.get()

print('you pressed', pick)

print('result:', demos[pick]())

def report(self):

print(self.var.get())

if __name__ == '__main__': Demo().mainloop()

462 | Chapter 8: A tkinter Tour, Part 1

Figure 8-28 shows what this script generates when run. Pressing any of this window’s

radio buttons triggers its command handler, pops up one of the standard dialog boxes we

met earlier, and automatically deselects the button previously pressed. Like check but-

tons, radio buttons are packed; this script packs them to the top to arrange them ver-

tically, and then anchors each on the northwest corner of its allocated space so that

they align well.

Figure 8-28. demoRadio in action

Like the check button demo script, this one also puts up a State button to run the class’s

report method and to show the current radio state (the button selected). Unlike the

check button demo, this script also prints the return values of dialog demo calls that

are run as its buttons are pressed. Here is what the stdout stream looks like after a few

presses and state dumps; states are shown in bold:

C:\...\PP4E\Gui\Tour> python demoRadio.py

you pressed Input

result: 3.14

Input

you pressed Open

result: C:/PP4thEd/Examples/PP4E/Gui/Tour/demoRadio.py

Open

you pressed Query

result: yes

Query

Radio buttons and variables

So, why variables here? For one thing, radio buttons also have no “get” widget method

to fetch the selection in the future. More importantly, in radio button groups, the

value and variable settings turn out to be the whole basis of single-choice behavior. In

Checkbutton, Radiobutton, and Scale | 463

fact, to make radio buttons work normally at all, it’s crucial that they are all associated

with the same tkinter variable and have distinct value settings. To truly understand

why, though, you need to know a bit more about how radio buttons and variables do

their stuff.

We’ve already seen that changing a widget changes its associated tkinter variable, and

vice versa. But it’s also true that changing a variable in any way automatically changes

every widget it is associated with. In the world of radio buttons, pressing a button sets

a shared variable, which in turn impacts other buttons associated with that variable.

Assuming that all radio buttons have distinct values, this works as you expect it to

work. When a button press changes the shared variable to the pressed button’s value,

all other buttons are deselected, simply because the variable has been changed to a

value not their own.

This is true both when the user selects a button and changes the shared variable’s value

implicitly, but also when the variable’s value is set manually by a script. For instance,

when Example 8-25 sets the shared variable to the last of the demo’s names initially

(with self.var.set), it selects that demo’s button and deselects all the others in the

process; this way, only one is selected at first. If the variable was instead set to a string

that is not any demo’s name (e.g., ' '), all buttons would be deselected at startup.

This ripple effect is a bit subtle, but it might help to know that within a group of radio

buttons sharing the same variable, if you assign a set of buttons the same value, the

entire set will be selected if any one of them is pressed. Consider Example 8-26, which

creates Figure 8-29, for instance. All buttons start out deselected this time (by initial-

izing the shared variable to none of their values), but because radio buttons 0, 3, 6, and

9 have value 0 (the remainder of division by 3), all are selected if any are selected.

Figure 8-29. Radio buttons gone bad?

Example 8-26. PP4E\Gui\Tour\demo-radio-multi.py

# see what happens when some buttons have same value

from tkinter import *

root = Tk()

var = StringVar()

for i in range(10):

rad = Radiobutton(root, text=str(i), variable=var, value=str(i % 3))

rad.pack(side=LEFT)

var.set(' ') # deselect all initially

root.mainloop()

464 | Chapter 8: A tkinter Tour, Part 1

If you press 1, 4, or 7 now, all three of these are selected, and any existing selections

are cleared (they don’t have the value “1”). That’s not normally what you want—radio

buttons are usually a single-choice group (check buttons handle multiple-choice in-

puts). If you want them to work as expected, be sure to give each radio button the same

variable but a unique value across the entire group. In the demoRadio script, for instance,

the name of the demo provides a naturally unique value for each button.

Radio buttons without variables

Strictly speaking, we could get by without tkinter variables here, too. Example 8-27,

for instance, implements a single-selection model without variables, by manually se-

lecting and deselecting widgets in the group, in a callback handler of its own. On each

press event, it issues deselect calls for every widget object in the group and select for

the one pressed.

Example 8-27. PP4E\Gui\Tour\demo-radio-manual.py

"""

radio buttons, the hard way (without variables)

note that deselect for radio buttons simply sets the button's

associated value to a null string, so we either need to still

give buttons unique values, or use checkbuttons here instead;

"""

from tkinter import *

state = ''

buttons = []

def onPress(i):

global state

state = i

for btn in buttons:

btn.deselect()

buttons[i].select()

root = Tk()

for i in range(10):

rad = Radiobutton(root, text=str(i),

value=str(i), command=(lambda i=i: onPress(i)) )

rad.pack(side=LEFT)

buttons.append(rad)

onPress(0) # select first initially

root.mainloop()

print(state) # show state on exit

This works. It creates a 10-radio button window that looks just like the one in Fig-

ure 8-29 but implements a single-choice radio-style interface, with current state avail-

able in a global Python variable printed on script exit. By associating tkinter variables

and unique values, though, you can let tkinter do all this work for you, as shown in

Example 8-28.

Checkbutton, Radiobutton, and Scale | 465

Example 8-28. PP4E\Gui\Tour\demo-radio-auto.py

# radio buttons, the easy way

from tkinter import *

root = Tk() # IntVars work too

var = IntVar(0) # select 0 to start

for i in range(10):

rad = Radiobutton(root, text=str(i), value=i, variable=var)

rad.pack(side=LEFT)

root.mainloop()

print(var.get()) # show state on exit

This works the same way, but it is a lot less to type and debug. Notice that this script

associates the buttons with an IntVar, the integer type sibling of StringVar, and initi-

alizes it to zero (which is also its default); as long as button values are unique, integers

work fine for radio buttons too.

Hold onto your variables!

One minor word of caution: you should generally hold onto the tkinter variable object

used to link radio buttons for as long as the radio buttons are displayed. Assign it to a

module global variable, store it in a long-lived data structure, or save it as an attribute

of a long-lived class instance object as done by demoRadio. Just make sure you retain a

reference to it somehow. You normally will in order to fetch its state anyhow, so it’s

unlikely that you’ll ever care about what I’m about to tell you.

But in the current tkinter, variable classes have a __del__ destructor that automatically

unsets a generated Tk variable when the Python object is reclaimed (i.e., garbage col-

lected). The upshot is that all of your radio buttons may be deselected if the variable

object is collected, at least until the next press resets the Tk variable to a new value.

Example 8-29 shows one way to trigger this.

Example 8-29. PP4E\Gui\Tour\demo-radio-clear.py

# hold on to your radio variables (an obscure thing, indeed)

from tkinter import *

root = Tk()

def radio1(): # local vars are temporary

#global tmp # making it global fixes the problem

tmp = IntVar()

for i in range(10):

rad = Radiobutton(root, text=str(i), value=i, variable=tmp)

rad.pack(side=LEFT)

tmp.set(5) # select 6th button

radio1()

root.mainloop()

466 | Chapter 8: A tkinter Tour, Part 1

This should come up with button “5” selected initially, but it doesn’t. The variable

referenced by local tmp is reclaimed on function exit, the Tk variable is unset, and the 5

setting is lost (all buttons come up unselected). These radio buttons work fine, though,

once you start pressing them, because that resets the internal Tk variable. Uncomment-

ing the global statement here makes 5 start out set, as expected.

This phenomenon seems to have grown even worse in Python 3.X: not only is “5” not

selected initially, but moving the mouse cursor over the unselected buttons seems to

select many at random until one is pressed. (In 3.X we also need to initialize a String

Var shared by radio buttons as we did in this section’s earlier examples, or else its empty

string default selects all of them!)

Of course, this is an atypical example—as coded, there is no way to know which button

is pressed, because the variable isn’t saved (and command isn’t set). It makes little sense

to use a group of radio buttons at all if you cannot query its value later. In fact, this is

so obscure that I’ll just refer you to demo-radio-clear2.py in the book’s examples dis-

tribution for an example that works hard to trigger this oddity in other ways. You

probably won’t care, but you can’t say that I didn’t warn you if you ever do.

Scales (Sliders)

Scales (sometimes called “sliders”) are used to select among a range of numeric values.

Moving the scale’s position with mouse drags or clicks moves the widget’s value among

a range of integers and triggers Python callbacks if registered.

Like check buttons and radio buttons, scales have both a command option for registering

an event-driven callback handler to be run right away when the scale is moved, and a

variable option for associating a tkinter variable that allows the scale’s position to be

fetched and set at arbitrary times. You can process scale settings when they are made,

or let the user pick a setting for later use.

In addition, scales have a third processing option—get and set methods that scripts

may call to access scale values directly without associating variables. Because scale

command movement callbacks also get the current scale setting value as an argument, it’s

often enough just to provide a callback for this widget, without resorting to either linked

variables or get/set method calls.

To illustrate the basics, Example 8-30 makes two scales—one horizontal and one ver-

tical—and links them with an associated variable to keep them in sync.

Example 8-30. PP4E\Gui\Tour\demoScale.py

"create two linked scales used to launch dialog demos"

from tkinter import * # get base widget set

from dialogTable import demos # button callback handlers

from quitter import Quitter # attach a quit frame to me

Checkbutton, Radiobutton, and Scale | 467

class Demo(Frame):

def __init__(self, parent=None, **options):

Frame.__init__(self, parent, **options)

self.pack()

Label(self, text="Scale demos").pack()

self.var = IntVar()

Scale(self, label='Pick demo number',

command=self.onMove, # catch moves

variable=self.var, # reflects position

from_=0, to=len(demos)-1).pack()

Scale(self, label='Pick demo number',

command=self.onMove, # catch moves

variable=self.var, # reflects position

from_=0, to=len(demos)-1,

length=200, tickinterval=1,

showvalue=YES, orient='horizontal').pack()

Quitter(self).pack(side=RIGHT)

Button(self, text="Run demo", command=self.onRun).pack(side=LEFT)

Button(self, text="State", command=self.report).pack(side=RIGHT)

def onMove(self, value):

print('in onMove', value)

def onRun(self):

pos = self.var.get()

print('You picked', pos)

demo = list(demos.values())[pos] # map from position to value (3.X view)

print(demo()) # or demos[ list(demos.keys())[pos] ]()

def report(self):

print(self.var.get())

if __name__ == '__main__':

print(list(demos.keys()))

Demo().mainloop()

Besides value access and callback registration, scales have options tailored to the notion

of a range of selectable values, most of which are demonstrated in this example’s code:

• The label option provides text that appears along with the scale, length specifies

an initial size in pixels, and orient specifies an axis.

• The from_ and to options set the scale range’s minimum and maximum values (note

that from is a Python reserved word, but from_ is not).

• The tickinterval option sets the number of units between marks drawn at regular

intervals next to the scale (the default means no marks are drawn).

• The resolution option provides the number of units that the scale’s value jumps

on each drag or left mouse click event (defaults to 1).

• The showvalue option can be used to show or hide the scale’s current value next to

its slider bar (the default showvalue=YES means it is drawn).

468 | Chapter 8: A tkinter Tour, Part 1

Note that scales are also packed in their container, just like other tkinter widgets. Let’s

see how these ideas translate in practice; Figure 8-30 shows the window you get if you

run this script live on Windows 7 (you get a similar one on Unix and Mac machines).

Figure 8-30. demoScale in action

For illustration purposes, this window’s State button shows the scales’ current values,

and “Run demo” runs a standard dialog call as before, using the integer value of the

scales to index the demos table. The script also registers a command handler that fires

every time either of the scales is moved and prints their new positions. Here is a set of

messages sent to stdout after a few moves, demo runs (italic), and state requests (bold):

C:\...\PP4E\Gui\Tour> python demoScale.py

['Color', 'Query', 'Input', 'Open', 'Error']

in onMove 0

in onMove 1

in onMove 2

You picked 2

123.0

in onMove 3

You picked 3

C:/Users/mark/Stuff/Books/4E/PP4E/dev/Examples/PP4E/Launcher.py

Scales and variables

As you can probably tell, scales offer a variety of ways to process their selections: im-

mediately in move callbacks, or later by fetching current positions with variables or

scale method calls. In fact, tkinter variables aren’t needed to program scales at all—

Checkbutton, Radiobutton, and Scale | 469

simply register movement callbacks or call the scale get method to fetch scale values

on demand, as in the simpler scale example in Example 8-31.

Example 8-31. PP4E\Gui\Tour\demo-scale-simple.py

from tkinter import *

root = Tk()

scl = Scale(root, from_=-100, to=100, tickinterval=50, resolution=10)

scl.pack(expand=YES, fill=Y)

def report():

print(scl.get())

Button(root, text='state', command=report).pack(side=RIGHT)

root.mainloop()

Figure 8-31 shows two instances of this program running on Windows—one stretched

and one not (the scales are packed to grow vertically on resizes). Its scale displays a

range from −100 to 100, uses the resolution option to adjust the current position up

or down by 10 on every move, and sets the tickinterval option to show values next to

the scale in increments of 50. When you press the State button in this script’s window,

it calls the scale’s get method to display the current setting, without variables or call-

backs of any kind:

C:\...\PP4E\Gui\Tour> python demo-scale-simple.py

-70

Figure 8-31. A simple scale without variables

Frankly, the only reason tkinter variables are used in the demoScale script at all is to

synchronize scales. To make the demo interesting, this script associates the same tkinter

470 | Chapter 8: A tkinter Tour, Part 1

variable object with both scales. As we learned in the last section, changing a widget

changes its variable, but changing a variable also changes all the widgets it is associated

with. In the world of sliders, moving the slide updates that variable, which in turn might

update other widgets associated with the same variable. Because this script links one

variable with two scales, it keeps them automatically in sync: moving one scale moves

the other, too, because the shared variable is changed in the process and so updates the

other scale as a side effect.

Linking scales like this may or may not be typical of your applications (and borders on

deep magic), but it’s a powerful tool once you get your mind around it. By linking

multiple widgets on a display with tkinter variables, you can keep them automatically

in sync, without making manual adjustments in callback handlers. On the other hand,

the synchronization could be implemented without a shared variable at all by calling

one scale’s set method from a move callback handler of the other. I’ll leave such a

manual mutation as a suggested exercise, though. One person’s deep magic might be

another’s useful hack.

Running GUI Code Three Ways

Now that we’ve built a handful of similar demo launcher programs, let’s write a few

top-level scripts to combine them. Because the demos were coded as both reusable

classes and scripts, they can be deployed as attached frame components, run in their

own top-level windows, and launched as standalone programs. All three options illus-

trate code reuse in action.

Attaching Frames

To illustrate hierarchical GUI composition on a grander scale than we’ve seen so far,

Example 8-32 arranges to show all four of the dialog launcher bar scripts of this chapter

in a single container. It reuses Examples 8-9, 8-22, 8-25, and 8-30.

Example 8-32. PP4E\Gui\Tour\demoAll-frm.py

"""

4 demo class components (subframes) on one window;

there are 5 Quitter buttons on this one window too, and each kills entire gui;

GUIs can be reused as frames in container, independent windows, or processes;

"""

from tkinter import *

from quitter import Quitter

demoModules = ['demoDlg', 'demoCheck', 'demoRadio', 'demoScale']

parts = []

def addComponents(root):

for demo in demoModules:

module = __import__(demo) # import by name string

part = module.Demo(root) # attach an instance

Running GUI Code Three Ways | 471

part.config(bd=2, relief=GROOVE) # or pass configs to Demo()

part.pack(side=LEFT, expand=YES, fill=BOTH) # grow, stretch with window

parts.append(part) # change list in-place

def dumpState():

for part in parts: # run demo report if any

print(part.__module__ + ':', end=' ')

if hasattr(part, 'report'):

part.report()

else:

print('none')

root = Tk() # make explicit root first

root.title('Frames')

Label(root, text='Multiple Frame demo', bg='white').pack()

Button(root, text='States', command=dumpState).pack(fill=X)

Quitter(root).pack(fill=X)

addComponents(root)

root.mainloop()

Because all four demo launcher bars are coded as frames which attach themselves to

parent container widgets, this is easier than you might think: simply pass the same

parent widget (here, the root window) to all four demo constructor calls, and repack

and configure the demo objects as desired. Figure 8-32 shows this script’s graphical

result—a single window embedding instances of all four of the dialog demo launcher

demos we saw earlier. As coded, all four embedded demos grow and stretch with the

window when resized (try taking out the expand=YES to keep their sizes more constant).

Figure 8-32. demoAll_frm: nested subframes

Naturally, this example is artificial, but it illustrates the power of composition when

applied to building larger GUI displays. If you pretend that each of the four attached

472 | Chapter 8: A tkinter Tour, Part 1

demo objects was something more useful, like a text editor, calculator, or clock, you’ll

better appreciate the point of this example.

Besides demo object frames, this composite window also contains no fewer than five

instances of the Quitter button we wrote earlier (all of which verify the request and any

one of which can end the GUI) and a States button to dump the current values of all

the embedded demo objects at once (it calls each object’s report method, if it has one).

Here is a sample of the sort of output that shows up in the stdout stream after interacting

with widgets on this display; States output is in bold:

C:\...\PP4E\Gui\Tour> python demoAll_frm.py

in onMove 0

demoDlg: none

demoCheck: 0 0 0 0 0

demoRadio: Error

demoScale: 0

you pressed Input

result: 1.234

in onMove 1

demoDlg: none

demoCheck: 1 0 1 1 0

demoRadio: Input

demoScale: 1

you pressed Query

result: yes

in onMove 2

You picked 2

None

in onMove 3

You picked 3

C:/Users/mark/Stuff/Books/4E/PP4E/dev/Examples/PP4E/Launcher.py

Query

1 1 1 1 0

demoDlg: none

demoCheck: 1 1 1 1 0

demoRadio: Query

demoScale: 3

Importing by name string

The only substantially tricky part of this script is its use of Python’s built-in

__import__ function to import a module by a name string. Look at the following two

lines from the script’s addComponents function:

module = __import__(demo) # import module by name string

part = module.Demo(root) # attach an instance of its Demo

This is equivalent to saying something like this:

import 'demoDlg'

part = 'demoDlg'.Demo(root)

Running GUI Code Three Ways | 473

However, the preceding code is not legal Python syntax—the module name in import

statements and dot expressions must be a Python variable, not a string; moreover, in

an import the name is taken literally (not evaluated), and in dot syntax must evaluate

to the object (not its string name). To be generic, addComponents steps through a list of

name strings and relies on __import__ to import and return the module identified by

each string. In fact, the for loop containing these statements works as though all of

these statements were run:

import demoDlg, demoRadio, demoCheck, demoScale

part = demoDlg.Demo(root)

part = demoRadio.Demo(root)

part = demoCheck.Demo(root)

part = demoScale.Demo(root)

But because the script uses a list of name strings, it’s easier to change the set of demos

embedded—simply change the list, not the lines of executable code. Further, such data-

driven code tends to be more compact, less redundant, and easier to debug and main-

tain. Incidentally, modules can also be imported from name strings by dynamically

constructing and running import statements, like this:

for demo in demoModules:

exec('from %s import Demo' % demo) # make and run a from

part = eval('Demo')(root) # fetch known import name by string

The exec statement compiles and runs a Python statement string (here, a from to load

a module’s Demo class); it works here as if the statement string were pasted into the

source code where the exec statement appears. The following achieves the same effect

by running an import statement instead:

for demo in demoModules:

exec('import %s' % demo) # make and run an import

part = eval(demo).Demo(root) # fetch module variable by name too

Because it supports any sort of Python statement, these exec/eval techniques are more

general than the __import__ call, but can also be slower, since they must parse code

strings before running them.† However, that slowness may not matter in a GUI; users

tend to be significantly slower than parsers.

Configuring at construction time

One other alternative worth mentioning: notice how Example 8-32 configures and

repacks each attached demo frame for its role in this GUI:

def addComponents(root):

for demo in demoModules:

module = __import__(demo) # import by name string

part = module.Demo(root) # attach an instance

† As we’ll see later in this book, exec can also be dangerous if it is running code strings fetched from users or

network connections. That’s not an issue for the hardcoded strings used internally in this example.

474 | Chapter 8: A tkinter Tour, Part 1

part.config(bd=2, relief=GROOVE) # or pass configs to Demo()

part.pack(side=LEFT, expand=YES, fill=BOTH) # grow, stretch with window

Because the demo classes use their **options arguments to support constructor argu-

ments, though, we could configure at creation time, too. For example, if we change

this code as follows, it produces the slightly different composite window captured in

Figure 8-33 (stretched a bit horizontally for illustration, too; you can run this as

demoAll-frm-ridge.py in the examples package):

def addComponents(root):

for demo in demoModules:

module = __import__(demo) # import by name string

part = module.Demo(root, bd=6, relief=RIDGE) # attach, config instance

part.pack(side=LEFT, expand=YES, fill=BOTH) # grow, stretch with window

Because the demo classes both subclass Frame and support the usual construction

argument protocols, they become true widgets—specialized tkinter frames that imple-

ment an attachable package of widgets and support flexible configuration techniques.

Figure 8-33. demoAll_frm: configure when constructed

As we saw in Chapter 7, attaching nested frames like this is really just one way to reuse

GUI code structured as classes. It’s just as easy to customize such interfaces by sub-

classing rather than embedding. Here, though, we’re more interested in deploying an

existing widget package than changing it, so attachment is the pattern we want. The

next two sections show two other ways to present such precoded widget packages to

users—in pop-up windows and as autonomous programs.

Running GUI Code Three Ways | 475

Independent Windows

Once you have a set of component classes coded as frames, any parent will work—

both other frames and brand-new, top-level windows. Example 8-33 attaches instances

of all four demo bar objects to their own independent Toplevel windows, instead of

the same container.

Example 8-33. PP4E\Gui\Tour\demoAll-win.py

"""

4 demo classes in independent top-level windows;

not processes: when one is quit all others go away, because all windows run in

the same process here; make Tk() first here, else we get blank default window

"""

from tkinter import *

demoModules = ['demoDlg', 'demoRadio', 'demoCheck', 'demoScale']

def makePopups(modnames):

demoObjects = []

for modname in modnames:

module = __import__(modname) # import by name string

window = Toplevel() # make a new window

demo = module.Demo(window) # parent is the new window

window.title(module.__name__)

demoObjects.append(demo)

return demoObjects

def allstates(demoObjects):

for obj in demoObjects:

if hasattr(obj, 'report'):

print(obj.__module__, end=' ')

obj.report()

root = Tk() # make explicit root first

root.title('Popups')

demos = makePopups(demoModules)

Label(root, text='Multiple Toplevel window demo', bg='white').pack()

Button(root, text='States', command=lambda: allstates(demos)).pack(fill=X)

root.mainloop()

We met the Toplevel class earlier; every instance generates a new window on your

screen. The net result is captured in Figure 8-34. Each demo runs in an independent

window of its own instead of being packed together in a single display.

476 | Chapter 8: A tkinter Tour, Part 1

Figure 8-34. demoAll_win: new Toplevel windows

The main root window of this program appears in the lower left of this screenshot; it

provides a States button that runs the report method of each demo object, producing

this sort of stdout text:

C:\...\PP4E\Gui\Tour> python demoAll_win.py

in onMove 0

in onMove 1

you pressed Open

result: C:/Users/mark/Stuff/Books/4E/PP4E/dev/Examples/PP4E/Launcher.py

demoRadio Open

demoCheck 1 1 0 0 0

demoScale 1

As we learned earlier in this chapter, Toplevel windows function independently, but

they are not really independent programs. Destroying just one of the demo windows

in Figure 8-34 by clicking the X button in its upper right corner closes just that window.

But quitting any of the windows shown in Figure 8-34—by a demo window’s Quit

buttons or the main window’s X—quits them all and ends the application, because all

run in the same program process. That’s OK in some applications, but not all. To go

truly rogue we need to spawn processes, as the next section shows.

Running GUI Code Three Ways | 477

Running Programs

To be more independent, Example 8-34 spawns each of the four demo launchers as

independent programs (processes), using the launchmodes module we wrote at the end

of Chapter 5. This works only because the demos were written as both importable

classes and runnable scripts. Launching them here makes all their names __main__ when

run, because they are separate, stand-alone programs; this in turn kicks off the main

loop call at the bottom of each of their files.

Example 8-34. PP4E\Gui\Tour\demoAll-prg.py

"""

4 demo classes run as independent program processes: command lines;

if one window is quit now, the others will live on; there is no simple way to

run all report calls here (though sockets and pipes could be used for IPC), and

some launch schemes may drop child program stdout and disconnect parent/child;

"""

from tkinter import *

from PP4E.launchmodes import PortableLauncher

demoModules = ['demoDlg', 'demoRadio', 'demoCheck', 'demoScale']

for demo in demoModules: # see Parallel System Tools

PortableLauncher(demo, demo + '.py')() # start as top-level programs

root = Tk()

root.title('Processes')

Label(root, text='Multiple program demo: command lines', bg='white').pack()

root.mainloop()

Make sure the PP4E directory’s container is on your module search path (e.g.,

PYTHONPATH) to run this; it imports an example module from a different directory.

As Figure 8-35 shows, the display generated by this script is similar to the prior one;

all four demos come up in windows of their own.

This time, though, these are truly independent programs: if any one of the five windows

here is quit, the others live on. The demos even outlive their parent, if the main window

is closed. On Windows, in fact, the shell window where this script is started becomes

active again when the main window is closed, even though the spawned demos con-

tinue running. We’re reusing the demo code as program, not module.

478 | Chapter 8: A tkinter Tour, Part 1

Figure 8-35. demoAll_prg: independent programs

Launching GUIs as programs other ways: multiprocessing

If you backtrack to Chapter 5 to study the portable launcher module used by Exam-

ple 8-34 to start programs, you’ll see that it works by using os.spawnv on Windows and

os.fork/exec on others. The net effect is that the GUI processes are effectively started

by launching command lines. These techniques work well, but as we learned in Chap-

ter 5, they are members of a larger set of program launching tools that also includes

os.popen, os.system, os.startfile, and the subprocess and multiprocessing modules;

these tools can vary subtly in how they handle shell window connections, parent

process exits, and more.

For example, the multiprocessing module we studied in Chapter 5 provides a similarly

portable way to run our GUIs as independent processes, as demonstrated in Exam-

ple 8-35. When run, it produces the exact same windows shown in Figure 8-35, except

that the label in the main window is different.

Example 8-35. PP4E\Gui\Tour\demoAll-prg-multi.py

"""

4 demo classes run as independent program processes: multiprocessing;

multiprocessing allows us to launch named functions with arguments,

but not lambdas, because they are not pickleable on Windows (Chapter 5);

multiprocessing also has its own IPC tools like pipes for communication;

"""

Running GUI Code Three Ways | 479

from tkinter import *

from multiprocessing import Process

demoModules = ['demoDlg', 'demoRadio', 'demoCheck', 'demoScale']

def runDemo(modname): # run in a new process

module = __import__(modname) # build gui from scratch

module.Demo().mainloop()

if __name__ == '__main__':

for modname in demoModules: # in __main__ only!

Process(target=runDemo, args=(modname,)).start()

root = Tk() # parent process GUI

root.title('Processes')

Label(root, text='Multiple program demo: multiprocessing', bg='white').pack()

root.mainloop()

Operationally, this version differs on Windows only in that:

• The child processes’ standard output shows up in the window where the script was

launched, including the outputs of both dialog demos themselves and all demo

windows’ State buttons.

• The script doesn’t truly exit if any children are still running: the shell where it is

launched is blocked if the main process’s window is closed while children are still

running, unless we set the child processes’ daemon flag to True before they start as

we saw in Chapter 5—in which case all child programs are automatically shut down

when their parent is (but parents may still outlive their children).

Also observe how we start a simple named function in the new Process. As we learned

in Chapter 5, the target must be pickleable on Windows (which essentially means im-

portable), so we cannot use lambdas to pass extra data in the way we typically could

in tkinter callbacks. The following coding alternatives both fail with errors on

Windows:

Process(target=(lambda: runDemo(modname))).start() # these both fail!

Process(target=(lambda: __import__(modname).Demo().mainloop())).start()

We won’t recode our GUI program launcher script with any of the other techniques

available, but feel free to experiment on your own using Chapter 5 as a resource. Al-

though not universally applicable, the whole point of tools like the PortableLauncher

class is to hide such details so we can largely forget them.

Cross-program communication

Spawning GUIs as programs is the ultimate in code independence, but it makes the

lines of communication between components more complex. For instance, because the

demos run as programs here, there is no easy way to run all their report methods from

the launching script’s window pictured in the upper right of Figure 8-35. In fact, the

480 | Chapter 8: A tkinter Tour, Part 1

States button is gone this time, and we only get PortableLauncher messages in stdout

as the demos start up in Example 8-34:

C:\...\PP4E\Gui\Tour> python demoAll_prg.py

demoDlg

demoRadio

demoCheck

demoScale

On some platforms, messages printed by the demo programs (including their own State

buttons) may show up in the original console window where this script is launched;

on Windows, the os.spawnv call used to start programs by launchmodes in Exam-

ple 8-34 completely disconnects the child program’s stdout stream from its parent, but

the multiprocessing scheme of Example 8-35 does not. Regardless, there is no direct

way to call all demos’ report methods at once—they are spawned programs in distinct

address spaces, not imported modules.

Of course, we could trigger report methods in the spawned programs with some of the

Inter-Process Communication (IPC) mechanisms we met in Chapter 5. For instance:

• The demos could be instrumented to catch a user signal, and could run their

report in response.

• The demos could also watch for request strings sent by the launching program to

show up in pipes or fifos; the demoAll launching program would essentially act as

a client, and the demo GUIs as servers that respond to client requests.

• Independent programs can also converse this same way over sockets, the general

IPC tool introduced in Chapter 5, which we’ll study in depth in Part IV. The main

window might send a report request and receive its result on the same socket (and

might even contact demos running remotely).

• If used, the multiprocessing module has IPC tools all its own, such as the object

pipes and queues we studied in Chapter 5, that could also be leveraged: demos

might listen on this type of pipe, too.

Given their event-driven nature, GUI-based programs like our demos also need to avoid

becoming stuck in wait states—they cannot be blocked while waiting for requests on

IPC devices like those above, or they won’t be responsive to users (and might not even

redraw themselves). Because of that, they may also have be augmented with threads,

timer-event callbacks, nonblocking input calls, or some combination of such techni-

ques to periodically check for incoming messages on pipes, fifos, or sockets. As we’ll

see, the tkinter after method call described near the end of the next chapter is ideal for

this: it allows us to register a callback to run periodically to check for incoming requests

on such IPC tools.

We’ll explore some of these options near the end of Chapter 10, after we’ve looked at

GUI threading topics. But since this is well beyond the scope of the current chapter’s

simple demo programs, I’ll leave such cross-program extensions up to more parallel-

minded readers for now.

Running GUI Code Three Ways | 481

Coding for reusability

A postscript: I coded the demo launcher bars deployed by the last four examples to

demonstrate all the different ways that their widgets can be used. They were not de-

veloped with general-purpose reusability in mind; in fact, they’re not really useful out-

side the context of introducing widgets in this book.

That was by design; most tkinter widgets are easy to use once you learn their interfaces,

and tkinter already provides lots of configuration flexibility by itself. But if I had it in

mind to code checkbutton and radiobutton classes to be reused as general library com-

ponents, they would have to be structured differently:

Extra widgets

They would not display anything but radio buttons and check buttons. As is, the

demos each embed State and Quit buttons for illustration, but there really should

be just one Quit per top-level window.

Geometry management

They would allow for different button arrangements and would not pack (or grid)

themselves at all. In a true general-purpose reuse scenario, it’s often better to leave

a component’s geometry management up to its caller.

Usage mode limitations

They would either have to export complex interfaces to support all possible tkinter

configuration options and modes, or make some limiting decisions that support

one common use only. For instance, these buttons can either run callbacks at press

time or provide their state later in the application.

Example 8-36 shows one way to code check button and radio button bars as library

components. It encapsulates the notion of associating tkinter variables and imposes a

common usage mode on callers—state fetches rather than press callbacks—to keep the

interface simple.

Example 8-36. PP4E\Gui\Tour\buttonbars.py

"""

check and radio button bar classes for apps that fetch state later;

pass a list of options, call state(), variable details automated

"""

from tkinter import *

class Checkbar(Frame):

def __init__(self, parent=None, picks=[], side=LEFT, anchor=W):

Frame.__init__(self, parent)

self.vars = []

for pick in picks:

var = IntVar()

chk = Checkbutton(self, text=pick, variable=var)

chk.pack(side=side, anchor=anchor, expand=YES)

self.vars.append(var)

def state(self):

482 | Chapter 8: A tkinter Tour, Part 1

return [var.get() for var in self.vars]

class Radiobar(Frame):

def __init__(self, parent=None, picks=[], side=LEFT, anchor=W):

Frame.__init__(self, parent)

self.var = StringVar()

self.var.set(picks[0])

for pick in picks:

rad = Radiobutton(self, text=pick, value=pick, variable=self.var)

rad.pack(side=side, anchor=anchor, expand=YES)

def state(self):

return self.var.get()

if __name__ == '__main__':

root = Tk()

lng = Checkbar(root, ['Python', 'C#', 'Java', 'C++'])

gui = Radiobar(root, ['win', 'x11', 'mac'], side=TOP, anchor=NW)

tgl = Checkbar(root, ['All'])

gui.pack(side=LEFT, fill=Y)

lng.pack(side=TOP, fill=X)

tgl.pack(side=LEFT)

lng.config(relief=GROOVE, bd=2)

gui.config(relief=RIDGE, bd=2)

def allstates():

print(gui.state(), lng.state(), tgl.state())

from quitter import Quitter

Quitter(root).pack(side=RIGHT)

Button(root, text='Peek', command=allstates).pack(side=RIGHT)

root.mainloop()

To reuse these classes in your scripts, import and call them with a list of the options

that you want to appear in a bar of check buttons or radio buttons. This module’s self-

test code at the bottom of the file gives further usage details. It generates Figure 8-36—

a top-level window that embeds two Checkbars, one Radiobar, a Quitter button to exit,

and a Peek button to show bar states—when this file is run as a program instead of

being imported.

Figure 8-36. buttonbars self-test window

Running GUI Code Three Ways | 483

Here’s the stdout text you get after pressing Peek—the results of these classes’ state

methods:

x11 [1, 0, 1, 1] [0]

win [1, 0, 0, 1] [1]

The two classes in this module demonstrate how easy it is to wrap tkinter interfaces to

make them easier to use; they completely abstract away many of the tricky parts of

radio button and check button bars. For instance, you can forget about linked variable

details completely if you use such higher-level classes instead—simply make objects

with option lists and call their state methods later. If you follow this path to its logical

conclusion, you might just wind up with a higher-level widget library on the order of

the Pmw package mentioned in Chapter 7.

On the other hand, these classes are still not universally applicable; if you need to run

actions when these buttons are pressed, for instance, you’ll need to use other high-level

interfaces. Luckily, Python/tkinter already provides plenty. Later in this book, we’ll

again use the widget combination and reuse techniques introduced in this section to

construct larger GUIs like text editors, email clients and calculators. For now, this first

chapter in the widget tour is about to make one last stop—the photo shop.

Images

In tkinter, graphical images are displayed by creating independent PhotoImage or

BitmapImage objects, and then attaching those image objects to other widgets via

image attribute settings. Buttons, labels, canvases, text, and menus can display images

by associating prebuilt image objects in this way. To illustrate, Example 8-37 throws a

picture up on a button.

Example 8-37. PP4E\Gui\Tour\imgButton.py

gifdir = "../gifs/"

from tkinter import *

win = Tk()

igm = PhotoImage(file=gifdir + "ora-pp.gif")

Button(win, image=igm).pack()

win.mainloop()

I could try to come up with a simpler example, but it would be tough—all this script

does is make a tkinter PhotoImage object for a GIF file stored in another directory, and

associate it with a Button widget’s image option. The result is captured in Figure 8-37.

484 | Chapter 8: A tkinter Tour, Part 1

Figure 8-37. imgButton in action

PhotoImage and its cousin, BitmapImage, essentially load graphics files and allow those

graphics to be attached to other kinds of widgets. To open a picture file, pass its name

to the file attribute of these image objects. Though simple, attaching images to buttons

this way has many uses; in Chapter 9, for instance, we’ll use this basic idea to implement

toolbar buttons at the bottom of a window.

Canvas widgets—general drawing surfaces covered in more detail in the next chapter—

can display pictures too. Though this is a bit of a preview for the upcoming chapter,

basic canvas usage is straightforward enough to demonstrate here; Example 8-38 ren-

ders Figure 8-38 (shrunk here for display):

Example 8-38. PP4E\Gui\Tour\imgCanvas.py

gifdir = "../gifs/"

from tkinter import *

win = Tk()

img = PhotoImage(file=gifdir + "ora-lp4e.gif")

can = Canvas(win)

can.pack(fill=BOTH)

can.create_image(2, 2, image=img, anchor=NW) # x, y coordinates

win.mainloop()

Buttons are automatically sized to fit an associated photo, but canvases are not (because

you can add objects to a canvas later, as we’ll see in Chapter 9). To make a canvas fit

the picture, size it according to the width and height methods of image objects, as in

Example 8-39. This version will make the canvas smaller or larger than its default size

as needed, lets you pass in a photo file’s name on the command line, and can be used

as a simple image viewer utility. The visual effect of this script is captured in Figure 8-39.

Images | 485

Example 8-39. PP4E\Gui\Tour\imgCanvas2.py

gifdir = "../gifs/"

from sys import argv

from tkinter import *

filename = argv[1] if len(argv) > 1 else 'ora-lp4e.gif' # name on cmdline?

win = Tk()

img = PhotoImage(file=gifdir + filename)

can = Canvas(win)

can.pack(fill=BOTH)

can.config(width=img.width(), height=img.height()) # size to img size

can.create_image(2, 2, image=img, anchor=NW)

win.mainloop()

Figure 8-38. An image on canvas

Figure 8-39. Sizing the canvas to match the photo

486 | Chapter 8: A tkinter Tour, Part 1

Run this script with other filenames to view other images (try this on your own):

C:\...\PP4E\Gui\Tour> imgCanvas2.py ora-ppr-german.gif

And that’s all there is to it. In Chapter 9, we’ll see images show up again in the items

of a Menu, in the buttons of a window’s toolbar, in other Canvas examples, and in the

image-friendly Text widget. In later chapters, we’ll find them in an image slideshow

(PyView), in a paint program (PyDraw), on clocks (PyClock), in a generalized photo

viewer (PyPhoto), and so on. It’s easy to add graphics to GUIs in Python/tkinter.

Once you start using photos in earnest, though, you’re likely to run into two tricky bits

that I want to warn you about here:

Supported file types

At present, the standard tkinter PhotoImage widget supports only GIF, PPM, and

PGM graphic file formats, and BitmapImage supports X Windows-style .xbm bitmap

files. This may be expanded in future releases, and you can convert photos in other

formats to these supported formats ahead of time, of course. But as we’ll see later

in this chapter, it’s easy to support additional image types with the PIL open source

extension toolkit and its PhotoImage replacement.

Hold on to your images!

Unlike all other tkinter widgets, an image is utterly lost if the corresponding Python

image object is garbage collected. That means you must retain an explicit reference

to image objects for as long as your program needs them (e.g., assign them to a

long-lived variable name, object attribute, or data structure component). Python

does not automatically keep a reference to the image, even if it is linked to other

GUI components for display. Moreover, image destructor methods erase the image

from memory. We saw earlier that tkinter variables can behave oddly when re-

claimed, too (they may be unset), but the effect is much worse and more likely to

happen with images. This may change in future Python releases, though there are

good reasons for not retaining big image files in memory indefinitely; for now,

though, images are a “use it or lose it” widget.

Fun with Buttons and Pictures

I tried to come up with an image demo for this section that was both fun and useful. I

settled for the fun part. Example 8-40 displays a button that changes its image at ran-

dom each time it is pressed.

Example 8-40. PP4E\Gui\Tour\buttonpics-func.py

from tkinter import * # get base widget set

from glob import glob # filename expansion list

import demoCheck # attach checkbutton demo to me

import random # pick a picture at random

gifdir = '../gifs/' # where to look for GIF files

Images | 487

def draw():

name, photo = random.choice(images)

lbl.config(text=name)

pix.config(image=photo)

root=Tk()

lbl = Label(root, text="none", bg='blue', fg='red')

pix = Button(root, text="Press me", command=draw, bg='white')

lbl.pack(fill=BOTH)

pix.pack(pady=10)

demoCheck.Demo(root, relief=SUNKEN, bd=2).pack(fill=BOTH)

files = glob(gifdir + "*.gif") # GIFs for now

images = [(x, PhotoImage(file=x)) for x in files] # load and hold

print(files)

root.mainloop()

This code uses a handful of built-in tools from the Python library:

• The Python glob module we first met in Chapter 4 gives a list of all files ending

in .gif in a directory; in other words, all GIF files stored there.

• The Python random module is used to select a random GIF from files in the directory:

random.choice picks and returns an item from a list at random.

• To change the image displayed (and the GIF file’s name in a label at the top of the

window), the script simply calls the widget config method with new option set-

tings; changing on the fly like this changes the widget’s display dynamically.

Just for fun, this script also attaches an instance of the demoCheck check button demo

bar from Example 8-22, which in turn attaches an instance of the Quitter button we

wrote earlier in Example 8-7. This is an artificial example, of course, but it again dem-

onstrates the power of component class attachment at work.

Notice how this script builds and holds on to all images in its images list. The list

comprehension here applies a PhotoImage constructor call to every .gif file in the photo

directory, producing a list of (filename, imageobject) tuples that is saved in a global

variable (a map call using a one-argument lambda function could do the same). Remem-

ber, this guarantees that image objects won’t be garbage collected as long as the pro-

gram is running. Figure 8-40 shows this script in action on Windows.

Although it may not be obvious in this grayscale book, the name of the GIF file being

displayed is shown in red text in the blue label at the top of this window. This program’s

window grows and shrinks automatically when larger and smaller GIF files are dis-

played; Figure 8-41 shows it randomly picking a taller photo globbed from the image

directory.

488 | Chapter 8: A tkinter Tour, Part 1

Figure 8-40. buttonpics in action

Figure 8-41. buttonpics showing a taller photo

Images | 489

And finally, Figure 8-42 captures this script’s GUI displaying one of the wider GIFs,

selected completely at random from the photo file directory.‡

Figure 8-42. buttonpics gets political

While we’re playing, let’s recode this script as a class in case we ever want to attach or

customize it later (it could happen, especially in more realistic programs). It’s mostly

a matter of indenting and adding self before global variable names, as shown in

Example 8-41.

Example 8-41. PP4E\Gui\Tour\buttonpics.py

from tkinter import * # get base widget set

from glob import glob # filename expansion list

import demoCheck # attach check button example to me

import random # pick a picture at random

gifdir = '../gifs/' # default dir to load GIF files

class ButtonPicsDemo(Frame):

def __init__(self, gifdir=gifdir, parent=None):

Frame.__init__(self, parent)

self.pack()

self.lbl = Label(self, text="none", bg='blue', fg='red')

self.pix = Button(self, text="Press me", command=self.draw, bg='white')

self.lbl.pack(fill=BOTH)

self.pix.pack(pady=10)

demoCheck.Demo(self, relief=SUNKEN, bd=2).pack(fill=BOTH)

files = glob(gifdir + "*.gif")

self.images = [(x, PhotoImage(file=x)) for x in files]

print(files)

def draw(self):

name, photo = random.choice(self.images)

‡ This particular image is not my creation; it appeared as a banner ad on developer-related websites such as

Slashdot when the book Learning Python was first published in 1999. It generated enough of a backlash from

Perl zealots that O’Reilly eventually pulled the ad altogether. Which may be why, of course, it later appeared

in this book.

490 | Chapter 8: A tkinter Tour, Part 1

self.lbl.config(text=name)

self.pix.config(image=photo)

if __name__ == '__main__': ButtonPicsDemo().mainloop()

This version works the same way as the original, but it can now be attached to any

other GUI where you would like to include such an unreasonably silly button.

Viewing and Processing Images with PIL

As mentioned earlier, Python tkinter scripts show images by associating independently

created image objects with real widget objects. At this writing, tkinter GUIs can display

photo image files in GIF, PPM, and PGM formats by creating a PhotoImage object, as

well as X11-style bitmap files (usually suffixed with an .xbm extension) by creating a

BitmapImage object.

This set of supported file formats is limited by the underlying Tk library, not by tkinter

itself, and may expand in the future (it has not in many years). But if you want to display

files in other formats today (e.g., the popular JPEG format), you can either convert your

files to one of the supported formats with an image-processing program or install the

PIL Python extension package mentioned at the start of Chapter 7.

PIL, the Python Imaging Library, is an open source system that supports nearly 30

graphics file formats (including GIF, JPEG, TIFF, PNG, and BMP). In addition to al-

lowing your scripts to display a much wider variety of image types than standard tkinter,

PIL also provides tools for image processing, including geometric transforms, thumb-

nail creation, format conversions, and much more.

PIL Basics

To use its tools, you must first fetch and install the PIL package: see http://www.py

thonware.com (or search for “PIL” on the web). Then, simply use special PhotoImage

and BitmapImage objects imported from the PIL ImageTk module to open files in other

graphic formats. These are compatible replacements for the standard tkinter classes of

the same name, and they may be used anywhere tkinter expects a PhotoImage or

BitmapImage object (i.e., in label, button, canvas, text, and menu object configurations).

That is, replace standard tkinter code such as this:

from tkinter import *

imgobj = PhotoImage(file=imgdir + "spam.gif")

Button(image=imgobj).pack()

Viewing and Processing Images with PIL | 491

with code of this form:

from tkinter import *

from PIL import ImageTk

photoimg = ImageTk.PhotoImage(file=imgdir + "spam.jpg")

Button(image=photoimg).pack()

or with the more verbose equivalent, which comes in handy if you will perform image

processing in addition to image display:

from tkinter import *

from PIL import Image, ImageTk

imageobj = Image.open(imgdir + "spam.jpeg")

photoimg = ImageTk.PhotoImage(imageobj)

Button(image=photoimg).pack()

In fact, to use PIL for image display, all you really need to do is install it and add a single

from statement to your code to get its replacement PhotoImage object after loading the

original from tkinter. The rest of your code remains unchanged but will be able to

display JPEG, PNG, and other image types:

from tkinter import *

from PIL.ImageTk import PhotoImage # <== add this line

imgobj = PhotoImage(file=imgdir + "spam.png")

Button(image=imgobj).pack()

PIL installation details vary per platform; on Windows, it is just a matter of down-

loading and running a self-installer. PIL code winds up in the Python install directory’s

Lib\site-packages; because this is automatically added to the module import search

path, no path configuration is required to use PIL. Simply run the installer and import

the PIL package’s modules. On other platforms, you might untar or unZIP a fetched

source code archive and add PIL directories to the front of your PYTHONPATH setting; see

the PIL system’s website for more details. (In fact, I am using a pre-release version of

PIL for Python 3.1 in this edition; it should be officially released by the time you read

these words.)

There is much more to PIL than we have space to cover here. For instance, it also

provides image conversion, resizing, and transformation tools, some of which can be

run as command-line programs that have nothing to do with GUIs directly. Especially

for tkinter-based programs that display or process images, PIL will likely become a

standard component in your software tool set.

See http://www.pythonware.com for more information, as well as online PIL and tkinter

documentation sets. To help get you started, though, we’ll close out this chapter with

a handful of real scripts that use PIL for image display and processing.

492 | Chapter 8: A tkinter Tour, Part 1

Displaying Other Image Types with PIL

In our earlier image examples, we attached widgets to buttons and canvases, but the

standard tkinter toolkit allows images to be added to a variety of widget types, including

simple labels, text, and menu entries. Example 8-42, for instance, uses unadorned

tkinter to display a single image by attaching it to a label, in the main application win-

dow. The example assumes that images are stored in an images subdirectory, and it

allows the image filename to be passed in as a command-line argument (it defaults to

spam.gif if no argument is passed). It also joins file and directory names more portably

with os.path.join, and it prints the image’s height and width in pixels to the standard

output stream, just to give extra information.

Example 8-42. PP4E\Gui\PIL\viewer-tk.py

"""

show one image with standard tkinter photo object;

as is this handles GIF files, but not JPEG images; image filename listed in

command line, or default; use a Canvas instead of Label for scrolling, etc.

"""

import os, sys

from tkinter import * # use standard tkinter photo object

# GIF works, but JPEG requires PIL

imgdir = 'images'

imgfile = 'london-2010.gif'

if len(sys.argv) > 1: # cmdline argument given?

imgfile = sys.argv[1]

imgpath = os.path.join(imgdir, imgfile)

win = Tk()

win.title(imgfile)

imgobj = PhotoImage(file=imgpath) # display photo on a Label

Label(win, image=imgobj).pack()

print(imgobj.width(), imgobj.height()) # show size in pixels before destroyed

win.mainloop()

Figure 8-43 captures this script’s display on Windows 7, showing the default GIF image

file. Run this from the system console with a filename as a command-line argument to

view other files in the images subdirectory (e.g., python viewer_tk.py filename.gif).

Viewing and Processing Images with PIL | 493

Figure 8-43. tkinter GIF display

Example 8-42 works, but only for image types supported by the base tkinter toolkit.

To display other image formats, such as JPEG, we need to install PIL and use its re-

placement PhotoImage object. In terms of code, it’s simply a matter of adding one import

statement, as illustrated in Example 8-43.

Example 8-43. PP4E\Gui\PIL\viewer-pil.py

"""

show one image with PIL photo replacement object

handles many more image types; install PIL first: placed in Lib\site-packages

"""

import os, sys

from tkinter import *

from PIL.ImageTk import PhotoImage # <== use PIL replacement class

# rest of code unchanged

imgdir = 'images'

imgfile = 'florida-2009-1.jpg' # does gif, jpg, png, tiff, etc.

if len(sys.argv) > 1:

imgfile = sys.argv[1]

imgpath = os.path.join(imgdir, imgfile)

win = Tk()

win.title(imgfile)

imgobj = PhotoImage(file=imgpath) # now JPEGs work!

Label(win, image=imgobj).pack()

494 | Chapter 8: A tkinter Tour, Part 1

win.mainloop()

print(imgobj.width(), imgobj.height()) # show size in pixels on exit

With PIL, our script is now able to display many image types, including the default

JPEG image defined in the script and captured in Figure 8-44. Again, run with a

command-line argument to view other photos.

Figure 8-44. tkinter+PIL JPEG display

Displaying all images in a directory

While we’re at it, it’s not much extra work to allow viewing all images in a directory,

using some of the directory path tools we met in the first part of this book.

Example 8-44, for instance, simply opens a new Toplevel pop-up window for each

image in a directory (given as a command-line argument or a default), taking care to

skip nonimage files by catching exceptions—error messages are both printed and dis-

played in the bad file’s pop-up window.

Example 8-44. PP4E\Gui\PIL\viewer-dir.py

"""

display all images in a directory in pop-up windows

GIFs work in basic tkinter, but JPEGs will be skipped without PIL

"""

Viewing and Processing Images with PIL | 495

import os, sys

from tkinter import *

from PIL.ImageTk import PhotoImage # <== required for JPEGs and others

imgdir = 'images'

if len(sys.argv) > 1: imgdir = sys.argv[1]

imgfiles = os.listdir(imgdir) # does not include directory prefix

main = Tk()

main.title('Viewer')

quit = Button(main, text='Quit all', command=main.quit, font=('courier', 25))

quit.pack()

savephotos = []

for imgfile in imgfiles:

imgpath = os.path.join(imgdir, imgfile)

win = Toplevel()

win.title(imgfile)

try:

imgobj = PhotoImage(file=imgpath)

Label(win, image=imgobj).pack()

print(imgpath, imgobj.width(), imgobj.height()) # size in pixels

savephotos.append(imgobj) # keep a reference

except:

errmsg = 'skipping %s\n%s' % (imgfile, sys.exc_info()[1])

Label(win, text=errmsg).pack()

main.mainloop()

Run this code on your own to see the windows it generates. If you do, you’ll get one

main window with a Quit button to kill all the windows at once, plus as many pop-up

image view windows as there are images in the directory. This is convenient for a quick

look, but not exactly the epitome of user friendliness for large directories! The sample

images directory used for testing, for instance, has 59 images, yielding 60 pop-up win-

dows; those created by your digital camera may have many more. To do better, let’s

move on to the next section.

Creating Image Thumbnails with PIL

As mentioned, PIL does more than display images in a GUI; it also comes with tools

for resizing, converting, and more. One of the many useful tools it provides is the ability

to generate small, “thumbnail” images from originals. Such thumbnails may be dis-

played in a web page or selection GUI to allow the user to open full-size images on

demand.

Example 8-45 is a concrete implementation of this idea—it generates thumbnail images

using PIL and displays them on buttons which open the corresponding original image

when clicked. The net effect is much like the file explorer GUIs that are now standard

on modern operating systems, but by coding this in Python, we’re able to control its

behavior and to reuse and customize its code in our own applications. In fact, we’ll

496 | Chapter 8: A tkinter Tour, Part 1

reuse the makeThumbs function here repeatedly in other examples. As usual, these are

some of the primary benefits inherent in open source software in general.

Example 8-45. PP4E\Gui\PIL\viewer_thumbs.py

"""

display all images in a directory as thumbnail image buttons that display

the full image when clicked; requires PIL for JPEGs and thumbnail image

creation; to do: add scrolling if too many thumbs for window!

"""

import os, sys, math

from tkinter import *

from PIL import Image # <== required for thumbs

from PIL.ImageTk import PhotoImage # <== required for JPEG display

def makeThumbs(imgdir, size=(100, 100), subdir='thumbs'):

"""

get thumbnail images for all images in a directory; for each image, create

and save a new thumb, or load and return an existing thumb; makes thumb

dir if needed; returns a list of (image filename, thumb image object);

caller can also run listdir on thumb dir to load; on bad file types may

raise IOError, or other; caveat: could also check file timestamps;

"""

thumbdir = os.path.join(imgdir, subdir)

if not os.path.exists(thumbdir):

os.mkdir(thumbdir)

thumbs = []

for imgfile in os.listdir(imgdir):

thumbpath = os.path.join(thumbdir, imgfile)

if os.path.exists(thumbpath):

thumbobj = Image.open(thumbpath) # use already created

thumbs.append((imgfile, thumbobj))

else:

print('making', thumbpath)

imgpath = os.path.join(imgdir, imgfile)

try:

imgobj = Image.open(imgpath) # make new thumb

imgobj.thumbnail(size, Image.ANTIALIAS) # best downsize filter

imgobj.save(thumbpath) # type via ext or passed

thumbs.append((imgfile, imgobj))

except: # not always IOError

print("Skipping: ", imgpath)

return thumbs

class ViewOne(Toplevel):

"""

open a single image in a pop-up window when created; photoimage

object must be saved: images are erased if object is reclaimed;

"""

def __init__(self, imgdir, imgfile):

Toplevel.__init__(self)

self.title(imgfile)

imgpath = os.path.join(imgdir, imgfile)

Viewing and Processing Images with PIL | 497

imgobj = PhotoImage(file=imgpath)

Label(self, image=imgobj).pack()

print(imgpath, imgobj.width(), imgobj.height()) # size in pixels

self.savephoto = imgobj # keep reference on me

def viewer(imgdir, kind=Toplevel, cols=None):

"""

make thumb links window for an image directory: one thumb button per image;

use kind=Tk to show in main app window, or Frame container (pack); imgfile

differs per loop: must save with a default; photoimage objs must be saved:

erased if reclaimed; packed row frames (versus grids, fixed-sizes, canvas);

"""

win = kind()

win.title('Viewer: ' + imgdir)

quit = Button(win, text='Quit', command=win.quit, bg='beige') # pack first

quit.pack(fill=X, side=BOTTOM) # so clip last

thumbs = makeThumbs(imgdir)

if not cols:

cols = int(math.ceil(math.sqrt(len(thumbs)))) # fixed or N x N

savephotos = []

while thumbs:

thumbsrow, thumbs = thumbs[:cols], thumbs[cols:]

row = Frame(win)

row.pack(fill=BOTH)

for (imgfile, imgobj) in thumbsrow:

photo = PhotoImage(imgobj)

link = Button(row, image=photo)

handler = lambda savefile=imgfile: ViewOne(imgdir, savefile)

link.config(command=handler)

link.pack(side=LEFT, expand=YES)

savephotos.append(photo)

return win, savephotos

if __name__ == '__main__':

imgdir = (len(sys.argv) > 1 and sys.argv[1]) or 'images'

main, save = viewer(imgdir, kind=Tk)

main.mainloop()

Notice how this code’s viewer must pass in the imgfile to the generated callback han-

dler with a default argument; because imgfile is a loop variable, all callbacks will have

its final loop iteration value if its current value is not saved this way (all buttons would

open the same image!). Also notice we keep a list of references to the photo image

objects; photos are erased when their object is garbage collected, even if they are cur-

rently being displayed. To avoid this, we generate references in a long-lived list.

Figure 8-45 shows the main thumbnail selection window generated by Example 8-45

when viewing the default images subdirectory in the examples source tree (resized here

for display). As in the previous examples, you can pass in an optional directory name

to run the viewer on a directory of your own (for instance, one copied from your digital

camera). Clicking a thumbnail button in the main window opens a corresponding im-

age in a pop-up window; Figure 8-46 captures one.

498 | Chapter 8: A tkinter Tour, Part 1

Figure 8-45. Simple thumbnail selection GUI, simple row frames

Figure 8-46. Thumbnail viewer pop-up image window

Viewing and Processing Images with PIL | 499

Much of Example 8-45’s code should be straightforward by now. It lays out thumbnail

buttons in row frames, much like prior examples (see the input forms layout alternatives

earlier in this chapter). Most of the PIL-specific code in this example is in the make

Thumbs function. It opens, creates, and saves the thumbnail image, unless one has al-

ready been saved (i.e., cached) to a local file. As coded, thumbnail images are saved in

the same image format as the original full-size photo.

We also use the PIL ANTIALIAS filter—the best quality for down-sampling (shrinking);

this does a better job on low-resolution GIFs. Thumbnail generation is essentially just

an in-place resize that preserves the original aspect ratio. Because there is more to this

story than we can cover here, though, I’ll defer to PIL and its documentation for more

details on that package’s API.

We’ll revisit thumbnail creation again briefly in the next chapter to create toolbar but-

tons. Before we move on, though, three variations on the thumbnail viewer are worth

quick consideration—the first underscores performance concepts and the others have

to do with improving on the arguably odd layout of Figure 8-45.

Performance: Saving thumbnail files

As is, the viewer saves the generated thumbnail image in a file, so it can be loaded

quickly the next time the script is run. This isn’t strictly required—Example 8-46, for

instance, customizes the thumbnail generation function to generate the thumbnail im-

ages in memory, but never save them.

There is no noticeable speed difference for very small image collections. If you run these

alternatives on larger image collections, though, you’ll notice that the original version

in Example 8-45 gains a big performance advantage by saving and loading the thumb-

nails to files. On one test with many large image files on my machine (some 320 images

from a digital camera memory stick and an admittedly underpowered laptop), the

original version opens the GUI in roughly just 5 seconds after its initial run to cache

thumbnails, compared to as much as 1 minute and 20 seconds for Example 8-46: a

factor of 16 slower. For thumbnails, loading from files is much quicker than

recalculation.

Example 8-46. PP4E\Gui\PIL\viewer-thumbs-nosave.py

"""

same, but make thumb images in memory without saving to or loading from files:

seems just as fast for small directories, but saving to files makes startup much

quicker for large image collections; saving may be needed in some apps (web pages)

"""

import os, sys

from PIL import Image

from tkinter import Tk

import viewer_thumbs

def makeThumbs(imgdir, size=(100, 100), subdir='thumbs'):

500 | Chapter 8: A tkinter Tour, Part 1

"""

create thumbs in memory but don't cache to files

"""

thumbs = []

for imgfile in os.listdir(imgdir):

imgpath = os.path.join(imgdir, imgfile)

try:

imgobj = Image.open(imgpath) # make new thumb

imgobj.thumbnail(size)

thumbs.append((imgfile, imgobj))

except:

print("Skipping: ", imgpath)

return thumbs

if __name__ == '__main__':

imgdir = (len(sys.argv) > 1 and sys.argv[1]) or 'images'

viewer_thumbs.makeThumbs = makeThumbs

main, save = viewer_thumbs.viewer(imgdir, kind=Tk)

main.mainloop()

Layout options: Gridding

The next variations on our viewer are purely cosmetic, but they illustrate tkinter layout

concepts. If you look at Figure 8-45 long enough, you’ll notice that its layout of thumb-

nails is not as uniform as it could be. Individual rows are fairly coherent because the

GUI is laid out by row frames, but columns can be misaligned badly due to differences

in image shape. Different packing options don’t seem to help (and can make matters

even more askew—try it), and arranging by column frames would just shift the problem

to another dimension. For larger collections, it could become difficult to locate and

open specific images.

With just a little extra work, we can achieve a more uniform layout by either laying out

the thumbnails in a grid, or using uniform fixed-size buttons. Example 8-47 positions

buttons in a row/column grid by using the tkinter grid geometry manager—a topic we

will explore in more detail in the next chapter, so like the canvas, you should consider

some of this code to be a preview and segue, too. In short, grid arranges its contents

by row and column; we’ll learn all about the stickiness of the Quit button here in

Chapter 9.

Example 8-47. PP4E\Gui\PIL\viewer-thumbs-grid.py

"""

same as viewer_thumbs, but uses the grid geometry manager to try to achieve

a more uniform layout; can generally achieve the same with frames and pack

if buttons are all fixed and uniform in size;

"""

import sys, math

from tkinter import *

from PIL.ImageTk import PhotoImage

from viewer_thumbs import makeThumbs, ViewOne

Viewing and Processing Images with PIL | 501

def viewer(imgdir, kind=Toplevel, cols=None):

"""

custom version that uses gridding

"""

win = kind()

win.title('Viewer: ' + imgdir)

thumbs = makeThumbs(imgdir)

if not cols:

cols = int(math.ceil(math.sqrt(len(thumbs)))) # fixed or N x N

rownum = 0

savephotos = []

while thumbs:

thumbsrow, thumbs = thumbs[:cols], thumbs[cols:]

colnum = 0

for (imgfile, imgobj) in thumbsrow:

photo = PhotoImage(imgobj)

link = Button(win, image=photo)

handler = lambda savefile=imgfile: ViewOne(imgdir, savefile)

link.config(command=handler)

link.grid(row=rownum, column=colnum)

savephotos.append(photo)

colnum += 1

rownum += 1

Button(win, text='Quit', command=win.quit).grid(columnspan=cols, stick=EW)

return win, savephotos

if __name__ == '__main__':

imgdir = (len(sys.argv) > 1 and sys.argv[1]) or 'images'

main, save = viewer(imgdir, kind=Tk)

main.mainloop()

Figure 8-47 displays the effect of gridding—our buttons line up in rows and columns

in a more uniform fashion than in Figure 8-45, because they are positioned by both row

and column, not just by rows. As we’ll see in the next chapter, gridding can help any

time our displays are two-dimensional by nature.

Layout options: Fixed-size buttons

Gridding helps—rows and columns align regularly now—but image shape still makes

this less than ideal. We can achieve a layout that is perhaps even more uniform than

gridding by giving each thumbnail button a fixed size. Buttons are sized to their images

(or text) by default, but we can always override this if needed. Example 8-48 does the

trick. It sets the height and width of each button to match the maximum dimension of

the thumbnail icon, so it is neither too thin nor too high. Assuming all thumbnails have

the same maximum dimension (something our thumb-maker ensures), this will achieve

the desired layout.

502 | Chapter 8: A tkinter Tour, Part 1

Example 8-48. PP4E\Gui\PIL\viewer-thumbs-fixed.py

"""

use fixed size for thumbnails, so align regularly; size taken from image

object, assume all same max; this is essentially what file selection GUIs do;

"""

import sys, math

from tkinter import *

from PIL.ImageTk import PhotoImage

from viewer_thumbs import makeThumbs, ViewOne

def viewer(imgdir, kind=Toplevel, cols=None):

"""

custom version that lays out with fixed-size buttons

"""

win = kind()

win.title('Viewer: ' + imgdir)

thumbs = makeThumbs(imgdir)

if not cols:

cols = int(math.ceil(math.sqrt(len(thumbs)))) # fixed or N x N

savephotos = []

while thumbs:

thumbsrow, thumbs = thumbs[:cols], thumbs[cols:]

row = Frame(win)

row.pack(fill=BOTH)

for (imgfile, imgobj) in thumbsrow:

size = max(imgobj.size) # width, height

photo = PhotoImage(imgobj)

link = Button(row, image=photo)

Figure 8-47. Gridded thumbnail selection GUI

Viewing and Processing Images with PIL | 503

handler = lambda savefile=imgfile: ViewOne(imgdir, savefile)

link.config(command=handler, width=size, height=size)

link.pack(side=LEFT, expand=YES)

savephotos.append(photo)

Button(win, text='Quit', command=win.quit, bg='beige').pack(fill=X)

return win, savephotos

if __name__ == '__main__':

imgdir = (len(sys.argv) > 1 and sys.argv[1]) or 'images'

main, save = viewer(imgdir, kind=Tk)

main.mainloop()

Figure 8-48 shows the results of applying a fixed size to our buttons; all are the same

size now, using a size taken from the images themselves. The effect is to display all

thumbnails as same-size tiles regardless of their shape, so they are easier to view. Nat-

urally, other layout schemes are possible as well; experiment with some of the config-

uration options in this code on your own to see their effect on the display.

Figure 8-48. Fixed-size thumbnail selection GUI, row frames

Scrolling and canvases (ahead)

The thumbnail viewer scripts presented in this section work well for reasonably sized

image directories, and you can use smaller thumbnail size settings for larger image

collections. Perhaps the biggest limitation of these programs, though, is that the

thumbnail windows they create will become too large to handle (or display at all) if the

image directory contains very many files.

504 | Chapter 8: A tkinter Tour, Part 1

Even with the sample images directory used for this book, we lost the Quit button at

the bottom of the display in the last two figures because there are too many thumbnail

images to show. To illustrate the difference, the original Example 8-45 packs the Quit

button first for this very reason—so it is clipped last, after all thumbnails, and thus

remains visible when there are many photos. We could do a similar thing for the other

versions, but we’d still lose thumbnails if there were too many. A directory from your

camera with many images might similarly produce a window too large to fit on your

computer’s screen.

To do better, we could arrange the thumbnails on a widget that supports scrolling. The

open source Pmw package includes a handy scrolled frame that may help. Moreover,

the standard tkinter Canvas widget gives us more control over image displays (including

placement by absolute pixel coordinates) and supports horizontal and vertical scrolling

of its content.

In fact, in the next chapter, we’ll code one final extension to our script which does just

that—it displays thumbnails in a scrolled canvas, and so it handles large collections

much better. Its thumbnail buttons are fixed-size as in our last example here, but are

positioned at computed coordinates. I’ll defer further details here, though, because

we’ll study that extension in conjunction with canvases in the next chapter. And in

Chapter 11, we’ll apply this technique to an even more full-featured image program

called PyPhoto.

To learn how these programs do their jobs, though, we need to move on to the next

chapter, and the second half of our widget tour.

Viewing and Processing Images with PIL | 505

CHAPTER 9

A tkinter Tour, Part 2

“On Today’s Menu: Spam, Spam, and Spam”

This chapter is the second in a two-part tour of the tkinter library. It picks up where

Chapter 8 left off and covers some of the more advanced widgets and tools in the tkinter

arsenal. Among the topics presented in this chapter:

•Menu, Menubutton, and OptionMenu widgets

• The Scrollbar widget: for scrolling text, lists, and canvases

• The Listbox widget: a list of multiple selections

• The Text widget: a general text display and editing tool

• The Canvas widget: a general graphical drawing tool

• The grid table-based geometry manager

• Time-based tools: after, update, wait, and threads

• Basic tkinter animation techniques

• Clipboards, erasing widgets and windows, and so on

By the time you’ve finished this chapter, you will have seen the bulk of the tkinter

library, and you will have all the information you need to compose larger, portable user

interfaces of your own. You’ll also be ready to tackle the larger GUI techniques and

more complete examples presented in Chapters 10 and 11. For now, let’s resume the

widget show.

Menus

Menus are the pull-down lists you’re accustomed to seeing at the top of a window (or

the entire display, if you’re accustomed to seeing them that way on a Macintosh). Move

the mouse cursor to the menu bar at the top and click on a name (e.g., File), and a list

of selectable options pops up under the name you clicked (e.g., Open, Save). The op-

tions within a menu might trigger actions, much like clicking on a button; they may

507

also open other “cascading” submenus that list more options, pop up dialog windows,

and so on. In tkinter, there are two kinds of menus you can add to your scripts: top-

level window menus and frame-based menus. The former option is better suited to

whole windows, but the latter also works as a nested component.

Top-Level Window Menus

In all recent Python releases (using Tk 8.0 and later), you can associate a horizontal

menu bar with a top-level window object (e.g., a Tk or Toplevel). On Windows and

Unix (X Windows), this menu bar is displayed along the top of the window; on some

Macintosh machines, this menu replaces the one shown at the top of the screen when

the window is selected. In other words, window menus look like you would expect on

whatever underlying platform your script runs upon.

This scheme is based on building trees of Menu w i d g e t o b j e c t s . S i m p l y a s s o c i a t e o n e t o p -

level Menu with the window, add other pull-down Menu objects as cascades of the top-

level Menu, and add entries to each of the pull-down objects. Menus are cross-linked with

the next higher level, by using parent widget arguments and the Menu widget’s

add_cascade method. It works like this:

1. Create a topmost Menu as the child of the window widget and configure the win-

dow’s menu attribute to be the new Menu.

2. For each pull-down object, make a new Menu as the child of the topmost Menu and

add the child as a cascade of the topmost Menu using add_cascade.

3. Add menu selections to each pull-down Menu f r o m s t e p 2 , u s i n g t h e command o p t i o n s

of add_command to register selection callback handlers.

4. Add a cascading submenu by making a new Menu a s t h e c h i l d o f t h e Menu t h e c a s c a d e

extends and using add_cascade to link the parent to the child.

The end result is a tree of Menu w i d g e t s w i t h a s s o c i a t e d command c a l l b a c k h a n d l e r s . T h i s

is probably simpler in code than in words, though. Example 9-1 makes a main menu

with two pull downs, File and Edit; the Edit pull down in turn has a nested submenu

of its own.

Example 9-1. PP4E\Gui\Tour\menu_win.py

# Tk8.0 style top-level window menus

from tkinter import * # get widget classes

from tkinter.messagebox import * # get standard dialogs

def notdone():

showerror('Not implemented', 'Not yet available')

def makemenu(win):

top = Menu(win) # win=top-level window

win.config(menu=top) # set its menu option

508 | Chapter 9: A tkinter Tour, Part 2

file = Menu(top)

file.add_command(label='New...', command=notdone, underline=0)

file.add_command(label='Open...', command=notdone, underline=0)

file.add_command(label='Quit', command=win.quit, underline=0)

top.add_cascade(label='File', menu=file, underline=0)

edit = Menu(top, tearoff=False)

edit.add_command(label='Cut', command=notdone, underline=0)

edit.add_command(label='Paste', command=notdone, underline=0)

edit.add_separator()

top.add_cascade(label='Edit', menu=edit, underline=0)

submenu = Menu(edit, tearoff=True)

submenu.add_command(label='Spam', command=win.quit, underline=0)

submenu.add_command(label='Eggs', command=notdone, underline=0)

edit.add_cascade(label='Stuff', menu=submenu, underline=0)

if __name__ == '__main__':

root = Tk() # or Toplevel()

root.title('menu_win') # set window-mgr info

makemenu(root) # associate a menu bar

msg = Label(root, text='Window menu basics') # add something below

msg.pack(expand=YES, fill=BOTH)

msg.config(relief=SUNKEN, width=40, height=7, bg='beige')

root.mainloop()

A lot of code in this file is devoted to setting callbacks and such, so it might help to

isolate the bits involved with the menu tree-building process. For the File menu, it’s

done like this:

top = Menu(win) # attach Menu to window

win.config(menu=top) # cross-link window to menu

file = Menu(top) # attach a Menu to top Menu

top.add_cascade(label='File', menu=file) # cross-link parent to child

Apart from building up the menu object tree, this script also demonstrates some of the

most common menu configuration options:

Separator lines

The script makes a separator in the Edit menu with add_separator; it’s just a line

used to set off groups of related entries.

Tear-offs

The script also disables menu tear-offs in the Edit pull down by passing a

tearoff=False widget option to Menu. Tear-offs are dashed lines that appear by

default at the top of tkinter menus and create a new window containing the menu’s

contents when clicked. They can be a convenient shortcut device (you can click

items in the tear-off window right away, without having to navigate through menu

trees), but they are not widely used on all platforms.

Keyboard shortcuts

The script uses the underline option to make a unique letter in a menu entry a

keyboard shortcut. It gives the offset of the shortcut letter in the entry’s label string.

Menus | 509

On Windows, for example, the Quit option in this script’s File menu can be se-

lected with the mouse but also by pressing Alt, then “f,” and then “q.” You don’t

strictly have to use underline—on Windows, the first letter of a pull-down name

is a shortcut automatically, and arrow and Enter keys can be used to select pull-

down items. But explicit keys can enhance usability in large menus; for instance,

the key sequence Alt-E-S-S runs the quit action in this script’s nested submenu.

Let’s see what this translates to in the realm of the pixel. Figure 9-1 shows the window

that first appears when this script is run on Windows 7 with my system settings; it looks

different, but similar, on Unix, Macintosh, and other Windows configurations.

Figure 9-1. menu_win: a top-level window menu bar

Figure 9-2 shows the scene when the File pull down is selected. Notice that Menu widgets

are linked, not packed (or gridded)—the geometry manager doesn’t really come into

play here. If you run this script, you’ll also notice that all of its menu entries either quit

the program immediately or pop up a “Not Implemented” standard error dialog. This

example is about menus, after all, but menu selection callback handlers generally do

more useful work in practice.

Figure 9-2. The File menu pull down

510 | Chapter 9: A tkinter Tour, Part 2

And finally, Figure 9-3 shows what happens after clicking the File menu’s tear-off line

and selecting the cascading submenu in the Edit pull down. Cascades can be nested as

deep as you like (though your users probably won’t be happy if this gets silly).

In tkinter, every top-level window can have a menu bar, including pop ups you create

with the Toplevel widget. Example 9-2 makes three pop-up windows with the same

menu bar as the one we just met; when run, it constructs the scene in Figure 9-4.

Figure 9-3. A File tear-off and Edit cascade

Figure 9-4. Multiple Toplevels with menus

Example 9-2. PP4E\Gui\Tour\menu_win-multi.py

from menu_win import makemenu # reuse menu maker function

from tkinter import *

root = Tk()

for i in range(3): # three pop-up windows with menus

win = Toplevel(root)

makemenu(win)

Menus | 511

Label(win, bg='black', height=5, width=25).pack(expand=YES, fill=BOTH)

Button(root, text="Bye", command=root.quit).pack()

root.mainloop()

Frame- and Menubutton-Based Menus

Although these are less commonly used for top-level windows, it’s also possible to

create a menu bar as a horizontal Frame. Before I show you how, though, let me explain

why you should care. Because this frame-based scheme doesn’t depend on top-level

window protocols, it can also be used to add menus as nested components of larger

displays. In other words, it’s not just for top-level windows. For example, Chap-

ter 11’s PyEdit text editor can be used both as a program and as an attachable compo-

nent. We’ll use window menus to implement PyEdit selections when PyEdit is run as

a standalone program, but we’ll use frame-based menus when PyEdit is embedded in

the PyMailGUI and PyView displays. Both schemes are worth knowing.

Frame-based menus require a few more lines of code, but they aren’t much more com-

plex than window menus. To make one, simply pack Menubutton widgets within a

Frame container, associate Menu widgets with the Menubuttons, and associate the Frame

with the top of a container window. Example 9-3 creates the same menu as Exam-

ple 9-2, but using the frame-based approach.

Example 9-3. PP4E\Gui\Tour\menu_frm.py

# Frame-based menus: for top-levels and components

from tkinter import * # get widget classes

from tkinter.messagebox import * # get standard dialogs

def notdone():

showerror('Not implemented', 'Not yet available')

def makemenu(parent):

menubar = Frame(parent) # relief=RAISED, bd=2...

menubar.pack(side=TOP, fill=X)

fbutton = Menubutton(menubar, text='File', underline=0)

fbutton.pack(side=LEFT)

file = Menu(fbutton)

file.add_command(label='New...', command=notdone, underline=0)

file.add_command(label='Open...', command=notdone, underline=0)

file.add_command(label='Quit', command=parent.quit, underline=0)

fbutton.config(menu=file)

ebutton = Menubutton(menubar, text='Edit', underline=0)

ebutton.pack(side=LEFT)

edit = Menu(ebutton, tearoff=False)

edit.add_command(label='Cut', command=notdone, underline=0)

edit.add_command(label='Paste', command=notdone, underline=0)

edit.add_separator()

ebutton.config(menu=edit)

512 | Chapter 9: A tkinter Tour, Part 2

submenu = Menu(edit, tearoff=True)

submenu.add_command(label='Spam', command=parent.quit, underline=0)

submenu.add_command(label='Eggs', command=notdone, underline=0)

edit.add_cascade(label='Stuff', menu=submenu, underline=0)

return menubar

if __name__ == '__main__':

root = Tk() # or TopLevel or Frame

root.title('menu_frm') # set window-mgr info

makemenu(root) # associate a menu bar

msg = Label(root, text='Frame menu basics') # add something below

msg.pack(expand=YES, fill=BOTH)

msg.config(relief=SUNKEN, width=40, height=7, bg='beige')

root.mainloop()

Again, let’s isolate the linkage logic here to avoid getting distracted by other details.

For the File menu case, here is what this boils down to:

menubar = Frame(parent) # make a Frame for the menubar

fbutton = Menubutton(menubar, text='File') # attach a Menubutton to Frame

file = Menu(fbutton) # attach a Menu to Menubutton

fbutton.config(menu=file) # crosslink button to menu

There is an extra Menubutton widget in this scheme, but it’s not much more complex

than making top-level window menus. Figures 9-5 and 9-6 show this script in action

on Windows.

Figure 9-5. menu_frm: Frame and Menubutton menu bar

The menu widgets in this script provide a default set of event bindings that automati-

cally pop up menus when selected with a mouse. This doesn’t look or behave exactly

like the top-level window menu scheme shown earlier, but it is close, can be configured

in any way that frames can (i.e., with colors and borders), and will look similar on every

platform (though this may or may not be a feature in all contexts).

The biggest advantage of frame-based menu bars, though, is that they can also be at-

tached as nested components in larger displays. Example 9-4 and its resulting interface

(Figure 9-7) show how—both menu bars are completely functional in the same single

window.

Menus | 513

Example 9-4. PP4E\Gui\Tour\menu_frm-multi.py

from menu_frm import makemenu # can't use menu_win here--one window

from tkinter import * # but can attach frame menus to windows

root = Tk()

for i in range(2): # 2 menus nested in one window

mnu = makemenu(root)

mnu.config(bd=2, relief=RAISED)

Label(root, bg='black', height=5, width=25).pack(expand=YES, fill=BOTH)

Button(root, text="Bye", command=root.quit).pack()

root.mainloop()

Figure 9-7. Multiple Frame menus on one window

Figure 9-6. With the Edit menu selected

514 | Chapter 9: A tkinter Tour, Part 2

Because they are not tied to the enclosing window, frame-based menus can also be used

as part of another attachable component’s widget package. For example, the menu-

embedding behavior in Example 9-5 works even if the menu’s parent is another

Frame container and not the top-level window; this script is similar to the prior, but

creates three fully functional menu bars attached to frames nested in a window.

Example 9-5. PP4E\Gui\Tour\menu_frm-multi2.py

from menu_frm import makemenu # can't use menu_win here--root=Frame

from tkinter import *

root = Tk()

for i in range(3): # three menus nested in the containers

frm = Frame()

mnu = makemenu(frm)

mnu.config(bd=2, relief=RAISED)

frm.pack(expand=YES, fill=BOTH)

Label(frm, bg='black', height=5, width=25).pack(expand=YES, fill=BOTH)

Button(root, text="Bye", command=root.quit).pack()

root.mainloop()

Using Menubuttons and Optionmenus

In fact, menus based on Menubutton are even more general than Example 9-3 implies—

they can actually show up anywhere on a display that normal buttons can, not just

within a menu bar Frame. Example 9-6 makes a Menubutton pull-down list that simply

shows up by itself, attached to the root window; Figure 9-8 shows the GUI it produces.

Example 9-6. PP4E\Gui\Tour\mbutton.py

from tkinter import *

root = Tk()

mbutton = Menubutton(root, text='Food') # the pull-down stands alone

picks = Menu(mbutton)

mbutton.config(menu=picks)

picks.add_command(label='spam', command=root.quit)

picks.add_command(label='eggs', command=root.quit)

picks.add_command(label='bacon', command=root.quit)

mbutton.pack()

mbutton.config(bg='white', bd=4, relief=RAISED)

root.mainloop()

The related tkinter Optionmenu widget displays an item selected from a pull-down menu.

It’s roughly like a Menubutton plus a display label, and it displays a menu of choices

when clicked, but you must link tkinter variables (described in Chapter 8) to fetch the

choice after the fact instead of registering callbacks, and menu entries are passed as

arguments in the widget constructor call after the variable.

Example 9-7 illustrates typical Optionmenu usage and builds the interface captured in

Figure 9-9. Clicking on either of the first two buttons opens a pull-down menu of

Menus | 515

options; clicking on the third “state” button fetches and prints the current values dis-

played in the first two.

Example 9-7. PP4E\Gui\Tour\optionmenu.py

from tkinter import *

root = Tk()

var1 = StringVar()

var2 = StringVar()

opt1 = OptionMenu(root, var1, 'spam', 'eggs', 'toast') # like Menubutton

opt2 = OptionMenu(root, var2, 'ham', 'bacon', 'sausage') # but shows choice

opt1.pack(fill=X)

opt2.pack(fill=X)

var1.set('spam')

var2.set('ham')

def state(): print(var1.get(), var2.get()) # linked variables

Button(root, command=state, text='state').pack()

root.mainloop()

Figure 9-9. An Optionmenu at work

There are other menu-related topics that we’ll skip here in the interest of space. For

instance, scripts can add entries to system menus and can generate pop-up menus

(posted in response to events, without an associated button). Refer to Tk and tkinter

resources for more details on this front.

Figure 9-8. A Menubutton all by itself

516 | Chapter 9: A tkinter Tour, Part 2

In addition to simple selections and cascades, menus can also contain disabled entries,

check button and radio button selections, and bitmap and photo images. The next

section demonstrates how some of these special menu entries are programmed.

Windows with Both Menus and Toolbars

Besides showing a menu at the top, it is common for windows to display a row of

buttons at the bottom. This bottom button row is usually called a toolbar, and it often

contains shortcuts to items also available in the menus at the top. It’s easy to add a

toolbar to windows in tkinter—simply pack buttons (and other kinds of widgets) into

a frame, pack the frame on the bottom of the window, and set it to expand horizontally

only. This is really just hierarchical GUI layout at work again, but make sure to pack

toolbars (and frame-based menu bars) early so that other widgets in the middle of the

display are clipped first when the window shrinks; you usually want your tool and menu

bars to outlive other widgets.

Example 9-8 shows one way to go about adding a toolbar to a window. It also dem-

onstrates how to add photo images in menu entries (set the image attribute to a Photo

Image object) and how to disable entries and give them a grayed-out appearance (call

the menu entryconfig method with the index of the item to disable, starting from 1).

Notice that PhotoImage objects are saved as a list; remember, unlike other widgets, these

go away if you don’t hold on to them (see Chapter 8 if you need a refresher).

Example 9-8. PP4E\Gui\Tour\menuDemo.py

#!/usr/local/bin/python

"""

Tk8.0 style main window menus

menu/tool bars packed before middle, fill=X (pack first=clip last);

adds photo menu entries; see also: add_checkbutton, add_radiobutton

"""

from tkinter import * # get widget classes

from tkinter.messagebox import * # get standard dialogs

class NewMenuDemo(Frame): # an extended frame

def __init__(self, parent=None): # attach to top-level?

Frame.__init__(self, parent) # do superclass init

self.pack(expand=YES, fill=BOTH)

self.createWidgets() # attach frames/widgets

self.master.title("Toolbars and Menus") # set window-manager info

self.master.iconname("tkpython") # label when iconified

def createWidgets(self):

self.makeMenuBar()

self.makeToolBar()

L = Label(self, text='Menu and Toolbar Demo')

L.config(relief=SUNKEN, width=40, height=10, bg='white')

L.pack(expand=YES, fill=BOTH)

Menus | 517

def makeToolBar(self):

toolbar = Frame(self, cursor='hand2', relief=SUNKEN, bd=2)

toolbar.pack(side=BOTTOM, fill=X)

Button(toolbar, text='Quit', command=self.quit ).pack(side=RIGHT)

Button(toolbar, text='Hello', command=self.greeting).pack(side=LEFT)

def makeMenuBar(self):

self.menubar = Menu(self.master)

self.master.config(menu=self.menubar) # master=top-level window

self.fileMenu()

self.editMenu()

self.imageMenu()

def fileMenu(self):

pulldown = Menu(self.menubar)

pulldown.add_command(label='Open...', command=self.notdone)

pulldown.add_command(label='Quit', command=self.quit)

self.menubar.add_cascade(label='File', underline=0, menu=pulldown)

def editMenu(self):

pulldown = Menu(self.menubar)

pulldown.add_command(label='Paste', command=self.notdone)

pulldown.add_command(label='Spam', command=self.greeting)

pulldown.add_separator()

pulldown.add_command(label='Delete', command=self.greeting)

pulldown.entryconfig(4, state=DISABLED)

self.menubar.add_cascade(label='Edit', underline=0, menu=pulldown)

def imageMenu(self):

photoFiles = ('ora-lp4e.gif', 'pythonPowered.gif', 'python_conf_ora.gif')

pulldown = Menu(self.menubar)

self.photoObjs = []

for file in photoFiles:

img = PhotoImage(file='../gifs/' + file)

pulldown.add_command(image=img, command=self.notdone)

self.photoObjs.append(img) # keep a reference

self.menubar.add_cascade(label='Image', underline=0, menu=pulldown)

def greeting(self):

showinfo('greeting', 'Greetings')

def notdone(self):

showerror('Not implemented', 'Not yet available')

def quit(self):

if askyesno('Verify quit', 'Are you sure you want to quit?'):

Frame.quit(self)

if __name__ == '__main__': NewMenuDemo().mainloop() # if I'm run as a script

When run, this script generates the scene in Figure 9-10 at first. Figure 9-11 shows this

window after being stretched a bit, with its Image menu torn off and its Edit menu

selected. The toolbar at the bottom grows horizontally with the window but not ver-

tically. For emphasis, this script also sets the cursor to change to a hand when moved

over the toolbar at the bottom. Run this on your own to get a better feel for its behavior.

518 | Chapter 9: A tkinter Tour, Part 2

Figure 9-10. menuDemo: menus and toolbars

Figure 9-11. Images and tear-offs on the job

Menus | 519

Using images in toolbars, too

As shown in Figure 9-11, it’s easy to use images for menu items. Although not used in

Example 9-8, toolbar items can be pictures too, just like the Image menu’s items—

simply associate small images with toolbar frame buttons, just as we did in the image

button examples we wrote in the last part of Chapter 8. If you create toolbar images

manually ahead of time, it’s simple to associate them with buttons as we’ve learned. In

fact, it’s not much more work to build them dynamically—the PIL-based thumbnail

image construction skills we developed in the prior chapter might come in handy in

this context as well.

To illustrate, make sure you’ve installed the PIL extension, and replace the toolbar

construction method of Example 9-8 with the following (I’ve done this in file menu-

Demo2.py in the examples distribution so you can run and experiment on your own):

# resize toolbar images on the fly with PIL

def makeToolBar(self, size=(40, 40)):

from PIL.ImageTk import PhotoImage, Image # if jpegs or make new thumbs

imgdir = r'../PIL/images/'

toolbar = Frame(self, cursor='hand2', relief=SUNKEN, bd=2)

toolbar.pack(side=BOTTOM, fill=X)

photos = 'ora-lp4e-big.jpg', 'PythonPoweredAnim.gif', 'python_conf_ora.gif'

self.toolPhotoObjs = []

for file in photos:

imgobj = Image.open(imgdir + file) # make new thumb

imgobj.thumbnail(size, Image.ANTIALIAS) # best downsize filter

img = PhotoImage(imgobj)

btn = Button(toolbar, image=img, command=self.greeting)

btn.config(relief=RAISED, bd=2)

btn.config(width=size[0], height=size[1])

btn.pack(side=LEFT)

self.toolPhotoObjs.append((img, imgobj)) # keep a reference

Button(toolbar, text='Quit', command=self.quit).pack(side=RIGHT, fill=Y)

When run, this alternative creates the window captured in Figure 9-12—the three im-

age options available in the Image menu at the top of the window are now also buttons

in the toolbar at the bottom, along with a simple text button for quitting on the right.

As before, the cursor becomes a hand over the toolbar.

You don’t need PIL at all if you’re willing to use GIF or supported bitmap images that

you create by hand manually—simply load by filename using the standard tkinter

photo object, as shown by the following alternative coding for the toolbar construction

method (this is file menuDemo3.py in the examples distribution if you’re keeping

scope):

# use unresized gifs with standard tkinter

def makeToolBar(self, size=(30, 30)):

imgdir = r'../gifs/'

toolbar = Frame(self, cursor='hand2', relief=SUNKEN, bd=2)

toolbar.pack(side=BOTTOM, fill=X)

520 | Chapter 9: A tkinter Tour, Part 2

photos = 'ora-lp4e.gif', 'pythonPowered.gif', 'python_conf_ora.gif'

self.toolPhotoObjs = []

for file in photos:

img = PhotoImage(file=imgdir + file)

btn = Button(toolbar, image=img, command=self.greeting)

btn.config(bd=5, relief=RIDGE)

btn.config(width=size[0], height=size[1])

btn.pack(side=LEFT)

self.toolPhotoObjs.append(img) # keep a reference

Button(toolbar, text='Quit', command=self.quit).pack(side=RIGHT, fill=Y)

Figure 9-12. menuDemo2: images in the toolbar with PIL

When run, this alternative uses GIF images, and renders the window grabbed in Fig-

ure 9-13. Depending on your user’s preferences, you might want to resize the GIF

images used here for this role with other tools; we only get part of unresized photos in

the fixed-width buttons, which may or may not be enough.

As is, this is something of a first cut solution to toolbar image buttons. There are many

ways to configure such image buttons. Since we’re going to see PIL in action again later

in this chapter when we explore canvases, though, we’ll leave further extensions in the

suggested exercise column.

Automating menu construction

Menus are a powerful tkinter interface device. If you’re like me, though, the examples

in this section probably seem like a lot of work. Menu construction can be both code

intensive and error prone if done by calling tkinter methods directly. A better approach

might automatically build and link up menus from a higher-level description of their

contents. In fact, in Chapter 10, we’ll meet a tool called GuiMixin that automates the

menu construction process, given a data structure that contains all menus desired. As

Menus | 521

an added bonus, it supports both window and frame-style menus, so it can be used by

both standalone programs and nested components. Although it’s important to know

the underlying calls used to make menus, you don’t necessarily have to remember them

for long.

Listboxes and Scrollbars

Let’s rejoin our widget tour. Listbox widgets allow you to display a list of items for

selection, and Scrollbars are designed for navigating through the contents of other

widgets. Because it is common to use these widgets together, we’ll study them both at

once. Example 9-9 builds both a Listbox and a Scrollbar, as a packaged set.

Example 9-9. PP4E\Gui\Tour\scrolledlist.py

"a simple customizable scrolled listbox component"

from tkinter import *

class ScrolledList(Frame):

def __init__(self, options, parent=None):

Frame.__init__(self, parent)

self.pack(expand=YES, fill=BOTH) # make me expandable

self.makeWidgets(options)

def handleList(self, event):

index = self.listbox.curselection() # on list double-click

label = self.listbox.get(index) # fetch selection text

self.runCommand(label) # and call action here

# or get(ACTIVE)

def makeWidgets(self, options):

sbar = Scrollbar(self)

Figure 9-13. menuDemo3: unresized GIF images in the toolbar

522 | Chapter 9: A tkinter Tour, Part 2

list = Listbox(self, relief=SUNKEN)

sbar.config(command=list.yview) # xlink sbar and list

list.config(yscrollcommand=sbar.set) # move one moves other

sbar.pack(side=RIGHT, fill=Y) # pack first=clip last

list.pack(side=LEFT, expand=YES, fill=BOTH) # list clipped first

pos = 0

for label in options: # add to listbox

list.insert(pos, label) # or insert(END,label)

pos += 1 # or enumerate(options)

#list.config(selectmode=SINGLE, setgrid=1) # select,resize modes

list.bind('<Double-1>', self.handleList) # set event handler

self.listbox = list

def runCommand(self, selection): # redefine me lower

print('You selected:', selection)

if __name__ == '__main__':

options = (('Lumberjack-%s' % x) for x in range(20)) # or map/lambda, [...]

ScrolledList(options).mainloop()

This module can be run standalone to experiment with these widgets, but it is also

designed to be useful as a library object. By passing in different selection lists to the

options argument and redefining the runCommand method in a subclass, the Scrolled

List component class defined here can be reused anytime you need to display a scrol-

lable list. In fact, we’ll be reusing it this way in Chapter 11’s PyEdit program. With just

a little forethought, it’s easy to extend the tkinter library with Python classes this way.

When run standalone, this script generates the window in Figure 9-14, shown here

with Windows 7 look-and-feel. It’s a Frame, with a Listbox on its left containing 20

generated entries (the fifth has been clicked), along with an associated Scrollbar on its

right for moving through the list. If you move the scroll, the list moves, and vice versa.

Figure 9-14. scrolledlist at the top

Listboxes and Scrollbars | 523

Programming Listboxes

Listboxes are straightforward to use, but they are populated and processed in somewhat

unique ways compared to the widgets we’ve seen so far. Many listbox calls accept a

passed-in index to refer to an entry in the list. Indexes start at integer 0 and grow higher,

but tkinter also accepts special name strings in place of integer offsets: end to refer to

the end of the list, active to denote the line selected, and more. This generally yields

more than one way to code listbox calls.

For instance, this script adds items to the listbox in this window by calling its insert

method, with successive offsets (starting at zero—something the enumerate built-in

could automate for us):

list.insert(pos, label)

pos += 1

But you can also fill a list by simply adding items at the end without keeping a position

counter at all, with either of these statements:

list.insert('end', label) # add at end: no need to count positions

list.insert(END, label) # END is preset to 'end' inside tkinter

The listbox widget doesn’t have anything like the command option we use to register

callback handlers for button presses, so you either need to fetch listbox selections while

processing other widgets’ events (e.g., a button press elsewhere in the GUI) or tap into

other event protocols to process user selections. To fetch a selected value, this script

binds the <Double-1> left mouse button double-click event to a callback handler method

with bind (seen earlier on this tour).

In the double-click handler, this script grabs the selected item out of the listbox with

this pair of listbox method calls:

index = self.listbox.curselection() # get selection index

label = self.listbox.get(index) # fetch text by its index

Here, too, you can code this differently. Either of the following lines has the same effect;

they get the contents of the line at index 'active'—the one selected:

label = self.listbox.get('active') # fetch from active index

label = self.listbox.get(ACTIVE) # ACTIVE='active' in tkinter

For illustration purposes, the class’s default runCommand method prints the value selec-

ted each time you double-click an entry in the list—as fetched by this script, it comes

back as a string reflecting the text in the selected entry:

C:\...\PP4E\Gui\Tour> python scrolledlist.py

You selected: Lumberjack-2

You selected: Lumberjack-19

You selected: Lumberjack-4

You selected: Lumberjack-12

Listboxes can also be useful input devices even without attached scroll bars; they accept

color, font, and relief configuration options. They also support both single and multiple

524 | Chapter 9: A tkinter Tour, Part 2

selection modes. The default mode allows only a single item to be selected, but the

selectmode argument supports four settings: SINGLE, BROWSE, MULTIPLE, and EXTENDED

(the default is BROWSE). Of these, the first two are single selection modes, and the last

two allow multiple items to be selected.

These modes vary in subtle ways. For instance, BROWSE is like SINGLE, but it also allows

the selection to be dragged. Clicking an item in MULTIPLE mode toggles its state without

affecting other selected items. And the EXTENDED mode allows for multiple selections

and works like the Windows file explorer GUI—you select one item with a simple click,

multiple items with a Ctrl-click combination, and ranges of items with Shift-clicks.

Multiple selections can be programmed with code of this sort:

listbox = Listbox(window, bg='white', font=('courier', fontsz))

listbox.config(selectmode=EXTENDED)

listbox.bind('<Double-1>', (lambda event: onDoubleClick()))

# onDoubeClick: get messages selected in listbox

selections = listbox.curselection() # tuple of digit strs, 0..N-1

selections = [int(x)+1 for x in selections] # convert to ints, make 1..N

When multiple selections are enabled, the curselection method returns a list of digit

strings giving the relative numbers of the items selected, or it returns an empty tuple if

none is selected. Really, this method always returns a tuple of digit strings, even in

single selection mode (we don’t care in Example 9-9, because the get method does the

right thing for a one-item tuple, when fetching a value out of the listbox).

You can experiment with the selection alternatives on your own by uncommenting the

selectmode setting in Example 9-9 and changing its value. You may get an error on

double-clicks in multiple selection modes, though, because the get method will be

passed a tuple of more than one selection index (print it out to see for yourself). We’ll

see multiple selections in action in the PyMailGUI example later in this book (Chap-

ter 14), so I’ll pass on further examples here.

Programming Scroll Bars

Perhaps the deepest magic in the Example 9-9 script, though, boils down to two lines

of code:

sbar.config(command=list.yview) # call list.yview when I move

list.config(yscrollcommand=sbar.set) # call sbar.set when I move

The scroll bar and listbox are effectively cross-linked to each other through these con-

figuration options; their values simply refer to bound widget methods of the other. By

linking like this, tkinter automatically keeps the two widgets in sync with each other

as they move. Here’s how this works:

• Moving a scroll bar invokes the callback handler registered with its command option.

Here, list.yview refers to a built-in listbox method that adjusts the listbox display

proportionally, based on arguments passed to the handler.

Listboxes and Scrollbars | 525

• Moving a listbox vertically invokes the callback handler registered with its yscroll

command option. In this script, the sbar.set built-in method adjusts a scroll bar

proportionally.

In other words, moving one automatically moves the other. It turns out that every

scrollable object in tkinter—Listbox, Entry, Text, and Canvas—has built-in yview and

xview methods to process incoming vertical and horizontal scroll callbacks, as well as

yscrollcommand and xscrollcommand options for specifying an associated scroll bar’s

callback handler to invoke. All scroll bars have a command option, to name an associated

widget’s handler to be called on moves. Internally, tkinter passes information to all of

these methods, and that information specifies their new position (e.g., “go 10 percent

down from the top”), but your scripts usually need never deal with that level of detail.

Because the scroll bar and listbox have been cross-linked in their option settings, mov-

ing the scroll bar automatically moves the list, and moving the list automatically moves

the scroll bar. To move the scroll bar, either drag the solid part or click on its arrows

or empty areas. To move the list, click on the list and either use your arrow keys or

move the mouse pointer above or below the listbox without releasing the mouse button.

In all cases, the list and scroll bar move in unison. Figure 9-15 shows the scene after

expanding the window and moving down a few entries in the list, one way or another.

Figure 9-15. scrolledlist in the middle

Packing Scroll Bars

Finally, remember that widgets packed last are always clipped first when a window is

shrunk. Because of that, it’s important to pack scroll bars in a display as soon as possible

so that they are the last to go when the window becomes too small for everything. You

can generally make do with less than complete listbox text, but the scroll bar is crucial

526 | Chapter 9: A tkinter Tour, Part 2

for navigating through the list. As Figure 9-16 shows, shrinking this script’s window

cuts out part of the list but retains the scroll bar.

Figure 9-16. scrolledlist gets small

At the same time, you don’t generally want a scroll bar to expand with a window, so

be sure to pack it with just a fill=Y (or fill=X for a horizontal scroll) and not an

expand=YES. Expanding this example’s window in Figure 9-15, for instance, made the

listbox grow along with the window, but it kept the scroll bar attached to the right and

kept it the same size.

We’ll see both scroll bars and listboxes repeatedly in later examples in this and later

chapters (flip ahead to examples for PyEdit, PyMailGUI, PyForm, PyTree, and ShellGui

for more). And although the example script in this section captures the fundamentals,

I should point out that there is more to both scroll bars and listboxes than meets the

eye here.

For example, it’s just as easy to add horizontal scroll bars to scrollable widgets. They

are programmed almost exactly like the vertical one implemented here, but callback

handler names start with “x,” not “y” (e.g., xscrollcommand), and an

orient='horizontal' configuration option is set for the scroll bar object. To add both

vertical and horizontal scrolls and to crosslink their motions, you would use the fol-

lowing sort of code:

window = Frame(self)

vscroll = Scrollbar(window)

hscroll = Scrollbar(window, orient='horizontal')

listbox = Listbox(window)

# move listbox when scroll moved

vscroll.config(command=listbox.yview, relief=SUNKEN)

hscroll.config(command=listbox.xview, relief=SUNKEN)

# move scroll when listbox moved

listbox.config(yscrollcommand=vscroll.set, relief=SUNKEN)

listbox.config(xscrollcommand=hscroll.set)

See the image viewer canvas later in this chapter, as well as the PyEdit, PyTree, and

PyMailGUI programs later in this book, for examples of horizontal scroll bars at work.

Scroll bars see more kinds of GUI action too—they can be associated with other kinds

of widgets in the tkinter library. For instance, it is common to attach one to the Text

Listboxes and Scrollbars | 527

widget. Not entirely by coincidence, this brings us to the next point of interest on our

widget tour.

Text

It’s been said that tkinter’s strongest points may be its Text and Canvas widgets. Both

provide a remarkable amount of functionality. For instance, the tkinter Text widget

was powerful enough to implement the web pages of Grail, an experimental web

browser coded in Python; Text supports complex font-style settings, embedded images,

unlimited undo and redo, and much more. The tkinter Canvas widget, a general-purpose

drawing device, allows for efficient free-form graphics and has been the basis of so-

phisticated image-processing and visualization applications.

In Chapter 11, we’ll put these two widgets to use to implement text editors (PyEdit),

paint programs (PyDraw), clock GUIs (PyClock), and image programs (PyPhoto and

PyView). For the purposes of this tour chapter, though, let’s start out using these

widgets in simpler ways. Example 9-10 implements a simple scrolled-text display,

which knows how to fill its display with a text string or file.

Example 9-10. PP4E\Gui\Tour\scrolledtext.py

"a simple text or file viewer component"

print('PP4E scrolledtext')

from tkinter import *

class ScrolledText(Frame):

def __init__(self, parent=None, text='', file=None):

Frame.__init__(self, parent)

self.pack(expand=YES, fill=BOTH) # make me expandable

self.makewidgets()

self.settext(text, file)

def makewidgets(self):

sbar = Scrollbar(self)

text = Text(self, relief=SUNKEN)

sbar.config(command=text.yview) # xlink sbar and text

text.config(yscrollcommand=sbar.set) # move one moves other

sbar.pack(side=RIGHT, fill=Y) # pack first=clip last

text.pack(side=LEFT, expand=YES, fill=BOTH) # text clipped first

self.text = text

def settext(self, text='', file=None):

if file:

text = open(file, 'r').read()

self.text.delete('1.0', END) # delete current text

self.text.insert('1.0', text) # add at line 1, col 0

self.text.mark_set(INSERT, '1.0') # set insert cursor

self.text.focus() # save user a click

def gettext(self): # returns a string

528 | Chapter 9: A tkinter Tour, Part 2

return self.text.get('1.0', END+'-1c') # first through last

if __name__ == '__main__':

root = Tk()

if len(sys.argv) > 1:

st = ScrolledText(file=sys.argv[1]) # filename on cmdline

else:

st = ScrolledText(text='Words\ngo here') # or not: two lines

def show(event):

print(repr(st.gettext())) # show as raw string

root.bind('<Key-Escape>', show) # esc = dump text

root.mainloop()

Like the ScrolledList in Example 9-9, the ScrolledText object in this file is designed

to be a reusable component which we’ll also put to work in later examples, but it can

also be run standalone to display text file contents. Also like the last section, this script

is careful to pack the scroll bar first so that it is cut out of the display last as the window

shrinks and arranges for the embedded Text object to expand in both directions as the

window grows. When run with a filename argument, this script makes the window

shown in Figure 9-17; it embeds a Text widget on the left and a cross-linked Scroll

bar on the right.

Figure 9-17. scrolledtext in action

Just for fun, I populated the text file displayed in the window with the following code

and command lines (and not just because I used to live near an infamous hotel in

Colorado):

C:\...\PP4E\Gui\Tour> type makefile.py

f = open('jack.txt', 'w')

Text | 529

for i in range(250):

f.write('%03d) All work and no play makes Jack a dull boy.\n' % i)

f.close()

C:\...\PP4E\Gui\Tour> python makefile.py

C:\...\PP4E\Gui\Tour> python scrolledtext.py jack.txt

PP4E scrolledtext

To view a file, pass its name on the command line—its text is automatically displayed

in the new window. By default, it is shown in a font that may vary per platform (and

might not be fixed-width on some), but we’ll pass a font option to the Text widget in

the next example to change that. Pressing the Escape key fetches and displays the full

text content of the widget as a single string (more on this in a moment).

Notice the PP4E scrolledtext message printed when this script runs. Because there is

also a scrolledtext.py file in the standard Python distribution (in module

tkinter.scrolledtext) with a very different implementation and interface, the one here

identifies itself when run or imported, so you can tell which one you’ve got. If the

standard library’s alternative ever goes away, import the class listed to get a simple text

browser, and adjust any text widget configuration calls to include a .text qualifier level

(e.g., x.text.config instead of x.config; the library version subclasses Text directly,

not Frame).

Programming the Text Widget

To understand how this script works at all, though, we have to detour into a few

Text widget details here. Earlier we met the Entry and Message widgets, which address

a subset of the Text widget’s uses. The Text widget is much richer in both features and

interfaces—it supports both input and display of multiple lines of text, editing opera-

tions for both programs and interactive users, multiple fonts and colors, and much

more. Text objects are created, configured, and packed just like any other widget, but

they have properties all their own.

Text is a Python string

Although the Text widget is a powerful tool, its interface seems to boil down to two

core concepts. First, the content of a Text widget is represented as a string in Python

scripts, and multiple lines are separated with the normal \n line terminator. The string

'Words\ngo here', for instance, represents two lines when stored in or fetched from a

Text widget; it would normally have a trailing \n also, but it doesn’t have to.

To help illustrate this point, this script binds the Escape key press to fetch and print

the entire contents of the Text widget it embeds:

C:\...\PP4E\Gui\Tour> python scrolledtext.py

PP4E scrolledtext

'Words\ngo here'

'Always look\non the bright\nside of life\n'

530 | Chapter 9: A tkinter Tour, Part 2

When run with arguments, the script stores a file’s contents in the Text widget. When

run without arguments, the script stuffs a simple literal string into the widget, displayed

by the first Escape press output here (recall that \n is the escape sequence for the line-

end character). The second output here happens after editing the window’s text, when

pressing Escape in the shrunken window captured in Figure 9-18. By default, Text

widget text is fully editable using the usual edit operations for your platform.

Figure 9-18. scrolledtext gets a positive outlook

String positions

The second key to understanding Text code has to do with the ways you specify a

position in the text string. Like the listbox, Text widgets allow you to specify such a

position in a variety of ways. In Text, methods that expect a position to be passed in

will accept an index, a mark, or a tag reference. Moreover, some special operations are

invoked with predefined marks and tags—the insert cursor is mark INSERT, and the

current selection is tag SEL. Since they are fundamental to Text and the source of much

of its expressive power, let’s take a closer look at these settings.

Because it is a multiple-line widget, Text indexes identify both a line and a

column. For instance, consider the interfaces of the basic insert, delete, and fetch text

operations used by this script:

self.text.insert('1.0', text) # insert text at the start

self.text.delete('1.0', END) # delete all current text

return self.text.get('1.0', END+'-1c') # fetch first through last

In all of these, the first argument is an absolute index that refers to the start of the text

string: string '1.0' means row 1, column 0 (rows are numbered from 1 and columns

from 0, though '0.0' is accepted as a reference to the start of the text, too). An index

'2.1' refers to the second character in the second row.

Like the listbox, text indexes can also be symbolic names: the END in the preceding

delete call refers to the position just past the last character in the text string (it’s a

tkinter variable preset to string 'end'). Similarly, the symbolic index INSERT (really,

string 'insert') refers to the position immediately after the insert cursor—the place

where characters would appear if typed at the keyboard. Symbolic names such as

INSERT can also be called marks, described in a moment.

Text indexes.

Text | 531

For added precision, you can add simple arithmetic extensions to index strings. The

index expression END+'-1c' in the get call in the previous example, for instance, is really

the string 'end-1c' and refers to one character back from END. Because END points to

just beyond the last character in the text string, this expression refers to the last char-

acter itself. The −1c extension effectively strips the trailing \n that this widget adds to

its contents (and which may add a blank line if saved in a file).

Similar index string extensions let you name characters ahead (+1c), name lines ahead

and behind (+2l, −2l), and specify things such as line ends and word starts around an

index (lineend, wordstart). Indexes show up in most Text widget calls.

Besides row/column identifier strings, you can also pass positions as names

of marks—symbolic names for a position between two characters. Unlike absolute row/

column positions, marks are virtual locations that move as new text is inserted or de-

leted (by your script or your user). A mark always refers to its original location, even if

that location shifts to a different row and column over time.

To create a mark, call the Text object’s mark_set method with a string name and an

index to give its logical location. For instance, this script sets the insert cursor at the

start of the text initially, with a call like the first one here:

self.text.mark_set(INSERT, '1.0') # set insert cursor to start

self.text.mark_set('linetwo', '2.0') # mark current line 2

The name INSERT is a predefined special mark that identifies the insert cursor position;

setting it changes the insert cursor’s location. To make a mark of your own, simply

provide a unique name as in the second call here and use it anywhere you need to specify

a text position. The mark_unset call deletes marks by name.

In addition to absolute indexes and symbolic mark names, the Text widget

supports the notion of tags—symbolic names associated with one or more substrings

within the Text widget’s string. Tags can be used for many things, but they also serve

to represent a position anywhere you need one: tagged items are named by their be-

ginning and ending indexes, which can be later passed to position-based calls.

For example, tkinter provides a built-in tag name, SEL—a tkinter name preassigned to

string 'sel'—which automatically refers to currently selected text. To fetch the text

selected (highlighted) with a mouse, run either of these calls:

text = self.text.get(SEL_FIRST, SEL_LAST) # use tags for from/to indexes

text = self.text.get('sel.first', 'sel.last') # strings and constants work

The names SEL_FIRST and SEL_LAST are just preassigned variables in the tkinter module

that refer to the strings used in the second line here. The text get method expects two

indexes; to fetch text names by a tag, add .first and .last to the tag’s name to get its

start and end indexes.

To tag a substring, call the Text widget’s tag_add method with a tag name string and

start and stop positions (text can also be tagged as added in insert calls). To remove

a tag from all characters in a range of text, call tag_remove:

Text marks.

Text tags.

532 | Chapter 9: A tkinter Tour, Part 2

self.text.tag_add('alltext', '1.0', END) # tag all text in the widget

self.text.tag_add(SEL, index1, index2) # select from index1 up to index2

self.text.tag_remove(SEL, '1.0', END) # remove selection from all text

The first line here creates a new tag that names all text in the widget—from start through

end positions. The second line adds a range of characters to the built-in SEL selection

tag—they are automatically highlighted, because this tag is predefined to configure its

members that way. The third line removes all characters in the text string from the

SEL tag (all selections are unselected). Note that the tag_remove call just untags text

within the named range; to really delete a tag completely, call tag_delete instead. Also

keep in mind that these calls apply to tags themselves; to delete actual text use the

delete method shown earlier.

You can map indexes to tags dynamically, too. For example, the text search method

returns the row.column index of the first occurrence of a string between start and stop

positions. To automatically select the text thus found, simply add its index to the built-

in SEL tag:

where = self.text.search(target, INSERT, END) # search from insert cursor

pastit = where + ('+%dc' % len(target)) # index beyond string found

self.text.tag_add(SEL, where, pastit) # tag and select found string

self.text.focus() # select text widget itself

If you want only one string to be selected, be sure to first run the tag_remove call listed

earlier—this code adds a selection in addition to any selections that already exist (it

may generate multiple selections in the display). In general, you can add any number

of substrings to a tag to process them as a group.

To summarize: indexes, marks, and tag locations can be used anytime you need a text

position. For instance, the text see method scrolls the display to make a position visible;

it accepts all three kinds of position specifiers:

self.text.see('1.0') # scroll display to top

self.text.see(INSERT) # scroll display to insert cursor mark

self.text.see(SEL_FIRST) # scroll display to selection tag

Text tags can also be used in broader ways for formatting and event bindings, but I’ll

defer those details until the end of this section.

Adding Text-Editing Operations

Example 9-11 puts some of these concepts to work. It extends Example 9-10 to add

support for four common text-editing operations—file save, text cut and paste, and

string find searching—by subclassing ScrolledText to provide additional buttons and

methods. The Text widget comes with a set of default keyboard bindings that perform

some common editing operations, too, but they might not be what is expected on every

platform; it’s more common and user friendly to provide GUI interfaces to editing

operations in a GUI text editor.

Text | 533

Example 9-11. PP4E\Gui\Tour\simpleedit.py

"""

add common edit tools to ScrolledText by inheritance;

composition (embedding) would work just as well here;

this is not robust!--see PyEdit for a feature superset;

"""

from tkinter import *

from tkinter.simpledialog import askstring

from tkinter.filedialog import asksaveasfilename

from quitter import Quitter

from scrolledtext import ScrolledText # here, not Python's

class SimpleEditor(ScrolledText): # see PyEdit for more

def __init__(self, parent=None, file=None):

frm = Frame(parent)

frm.pack(fill=X)

Button(frm, text='Save', command=self.onSave).pack(side=LEFT)

Button(frm, text='Cut', command=self.onCut).pack(side=LEFT)

Button(frm, text='Paste', command=self.onPaste).pack(side=LEFT)

Button(frm, text='Find', command=self.onFind).pack(side=LEFT)

Quitter(frm).pack(side=LEFT)

ScrolledText.__init__(self, parent, file=file)

self.text.config(font=('courier', 9, 'normal'))

def onSave(self):

filename = asksaveasfilename()

if filename:

alltext = self.gettext() # first through last

open(filename, 'w').write(alltext) # store text in file

def onCut(self):

text = self.text.get(SEL_FIRST, SEL_LAST) # error if no select

self.text.delete(SEL_FIRST, SEL_LAST) # should wrap in try

self.clipboard_clear()

self.clipboard_append(text)

def onPaste(self): # add clipboard text

try:

text = self.selection_get(selection='CLIPBOARD')

self.text.insert(INSERT, text)

except TclError:

pass # not to be pasted

def onFind(self):

target = askstring('SimpleEditor', 'Search String?')

if target:

where = self.text.search(target, INSERT, END) # from insert cursor

if where: # returns an index

print(where)

pastit = where + ('+%dc' % len(target)) # index past target

#self.text.tag_remove(SEL, '1.0', END) # remove selection

self.text.tag_add(SEL, where, pastit) # select found target

self.text.mark_set(INSERT, pastit) # set insert mark

self.text.see(INSERT) # scroll display

534 | Chapter 9: A tkinter Tour, Part 2

self.text.focus() # select text widget

if __name__ == '__main__':

if len(sys.argv) > 1:

SimpleEditor(file=sys.argv[1]).mainloop() # filename on command line

else:

SimpleEditor().mainloop() # or not: start empty

This, too, was written with one eye toward reuse—the SimpleEditor class it defines

could be attached or subclassed by other GUI code. As I’ll explain at the end of this

section, though, it’s not yet as robust as a general-purpose library tool should be. Still,

it implements a functional text editor in a small amount of portable code. When run

standalone, it brings up the window in Figure 9-19 (shown editing itself and running

on Windows); index positions are printed on stdout after each successful find

operation—here, for two “def” finds, with prior selection removal logic commented-

out in the script (uncomment this line in the script to get single-selection behavior for

finds):

C:\...\PP4E\Gui\Tour> python simpleedit.py simpleedit.py

PP4E scrolledtext

14.4

25.4

Figure 9-19. simpleedit in action

The save operation pops up the common save dialog that is available in tkinter and is

tailored to look native on each platform. Figure 9-20 shows this dialog in action on

Windows 7. Find operations also pop up a standard dialog box to input a search string

Text | 535

(Figure 9-21); in a full-blown editor, you might want to save this string away to repeat

the find again (we will, in Chapter 11’s more full-featured PyEdit example). Quit op-

erations reuse the verifying Quit button component we coded in Chapter 8 yet again;

code reuse means never having to say you’re quitting without warning…

Figure 9-20. Save pop-up dialog on Windows

Figure 9-21. Find pop-up dialog

Using the clipboard

Besides Text widget operations, Example 9-11 applies the tkinter clipboard interfaces

in its cut-and-paste functions. Together, these operations allow you to move text within

a file (cut in one place, paste in another). The clipboard they use is just a place to store

data temporarily—deleted text is placed on the clipboard on a cut, and text is inserted

from the clipboard on a paste. If we restrict our focus to this program alone, there really

536 | Chapter 9: A tkinter Tour, Part 2

is no reason that the text string cut couldn’t simply be stored in a Python instance

variable. But the clipboard is actually a much larger concept.

The clipboard used by this script is an interface to a system-wide storage space, shared

by all programs on your computer. Because of that, it can be used to transfer data

between applications, even ones that know nothing of tkinter. For instance, text cut or

copied in a Microsoft Word session can be pasted in a SimpleEditor window, and text

cut in SimpleEditor can be pasted in a Microsoft Notepad window (try it). By using the

clipboard for cut and paste, SimpleEditor automatically integrates with the window

system at large. Moreover, the clipboard is not just for the Text widget—it can also be

used to cut and paste graphical objects in the Canvas widget (discussed next).

As used in the script of Example 9-11, the basic tkinter clipboard interface looks like

this:

self.clipboard_clear() # clear the clipboard

self.clipboard_append(text) # store a text string on it

text = self.selection_get(selection='CLIPBOARD') # fetch contents, if any

All of these calls are available as methods inherited by all tkinter widget objects because

they are global in nature. The CLIPBOARD selection used by this script is available on all

platforms (a PRIMARY selection is also available, but it is only generally useful on X

Windows, so we’ll ignore it here). Notice that the clipboard selection_get call throws

a TclError exception if it fails; this script simply ignores it and abandons a paste request,

but we’ll do better later.

Composition versus inheritance

As coded, SimpleEditor uses inheritance to extend ScrolledText with extra buttons and

callback methods. As we’ve seen, it’s also reasonable to attach (embed) GUI objects

coded as components, such as ScrolledText. The attachment model is usually called

composition; some people find it simpler to understand and less prone to name clashes

than extension by inheritance.

To give you an idea of the differences between these two approaches, the following

sketches the sort of code you would write to attach ScrolledText to SimpleEditor with

changed lines in bold font (see the file simpleedit2.py in the book’s examples distribu-

tion for a complete composition implementation). It’s mostly a matter of passing in the

right parents and adding an extra st attribute name anytime you need to get to the

Text widget’s methods:

class SimpleEditor(Frame):

def __init__(self, parent=None, file=None):

Frame.__init__(self, parent)

self.pack()

frm = Frame(self)

frm.pack(fill=X)

Button(frm, text='Save', command=self.onSave).pack(side=LEFT)

...more...

Text | 537

Quitter(frm).pack(side=LEFT)

self.st = ScrolledText(self, file=file) # attach, not subclass

self.st.text.config(font=('courier', 9, 'normal'))

def onSave(self):

filename = asksaveasfilename()

if filename:

alltext = self.st.gettext() # go through attribute

open(filename, 'w').write(alltext)

def onCut(self):

text = self.st.text.get(SEL_FIRST, SEL_LAST)

self.st.text.delete(SEL_FIRST, SEL_LAST)

...more...

This code doesn’t need to subclass Frame necessarily (it could add widgets to the passed-

in parent directly), but being a frame allows the full package here to be embedded and

configured as well. The window looks identical when such code is run. I’ll let you be

the judge of whether composition or inheritance is better here. If you code your Python

GUI classes right, they will work under either regime.

It’s called “Simple” for a reason: PyEdit (ahead)

Finally, before you change your system registry to make SimpleEditor your default text

file viewer, I should mention that although it shows the basics, it’s something of a

stripped-down version (really, a prototype) of the PyEdit example we’ll meet in Chap-

ter 11. In fact, you may wish to study that example now if you’re looking for more

complete tkinter text-processing code in general. There, we’ll also use more advanced

text operations, such as the undo/redo interface, case-insensitive searches, external files

search, and more. Because the Text widget is so powerful, it’s difficult to demonstrate

more of its features without the volume of code that is already listed in the PyEdit

program.

I should also point out that SimpleEditor not only is limited in function, but also is just

plain careless—many boundary cases go unchecked and trigger uncaught exceptions

that don’t kill the GUI, but are not handled or reported well. Even errors that are caught

are not reported to the user (e.g., a paste with nothing to be pasted). Be sure to see the

PyEdit example for a more robust and complete implementation of the operations

introduced in SimpleEditor.

Unicode and the Text Widget

I told you earlier that text content in the Text widget is always a string. Technically,

though, there are two string types in Python 3.X: str for Unicode text, and bytes for

byte strings. Moreover, text can be represented in a variety of Unicode encodings when

stored on files. It turns out that both these factors can impact programs that wish to

use Text well in Python 3.X.

538 | Chapter 9: A tkinter Tour, Part 2

In short, tkinter’s Text and other text-related widgets such as Entry support display of

International character sets for both str and bytes, but we must pass decoded Unicode

str to support the broadest range of character types. In this section, we decompose the

text story in tkinter in general to show why.

String types in the Text widget

You may or may not have noticed, but all our examples so far have been representing

content as str strings—either hardcoded in scripts, or fetched and saved using simple

text-mode files which assume the platform default encoding. Technically, though, the

Text widget allows us to insert both str and bytes:

>>> from tkinter import Text

>>> T = Text()

>>> T.insert('1.0', 'spam') # insert a str

>>> T.insert('end', b'eggs') # insert a bytes

>>> T.pack() # "spameggs" appears in text widget now

>>> T.get('1.0', 'end') # fetch content

'spameggs\n'

Inserting text as bytes might be useful for viewing arbitrary kinds of Unicode text,

especially if the encoding name is unknown. For example, text fetched over the Internet

(e.g., attached to an email or fetched by FTP) could be in any Unicode encoding; storing

it in binary-mode files and displaying it as bytes in a Text widget may at least seem to

side-step the encoding in our scripts.

Unfortunately, though, the Text widget returns its content as str strings, regardless of

whether it was inserted as str or bytes—we get back already-decoded Unicode text

strings either way:

>>> T = Text()

>>> T.insert('1.0', 'Textfileline1\n')

>>> T.insert('end', 'Textfileline2\n') # content is str for str

>>> T.get('1.0', 'end') # pack() is irrelevent to get()

'Textfileline1\nTextfileline2\n\n'

>>> T = Text()

>>> T.insert('1.0', b'Bytesfileline1\r\n') # content is str for bytes too!

>>> T.insert('end', b'Bytesfileline2\r\n') # and \r displays as a space

>>> T.get('1.0', 'end')

'Bytesfileline1\r\nBytesfileline2\r\n\n'

In fact, we get back str for content even if we insert both str and bytes, with a single

\n added at the end for good measure, as the first example in this section shows; here’s

a more comprehensive illustration:

>>> T = Text()

>>> T.insert('1.0', 'Textfileline1\n')

>>> T.insert('end', 'Textfileline2\n') # content is str for both

>>> T.insert('1.0', b'Bytesfileline1\r\n') # one \n added for either type

>>> T.insert('end', b'Bytesfileline2\r\n') # pack() displays as 4 lines

>>> T.get('1.0', 'end')

'Bytesfileline1\r\nTextfileline1\nTextfileline2\nBytesfileline2\r\n\n'

Text | 539

>>>

>>> print(T.get('1.0', 'end'))

Bytesfileline1

Textfileline1

Textfileline2

Bytesfileline2

This makes it easy to perform text processing on content after it is fetched: we may

conduct it in terms of str, regardless of which type of string was inserted. However,

this also makes it difficult to treat text data generically from a Unicode perspective: we

cannot save the returned str content to a binary mode file as is, because binary mode

files expect bytes. We must either encode to bytes manually first or open the file in text

mode and rely on it to encode the str. In either case we must know the Unicode en-

coding name to apply, assume the platform default suffices, fall back on guesses and

hope one works, or ask the user.

In other words, although tkinter allows us to insert and view some text of unknown

encoding as bytes, the fact that it’s returned as str strings means we generally need to

know how to encode it anyhow on saves, to satisfy Python 3.X file interfaces. Moreover,

because bytes inserted into Text widgets must also be decodable according to the limi-

ted Unicode policies of the underlying Tk library, we’re generally better off decoding

text to str ourselves if we wish to support Unicode broadly. To truly understand why

that’s true, we need to take a brief excursion through the Land of Unicode.

Unicode text in strings

The reason for all this extra complexity, of course, is that in a world with Unicode, we

cannot really think of “text” anymore without also asking “which kind.” Text in general

can be encoded in a wide variety of Unicode encoding schemes. In Python, this is always

a factor for str and pertains to bytes when it contains encoded text. Python’s str Uni-

code strings are simply strings once they are created, but you have to take encodings

into consideration when transferring them to and from files and when passing them to

libraries that impose constraints on text encodings.

We won’t cover Unicode encodings it in depth here (see Learning Python for back-

ground details, as well as the brief look at implications for files in Chapter 4), but a

quick review is in order to illustrate how this relates to Text widgets. First of all, keep

in mind that ASCII text data normally just works in most contexts, because it is a subset

of most Unicode encoding schemes. Data outside the ASCII 7-bit range, though, may

be represented differently as bytes in different encoding schemes.

For instance, the following must decode a Latin-1 bytes string using the Latin-1 en-

coding—using the platform default or an explicitly named encoding that doesn’t match

the bytes will fail:

>>> b = b'A\xc4B\xe4C' # these bytes are latin-1 format text

>>> b

b'A\xc4B\xe4C'

540 | Chapter 9: A tkinter Tour, Part 2

>>> s = b.decode('utf8')

UnicodeDecodeError: 'utf8' codec can't decode bytes in position 1-2: invalid dat...

>>> s = b.decode()

UnicodeDecodeError: 'utf8' codec can't decode bytes in position 1-2: invalid dat...

>>> s = b.decode('latin1')

>>> s

'AÄBäC'

Once you’ve decoded to a Unicode string, you can “convert” it to a variety of different

encoding schemes. Really, this simply translates to alternative binary encoding formats,

from which we can decode again later; a Unicode string has no Unicode type per se,

only encoded binary data does:

>>> s.encode('latin-1')

b'A\xc4B\xe4C'

>>> s.encode('utf-8')

b'A\xc3\x84B\xc3\xa4C'

>>> s.encode('utf-16')

b'\xff\xfeA\x00\xc4\x00B\x00\xe4\x00C\x00'

>>> s.encode('ascii')

UnicodeEncodeError: 'ascii' codec can't encode character '\xc4' in position 1: o...

Notice the last test here: the string you encode to must be compatible with the scheme

you choose, or you’ll get an exception; here, ASCII is too narrow to represent characters

decoded from Latin-1 bytes. Even though you can convert to different (compatible)

representations’ bytes, you must generally know what the encoded format is in order

to decode back to a string:

>>> s.encode('utf-16').decode('utf-16')

'AÄBäC'

>>> s.encode('latin-1').decode('latin-1')

'AÄBäC'

>>> s.encode('latin-1').decode('utf-8')

UnicodeDecodeError: 'utf8' codec can't decode bytes in position 1-2: invalid dat...

>>> s.encode('utf-8').decode('latin-1')

UnicodeEncodeError: 'charmap' codec can't encode character '\xc3' in position 2:...

Note the last test here again. Technically, encoding Unicode code points (characters)

to UTF-8 bytes and then decoding back again per the Latin-1 format does not raise an

error, but trying to print the result does: it’s scrambled garbage. To maintain fidelity,

you must generally know what format encoded bytes are in:

>>> s

'AÄBäC'

>>> x = s.encode('utf-8').decode('utf-8') # OK if encoding matches data

>>> x

'AÄBäC'

>>> x = s.encode('latin-1').decode('latin-1') # any compatible encoding works

Text | 541

>>> x

'AÄBäC'

>>> x = s.encode('utf-8').decode('latin-1') # decoding works, result is garbage

>>> x

UnicodeEncodeError: 'charmap' codec can't encode character '\xc3' in position 2:...

>>> len(s), len(x) # no longer the same string

(5, 7)

>>> s.encode('utf-8') # no longer same code points

b'A\xc3\x84B\xc3\xa4C'

>>> x.encode('utf-8')

b'A\xc3\x83\xc2\x84B\xc3\x83\xc2\xa4C'

>>> s.encode('latin-1')

b'A\xc4B\xe4C'

>>> x.encode('latin-1')

b'A\xc3\x84B\xc3\xa4C'

Curiously, the original string may still be there after a mismatch like this—if we encode

the scrambled bytes back to Latin-1 again (as 8-bit characters) and then decode prop-

erly, we might restore the original (in some contexts this can constitute a sort of second

chance if data is decoded wrong initially):

>>> s

'AÄBäC'

>>> s.encode('utf-8').decode('latin-1')

UnicodeEncodeError: 'charmap' codec can't encode character '\xc3' in position 2:...

>>> s.encode('utf-8').decode('latin-1').encode('latin-1')

b'A\xc3\x84B\xc3\xa4C'

>>> s.encode('utf-8').decode('latin-1').encode('latin-1').decode('utf-8')

'AÄBäC'

>>> s.encode('utf-8').decode('latin-1').encode('latin-1').decode('utf-8') == s

True

On the other hand, we can use a different encoding name to decode, as long as it’s

compatible with the format of the data; ASCII, UTF-8, and Latin-1, for instance, all

format ASCII text the same way:

>>> 'spam'.encode('utf8').decode('latin1')

'spam'

>>> 'spam'.encode('latin1').decode('ascii')

'spam'

It’s important to remember that a string’s decoded value doesn’t depend on the en-

coding it came from—once decoded, a string has no notion of encoding and is simply

a sequence of Unicode characters (“code points”). Hence, we really only need to care

about encodings at the point of transfer to and from files:

>>> s

'AÄBäC'

>>> s.encode('utf-16').decode('utf-16') == s.encode('latin-1').decode('latin-1')

True

542 | Chapter 9: A tkinter Tour, Part 2

Unicode text in files

Now, the same rules apply to text files, because Unicode strings are stored in files as

encoded bytes. When writing, we can encode in any format that accommodates the

string’s characters. When reading, though, we generally must know what that encoding

is or provide one that formats characters the same way:

>>> open('ldata', 'w', encoding='latin-1').write(s) # store in latin-1 format

>>> open('udata', 'w', encoding='utf-8').write(s) # store in utf-8 format

>>> open('ldata', 'r', encoding='latin-1').read() # OK if correct name given

'AÄBäC'

>>> open('udata', 'r', encoding='utf-8').read()

'AÄBäC'

>>> open('ldata', 'r').read() # else, may not work

'AÄBäC'

>>> open('udata', 'r').read()

UnicodeEncodeError: 'charmap' codec can't encode characters in position 2-3: cha...

>>> open('ldata', 'r', encoding='utf-8').read()

UnicodeDecodeError: 'utf8' codec can't decode bytes in position 1-2: invalid dat...

>>> open('udata', 'r', encoding='latin-1').read()

UnicodeEncodeError: 'charmap' codec can't encode character '\xc3' in position 2:...

By contrast, binary mode files don’t attempt to decode into a Unicode string; they

happily read whatever is present, whether the data was written to the file in text mode

with automatically encoded str strings (as in the preceding interaction) or in binary

mode with manually encoded bytes strings:

>>> open('ldata', 'rb').read()

b'A\xc4B\xe4C'

>>> open('udata', 'rb').read()

b'A\xc3\x84B\xc3\xa4C'

>>> open('sdata', 'wb').write( s.encode('utf-16') ) # return value: 12

>>> open('sdata', 'rb').read()

b'\xff\xfeA\x00\xc4\x00B\x00\xe4\x00C\x00'

Unicode and the Text widget

The application of all this to tkinter Text displays is straightforward: if we open in binary

mode to read bytes, we don’t need to be concerned about encodings in our own code—

tkinter interprets the data as expected, at least for these two encodings:

>>> from tkinter import Text

>>> t = Text()

>>> t.insert('1.0', open('ldata', 'rb').read())

>>> t.pack() # string appears in GUI OK

>>> t.get('1.0', 'end')

'AÄBäC\n'

>>>

Text | 543

>>> t = Text()

>>> t.insert('1.0', open('udata', 'rb').read())

>>> t.pack() # string appears in GUI OK

>>> t.get('1.0', 'end')

'AÄBäC\n'

It works the same if we pass a str fetched in text mode, but we then need to know the

encoding type on the Python side of the fence—reads will fail if the encoding type

doesn’t match the stored data:

>>> t = Text()

>>> t.insert('1.0', open('ldata', 'r', encoding='latin-1').read())

>>> t.pack()

>>> t.get('1.0', 'end')

'AÄBäC\n'

>>>

>>> t = Text()

>>> t.insert('1.0', open('udata', 'r', encoding='utf-8').read())

>>> t.pack()

>>> t.get('1.0', 'end')

'AÄBäC\n'

Either way, though, the fetched content is always a Unicode str, so binary mode really

only addresses loads: we still need to know an encoding to store, whether we write in

text mode directly or write in binary mode after manual encoding:

>>> c = t.get('1.0', 'end')

>>> c # content is str

'AÄBäC\n'

>>> open('cdata', 'wb').write(c) # binary mode needs bytes

TypeError: must be bytes or buffer, not str

>>> open('cdata', 'w', encoding='latin-1').write(c) # each write returns 6

>>> open('cdata', 'rb').read()

b'A\xc4B\xe4C\r\n'

>>> open('cdata', 'w', encoding='utf-8').write(c) # different bytes on files

>>> open('cdata', 'rb').read()

b'A\xc3\x84B\xc3\xa4C\r\n'

>>> open('cdata', 'w', encoding='utf-16').write(c)

>>> open('cdata', 'rb').read()

b'\xff\xfeA\x00\xc4\x00B\x00\xe4\x00C\x00\r\x00\n\x00'

>>> open('cdata', 'wb').write( c.encode('latin-1') ) # manual encoding first

>>> open('cdata', 'rb').read() # same but no \r on Win

b'A\xc4B\xe4C\n'

>>> open('cdata', 'w', encoding='ascii').write(c) # still must be compatible

UnicodeEncodeError: 'ascii' codec can't encode character '\xc4' in position 1: o

Notice the last test here: like manual encoding, file writes can still fail if the data cannot

be encoded in the target scheme. Because of that, programs may need to recover from

544 | Chapter 9: A tkinter Tour, Part 2

exceptions or try alternative schemes; this is especially true on platforms where ASCII

may be the default platform encoding.

The problem with treating text as bytes

The prior sections’ rules may seem complex, but they boil down to the following:

• Unless strings always use the platform default, we need to know encoding types

to read or write in text mode and to manually decode or encode for binary mode.

• We can use almost any encoding to write new files as long as it can handle the

string’s characters, but must provide one that is compatible with the existing data’s

binary format on reads.

• We don’t need to know the encoding mode to read text as bytes in binary mode

for display, but the str content returned by the Text widget still requires us to

encode to write on saves.

So why not always load text files in binary mode to display them in a tkinter Text widget?

While binary mode input files seem to side-step encoding issues for display, passing

text to tkinter as bytes instead of str really just delegates the encoding issue to the Tk

library, which imposes constraints of its own.

More specifically, opening input files in binary mode to read bytes may seem to support

viewing arbitrary types of text, but it has two potential downsides:

• It shifts the burden of deciding encoding type from our script to the Tk GUI library.

The library must still determine how to render those bytes and may not support

all encodings possible.

• It allows opening and viewing data that is not text in nature, thereby defeating

some of the purpose of the validity checks performed by text decoding.

The first point is probably the most crucial here. In experiments I’ve run on Windows,

Tk seems to correctly handle raw bytes strings encoded in ASCII, UTF-8 and Latin-1

format, but not UTF-16 or others such as CP500. By contrast, these all render correctly

if decoded in Python to str before being passed on to Tk. In programs intended for the

world at large, this wider support is crucial today. If you’re able to know or ask for

encodings, you’re better off using str both for display and saves.

To some degree, regardless of whether you pass in str or bytes, tkinter GUIs are subject

to the constraints imposed by the underlying Tk library and the Tcl language it uses

internally, as well as any imposed by the techniques Python’s tkinter uses to interface

with Tk. For example:

• Tcl, the internal implementation language of the Tk library, stores strings internally

in UTF-8 format, and decrees that strings passed in to and returned from its C API

be in this format.

Text | 545

• Tcl attempts to convert byte strings to its internal UTF-8 format, and generally

supports translation using the platform and locale encodings in the local operating

system with Latin-1 as a fallback.

• Python’s tkinter passes bytes strings to Tcl directly, but copies Python str Unicode

strings to and from Tcl Unicode string objects.

• Tk inherits all of Tcl’s Unicode policies, but adds additional font selection policies

for display.

In other words, GUIs that display text in tkinter are somewhat at the mercy of multiple

layers of software, above and beyond the Python language itself. In general, though,

Unicode is broadly supported by Tk’s Text widget for Python str, but not for Python

bytes. As you can probably tell, though, this story quickly becomes very low-level and

detailed, so we won’t explore it further in this book; see the Web and other resources

for more on tkinter, Tk, and Tcl, and the interfaces between them.

Other binary mode considerations

Even in contexts where it’s sufficient, using binary mode files to finesse encodings for

display is more complicated than you might think. We always need to be careful to

write output in binary mode, too, so what we read is what we later write—if we read

in binary mode, content end-lines will be \r\n on Windows, and we don’t want text-

mode files to expand this to \r\r\n. Moreover, there’s another difference in tkinter for

str and bytes. A str read from a text-mode file appears in the GUI as you expect, and

end-lines are mapped on Windows as usual:

C:\...\PP4E\Gui\Tour> python

>>> from tkinter import *

>>> T = Text() # str from text-mode file

>>> T.insert('1.0', open('jack.txt').read()) # platform default encoding

>>> T.pack() # appears in GUI normally

>>> T.get('1.0', 'end')[:75]

'000) All work and no play makes Jack a dull boy.\n001) All work and no pla'

If you pass in a bytes obtained from a binary-mode file, however, it’s odd in the GUI

on Windows—there’s an extra space at the end of each line, which reflects the \r that

is not stripped by binary mode files:

C:\...\PP4E\Gui\Tour> python

>>> from tkinter import *

>>> T = Text() # bytes from binary-mode

>>> T.insert('1.0', open('jack.txt', 'rb').read()) # no decoding occurs

>>> T.pack() # lines have space at end!

>>> T.get('1.0', 'end')[:75]

'000) All work and no play makes Jack a dull boy.\r\n001) All work and no pl'

To use bytes to allow for arbitrary text but make the text appear as expected by users,

we also have to strip the \r characters at line end manually. This assumes that a \r\n

combination doesn’t mean something special in the text’s encoding scheme, though

data in which this sequence does not mean end-of-line will likely have other issues when

546 | Chapter 9: A tkinter Tour, Part 2

displayed. The following avoids the extra end-of-line spaces—we open for input in

binary mode for undecoded bytes, but drop \r:

C:\...\PP4E\Gui\Tour> python

>>> from tkinter import * # use bytes, strip \r if any

>>> T = Text()

>>> data = open('jack.txt', 'rb').read()

>>> data = data.replace(b'\r\n', b'\n')

>>> T.insert('1.0', data)

>>> T.pack()

>>> T.get('1.0', 'end')[:75]

'000) All work and no play makes Jack a dull boy.\n001) All work and no pla'

To save content later, we can either add the \r characters back on Windows only,

manually encode to bytes, and save in binary mode; or we can open in text mode to

make the file object restore the \r if needed and encode for us, and write the str content

string directly. The second of these is probably simpler, as we don’t need to care about

platform differences.

Either way, though, we still face an encoding step—we can either rely on the platform

default encoding or obtain an encoding name from user interfaces. In the following,

for example, the text-mode file converts end-lines and encodes to bytes internally using

the platform default. If we care about supporting arbitrary Unicode types or run on a

platform whose default does not accommodate characters displayed, we would need

to pass in an explicit encoding argument (the Python slice operation here has the same

effect as fetching through Tk’s “end-1c” position specification):

...continuing prior listing...

>>> content = T.get('1.0', 'end')[:-1] # drop added \n at end

>>> open('copyjack.txt', 'w').write(content) # use platform default

12500 # text mode adds \n on Win

>>> ^Z

C:\...\PP4E\Gui\Tour> fc jack.txt copyjack.txt

Comparing files jack.txt and COPYJACK.TXT

FC: no differences encountered

Supporting Unicode in PyEdit (ahead)

We’ll see a use case of accommodating the Text widget’s Unicode behavior in the larger

PyEdit example of Chapter 11. Really, supporting Unicode just means supporting

arbitrary Unicode encodings in text files on opens and saves; once in memory, text

processing can always be performed in terms of str, since that’s how tkinter returns

content. To support Unicode, PyEdit will open both input and output files in text mode

with explicit encodings whenever possible, and fall back on opening input files in binary

mode only as a last resort. This avoids relying on the limited Unicode support Tk

provides for display of raw byte strings.

To make this policy work, PyEdit will accept encoding names from a wide variety of

sources and allow the user to configure which to attempt. Encodings may be obtained

from user dialog inputs, configuration file settings, the platform default, the prior

Text | 547

open’s encoding on saves, and even internal program values (parsed from email head-

ers, for instance). These sources are attempted until the first that succeeds, though it

may also be desirable to limit encoding attempts to just one such source in some

contexts.

Watch for this code in Chapter 14. Frankly, PyEdit in this edition originally read and

wrote files in text mode with platform default encodings. I didn’t consider the impli-

cations of Unicode on PyEdit until the PyMailGUI example’s Internet world raised the

specter of arbitrary text encodings. If it seems that strings are a lot more complicated

than they used to be, it’s probably only because your scope has been too narrow.

Advanced Text and Tag Operations

But enough about the idiosyncrasies of Unicode text—let’s get back to coding GUIs.

Besides the position specification roles we’ve seen so far, the Text widget’s text tags can

also be used to apply formatting and behavior to all characters in a substring and all

substrings added to a tag. In fact, this is where much of the power of the Text widget lies:

• Tags have formatting attributes for setting color, font, tabs, and line spacing and

justification; to apply these to many parts of the text at once, associate them with

a tag and apply formatting to the tag with the tag_config method, much like the

general config widget we’ve been using.

• Tags can also have associated event bindings, which let you implement things such

as hyperlinks in a Text widget: clicking the text triggers its tag’s event handler. Tag

bindings are set with a tag_bind method, much like the general widget bind method

we’ve already met.

With tags, it’s possible to display multiple configurations within the same Text widget;

for instance, you can apply one font to the Text widget at large and other fonts to tagged

text. In addition, the Text widget allows you to embed other widgets at an index (they

are treated like a single character), as well as images.

Example 9-12 illustrates the basics of all these advanced tools at once and draws the

interface captured in Figure 9-22. This script applies formatting and event bindings to

three tagged substrings, displays text in two different font and color schemes, and em-

beds an image and a button. Double-clicking any of the tagged substrings (or the em-

bedded button) with a mouse triggers an event that prints a “Got tag event” message

to stdout.

Example 9-12. PP4E\Gui\Tour\texttags.py

"demo advanced tag and text interfaces"

from tkinter import *

root = Tk()

def hello(event): print('Got tag event')

# make and config a Text

548 | Chapter 9: A tkinter Tour, Part 2

text = Text()

text.config(font=('courier', 15, 'normal')) # set font for all

text.config(width=20, height=12)

text.pack(expand=YES, fill=BOTH)

text.insert(END, 'This is\n\nthe meaning\n\nof life.\n\n') # insert six lines

# embed windows and photos

btn = Button(text, text='Spam', command=lambda: hello(0)) # embed a button

btn.pack()

text.window_create(END, window=btn) # embed a photo

text.insert(END, '\n\n')

img = PhotoImage(file='../gifs/PythonPowered.gif')

text.image_create(END, image=img)

# apply tags to substrings

text.tag_add('demo', '1.5', '1.7') # tag 'is'

text.tag_add('demo', '3.0', '3.3') # tag 'the'

text.tag_add('demo', '5.3', '5.7') # tag 'life'

text.tag_config('demo', background='purple') # change colors in tag

text.tag_config('demo', foreground='white') # not called bg/fg here

text.tag_config('demo', font=('times', 16, 'underline')) # change font in tag

text.tag_bind('demo', '<Double-1>', hello) # bind events in tag

root.mainloop()

Figure 9-22. Text tags in action

Such embedding and tag tools could ultimately be used to render a web page. In fact,

Python’s standard html.parser HTML parser module can help automate web page GUI

Text | 549

construction. As you can probably tell, though, the Text widget offers more GUI pro-

gramming options than we have space to list here. For more details on tag and text

options, consult other Tk and tkinter references. Right now, art class is about to begin.

Canvas

When it comes to graphics, the tkinter Canvas widget is the most free-form device in

the library. It’s a place to draw shapes, move objects dynamically, and place other kinds

of widgets. The canvas is based on a structured graphic object model: everything drawn

on a canvas can be processed as an object. You can get down to the pixel-by-pixel level

in a canvas, but you can also deal in terms of larger objects such as shapes, photos, and

embedded widgets. The net result makes the canvas powerful enough to support

everything from simple paint programs to full-scale visualization and animation.

Basic Canvas Operations

Canvases are ubiquitous in much nontrivial GUI work, and we’ll see larger canvas

examples show up later in this book under the names PyDraw, PyPhoto, PyView,

PyClock, and PyTree. For now, let’s jump right into an example that illustrates the

basics. Example 9-13 runs most of the major canvas drawing methods.

Example 9-13. PP4E\Gui\Tour\canvas1.py

"demo all basic canvas interfaces"

from tkinter import *

canvas = Canvas(width=525, height=300, bg='white') # 0,0 is top left corner

canvas.pack(expand=YES, fill=BOTH) # increases down, right

canvas.create_line(100, 100, 200, 200) # fromX, fromY, toX, toY

canvas.create_line(100, 200, 200, 300) # draw shapes

for i in range(1, 20, 2):

canvas.create_line(0, i, 50, i)

canvas.create_oval(10, 10, 200, 200, width=2, fill='blue')

canvas.create_arc(200, 200, 300, 100)

canvas.create_rectangle(200, 200, 300, 300, width=5, fill='red')

canvas.create_line(0, 300, 150, 150, width=10, fill='green')

photo=PhotoImage(file='../gifs/ora-lp4e.gif')

canvas.create_image(325, 25, image=photo, anchor=NW) # embed a photo

widget = Label(canvas, text='Spam', fg='white', bg='black')

widget.pack()

canvas.create_window(100, 100, window=widget) # embed a widget

canvas.create_text(100, 280, text='Ham') # draw some text

mainloop()

550 | Chapter 9: A tkinter Tour, Part 2

When run, this script draws the window captured in Figure 9-23. We saw how to place

a photo on canvas and size a canvas for a photo earlier on this tour (see “Im-

ages” on page 484). This script also draws shapes, text, and even an embedded Label

widget. Its window gets by on looks alone; in a moment, we’ll learn how to add event

callbacks that let users interact with drawn items.

Figure 9-23. canvas1 hardcoded object sketches

Programming the Canvas Widget

Canvases are easy to use, but they rely on a coordinate system, define unique drawing

methods, and name objects by identifier or tag. This section introduces these core

canvas concepts.

Coordinates

All items drawn on a canvas are distinct objects, but they are not really widgets. If you

study the canvas1 script closely, you’ll notice that canvases are created and packed (or

gridded or placed) within their parent container just like any other widget in tkinter.

But the items drawn on a canvas are not. Shapes, images, and so on, are positioned and

moved on the canvas by coordinates, identifiers, and tags. Of these, coordinates are

the most fundamental part of the canvas model.

Canvases define an (X,Y) coordinate system for their drawing area; x means the hori-

zontal scale, y means vertical. By default, coordinates are measured in screen pixels

(dots), the upper-left corner of the canvas has coordinates (0,0), and x and y coordinates

Canvas | 551

increase to the right and down, respectively. To draw and embed objects within a can-

vas, you supply one or more (X,Y) coordinate pairs to give absolute canvas locations.

This is different from the constraints we’ve used to pack widgets thus far, but it allows

very fine-grained control over graphical layouts, and it supports more free-form inter-

face techniques such as animation.*

Object construction

The canvas allows you to draw and display common shapes such as lines, ovals, rec-

tangles, arcs, and polygons. In addition, you can embed text, images, and other kinds

of tkinter widgets such as labels and buttons. The canvas1 script demonstrates all the

basic graphic object constructor calls; to each, you pass one or more sets of (X,Y) co-

ordinates to give the new object’s location, start point and endpoint, or diagonally

opposite corners of a bounding box that encloses the shape:

id = canvas.create_line(fromX, fromY, toX, toY) # line start, stop

id = canvas.create_oval(fromX, fromY, toX, toY) # two opposite box corners

id = canvas.create_arc( fromX, fromY, toX, toY) # two opposite oval corners

id = canvas.create_rectangle(fromX, fromY, toX, toY) # two opposite corners

Other drawing calls specify just one (X,Y) pair, to give the location of the object’s upper-

left corner:

id = canvas.create_image(250, 0, image=photo, anchor=NW) # embed a photo

id = canvas.create_window(100, 100, window=widget) # embed a widget

id = canvas.create_text(100, 280, text='Ham') # draw some text

The canvas also provides a create_polygon method that accepts an arbitrary set of

coordinate arguments defining the endpoints of connected lines; it’s useful for drawing

more arbitrary kinds of shapes composed of straight lines.

In addition to coordinates, most of these drawing calls let you specify common con-

figuration options, such as outline width, fill color, outline color, and so on. Indi-

vidual object types have unique configuration options all their own, too; for instance,

lines may specify the shape of an optional arrow, and text, widgets, and images may

be anchored to a point of the compass (this looks like the packer’s anchor, but really it

gives a point on the object that is positioned at the [X,Y] coordinates given in the

create call; NW puts the upper-left corner at [X,Y]).

Perhaps the most important thing to notice here, though, is that tkinter does most of

the “grunt” work for you—when drawing graphics, you provide coordinates, and

shapes are automatically plotted and rendered in the pixel world. If you’ve ever done

any lower-level graphics work, you’ll appreciate the difference.

* Animation techniques are covered at the end of this tour. As a use case example, because you can embed

other widgets in a canvas’s drawing area, their coordinate system also makes them ideal for implementing

GUIs that let users design other GUIs by dragging embedded widgets around on the canvas—a useful canvas

application we would explore in this book if I had a few hundred pages to spare.

552 | Chapter 9: A tkinter Tour, Part 2

Object identifiers and operations

Although not used by the canvas1 script, every object you put on a canvas has an iden-

tifier, returned by the create_ method that draws or embeds the object (what was coded

as id in the last section’s examples). This identifier can later be passed to other methods

that move the object to new coordinates, set its configuration options, delete it from

the canvas, raise or lower it among other overlapping objects, and so on.

For instance, the canvas move method accepts both an object identifier and X and Y

offsets (not coordinates), and it moves the named object by the offsets given:

canvas.move(objectIdOrTag, offsetX, offsetY) # move object(s) by offset

If this happens to move the object off-screen, it is simply clipped (not shown). Other

common canvas operations process objects, too:

canvas.delete(objectIdOrTag) # delete object(s) from canvas

canvas.tkraise(objectIdOrTag) # raise object(s) to front

canvas.lower(objectIdOrTag) # lower object(s) below others

canvas.itemconfig(objectIdOrTag, fill='red') # fill object(s) with red color

Notice the tkraise name—raise by itself is a reserved word in Python. Also note that

the itemconfig method is used to configure objects drawn on a canvas after they have

been created; use config to set configuration options for the canvas itself. Probably the

best thing to notice here, though, is that because tkinter is based on structured objects,

you can process a graphic object all at once; there is no need to erase and redraw each

pixel manually to implement a move or a raise.

Canvas object tags

But canvases offer even more power than suggested so far. In addition to object iden-

tifiers, you can also perform canvas operations on entire sets of objects at once, by

associating them all with a tag, a name that you make up and apply to objects on the

display. Tagging objects in a Canvas is at least similar in spirit to tagging substrings in

the Text widget we studied in the prior section. In general terms, canvas operation

methods accept either a single object’s identifier or a tag name.

For example, you can move an entire set of drawn objects by associating all with the

same tag and passing the tag name to the canvas move method. In fact, this is why

move takes offsets, not coordinates—when given a tag, each object associated with the

tag is moved by the same (X,Y) offsets; absolute coordinates would make all the tagged

objects appear on top of each other instead.

To associate an object with a tag, either specify the tag name in the object drawing call’s

tag option or call the addtag_withtag(tag, objectIdOrTag) canvas method (or its rel-

atives). For instance:

canvas.create_oval(x1, y1, x2, y2, fill='red', tag='bubbles')

canvas.create_oval(x3, y3, x4, y4, fill='red', tag='bubbles')

objectId = canvas.create_oval(x5, y5, x6, y6, fill='red')

Canvas | 553

canvas.addtag_withtag('bubbles', objectId)

canvas.move('bubbles', diffx, diffy)

This makes three ovals and moves them at the same time by associating them all with

the same tag name. Many objects can have the same tag, many tags can refer to the

same object, and each tag can be individually configured and processed.

As in Text, Canvas widgets have predefined tag names too: the tag all refers to all objects

on the canvas, and current refers to whatever object is under the mouse cursor. Besides

asking for an object under the mouse, you can also search for objects with the find_

canvas methods: canvas.find_closest(X,Y), for instance, returns a tuple whose first

item is the identifier of the closest object to the supplied coordinates—handy after

you’ve received coordinates in a general mouse-click event callback.

We’ll revisit the notion of canvas tags by example later in this chapter (see the animation

scripts near the end if you need more details right away). As usual, canvases support

additional operations and options that we don’t have space to cover in a finite text like

this (e.g., the canvas postscript method lets you save the canvas in a PostScript file).

See later examples in this book, such as PyDraw, for more details, and consult other

Tk or tkinter references for an exhaustive list of canvas object options.

Scrolling Canvases

One canvas-related operation is so common, though, that it does merit a look here. As

demonstrated in Example 9-14, scroll bars can be cross-linked with a canvas using the

same protocols we used to add them to listboxes and text earlier, but with a few unique

requirements.

Example 9-14. PP4E\Gui\Tour\scrolledcanvas.py

"a simple vertically-scrollable canvas component and demo"

from tkinter import *

class ScrolledCanvas(Frame):

def __init__(self, parent=None, color='brown'):

Frame.__init__(self, parent)

self.pack(expand=YES, fill=BOTH) # make me expandable

canv = Canvas(self, bg=color, relief=SUNKEN)

canv.config(width=300, height=200) # display area size

canv.config(scrollregion=(0, 0, 300, 1000)) # canvas size corners

canv.config(highlightthickness=0) # no pixels to border

sbar = Scrollbar(self)

sbar.config(command=canv.yview) # xlink sbar and canv

canv.config(yscrollcommand=sbar.set) # move one moves other

sbar.pack(side=RIGHT, fill=Y) # pack first=clip last

canv.pack(side=LEFT, expand=YES, fill=BOTH) # canv clipped first

self.fillContent(canv)

canv.bind('<Double-1>', self.onDoubleClick) # set event handler

554 | Chapter 9: A tkinter Tour, Part 2

self.canvas = canv

def fillContent(self, canv): # override me below

for i in range(10):

canv.create_text(150, 50+(i*100), text='spam'+str(i), fill='beige')

def onDoubleClick(self, event): # override me below

print(event.x, event.y)

print(self.canvas.canvasx(event.x), self.canvas.canvasy(event.y))

if __name__ == '__main__': ScrolledCanvas().mainloop()

This script makes the window in Figure 9-24. It is similar to prior scroll examples, but

scrolled canvases introduce two new kinks in the scrolling model:

Scrollable versus viewable sizes

You can specify the size of the displayed view window, but you must specify the

size of the scrollable canvas at large. The size of the view window is what is dis-

played, and it can be changed by the user by resizing. The size of the scrollable

canvas will generally be larger—it includes the entire content, of which only part

is displayed in the view window. Scrolling moves the view window over the scrol-

lable size canvas.

Viewable to absolute coordinate mapping

In addition, you may need to map between event view area coordinates and overall

canvas coordinates if the canvas is larger than its view area. In a scrolling scenario,

the canvas will almost always be larger than the part displayed, so mapping is often

needed when canvases are scrolled. In some applications, this mapping is not re-

quired, because widgets embedded in the canvas respond to users directly (e.g.,

buttons in the PyPhoto example in Chapter 11). If the user interacts with the canvas

directly, though (e.g., in a drawing program), mapping from view coordinates to

scrollable size coordinates may be necessary.

Figure 9-24. scrolledcanvas live

Canvas | 555

Sizes are given as configuration options. To specify a view area size, use canvas width

and height options. To specify an overall canvas size, give the (X,Y) coordinates of the

upper-left and lower-right corners of the canvas in a four-item tuple passed to the

scrollregion option. If no view area size is given, a default size is used. If no

scrollregion is given, it defaults to the view area size; this makes the scroll bar useless,

since the view is assumed to hold the entire canvas.

Mapping coordinates is a bit subtler. If the scrollable view area associated with a canvas

is smaller than the canvas at large, the (X,Y) coordinates returned in event objects are

view area coordinates, not overall canvas coordinates. You’ll generally want to scale

the event coordinates to canvas coordinates, by passing them to the canvasx and

canvasy canvas methods before using them to process objects.

For example, if you run the scrolled canvas script and watch the messages printed on

mouse double-clicks, you’ll notice that the event coordinates are always relative to the

displayed view window, not to the overall canvas:

C:\...\PP4E\Gui\Tour> python scrolledcanvas.py

2 0 event x,y when scrolled to top of canvas

2.0 0.0 canvas x,y -same, as long as no border pixels

150 106

150.0 106.0

299 197

299.0 197.0

3 2 event x,y when scrolled to bottom of canvas

3.0 802.0 canvas x,y -y differs radically

296 192

296.0 992.0

152 97 when scrolled to a midpoint in the canvas

152.0 599.0

16 187

16.0 689.0

Here, the mapped canvas X is always the same as the canvas X because the display area

and canvas are both set at 300 pixels wide (it would be off by 2 pixels due to automatic

borders if not for the script’s highlightthickness setting). But notice that the mapped

Y is wildly different from the event Y if you click after a vertical scroll. Without scaling,

the event’s Y incorrectly points to a spot much higher in the canvas.

Many of this book’s canvas examples need no such scaling—(0,0) always maps to the

upper-left corner of the canvas display in which a mouse click occurs—but just because

canvases are not scrolled. See the next section for a canvas with both horizontal and

vertical scrolls; the PyTree program later in this book is similar, but it also uses dy-

namically changed scrollable region sizes when new trees are viewed.

As a rule of thumb, if your canvases scroll, be sure to scale event coordinates to true

canvas coordinates in callback handlers that care about positions. Some handlers might

not care whether events are bound to individual drawn objects or embedded widgets

instead of the canvas at large, but we need to move on to the next two sections to see

how.

556 | Chapter 9: A tkinter Tour, Part 2

Scrollable Canvases and Image Thumbnails

At the end of Chapter 8, we looked at a collection of scripts that display thumbnail

image links for all photos in a directory. There, we noted that scrolling is a major

requirement for large photo collections. Now that we know about canvases and scroll-

bars, we can finally put them to work to implement this much-needed extension, and

conclude the image viewer story we began in Chapter 8 (well, almost).

Example 9-15 is a mutation of the last chapter’s code, which displays thumbnails in a

scrollable canvas. See the prior chapter for more details on its operation, including

the ImageTk module imported from the required Python Imaging Library (PIL) third-

party extension (needed for thumbnails and JPEG images).

In fact, to fully understand Example 9-15, you must also refer to Example 8-45, since

we’re reusing that module’s thumbnail creator and photo viewer tools. Here, we are

just adding a canvas, positioning the fixed-size thumbnail buttons at absolute coordi-

nates in the canvas, and computing the scrollable size using concepts outlined in the

prior section. Both horizontal and vertical scrollbars allow us to move through the

canvas of image buttons freely, regardless of how many there may be.

Example 9-15. PP4E\Gui\PIL\viewer_thumbs_scrolled.py

"""

image viewer extension: uses fixed-size thumbnail buttons for uniform layout, and

adds scrolling for large image sets by displaying thumbs in a canvas widget with

scroll bars; requires PIL to view image formats such as JPEG, and reuses thumbs

maker and single photo viewer in viewer_thumbs.py; caveat/to do: this could also

scroll popped-up images that are too large for the screen, and are cropped on

Windows as is; see PyPhoto later in Chapter 11 for a much more complete version;

"""

import sys, math

from tkinter import *

from PIL.ImageTk import PhotoImage

from viewer_thumbs import makeThumbs, ViewOne

def viewer(imgdir, kind=Toplevel, numcols=None, height=300, width=300):

"""

use fixed-size buttons, scrollable canvas;

sets scrollable (full) size, and places thumbs at absolute x,y

coordinates in canvas; caveat: assumes all thumbs are same size

"""

win = kind()

win.title('Simple viewer: ' + imgdir)

quit = Button(win, text='Quit', command=win.quit, bg='beige')

quit.pack(side=BOTTOM, fill=X)

canvas = Canvas(win, borderwidth=0)

vbar = Scrollbar(win)

hbar = Scrollbar(win, orient='horizontal')

vbar.pack(side=RIGHT, fill=Y) # pack canvas after bars

Canvas | 557

hbar.pack(side=BOTTOM, fill=X) # so clipped first

canvas.pack(side=TOP, fill=BOTH, expand=YES)

vbar.config(command=canvas.yview) # call on scroll move

hbar.config(command=canvas.xview)

canvas.config(yscrollcommand=vbar.set) # call on canvas move

canvas.config(xscrollcommand=hbar.set)

canvas.config(height=height, width=width) # init viewable area size

# changes if user resizes

thumbs = makeThumbs(imgdir) # [(imgfile, imgobj)]

numthumbs = len(thumbs)

if not numcols:

numcols = int(math.ceil(math.sqrt(numthumbs))) # fixed or N x N

numrows = int(math.ceil(numthumbs / numcols)) # 3.x true div

linksize = max(thumbs[0][1].size) # (width, height)

fullsize = (0, 0, # upper left X,Y

(linksize * numcols), (linksize * numrows) ) # lower right X,Y

canvas.config(scrollregion=fullsize) # scrollable area size

rowpos = 0

savephotos = []

while thumbs:

thumbsrow, thumbs = thumbs[:numcols], thumbs[numcols:]

colpos = 0

for (imgfile, imgobj) in thumbsrow:

photo = PhotoImage(imgobj)

link = Button(canvas, image=photo)

handler = lambda savefile=imgfile: ViewOne(imgdir, savefile)

link.config(command=handler, width=linksize, height=linksize)

link.pack(side=LEFT, expand=YES)

canvas.create_window(colpos, rowpos, anchor=NW,

window=link, width=linksize, height=linksize)

colpos += linksize

savephotos.append(photo)

rowpos += linksize

return win, savephotos

if __name__ == '__main__':

imgdir = 'images' if len(sys.argv) < 2 else sys.argv[1]

main, save = viewer(imgdir, kind=Tk)

main.mainloop()

To see this program in action, make sure you’ve installed the PIL extension described

near the end of Chapter 8 and launch the script from a command line, passing the name

of the image directory to be viewed as a command-line argument:

...\PP4E\Gui\PIL> viewer_thumbs_scrolled.py C:\Users\mark\temp\101MSDCF

As before, clicking on a thumbnail image opens the corresponding image at its full size

in a new pop-up window. Figure 9-25 shows the viewer at work on a large directory

copied from my digital camera; the initial run must create and cache thumbnails, but

later runs start quickly.

558 | Chapter 9: A tkinter Tour, Part 2

Figure 9-25. Scrolled thumbnail image viewer

Or simply run the script as is from a command line, by clicking its file icon, or within

IDLE—without command-line arguments, it displays the contents of the default sam-

ple images subdirectory in the book’s source code tree, as captured in Figure 9-26.

Figure 9-26. Displaying the default images directory

Canvas | 559

Scrolling images too: PyPhoto (ahead)

Despite its evolutionary twists, the scrollable thumbnail viewer in Example 9-15 still

has one major limitation remaining: images that are larger than the physical screen are

simply truncated on Windows when popped up. This becomes glaringly obvious when

opening large photos copied from a digital camera like those in Figure 9-25. Moreover,

there is no way to resize images once opened, to open other directories, and so on. It’s

a fairly simplistic demonstration of canvas programming.

In Chapter 11, we’ll learn how to do better when we meet the PyPhoto example pro-

gram. PyPhoto will scroll the full size of images as well. In addition, it has tools for a

variety of resizing effects, and it supports saving images to files and opening other image

directories on the fly. At its core, though, PyPhoto will reuse the techniques of our

simple browser here, as well as the thumbnail generation code we wrote in the prior

chapter; much like our simple text editor earlier in the chapter, the code here is essen-

tially a prototype for the more complete PyPhoto program we’ll put together later in

Chapter 11. Stay tuned for the thrilling conclusion of the PyPhoto story (or flip ahead

now if the suspense is too much to bear).

For the purposes of this chapter, notice how in Example 9-15 the thumbnail viewer’s

actions are associated with embedded button widgets, not with the canvas itself. In

fact, the canvas isn’t much but a display device. To see how to enrich it with events of

its own, let’s move on to the next section.

Using Canvas Events

Like Text and Listbox, there is no notion of a single command callback for Canvas. Instead,

canvas programs generally use other widgets (as we did with Example 9-15’s thumbnail

buttons) or the lower-level bind call to set up handlers for mouse clicks, key presses,

and the like (as we did for Example 9-14’s scrolling canvas). Example 9-16 takes the

latter approach further, showing how to bind additional events for the canvas itself, in

order to implement a few of the more common canvas drawing operations.

Example 9-16. PP4E\Gui\Tour\canvasDraw.py

"""

draw elastic shapes on a canvas on drag, move on right click;

see canvasDraw_tags*.py for extensions with tags and animation

"""

from tkinter import *

trace = False

class CanvasEventsDemo:

def __init__(self, parent=None):

canvas = Canvas(width=300, height=300, bg='beige')

canvas.pack()

canvas.bind('<ButtonPress-1>', self.onStart) # click

canvas.bind('<B1-Motion>', self.onGrow) # and drag

560 | Chapter 9: A tkinter Tour, Part 2

canvas.bind('<Double-1>', self.onClear) # delete all

canvas.bind('<ButtonPress-3>', self.onMove) # move latest

self.canvas = canvas

self.drawn = None

self.kinds = [canvas.create_oval, canvas.create_rectangle]

def onStart(self, event):

self.shape = self.kinds[0]

self.kinds = self.kinds[1:] + self.kinds[:1] # start dragout

self.start = event

self.drawn = None

def onGrow(self, event): # delete and redraw

canvas = event.widget

if self.drawn: canvas.delete(self.drawn)

objectId = self.shape(self.start.x, self.start.y, event.x, event.y)

if trace: print(objectId)

self.drawn = objectId

def onClear(self, event):

event.widget.delete('all') # use tag all

def onMove(self, event):

if self.drawn: # move to click spot

if trace: print(self.drawn)

canvas = event.widget

diffX, diffY = (event.x - self.start.x), (event.y - self.start.y)

canvas.move(self.drawn, diffX, diffY)

self.start = event

if __name__ == '__main__':

CanvasEventsDemo()

mainloop()

This script intercepts and processes three mouse-controlled actions:

Clearing the canvas

To erase everything on the canvas, the script binds the double left-click event to

run the canvas’s delete method with the all tag—again, a built-in tag that asso-

ciates every object on the screen. Notice that the Canvas widget clicked is available

in the event object passed in to the callback handler (it’s also available as

self.canvas).

Dragging out object shapes

Pressing the left mouse button and dragging (moving it while the button is still

pressed) creates a rectangle or oval shape as you drag. This is often called dragging

out an object—the shape grows and shrinks in an elastic rubber-band fashion as

you drag the mouse and winds up with a final size and location given by the point

where you release the mouse button.

To make this work in tkinter, all you need to do is delete the old shape and draw

another as each drag event fires; both delete and draw operations are fast enough

to achieve the elastic drag-out effect. Of course, to draw a shape to the current

Canvas | 561

mouse location, you need a starting point; to delete before a redraw, you also must

remember the last drawn object’s identifier. Two events come into play: the initial

button press event saves the start coordinates (really, the initial press event object,

which contains the start coordinates), and mouse movement events erase and re-

draw from the start coordinates to the new mouse coordinates and save the new

object ID for the next event’s erase.

Object moves

When you click the right mouse button (button 3), the script moves the most

recently drawn object to the spot you clicked in a single step. The event argument

gives the (X,Y) coordinates of the spot clicked, and we subtract the saved starting

coordinates of the last drawn object to get the (X,Y) offsets to pass to the canvas

move method (again, move does not take positions). Remember to scale event coor-

dinates first if your canvas is scrolled.

The net result creates a window like that shown in Figure 9-27 after user interaction.

As you drag out objects, the script alternates between ovals and rectangles; set the

script’s trace global to watch object identifiers scroll on stdout as new objects are drawn

during a drag. This screenshot was taken after a few object drag-outs and moves, but

you’d never tell from looking at it; run this example on your own computer to get a

better feel for the operations it supports.

Figure 9-27. canvasDraw after a few drags and moves

562 | Chapter 9: A tkinter Tour, Part 2

Binding events on specific items

Much like we did for the Text widget, it is also possible to bind events for one or more

specific objects drawn on a Canvas with its tag_bind method. This call accepts either a

tag name string or an object ID in its first argument. For instance, you can register a

different callback handler for mouse clicks on every drawn item or on any in a group

of drawn and tagged items, rather than for the entire canvas at large. Example 9-17

binds a double-click handler on both the canvas itself and on two specific text items

within it, to illustrate the interfaces. It generates Figure 9-28 when run.

Example 9-17. PP4E\Gui\Tour\canvas-bind.py

# bind events on both canvas and its items

from tkinter import *

def onCanvasClick(event):

print('Got canvas click', event.x, event.y, event.widget)

def onObjectClick(event):

print('Got object click', event.x, event.y, event.widget, end=' ')

print(event.widget.find_closest(event.x, event.y)) # find text object's ID

root = Tk()

canv = Canvas(root, width=100, height=100)

obj1 = canv.create_text(50, 30, text='Click me one')

obj2 = canv.create_text(50, 70, text='Click me two')

canv.bind('<Double-1>', onCanvasClick) # bind to whole canvas

canv.tag_bind(obj1, '<Double-1>', onObjectClick) # bind to drawn item

canv.tag_bind(obj2, '<Double-1>', onObjectClick) # a tag works here too

canv.pack()

root.mainloop()

Figure 9-28. Canvas-bind window

Object IDs are passed to tag_bind here, but a tag name string would work too, and

would allow you to associate multiple canvas objects as a group for event purposes.

When you click outside the text items in this script’s window, the canvas event handler

fires; when either text item is clicked, both the canvas and the text object handlers fire.

Here is the stdout result after clicking on the canvas twice and on each text item once;

Canvas | 563

the script uses the canvas find_closest method to fetch the object ID of the particular

text item clicked (the one closest to the click spot):

C:\...\PP4E\Gui\Tour> python canvas-bind.py

Got canvas click 3 6 .8217952 canvas clicks

Got canvas click 46 52 .8217952

Got object click 51 33 .8217952 (1,) first text click

Got canvas click 51 33 .8217952

Got object click 55 69 .8217952 (2,) second text click

Got canvas click 55 69 .8217952

We’ll revisit the notion of events bound to canvases in the PyDraw example in Chap-

ter 11, where we’ll use them to implement a feature-rich paint and motion program.

We’ll also return to the canvasDraw script later in this chapter, to add tag-based moves

and simple animation with time-based tools, so keep this page bookmarked for refer-

ence. First, though, let’s follow a promising side road to explore another way to lay out

widgets within windows—the gridding layout model.

Grids

So far, we’ve mostly been arranging widgets in displays by calling their pack methods—

an interface to the packer geometry manager in tkinter. We’ve also used absolute co-

ordinates in canvases, which are a kind of layout scheme, too, but not a high-level

managed one like the packer. This section introduces grid, the most commonly used

alternative to the packer. We previewed this alternative in Chapter 8 when discussing

input forms and arranging image thumbnails. Here, we’ll study gridding in its full form.

As we learned earlier, tkinter geometry managers work by arranging child widgets

within a parent container widget (parents are typically Frames or top-level windows).

When we ask a widget to pack or grid itself, we’re really asking its parent to place it

among its siblings. With pack, we provide constraints or sides and let the geometry

manager lay out widgets appropriately. With grid, we arrange widgets in rows and

columns in their parent, as though the parent container widget was a table.

Gridding is an entirely distinct geometry management system in tkinter. In fact, at this

writing, pack and grid are mutually exclusive for widgets that have the same parent—

within a given parent container, we can either pack widgets or grid them, but we cannot

do both. That makes sense, if you realize that geometry managers do their jobs as

parents, and a widget can be arranged by only one geometry manager.

Why Grids?

At least within one container, though, that means you must pick either grid or pack

and stick with it. So why grid, then? In general, grid is handy for displays in which

otherwise unrelated widgets must line up horizontally. This includes both tabular dis-

plays and form-like displays; arranging input fields in row/column grid fashion can be

at least as easy as laying out the display with nested frames.

564 | Chapter 9: A tkinter Tour, Part 2

As mentioned in the preceding chapter, input forms are generally best arranged either

as grids or as row frames with fixed-width labels, so that labels and entry fields line up

horizontally as expected on all platforms (as we learned, column frames don’t work

reliably, because they may misalign rows). Although grids and row frames are roughly

the same amount of work, grids are useful if calculating maximum label width is in-

convenient. Moreover, grids also apply to tables more complex than forms.

As we’ll see, though, for input forms, grid doesn’t offer substantial code or complexity

savings compared to equivalent packer solutions, especially when things like resizabil-

ity are added to the GUI picture. In other words, the choice between the two layout

schemes is often largely one of style, not technology.

Grid Basics: Input Forms Revisited

Let’s start off with the basics; Example 9-18 lays out a table of Labels and Entry fields—

widgets we’ve already met. Here, though, they are arrayed on a grid.

Example 9-18. PP4E\Gui\Tour\Grid\grid1.py

from tkinter import *

colors = ['red', 'green', 'orange', 'white', 'yellow', 'blue']

r = 0

for c in colors:

Label(text=c, relief=RIDGE, width=25).grid(row=r, column=0)

Entry(bg=c, relief=SUNKEN, width=50).grid(row=r, column=1)

r += 1

mainloop()

Gridding assigns widgets to row and column numbers, which both begin at number 0;

tkinter uses these coordinates, along with widget size in general, to lay out the con-

tainer’s display automatically. This is similar to the packer, except that rows and col-

umns replace the packer’s notion of sides and packing order.

When run, this script creates the window shown in Figure 9-29, pictured with data

typed into a few of the input fields. Once again, this book won’t do justice to the colors

displayed on the right, so you’ll have to stretch your imagination a little (or run this

script on a computer of your own).

Despite its colors, this is really just a classic input form layout again, of the same kind

we met in the prior chapter. Labels on the left describe data to type into entry fields on

the right. Here, though, we achieve the layout with gridding instead of packed frames.

Just for fun, this script displays color names on the left and the entry field of the cor-

responding color on the right. It achieves its table-like layout with these lines:

Label(...).grid(row=r, column=0)

Entry(...).grid(row=r, column=1)

Grids | 565

From the perspective of the container window, the label is gridded to column 0 in the

current row number (a counter that starts at 0) and the entry is placed in column 1.

The upshot is that the grid system lays out all the labels and entries in a two-dimensional

table automatically, with both evenly sized rows and evenly sized columns large enough

to hold the largest item in each column.

That is, because widgets are arranged by both row and column when gridded, they align

properly in both dimensions. Although packed row frames can achieve the same effect

if labels are fixed width (as we learned in Chapter 8), grids directly reflect the structure

of tabular displays; this includes input forms, as well as larger tables in general. The

next section illustrates this difference in code.

Comparing grid and pack

Time for some compare-and-contrast: Example 9-19 implements the same sort of

colorized input form with both grid and pack, to make it easy to see the differences

between the two approaches.

Example 9-19. PP4E\Gui\Tour\Grid\grid2.py

"""

add equivalent pack window using row frames and fixed-width labels;

Labels and Entrys in packed column frames may not line up horizontally;

same length code, though enumerate built-in could trim 2 lines off grid;

"""

from tkinter import *

colors = ['red', 'green', 'orange', 'white', 'yellow', 'blue']

def gridbox(parent):

"grid by row/column numbers"

row = 0

for color in colors:

lab = Label(parent, text=color, relief=RIDGE, width=25)

ent = Entry(parent, bg=color, relief=SUNKEN, width=50)

lab.grid(row=row, column=0)

ent.grid(row=row, column=1)

Figure 9-29. The grid geometry manager in pseudoliving color

566 | Chapter 9: A tkinter Tour, Part 2

ent.insert(0, 'grid')

row += 1

def packbox(parent):

"row frames with fixed-width labels"

for color in colors:

row = Frame(parent)

lab = Label(row, text=color, relief=RIDGE, width=25)

ent = Entry(row, bg=color, relief=SUNKEN, width=50)

row.pack(side=TOP)

lab.pack(side=LEFT)

ent.pack(side=RIGHT)

ent.insert(0, 'pack')

if __name__ == '__main__':

root = Tk()

gridbox(Toplevel())

packbox(Toplevel())

Button(root, text='Quit', command=root.quit).pack()

mainloop()

The pack version here uses row frames with fixed-width labels (again, column frames

can skew rows). The basic label and entry widgets are created the same way by these

two functions, but they are arranged in very different ways:

• With pack, we use side options to attach labels and rows on the left and right, and

create a Frame for each row (itself attached to the parent’s current top).

• With grid, we instead assign each widget a row and column position in the implied

tabular grid of the parent, using options of the same name.

As we’ve learned, with pack, the packing order can matter, too: a widget gets an entire

side of the remaining space (mostly irrelevant here), and items packed first are clipped

last (labels and topmost rows disappear last here). The grid alternative achieves the

same clipping effect by virtue of grid behavior. Running the script makes the windows

in Figure 9-30—one window for each scheme.

If you study this example closely, you’ll find that the difference in the amount of code

required for each layout scheme is roughly a wash, at least in this simple form. The

pack scheme must create a Frame per row, but the grid scheme must keep track of the

current row number.

In fact, both schemes require the same number of code lines as shown, though to be

fair we could shave one line from each by packing or gridding the label immediately,

and could shave two more lines from the grid layout by using the built-in enumerate

function to avoid manual counting. Here’s a minimalist’s version of the grid box func-

tion for reference:

def gridbox(parent):

for (row, color) in enumerate(colors):

Label(parent, text=color, relief=RIDGE, width=25).grid(row=row, column=0)

ent = Entry(parent, bg=color, relief=SUNKEN, width=50)

Grids | 567

ent.grid(row=row, column=1)

ent.insert(0, 'grid')

We’ll leave further code compaction to the more serious sports fans in the audience

(this code isn’t too horrific, but making your code concise in general is not always in

your coworkers’ best interest!). Irrespective of coding tricks, the complexity of packing

and gridding here seems similar. As we’ll see later, though, gridding can require more

code when widget resizing is factored into the mix.

Combining grid and pack

Notice that the prior section’s Example 9-19 passes a brand-new Toplevel to each form

constructor function so that the grid and pack versions wind up in distinct top-level

windows. Because the two geometry managers are mutually exclusive within a given

parent container, we have to be careful not to mix them improperly. For instance,

Example 9-20 is able to put both the packed and the gridded widgets on the same

window, but only by isolating each in its own Frame container widget.

Example 9-20. PP4E\Gui\Tour\Grid\grid2-same.py

"""

build pack and grid forms on different frames in same window;

can't grid and pack in same parent container (e.g., root window)

but can mix in same window if done in different parent frames;

"""

from tkinter import *

from grid2 import gridbox, packbox

root = Tk()

Label(root, text='Grid:').pack()

Figure 9-30. Equivalent grid and pack windows

568 | Chapter 9: A tkinter Tour, Part 2

frm = Frame(root, bd=5, relief=RAISED)

frm.pack(padx=5, pady=5)

gridbox(frm)

Label(root, text='Pack:').pack()

frm = Frame(root, bd=5, relief=RAISED)

frm.pack(padx=5, pady=5)

packbox(frm)

Button(root, text='Quit', command=root.quit).pack()

mainloop()

When this runs we get a composite window with two forms that look identical (Fig-

ure 9-31), but the two nested frames are actually controlled by completely different

geometry managers.

Figure 9-31. grid and pack in the same window

On the other hand, the sort of code in Example 9-21 fails badly, because it attempts to

use pack and grid within the same parent—only one geometry manager can be used

on any one parent.

Example 9-21. PP4E\Gui\Tour\Grid\grid2-fails.py

"""

FAILS-- can't grid and pack in same parent container (here, root window)

"""

Grids | 569

from tkinter import *

from grid2 import gridbox, packbox

root = Tk()

gridbox(root)

packbox(root)

Button(root, text='Quit', command=root.quit).pack()

mainloop()

This script passes the same parent (the top-level window) to each function in an effort

to make both forms appear in one window. It also utterly hangs the Python process on

my machine, without ever showing any windows at all (on some versions of Windows,

I’ve had to resort to Ctrl-Alt-Delete to kill it; on others, the Command Prompt shell

window must sometimes be restarted altogether).

Geometry manager combinations can be subtle until you get the hang of this. To make

this example work, for instance, we simply need to isolate the grid box in a parent

container all its own to keep it away from the packing going on in the root window—

as in the following bold alternative code:

root = Tk()

frm = Frame(root)

frm.pack() # this works

gridbox(frm) # gridbox must have its own parent in which to grid

packbox(root)

Button(root, text='Quit', command=root.quit).pack()

mainloop()

Again, today you must either pack or grid within one parent, but not both. It’s possible

that this restriction may be lifted in the future, but it’s been a long-lived constraint, and

it seems unlikely to be removed, given the disparity in the two window manager

schemes; try your Python to be sure.

Making Gridded Widgets Expandable

And now, some practical bits: the grids we’ve seen so far are fixed in size; they do not

grow when the enclosing window is resized by a user. Example 9-22 implements an

unreasonably patriotic input form with both grid and pack again, but adds the config-

uration steps needed to make all widgets in both windows expand along with their

window on a resize.

Example 9-22. PP4E\Gui\Tour\Grid\grid3.py

"add a label on the top and form resizing"

from tkinter import *

colors = ['red', 'white', 'blue']

def gridbox(root):

Label(root, text='Grid').grid(columnspan=2)

570 | Chapter 9: A tkinter Tour, Part 2

row = 1

for color in colors:

lab = Label(root, text=color, relief=RIDGE, width=25)

ent = Entry(root, bg=color, relief=SUNKEN, width=50)

lab.grid(row=row, column=0, sticky=NSEW)

ent.grid(row=row, column=1, sticky=NSEW)

root.rowconfigure(row, weight=1)

row += 1

root.columnconfigure(0, weight=1)

root.columnconfigure(1, weight=1)

def packbox(root):

Label(root, text='Pack').pack()

for color in colors:

row = Frame(root)

lab = Label(row, text=color, relief=RIDGE, width=25)

ent = Entry(row, bg=color, relief=SUNKEN, width=50)

row.pack(side=TOP, expand=YES, fill=BOTH)

lab.pack(side=LEFT, expand=YES, fill=BOTH)

ent.pack(side=RIGHT, expand=YES, fill=BOTH)

root = Tk()

gridbox(Toplevel(root))

packbox(Toplevel(root))

Button(root, text='Quit', command=root.quit).pack()

mainloop()

When run, this script makes the scene in Figure 9-32. It builds distinct pack and grid

windows again, with entry fields on the right colored red, white, and blue (or for readers

not working along on a computer, gray, white, and a marginally darker gray).

Figure 9-32. grid and pack windows before resizing

This time, though, resizing both windows with mouse drags makes all their embedded

labels and entry fields expand along with the parent window, as we see in Fig-

ure 9-33 (with text typed into the form).

Grids | 571

Figure 9-33. grid and pack windows resized

As coded, shrinking the pack window clips items packed last; shrinking the grid win-

dow shrinks all labels and entries together unlike grid2’s default behavior (try this on

your own).

Resizing in grids

Now that I’ve shown you what these windows do, I need to explain how they do it.

We learned in Chapter 7 how to make widgets expand with pack: we use expand and

fill options to increase space allocations and stretch into them, respectively. To make

expansion work for widgets arranged by grid, we need to use different protocols. Rows

and columns must be marked with a weight to make them expandable, and widgets

must also be made sticky so that they are stretched within their allocated grid cell:

Heavy rows and columns

With pack, we make each row expandable by making the corresponding Frame

expandable, with expand=YES and fill=BOTH. Gridders must be a bit more specific:

to get full expandability, call the grid container’s rowconfigure method for each

row and its columnconfigure for each column. To both methods, pass a weight

option with a value greater than zero to enable rows and columns to expand.

Weight defaults to zero (which means no expansion), and the grid container in this

script is just the top-level window. Using different weights for different rows and

columns makes them grow at proportionally different rates.

Sticky widgets

With pack, we use fill options to stretch widgets to fill their allocated space hor-

izontally or vertically, and anchor options to position widgets within their allocated

572 | Chapter 9: A tkinter Tour, Part 2

space. With grid, the sticky option serves the roles of both fill and anchor in the

packer. Gridded widgets can optionally be made sticky on one side of their allo-

cated cell space (such as anchor) or on more than one side to make them stretch

(such as fill). Widgets can be made sticky in four directions—N, S, E, and W, and

concatenations of these letters specify multiple-side stickiness. For instance, a

sticky setting of W left justifies the widget in its allocated space (such as a packer

anchor=W), and NS stretches the widget vertically within its allocated space (such as

a packer fill=Y).

Widget stickiness hasn’t been useful in examples thus far because the layouts were

regularly sized (widgets were no smaller than their allocated grid cell space), and

resizes weren’t supported at all. Here, though, Example 9-22 specifies NSEW stick-

iness to make widgets stretch in all directions with their allocated cells.

Different combinations of row and column weights and sticky settings generate differ-

ent resize effects. For instance, deleting the columnconfig lines in the grid3 script makes

the display expand vertically but not horizontally. Try changing some of these settings

yourself to see the sorts of effects they produce.

Spanning columns and rows

There is one other big difference in how the grid3 script configures its windows. Both

the grid and the pack windows display a label on the top that spans the entire window.

For the packer scheme, we simply make a label attached to the top of the window at

large (remember, side defaults to TOP):

Label(root, text='Pack').pack()

Because this label is attached to the window’s top before any row frames are, it appears

across the entire window top as expected. But laying out such a label takes a bit more

work in the rigid world of grids; the first line of the grid implementation function does

it like this:

Label(root, text='Grid').grid(columnspan=2)

To make a widget span across multiple columns, we pass grid a columnspan option with

a spanned-column count. Here, it just specifies that the label at the top of the window

should stretch over the entire window—across both the label and the entry columns.

To make a widget span across multiple rows, pass a rowspan option instead. The regular

layouts of grids can be either an asset or a liability, depending on how regular your user

interface will be; these two span settings let you specify exceptions to the rule when

needed.

So which geometry manager comes out on top here? When resizing is factored in, as

in the script in Example 9-22, gridding actually becomes slightly more complex; in fact,

gridding requires three extra lines of code. On the other hand, enumerate could again

make the race close, grid is still convenient for simple forms, and your grids and packs

may vary.

Grids | 573

For more on input form layout, stay tuned for the form builder utilities

we’ll code near the end of Chapter 12 and use again in Chapter 13, when

developing a file transfer and FTP client user interface. As we’ll see,

doing forms well once allows us to skip the details later. We’ll also use

more custom form layout code in the PyEdit program’s change dialog

in Chapter 11, and the PyMailGUI example’s email header fields in

Chapter 14.

Laying Out Larger Tables with grid

So far, we’ve been building two-column arrays of labels and input fields. That’s typical

of input forms, but the tkinter grid manager is capable of configuring much grander

matrixes. For instance, Example 9-23 builds a five-row by four-column array of labels,

where each label simply displays its row and column number (row.col). When run, the

window in Figure 9-34 appears on-screen.

Example 9-23. PP4E\Gui\Tour\Grid\grid4.py

# simple 2D table, in default Tk root window

from tkinter import *

for i in range(5):

for j in range(4):

lab = Label(text='%d.%d' % (i, j), relief=RIDGE)

lab.grid(row=i, column=j, sticky=NSEW)

mainloop()

Figure 9-34. A 5 × 4 array of coordinate labels

If you think this is starting to look like it might be a way to program spreadsheets, you

may be on to something. Example 9-24 takes this idea a bit further and adds a button

that prints the table’s current input field values to the stdout stream (usually, to the

console window).

574 | Chapter 9: A tkinter Tour, Part 2

Example 9-24. PP4E\Gui\Tour\Grid\grid5.py

# 2D table of input fields, default Tk root window

from tkinter import *

rows = []

for i in range(5):

cols = []

for j in range(4):

ent = Entry(relief=RIDGE)

ent.grid(row=i, column=j, sticky=NSEW)

ent.insert(END, '%d.%d' % (i, j))

cols.append(ent)

rows.append(cols)

def onPress():

for row in rows:

for col in row:

print(col.get(), end=' ')

print()

Button(text='Fetch', command=onPress).grid()

mainloop()

When run, this script creates the window in Figure 9-35 and saves away all the grid’s

entry field widgets in a two-dimensional list of lists. When its Fetch button is pressed,

the script steps through the saved list of lists of entry widgets, to fetch and display all

the current values in the grid. Here is the output of two Fetch presses—one before I

made input field changes, and one after:

C:\...\PP4E\Gui\Tour\Grid> python grid5.py

0.0 0.1 0.2 0.3

1.0 1.1 1.2 1.3

2.0 2.1 2.2 2.3

3.0 3.1 3.2 3.3

4.0 4.1 4.2 4.3

0.0 0.1 0.2 42

1.0 1.1 1.2 43

2.0 2.1 2.2 44

3.0 3.1 3.2 45

4.0 4.1 4.2 46

Now that we know how to build and step through arrays of input fields, let’s add a few

more useful buttons. Example 9-25 adds another row to display column sums and adds

buttons to clear all fields to zero and calculate column sums.

Example 9-25. PP4E\Gui\Tour\Grid\grid5b.py

# add column sums, clearing

from tkinter import *

numrow, numcol = 5, 4

Grids | 575

rows = []

for i in range(numrow):

cols = []

for j in range(numcol):

ent = Entry(relief=RIDGE)

ent.grid(row=i, column=j, sticky=NSEW)

ent.insert(END, '%d.%d' % (i, j))

cols.append(ent)

rows.append(cols)

sums = []

for i in range(numcol):

lab = Label(text='?', relief=SUNKEN)

lab.grid(row=numrow, column=i, sticky=NSEW)

sums.append(lab)

def onPrint():

for row in rows:

for col in row:

print(col.get(), end=' ')

print()

def onSum():

tots = [0] * numcol

for i in range(numcol):

for j in range(numrow):

tots[i] += eval(rows[j][i].get()) # sum column

for i in range(numcol):

sums[i].config(text=str(tots[i])) # display in GUI

def onClear():

for row in rows:

for col in row:

col.delete('0', END)

col.insert(END, '0.0')

for sum in sums:

sum.config(text='?')

import sys

Button(text='Sum', command=onSum).grid(row=numrow+1, column=0)

Button(text='Print', command=onPrint).grid(row=numrow+1, column=1)

Figure 9-35. A larger grid of input fields

576 | Chapter 9: A tkinter Tour, Part 2

Button(text='Clear', command=onClear).grid(row=numrow+1, column=2)

Button(text='Quit', command=sys.exit).grid(row=numrow+1, column=3)

mainloop()

Figure 9-36 shows this script at work summing up four columns of numbers; to get a

different-size table, change the numrow and numcol variables at the top of the script.

Figure 9-36. Adding column sums

And finally, Example 9-26 is one last extension that is coded as a class for reusability,

and it adds a button to load the table’s data from a file. Data files are assumed to be

coded as one line per row, with whitespace (spaces or tabs) between each column within

a row line. Loading a file of data automatically resizes the table GUI to accommodate

the number of columns in the table based upon the file’s content.

Example 9-26. PP4E\Gui\Tour\Grid\grid5c.py

# recode as an embeddable class

from tkinter import *

from tkinter.filedialog import askopenfilename

from PP4E.Gui.Tour.quitter import Quitter # reuse, pack, and grid

class SumGrid(Frame):

def __init__(self, parent=None, numrow=5, numcol=5):

Frame.__init__(self, parent)

self.numrow = numrow # I am a frame container

self.numcol = numcol # caller packs or grids me

self.makeWidgets(numrow, numcol) # else only usable one way

def makeWidgets(self, numrow, numcol):

self.rows = []

for i in range(numrow):

cols = []

for j in range(numcol):

ent = Entry(self, relief=RIDGE)

ent.grid(row=i+1, column=j, sticky=NSEW)

ent.insert(END, '%d.%d' % (i, j))

cols.append(ent)

self.rows.append(cols)

Grids | 577

self.sums = []

for i in range(numcol):

lab = Label(self, text='?', relief=SUNKEN)

lab.grid(row=numrow+1, column=i, sticky=NSEW)

self.sums.append(lab)

Button(self, text='Sum', command=self.onSum).grid(row=0, column=0)

Button(self, text='Print', command=self.onPrint).grid(row=0, column=1)

Button(self, text='Clear', command=self.onClear).grid(row=0, column=2)

Button(self, text='Load', command=self.onLoad).grid(row=0, column=3)

Quitter(self).grid(row=0, column=4) # fails: Quitter(self).pack()

def onPrint(self):

for row in self.rows:

for col in row:

print(col.get(), end=' ')

print()

def onSum(self):

tots = [0] * self.numcol

for i in range(self.numcol):

for j in range(self.numrow):

tots[i] += eval(self.rows[j][i].get()) # sum current data

for i in range(self.numcol):

self.sums[i].config(text=str(tots[i]))

def onClear(self):

for row in self.rows:

for col in row:

col.delete('0', END) # delete content

col.insert(END, '0.0') # preserve display

for sum in self.sums:

sum.config(text='?')

def onLoad(self):

file = askopenfilename()

if file:

for row in self.rows:

for col in row: col.grid_forget() # erase current gui

for sum in self.sums:

sum.grid_forget()

filelines = open(file, 'r').readlines() # load file data

self.numrow = len(filelines) # resize to data

self.numcol = len(filelines[0].split())

self.makeWidgets(self.numrow, self.numcol)

for (row, line) in enumerate(filelines): # load into gui

fields = line.split()

for col in range(self.numcol):

self.rows[row][col].delete('0', END)

self.rows[row][col].insert(END, fields[col])

578 | Chapter 9: A tkinter Tour, Part 2

if __name__ == '__main__':

import sys

root = Tk()

root.title('Summer Grid')

if len(sys.argv) != 3:

SumGrid(root).pack() # .grid() works here too

else:

rows, cols = eval(sys.argv[1]), eval(sys.argv[2])

SumGrid(root, rows, cols).pack()

mainloop()

Notice that this module’s SumGrid class is careful not to either grid or pack itself. In

order to be attachable to containers where other widgets are being gridded or packed,

it leaves its own geometry management ambiguous and requires callers to pack or grid

its instances. It’s OK for containers to pick either scheme for their own children because

they effectively seal off the pack-or-grid choice. But attachable component classes that

aim to be reused under both geometry managers cannot manage themselves because

they cannot predict their parent’s policy.

This is a fairly long example that doesn’t say much else about gridding or widgets in

general, so I’ll leave most of it as suggested reading and just show what it does. Fig-

ure 9-37 shows the initial window created by this script after changing the last column

and requesting a sum; make sure the directory containing the PP4E examples root is

on your module search path (e.g., PYTHONPATH) for the package import.

By default, the class makes the 5 × 5 grid here, but we can pass in other dimensions to

both the class constructor and the script’s command line. When you press the Load

button, you get the standard file selection dialog we met earlier on this tour

(Figure 9-38).

The datafile grid-data1.txt contains seven rows and six columns of data:

C:\...\PP4E\Gui\Tour\Grid> type grid5-data1.txt

1 2 3 4 5 6

Loading this file’s data into our GUI makes the dimensions of the grid change accord-

ingly—the class simply reruns its widget construction logic after erasing all the old entry

widgets with the grid_forget method. The grid_forget method unmaps gridded widg-

ets and so effectively erases them from the display. Also watch for the pack_forget

widget and window withdraw methods used in the after event “alarm” examples of the

next section for other ways to erase and redraw GUI components.

Once the GUI is erased and redrawn for the new data, Figure 9-39 captures the scene

after the file Load and a new Sum have been requested by the user in the GUI.

Grids | 579

Figure 9-39. Data file loaded, displayed, and summed

Figure 9-37. Adding datafile loads

Figure 9-38. Opening a data file for SumGrid

580 | Chapter 9: A tkinter Tour, Part 2

The grid5-data2.txt datafile has the same dimensions but contains expressions in two

of its columns, not just simple numbers. Because this script converts input field values

with the Python eval built-in function, any Python syntax will work in this table’s fields,

as long as it can be parsed and evaluated within the scope of the onSum method:

C:\...\PP4E\Gui\Tour\Grid> type grid5-data2.txt

1 2 3 2*2 5 6

1 3-1 3 2<<1 5 6

1 5%3 3 pow(2,2) 5 6

1 2 3 2**2 5 6

1 2 3 [4,3][0] 5 6

1 {'a':2}['a'] 3 len('abcd') 5 6

1 abs(-2) 3 eval('2+2') 5 6

Summing these fields runs the Python code they contain, as seen in Figure 9-40. This

can be a powerful feature; imagine a full-blown spreadsheet grid, for instance—field

values could be Python code “snippets” that compute values on the fly, call functions

in modules, and even download current stock quotes over the Internet with tools we’ll

meet in the next part of this book.

Figure 9-40. Python expressions in the data and table

It’s also a potentially dangerous tool—a field might just contain an expression that

erases your hard drive!† If you’re not sure what expressions may do, either don’t use

eval (convert with more limited built-in functions like int and float instead) or make

sure your Python is running in a process with restricted access permissions for system

components you don’t want to expose to the code you run.

Of course, this still is nowhere near a true spreadsheet program. There are fixed column

sums and file loads, for instance, but individual cells cannot contain formulas based

† I debated showing this, but since understanding a danger is a big part of avoiding it—if the Python process

had permission to delete files, passing the code string __import__('os').system('rm –rf *') to eval on Unix

would delete all files at and below the current directory by running a shell command (and 'rmdir /S /Q .'

would have a similar effect on Windows). Don’t do this! To see how this works in a less devious and potentially

useful way, type __import__('math').pi into one of the GUI table’s cells—on Sum, the cell evaluates to pi

(3.14159). Passing "__import__('os').system('dir')" to eval interactively proves the point safely as well.

All of this also applies to the exec built-in—eval runs expression strings and exec statements, but expressions

are statements (though not vice versa). A typical user of most GUIs is unlikely to type this kind of code

accidentally, of course, especially if that user is always you, but be careful out there!

Grids | 581

upon other cells. In the interest of space, further mutations toward that goal are left as

exercises.

I should also point out that there is more to gridding than we have time to present fully

here. For instance, by creating subframes that have grids of their own, we can build up

more sophisticated layouts as component hierarchies in much the same way as nested

frames arranged with the packer. For now, let’s move on to one last widget survey topic.

Time Tools, Threads, and Animation

The last stop on our widget tour is perhaps the most unique. tkinter also comes with

a handful of tools that have to do with the event-driven programming model, not

graphics displayed on a computer screen.

Some GUI applications need to perform background activities periodically. For exam-

ple, to “blink” a widget’s appearance, we’d like to register a callback handler to be

invoked at regular time intervals. Similarly, it’s not a good idea to let a long-running

file operation block other activity in a GUI; if the event loop could be forced to update

periodically, the GUI could remain responsive. tkinter comes with tools for both sched-

uling such delayed actions and forcing screen updates:

widget.after(milliseconds, function, *args)

This tool schedules the function to be called once by the GUI’s event processing

system after a number of milliseconds. This form of the call does not pause the

program—the callback function is scheduled to be run later from the normal

tkinter event loop, but the calling program continues normally, and the GUI re-

mains active while the function call is pending. As also discussed in Chapter 5,

unlike the threading module’s Timer object, widget.after events are dispatched in

the main GUI thread and so can freely update the GUI.

The function event handler argument can be any callable Python object: a function,

bound method, lambda and so on. The milliseconds timer duration argument is

an integer which can be used to specify both fractions and multiples of a second;

its value divided by 1,000 gives equivalent seconds. Any args arguments are passed

by position to function when it is later called.

In practice, a lambda can be used in place of individually-listed arguments to make

the association of arguments to function explicit, but that is not required. When

the function is a method, object state information (attributes) might also provide

its data instead of listed arguments. The after method returns an ID that can be

passed to after_cancel to cancel the callback. Since this method is so commonly

used, I’ll say more about it by example in a moment.

widget.after(milliseconds)

This tool pauses the calling program for a number of milliseconds—for example,

an argument of 5,000 pauses the caller for 5 seconds. This is essentially equivalent

to Python’s library function time.sleep(seconds), and both calls can be used to

582 | Chapter 9: A tkinter Tour, Part 2

add a delay in time-sensitive displays (e.g., animation programs such as PyDraw

and the simpler examples ahead).

widget.after_idle(function, *args)

This tool schedules the function to be called once when there are no more pending

events to process. That is, function becomes an idle handler, which is invoked

when the GUI isn’t busy doing anything else.

widget.after_cancel(id)

This tool cancels a pending after callback event before it occurs; id is the return

value of an after event scheduling call.

widget.update()

This tool forces tkinter to process all pending events in the event queue, including

geometry resizing and widget updates and redraws. You can call this periodically

from a long-running callback handler to refresh the screen and perform any updates

to it that your handler has already requested. If you don’t, your updates may not

appear on-screen until your callback handler exits. In fact, your display may hang

completely during long-running handlers if not manually updated (and handlers

are not run in threads, as described in the next section); the window won’t even

redraw itself until the handler returns if covered and uncovered by another.

For instance, programs that animate by repeatedly moving an object and pausing

must call for an update before the end of the animation or only the final object

position will appear on-screen; worse, the GUI will be completely inactive until

the animation callback returns (see the simple animation examples later in this

chapter, as well as PyDraw in Chapter 11).

widget.update_idletasks()

This tool processes any pending idle events. This may sometimes be safer than

after, which has the potential to set up race (looping) conditions in some scenarios.

Tk widgets use idle events to display themselves.

_tkinter.createfilehandler(file, mask, function)

This tool schedules the function to be called when a file’s status changes. The

function may be invoked when the file has data for reading, is available for writing,

or triggers an exception. The file argument is a Python file or socket object (tech-

nically, anything with a fileno() method) or an integer file descriptor; mask is

tkinter.READABLE or tkinter.WRITABLE to specify the mode; and the callback

function takes two arguments—the file ready to converse and a mask. File handlers

are often used to process pipes or sockets, since normal input/output requests can

block the caller.

Because this call is not available on Windows, it won’t be used in this book. Since

it’s currently a Unix-only alternative, portable GUIs may be better off using

after timer loops to poll for data and spawning threads to read data and place it

on queues if needed—see Chapter 10 for more details. Threads are a much more

general solution to nonblocking data transfers.

Time Tools, Threads, and Animation | 583

widget.wait_variable(var)

widget.wait_window(win)

widget.wait_visibility(win)

These tools pause the caller until a tkinter variable changes its value, a window is

destroyed, or a window becomes visible. All of these enter a local event loop, such

that the application’s mainloop continues to handle events. Note that var is a tkinter

variable object (discussed earlier), not a simple Python variable. To use for modal

dialogs, first call widget.focus() (to set input focus) and widget.grab() (to make

a window be the only one active).

Although we’ll put some of these to work in examples, we won’t go into exhaustive

details on all of these tools here; see other Tk and tkinter documentation for more

information.

Using Threads with tkinter GUIs

Keep in mind that for many programs, Python’s thread support that we discussed in

Chapter 5 can serve some of the same roles as the tkinter tools listed in the preceding

section and can even make use of them. For instance, to avoid blocking a GUI (and its

users) during a long-running file or socket transfer, the transfer can simply be run in a

spawned thread, while the rest of the program continues to run normally. Similarly,

GUIs that must watch for inputs on pipes or sockets can do so in spawned threads or

after callbacks, or some combination thereof, without blocking the GUI itself.

If you do use threads in tkinter programs, however, you need to remember that only

the main thread (the one that built the GUI and started the mainloop) should generally

make GUI calls. At the least, multiple threads should not attempt to update the GUI

at the same time. For example, the update method described in the preceding section

has historically caused problems in threaded GUIs—if a spawned thread calls this

method (or calls a method that does), it can sometimes trigger very strange and even

spectacular program crashes.

In fact, for a simple and more vivid example of the lack of thread safety in tkinter GUIs,

see and run the following files in the book examples distribution package:

...\PP4E\Gui\Tour\threads-demoAll-frm.py

...\PP4E\Gui\Tour threads-demoAll-win.py

These scripts are takeoffs of the prior chapter’s Examples 8-32 and 8-33, which run the

construction of four GUI demo components in parallel threads. They also both crash

horrifically on Windows and require forced shutdown of the program. While some

GUI operations appear to be safe to perform in parallel threads (e.g., see the canvas

moves in Example 9-32), thread safety is not guaranteed by tkinter in general. (For

further proof of tkinter’s lack of thread safety, see the discussion of threaded update

loops in the next chapter, just after Example 10-28; a thread there that attempts to pop

up a new window also makes the GUI fail resoundingly.)

584 | Chapter 9: A tkinter Tour, Part 2

This GUI thread story is prone to change over time, but it imposes a few structural

constraints on programs. For example, because spawned threads cannot usually per-

form GUI processing, they must generally communicate with the main thread using

global variables or shared mutable objects such as queues, as required by the applica-

tion. A spawned thread which watches a socket for data, for instance, might simply set

global variables or append to shared queues, which in turn triggers GUI changes in the

main thread’s periodic after event callbacks. The main thread’s timer events process

the spawned thread’s results.

Although some GUI operations or toolkits may support multiple threads better than

others, GUI programs are generally best structured as a main GUI thread and non-GUI

“worker” threads this way, both to avoid potential collisions and to finesse the thread

safety issue altogether. The PyMailGUI example later in the book, for instance, will

collect and dispatch callbacks produced by threads and stored on a queue.

Also remember that irrespective of thread safety of the GUI itself, threaded GUI pro-

grams must follow the same principles of threaded programs in general—as we learned

in Chapter 5, such programs must still synchronize access to mutable state shared

between threads, if it may be changed by threads running in parallel. Although a

producer/consumer thread model based upon queues can alleviate many thread issues

for the GUI itself, a program that spawns non-GUI threads to update shared informa-

tion used by the GUI thread may still need to use thread locks to avoid concurrent

update issues.

We’ll explore GUI threading in more detail in Chapter 10, and we’ll meet more realistic

threaded GUI programs in Part IV, especially in Chapter 14’s PyMailGUI. The latter,

for instance, runs long-running tasks in threads to avoid blocking the GUI, but both

restricts GUI updates to the main thread and uses locks to prevent overlap of operations

that may change shared caches.

Using the after Method

Of all the event tools in the preceding list, the after method may be the most interesting.

It allows scripts to schedule a callback handler to be run at some time in the future.

Though a simple device, we’ll use this often in later examples in this book. For instance,

in Chapter 11, we’ll meet a clock program that uses after to wake up 10 times per

second and check for a new time, and we’ll see an image slideshow program that uses

after to schedule the next photo display (see PyClock and PyView). To illustrate the

basics of scheduled callbacks, Example 9-27 does something a bit different.

Example 9-27. PP4E\Gui\Tour\alarm.py

# flash and beep every second using after() callback loop

from tkinter import *

class Alarm(Frame):

Time Tools, Threads, and Animation | 585

def __init__(self, msecs=1000): # default = 1 second

Frame.__init__(self)

self.msecs = msecs

self.pack()

stopper = Button(self, text='Stop the beeps!', command=self.quit)

stopper.pack()

stopper.config(bg='navy', fg='white', bd=8)

self.stopper = stopper

self.repeater()

def repeater(self): # on every N millisecs

self.bell() # beep now

self.stopper.flash() # flash button now

self.after(self.msecs, self.repeater) # reschedule handler

if __name__ == '__main__': Alarm(msecs=1000).mainloop()

This script builds the window in Figure 9-41 and periodically calls both the button

widget’s flash method to make the button flash momentarily (it alternates colors

quickly) and the tkinter bell method to call your system’s sound interface. The

repeater method beeps and flashes once and schedules a callback to be invoked after

a specific amount of time with the after method.

Figure 9-41. Stop the beeps!

But after doesn’t pause the caller: callbacks are scheduled to occur in the background,

while the program performs other processing—technically, as soon as the Tk event

loop is able to notice the time rollover. To make this work, repeater calls after each

time through, to reschedule the callback. Delayed events are one-shot callbacks; to

repeat the event as a loop, we need to reschedule it anew.

The net effect is that when this script runs, it starts beeping and flashing once its one-

button window pops up. And it keeps beeping and flashing. And beeping. And flashing.

Other activities and GUI operations don’t affect it. Even if the window is iconified, the

beeping continues, because tkinter timer events fire in the background. You need to

kill the window or press the button to stop the alarm. By changing the msecs delay, you

can make this beep as fast or as slow as your system allows (some platforms can’t beep

as fast as others…). This may or may not be the best demo to launch in a crowded office,

but at least you’ve been warned.

586 | Chapter 9: A tkinter Tour, Part 2

Hiding and redrawing widgets and windows

The button flash method flashes the widget, but it’s easy to dynamically change other

appearance options of widgets, such as buttons, labels, and text, with the widget

config method. For instance, you can also achieve a flash-like effect by manually re-

versing foreground and background colors with the widget config method in scheduled

after callbacks. Largely for fun, Example 9-28 specializes the alarm to go a step further.

Example 9-28. PP4E\Gui\Tour\alarm-hide.py

# customize to erase or show button on after() timer callbacks

from tkinter import *

import alarm

class Alarm(alarm.Alarm): # change alarm callback

def __init__(self, msecs=1000): # default = 1 second

self.shown = False

alarm.Alarm.__init__(self, msecs)

def repeater(self): # on every N millisecs

self.bell() # beep now

if self.shown:

self.stopper.pack_forget() # hide or erase button now

else: # or reverse colors, flash...

self.stopper.pack()

self.shown = not self.shown # toggle state for next time

self.after(self.msecs, self.repeater) # reschedule handler

if __name__ == '__main__': Alarm(msecs=500).mainloop()

When this script is run, the same window appears, but the button is erased or redrawn

on alternating timer events. The widget pack_forget method erases (unmaps) a drawn

widget and pack makes it show up again; grid_forget and grid similarly hide and show

widgets in a grid. The pack_forget method is useful for dynamically drawing and

changing a running GUI. For instance, you can be selective about which components

are displayed, and you can build widgets ahead of time and show them only as needed.

Here, it just means that users must press the button while it’s displayed, or else the

noise keeps going.

Example 9-29 goes even further. There are a handful of methods for hiding and un-

hiding entire top-level windows:

• To hide and unhide the entire window instead of just one widget within it, use

the top-level window widget withdraw and deiconify methods. The withdraw

method, demonstrated in Example 9-29, completely erases the window and its icon

(use iconify if you want the window’s icon to appear during a hide).

• The lift method raises a window above all its siblings or relative to another you

pass in. This method is also known as tkraise, but not raise—its name in Tk—

because raise is a reserved word in Python.

Time Tools, Threads, and Animation | 587

• The state method returns or changes the window’s current state—it accepts

normal, iconic, zoomed (full screen), or withdrawn.

Experiment with these methods on your own to see how they differ. They are also useful

to pop up prebuilt dialog windows dynamically, but are perhaps less practical here.

Example 9-29. PP4E\Gui\Tour\alarm-withdraw.py

# same, but hide or show entire window on after() timer callbacks

from tkinter import *

import alarm

class Alarm(alarm.Alarm):

def repeater(self): # on every N millisecs

self.bell() # beep now

if self.master.state() == 'normal': # is window displayed?

self.master.withdraw() # hide entire window, no icon

else: # iconify shrinks to an icon

self.master.deiconify() # else redraw entire window

self.master.lift() # and raise above others

self.after(self.msecs, self.repeater) # reschedule handler

if __name__ == '__main__': Alarm().mainloop() # master = default Tk root

This works the same, but the entire window appears or disappears on beeps—you have

to press it when it’s shown. You could add lots of other effects to the alarm, and their

timer-based callbacks technique is widely applicable. Whether your buttons and win-

dows should flash and disappear, though, probably depends less on tkinter technology

than on your users’ patience.

Simple Animation Techniques

Apart from the direct shape moves of the canvasDraw example we met earlier in this

chapter, all of the GUIs presented so far in this part of the book have been fairly static.

This last section shows you how to change that, by adding simple shape movement

animations to the canvas drawing example listed in Example 9-16.

It also demonstrates and expands on the notion of canvas tags—the move operations

performed here move all canvas objects associated with a tag at once. All oval shapes

move if you press “o,” and all rectangles move if you press “r”; as mentioned earlier,

canvas operation methods accept both object IDs and tag names.

But the main goal here is to illustrate simple animation techniques using the time-based

tools described earlier in this section. There are three basic ways to move objects around

a canvas:

• By loops that use time.sleep to pause for fractions of a second between multiple

move operations, along with manual update calls. The script moves, sleeps, moves

a bit more, and so on. A time.sleep call pauses the caller and so fails to return

588 | Chapter 9: A tkinter Tour, Part 2

control to the GUI event loop—any new requests that arrive during a move are

deferred. Because of that, canvas.update must be called to redraw the screen after

each move, or else updates don’t appear until the entire movement loop callback

finishes and returns. This is a classic long-running callback scenario; without man-

ual update calls, no new GUI events are handled until the callback returns in this

scheme (including both new user requests and basic window redraws).

• By using the widget.after method to schedule multiple move operations to occur

every few milliseconds. Because this approach is based upon scheduled events dis-

patched by tkinter to your handlers, it allows multiple moves to occur in parallel

and doesn’t require canvas.update calls. You rely on the event loop to run moves,

so there’s no reason for sleep pauses, and the GUI is not blocked while moves are

in progress.

• By using threads to run multiple copies of the time.sleep pausing loops of the first

approach. Because threads run in parallel, a sleep in any thread blocks neither the

GUI nor other motion threads. As described earlier, GUIs should not be updated

from spawned threads in general, but some canvas calls such as move seem to be

thread-safe today in the current tkinter implementation.

Of these three schemes, the first yields the smoothest animations but makes other

operations sluggish during movement, the second seems to yield slower motion than

the others but is safer than using threads in general, and the second and third allow

multiple objects to be in motion at the same time.

Using time.sleep loops

The next three sections demonstrate the code structure of all three approaches in turn,

with new subclasses of the canvasDraw example we met in Example 9-16 earlier in this

chapter. Refer back to that example for its other event bindings and basic draw, move,

and clear operations; here, we customize its object creators for tags and add new event

bindings and actions. Example 9-30 illustrates the first approach.

Example 9-30. PP4E\Gui\Tour\canvasDraw_tags.py

"""

add tagged moves with time.sleep (not widget.after or threads);

time.sleep does not block the GUI event loop while pausing, but screen not redrawn

until callback returns or widget.update call; currently running onMove callback has

exclusive attention until it returns: others pause if press 'r' or 'o' during move;

"""

from tkinter import *

import canvasDraw, time

class CanvasEventsDemo(canvasDraw.CanvasEventsDemo):

def __init__(self, parent=None):

canvasDraw.CanvasEventsDemo.__init__(self, parent)

self.canvas.create_text(100, 10, text='Press o and r to move shapes')

self.canvas.master.bind('<KeyPress-o>', self.onMoveOvals)

Time Tools, Threads, and Animation | 589

self.canvas.master.bind('<KeyPress-r>', self.onMoveRectangles)

self.kinds = self.create_oval_tagged, self.create_rectangle_tagged

def create_oval_tagged(self, x1, y1, x2, y2):

objectId = self.canvas.create_oval(x1, y1, x2, y2)

self.canvas.itemconfig(objectId, tag='ovals', fill='blue')

return objectId

def create_rectangle_tagged(self, x1, y1, x2, y2):

objectId = self.canvas.create_rectangle(x1, y1, x2, y2)

self.canvas.itemconfig(objectId, tag='rectangles', fill='red')

return objectId

def onMoveOvals(self, event):

print('moving ovals')

self.moveInSquares(tag='ovals') # move all tagged ovals

def onMoveRectangles(self, event):

print('moving rectangles')

self.moveInSquares(tag='rectangles')

def moveInSquares(self, tag): # 5 reps of 4 times per sec

for i in range(5):

for (diffx, diffy) in [(+20, 0), (0, +20), (−20, 0), (0, −20)]:

self.canvas.move(tag, diffx, diffy)

self.canvas.update() # force screen redraw/update

time.sleep(0.25) # pause, but don't block GUI

if __name__ == '__main__':

CanvasEventsDemo()

mainloop()

All three of the scripts in this section create a window of blue ovals and red rectangles

as you drag new shapes out with the left mouse button. The drag-out implementation

itself is inherited from the superclass. A right-mouse-button click still moves a single

shape immediately, and a double-left click still clears the canvas, too—other operations

inherited from the original superclass. In fact, all this new script really does is change

the object creation calls to add tags and colors to drawn objects here, add a text field

at the top of the canvas, and add bindings and callbacks for motion requests. Fig-

ure 9-42 shows what this subclass’s window looks like after dragging out a few shapes

to be animated.

The “o” and “r” keys are set up to start animation of all the ovals and rectangles you’ve

drawn, respectively. Pressing “o,” for example, makes all the blue ovals start moving

synchronously. Objects are animated to mark out five squares around their location

and to move four times per second. New objects drawn while others are in motion start

to move, too, because they are tagged. You need to run these live to get a feel for the

simple animations they implement, of course. (You could try moving this book back

and forth and up and down, but it’s not quite the same, and might look silly in public

places.)

590 | Chapter 9: A tkinter Tour, Part 2

Figure 9-42. Drag-out objects ready to be animated

Using widget.after events

The main drawback of this first approach is that only one animation can be going at

once: if you press “r” or “o” while a move is in progress, the new request puts the prior

movement on hold until it finishes because each move callback handler assumes the

only thread of control while it runs. That is, only one time.sleep loop callback can be

running at a time, and a new one started by an update call is effectively a recursive call

which pauses another loop already in progress.

Screen updates are a bit sluggish while moves are in progress, too, because they happen

only as often as manual update calls are made (try a drag-out or a cover/uncover of the

window during a move to see for yourself). In fact, uncommenting the canvas update

call in Example 9-30 makes the GUI completely unresponsive during the move—it

won’t redraw itself if covered, doesn’t respond to new user requests, and doesn’t show

any of its progress (you only get to see the final state). This effectively simulates the

impact of long-running operations on GUIs in general.

Example 9-31 specializes just the moveInSquares method of the prior example to remove

all such limitations—by using after timer callback loops, it schedules moves without

potential pauses. It also reflects the most common (and likely best) way that tkinter

GUIs handle time-based events at large. By breaking tasks into parts this way instead

of running them all at once, they are naturally both distributed over time and

overlapped.

Time Tools, Threads, and Animation | 591

Example 9-31. PP4E\Gui\Tour\canvasDraw_tags_after.py

"""

similar, but with widget.after() scheduled events, not time.sleep loops;

because these are scheduled events, this allows both ovals and rectangles

to be moving at the _same_ time and does not require update calls to refresh

the GUI; the motion gets wild if you press 'o' or 'r' while move in progress:

multiple move updates start firing around the same time;

"""

from tkinter import *

import canvasDraw_tags

class CanvasEventsDemo(canvasDraw_tags.CanvasEventsDemo):

def moveEm(self, tag, moremoves):

(diffx, diffy), moremoves = moremoves[0], moremoves[1:]

self.canvas.move(tag, diffx, diffy)

if moremoves:

self.canvas.after(250, self.moveEm, tag, moremoves)

def moveInSquares(self, tag):

allmoves = [(+20, 0), (0, +20), (−20, 0), (0, −20)] * 5

self.moveEm(tag, allmoves)

if __name__ == '__main__':

CanvasEventsDemo()

mainloop()

This version inherits the drawing customizations of the prior, but lets you make both

ovals and rectangles move at the same time—drag out a few ovals and rectangles, and

then press “o” and then “r” right away to make this go. In fact, try pressing both keys

a few times; the more you press, the more the objects move, because multiple scheduled

events are firing and moving objects from wherever they happen to be positioned. If

you drag out a new shape during a move, it starts moving immediately as before.

Using multiple time.sleep loop threads

Running animations in threads can sometimes achieve the same effect. As discussed

earlier, it can be dangerous to update the screen from a spawned thread in general, but

it works in this example (on the test platform used, at least). Example 9-32 runs each

animation task as an independent and parallel thread. That is, each time you press the

“o” or “r” key to start an animation, a new thread is spawned to do the work.

Example 9-32. PP4E\Gui\Tour\canvasDraw_tags_thread.py

"""

similar, but run time.sleep loops in parallel with threads, not after() events

or single active time.sleep loop; because threads run in parallel, this also

allows ovals and rectangles to be moving at the _same_ time and does not require

update calls to refresh the GUI: in fact, calling .update() once made this crash

badly, though some canvas calls must be thread safe or this wouldn't work at all;

"""

592 | Chapter 9: A tkinter Tour, Part 2

from tkinter import *

import canvasDraw_tags

import _thread, time

class CanvasEventsDemo(canvasDraw_tags.CanvasEventsDemo):

def moveEm(self, tag):

for i in range(5):

for (diffx, diffy) in [(+20, 0), (0, +20), (−20, 0), (0, −20)]:

self.canvas.move(tag, diffx, diffy)

time.sleep(0.25) # pause this thread only

def moveInSquares(self, tag):

_thread.start_new_thread(self.moveEm, (tag,))

if __name__ == '__main__':

CanvasEventsDemo()

mainloop()

This version lets you move shapes at the same time, just like Example 9-31, but this

time it’s a reflection of threads running in parallel. In fact, this uses the same scheme

as the first time.sleep version. Here, though, there is more than one active thread of

control, so move handlers can overlap in time—time.sleep blocks only the calling

thread, not the program at large.

This example works on Windows today, but it failed on Linux at one point in this

book’s lifetime—the screen was not updated as threads changed it, so you couldn’t see

any changes until later GUI events. The usual rule of thumb about avoiding GUI up-

dates in spawned threads laid out earlier still holds true. It is usually safer to have your

threads do number crunching only and let the main thread (the one that built the GUI)

handle any screen updates. Even under this model, though, the main thread can still

use after event loops like that of Example 9-31 to watch for results from worker threads

to appear without being blocked while waiting (more on this in the next section and

chapter).

Parts of this story are implementation details prone to change over time, and it’s not

impossible that GUI updates in threads may be better supported by tkinter in the future,

so be sure to explore the state of threading in future releases for more details.

More on Protocol Standards

If you want the full story on protocols and ports, at this writing you can find a com-

prehensive list of all ports reserved for protocols or registered as used by various com-

mon systems by searching the web pages maintained by the Internet Engineering Task

Force (IETF) and the Internet Assigned Numbers Authority (IANA). The IETF is the

organization responsible for maintaining web protocols and standards. The IANA is

the central coordinator for the assignment of unique parameter values for Internet pro-

tocols. Another standards body, the W3 (for WWW), also maintains relevant docu-

ments. See these web pages for more details:

786 | Chapter 12: Network Scripting

http://www.ietf.org

http://www.iana.org/numbers.html

http://www.iana.org/assignments/port-numbers

http://www.w3.org

It’s not impossible that more recent repositories for standard protocol specifications

will arise during this book’s shelf life, but the IETF website will likely be the main

authority for some time to come. If you do look, though, be warned that the details are,

well, detailed. Because Python’s protocol modules hide most of the socket and mes-

saging complexity documented in the protocol standards, you usually don’t need to

memorize these documents to get web work done with Python.

Socket Programming

Now that we’ve seen how sockets figure into the Internet picture, let’s move on to

explore the tools that Python provides for programming sockets with Python scripts.

This section shows you how to use the Python socket interface to perform low-level

network communications. In later chapters, we will instead use one of the higher-level

protocol modules that hide underlying sockets. Python’s socket interfaces can be used

directly, though, to implement custom network dialogs and to access standard proto-

cols manually.

As previewed in Chapter 5, the basic socket interface in Python is the standard library’s

socket module. Like the os POSIX module, Python’s socket module is just a thin wrap-

per (interface layer) over the underlying C library’s socket calls. Like Python files, it’s

also object-based—methods of a socket object implemented by this module call out to

the corresponding C library’s operations after data conversions. For instance, the C

library’s send and recv function calls become methods of socket objects in Python.

Python’s socket module supports socket programming on any machine that supports

BSD-style sockets—Windows, Macs, Linux, Unix, and so on—and so provides a port-

able socket interface. In addition, this module supports all commonly used socket

types—TCP/IP, UDP, datagram, and Unix domain—and can be used as both a network

interface API and a general IPC mechanism between processes running on the same

machine.

From a functional perspective, sockets are a programmer’s device for transferring bytes

between programs, possibly running on different computers. Although sockets them-

selves transfer only byte strings, we can also transfer Python objects through them by

using Python’s pickle module. Because this module converts Python objects such as

lists, dictionaries, and class instances to and from byte strings, it provides the extra step

needed to ship higher-level objects through sockets when required.

Socket Programming | 787

Python’s struct module can also be used to format Python objects as packed binary

data byte strings for transmission, but is generally limited in scope to objects that map

to types in the C programming language. The pickle module supports transmission of

larger object, such as dictionaries and class instances. For other tasks, including most

standard Internet protocols, simpler formatted byte strings suffice. We’ll learn more

about pickle later in this chapter and book.

Beyond basic data communication tasks, the socket module also includes a variety of

more advanced tools. For instance, it has calls for the following and more:

• Converting bytes to a standard network ordering (ntohl, htonl)

• Querying machine name and address (gethostname, gethostbyname)

• Wrapping socket objects in a file object interface (sockobj.makefile)

• Making socket calls nonblocking (sockobj.setblocking)

• Setting socket timeouts (sockobj.settimeout)

Provided your Python was compiled with Secure Sockets Layer (SSL) support, the

ssl standard library module also supports encrypted transfers with its

ssl.wrap_socket call. This call wraps a socket object in SSL logic, which is used in turn

by other standard library modules to support the HTTPS secure website protocol

(http.client and urllib.request), secure email transfers (poplib and smtplib), and

more. We’ll meet some of these other modules later in this part of the book, but we

won’t study all of the socket module’s advanced features in this text; see the Python

library manual for usage details omitted here.

Socket Basics

Although we won’t get into advanced socket use in this chapter, basic socket transfers

are remarkably easy to code in Python. To create a connection between machines,

Python programs import the socket module, create a socket object, and call the object’s

methods to establish connections and send and receive data.

Sockets are inherently bidirectional in nature, and socket object methods map directly

to socket calls in the C library. For example, the script in Example 12-1 implements a

program that simply listens for a connection on a socket and echoes back over a socket

whatever it receives through that socket, adding Echo=> string prefixes.

Example 12-1. PP4E\Internet\Sockets\echo-server.py

"""

Server side: open a TCP/IP socket on a port, listen for a message from

a client, and send an echo reply; this is a simple one-shot listen/reply

conversation per client, but it goes into an infinite loop to listen for

more clients as long as this server script runs; the client may run on

a remote machine, or on same computer if it uses 'localhost' for server

"""

788 | Chapter 12: Network Scripting

from socket import * # get socket constructor and constants

myHost = '' # '' = all available interfaces on host

myPort = 50007 # listen on a non-reserved port number

sockobj = socket(AF_INET, SOCK_STREAM) # make a TCP socket object

sockobj.bind((myHost, myPort)) # bind it to server port number

sockobj.listen(5) # listen, allow 5 pending connects

while True: # listen until process killed

connection, address = sockobj.accept() # wait for next client connect

print('Server connected by', address) # connection is a new socket

while True:

data = connection.recv(1024) # read next line on client socket

if not data: break # send a reply line to the client

connection.send(b'Echo=>' + data) # until eof when socket closed

connection.close()

As mentioned earlier, we usually call programs like this that listen for incoming con-

nections servers because they provide a service that can be accessed at a given machine

and port on the Internet. Programs that connect to such a server to access its service

are generally called clients. Example 12-2 shows a simple client implemented in Python.

Example 12-2. PP4E\Internet\Sockets\echo-client.py

"""

Client side: use sockets to send data to the server, and print server's

reply to each message line; 'localhost' means that the server is running

on the same machine as the client, which lets us test client and server

on one machine; to test over the Internet, run a server on a remote

machine, and set serverHost or argv[1] to machine's domain name or IP addr;

Python sockets are a portable BSD socket interface, with object methods

for the standard socket calls available in the system's C library;

"""

import sys

from socket import * # portable socket interface plus constants

serverHost = 'localhost' # server name, or: 'starship.python.net'

serverPort = 50007 # non-reserved port used by the server

message = [b'Hello network world'] # default text to send to server

# requires bytes: b'' or str,encode()

if len(sys.argv) > 1:

serverHost = sys.argv[1] # server from cmd line arg 1

if len(sys.argv) > 2: # text from cmd line args 2..n

message = (x.encode() for x in sys.argv[2:])

sockobj = socket(AF_INET, SOCK_STREAM) # make a TCP/IP socket object

sockobj.connect((serverHost, serverPort)) # connect to server machine + port

for line in message:

sockobj.send(line) # send line to server over socket

data = sockobj.recv(1024) # receive line from server: up to 1k

print('Client received:', data) # bytes are quoted, was `x`, repr(x)

sockobj.close() # close socket to send eof to server

Socket Programming | 789

Server socket calls

Before we see these programs in action, let’s take a minute to explain how this client

and server do their stuff. Both are fairly simple examples of socket scripts, but they

illustrate the common call patterns of most socket-based programs. In fact, this is boil-

erplate code: most connected socket programs generally make the same socket calls

that our two scripts do, so let’s step through the important points of these scripts line

by line.

Programs such as Example 12-1 that provide services for other programs with sockets

generally start out by following this sequence of calls:

sockobj = socket(AF_INET, SOCK_STREAM)

Uses the Python socket module to create a TCP socket object. The names

AF_INET and SOCK_STREAM are preassigned variables defined by and imported from

the socket module; using them in combination means “create a TCP/IP socket,”

the standard communication device for the Internet. More specifically, AF_INET

means the IP address protocol, and SOCK_STREAM means the TCP transfer protocol.

The AF_INET/SOCK_STREAM combination is the default because it is so common, but

it’s typical to make this explicit.

If you use other names in this call, you can instead create things like UDP connec-

tionless sockets (use SOCK_DGRAM second) and Unix domain sockets on the local

machine (use AF_UNIX first), but we won’t do so in this book. See the Python library

manual for details on these and other socket module options. Using other socket

types is mostly a matter of using different forms of boilerplate code.

sockobj.bind((myHost, myPort))

Associates the socket object with an address—for IP addresses, we pass a server

machine name and port number on that machine. This is where the server identifies

the machine and port associated with the socket. In server programs, the hostname

is typically an empty string (“”), which means the machine that the script runs on

(formally, all available local and remote interfaces on the machine), and the port

is a number outside the range 0 to 1023 (which is reserved for standard protocols,

described earlier).

Note that each unique socket dialog you support must have its own port number;

if you try to open a socket on a port already in use, Python will raise an exception.

Also notice the nested parentheses in this call—for the AF_INET address protocol

socket here, we pass the host/port socket address to bind as a two-item tuple object

(pass a string for AF_UNIX). Technically, bind takes a tuple of values appropriate for

the type of socket created.

sockobj.listen(5)

Starts listening for incoming client connections and allows for a backlog of up to

five pending requests. The value passed sets the number of incoming client requests

queued by the operating system before new requests are denied (which happens

only if a server isn’t fast enough to process requests before the queues fill up). A

790 | Chapter 12: Network Scripting

value of 5 is usually enough for most socket-based programs; the value must be at

least 1.

At this point, the server is ready to accept connection requests from client programs

running on remote machines (or the same machine) and falls into an infinite loop—

while True (or the equivalent while 1 for older Pythons and ex-C programmers)—

waiting for them to arrive:

connection, address = sockobj.accept()

Waits for the next client connection request to occur; when it does, the accept call

returns a brand-new socket object over which data can be transferred from and to

the connected client. Connections are accepted on sockobj, but communication

with a client happens on connection, the new socket. This call actually returns a

two-item tuple—address is the connecting client’s Internet address. We can call

accept more than one time, to service multiple client connections; that’s why each

call returns a new, distinct socket for talking to a particular client.

Once we have a client connection, we fall into another loop to receive data from the

client in blocks of up to 1,024 bytes at a time, and echo each block back to the client:

data = connection.recv(1024)

Reads at most 1,024 more bytes of the next message sent from a client (i.e., coming

across the network or IPC connection), and returns it to the script as a byte string.

We get back an empty byte string when the client has finished—end-of-file is trig-

gered when the client closes its end of the socket.

connection.send(b'Echo=>' + data)

Sends the latest byte string data block back to the client program, prepending the

string 'Echo=>' to it first. The client program can then recv what we send here—

the next reply line. Technically this call sends as much data as possible, and returns

the number of bytes actually sent. To be fully robust, some programs may need to

resend unsent portions or use connection.sendall to force all bytes to be sent.

connection.close()

Shuts down the connection with this particular client.

Transferring byte strings and objects

So far we’ve seen calls used to transfer data in a server, but what is it that is actually

shipped through a socket? As we learned in Chapter 5, sockets by themselves always

deal in binary byte strings, not text. To your scripts, this means you must send and will

receive bytes strings, not str, though you can convert to and from text as needed with

bytes.decode and str.encode methods. In our scripts, we use b'...' bytes literals to

satisfy socket data requirements. In other contexts, tools such as the struct and

pickle modules return the byte strings we need automatically, so no extra steps are

needed.

Socket Programming | 791

For example, although the socket model is limited to transferring byte strings, you can

send and receive nearly arbitrary Python objects with the standard library pickle object

serialization module. Its dumps and loads calls convert Python objects to and from byte

strings, ready for direct socket transfer:

>>> import pickle

>>> x = pickle.dumps([99, 100]) # on sending end... convert to byte strings

>>> x # string passed to send, returned by recv

b'\x80\x03]q\x00(KcKde.'

>>> pickle.loads(x) # on receiving end... convert back to object

[99, 100]

For simpler types that correspond to those in the C language, the struct module pro-

vides the byte-string conversion we need as well:

>>> import struct

>>> x = struct.pack('>ii', 99, 100) # convert simpler types for transmission

>>> x

b'\x00\x00\x00c\x00\x00\x00d'

>>> struct.unpack('>ii', x)

(99, 100)

When converted this way, Python native objects become candidates for socket-based

transfers. See Chapter 4 for more on struct. We previewed pickle and object seriali-

zation in Chapter 1, but we’ll learn more about it and its few pickleability constraints

when we explore data persistence in Chapter 17.

In fact there are a variety of ways to extend the basic socket transfer model. For instance,

much like os.fdopen and open for the file descriptors we studied in Chapter 4, the

socket.makefile method allows you to wrap sockets in text-mode file objects that han-

dle text encodings for you automatically. This call also allows you to specify nondefault

Unicode encodings and end-line behaviors in text mode with extra arguments in 3.X

just like the open built-in function. Because its result mimics file interfaces, the

socket.makefile call additionally allows the pickle module’s file-based calls to transfer

objects over sockets implicitly. We’ll see more on socket file wrappers later in this

chapter.

For our simpler scripts here, hardcoded byte strings and direct socket calls do the job.

After talking with a given connected client, the server in Example 12-1 goes back to its

infinite loop and waits for the next client connection request. Let’s move on to see what

happened on the other side of the fence.

Client socket calls

The actual socket-related calls in client programs like the one shown in Exam-

ple 12-2 are even simpler; in fact, half of that script is preparation logic. The main thing

to keep in mind is that the client and server must specify the same port number when

opening their sockets and the client must identify the machine on which the server is

792 | Chapter 12: Network Scripting

running; in our scripts, server and client agree to use port number 50007 for their

conversation, outside the standard protocol range. Here are the client’s socket calls:

sockobj = socket(AF_INET, SOCK_STREAM)

Creates a Python socket object in the client program, just like the server.

sockobj.connect((serverHost, serverPort))

Opens a connection to the machine and port on which the server program is lis-

tening for client connections. This is where the client specifies the string name of

the service to be contacted. In the client, we can either specify the name of the

remote machine as a domain name (e.g., starship.python.net) or numeric IP ad-

dress. We can also give the server name as localhost (or the equivalent IP address

127.0.0.1) to specify that the server program is running on the same machine as

the client; that comes in handy for debugging servers without having to connect

to the Net. And again, the client’s port number must match the server’s exactly.

Note the nested parentheses again—just as in server bind calls, we really pass the

server’s host/port address to connect in a tuple object.

Once the client establishes a connection to the server, it falls into a loop, sending a

message one line at a time and printing whatever the server sends back after each line

is sent:

sockobj.send(line)

Transfers the next byte-string message line to the server over the socket. Notice

that the default list of lines contains bytes strings (b'...'). Just as on the server,

data passed through the socket must be a byte string, though it can be the result

of a manual str.encode encoding call or an object conversion with pickle or

struct if desired. When lines to be sent are given as command-line arguments

instead, they must be converted from str to bytes; the client arranges this by en-

coding in a generator expression (a call map(str.encode, sys.argv[2:]) would have

the same effect).

data = sockobj.recv(1024)

Reads the next reply line sent by the server program. Technically, this reads up to

1,024 bytes of the next reply message and returns it as a byte string.

sockobj.close()

Closes the connection with the server, sending it the end-of-file signal.

And that’s it. The server exchanges one or more lines of text with each client that

connects. The operating system takes care of locating remote machines, routing bytes

sent between programs and possibly across the Internet, and (with TCP) making sure

that our messages arrive intact. That involves a lot of processing, too—our strings may

ultimately travel around the world, crossing phone wires, satellite links, and more along

the way. But we can be happily ignorant of what goes on beneath the socket call layer

when programming in Python.

Socket Programming | 793

Running Socket Programs Locally

Let’s put this client and server to work. There are two ways to run these scripts—on

either the same machine or two different machines. To run the client and the server on

the same machine, bring up two command-line consoles on your computer, start the

server program in one, and run the client repeatedly in the other. The server keeps

running and responds to requests made each time you run the client script in the other

window.

For instance, here is the text that shows up in the MS-DOS console window where I’ve

started the server script:

C:\...\PP4E\Internet\Sockets> python echo-server.py

Server connected by ('127.0.0.1', 57666)

Server connected by ('127.0.0.1', 57667)

Server connected by ('127.0.0.1', 57668)

The output here gives the address (machine IP name and port number) of each con-

necting client. Like most servers, this one runs perpetually, listening for client connec-

tion requests. This server receives three, but I have to show you the client window’s

text for you to understand what this means:

C:\...\PP4E\Internet\Sockets> python echo-client.py

Client received: b'Echo=>Hello network world'

C:\...\PP4E\Internet\Sockets> python echo-client.py localhost spam Spam SPAM

Client received: b'Echo=>spam'

Client received: b'Echo=>Spam'

Client received: b'Echo=>SPAM'

C:\...\PP4E\Internet\Sockets> python echo-client.py localhost Shrubbery

Client received: b'Echo=>Shrubbery'

Here, I ran the client script three times, while the server script kept running in the other

window. Each client connected to the server, sent it a message of one or more lines of

text, and read back the server’s reply—an echo of each line of text sent from the client.

And each time a client is run, a new connection message shows up in the server’s

window (that’s why we got three). Because the server’s coded as an infinite loop, you

may need to kill it with Task Manager on Windows when you’re done testing, because

a Ctrl-C in the server’s console window is ignored; other platforms may fare better.

It’s important to notice that client and server are running on the same machine here (a

Windows PC). The server and client agree on the port number, but they use the machine

names "" and localhost, respectively, to refer to the computer on which they are run-

ning. In fact, there is no Internet connection to speak of. This is just IPC, of the sort

we saw in Chapter 5: sockets also work well as cross-program communications tools

on a single machine.

794 | Chapter 12: Network Scripting

Running Socket Programs Remotely

To make these scripts talk over the Internet rather than on a single machine and sample

the broader scope of sockets, we have to do some extra work to run the server on a

different computer. First, upload the server’s source file to a remote machine where

you have an account and a Python. Here’s how I do it with FTP to a site that hosts a

domain name of my own, learning-python.com; most informational lines in the fol-

lowing have been removed, your server name and upload interface details will vary,

and there are other ways to copy files to a computer (e.g., FTP client GUIs, email, web

page post forms, and so on—see “Tips on Using Remote Servers” on page 798 for

hints on accessing remote servers):

C:\...\PP4E\Internet\Sockets> ftp learning-python.com

Connected to learning-python.com.

User (learning-python.com:(none)): xxxxxxxx

Password: yyyyyyyy

ftp> mkdir scripts

ftp> cd scripts

ftp> put echo-server.py

ftp> quit

Once you have the server program loaded on the other computer, you need to run it

there. Connect to that computer and start the server program. I usually Telnet or SSH

into my server machine and start the server program as a perpetually running process

from the command line. The & syntax in Unix/Linux shells can be used to run the server

script in the background; we could also make the server directly executable with a #!

line and a chmod command (see Chapter 3 for details).

Here is the text that shows up in a window on my PC that is running a SSH session

with the free PuTTY client, connected to the Linux server where my account is hosted

(again, less a few deleted informational lines):

XXXXXXXX@learning-python.com's password: yyyyyyyy

Last login: Fri Apr 23 07:46:33 2010 from 72.236.109.185

[...]$ cd scripts

[...]$ python echo-server.py &

[1] 23016

Now that the server is listening for connections on the Net, run the client on your local

computer multiple times again. This time, the client runs on a different machine than

the server, so we pass in the server’s domain or IP name as a client command-line

argument. The server still uses a machine name of "" because it always listens on what-

ever machine it runs on. Here is what shows up in the remote learning-python.com

server’s SSH window on my PC:

[...]$ Server connected by ('72.236.109.185', 57697)

Server connected by ('72.236.109.185', 57698)

Server connected by ('72.236.109.185', 57699)

Server connected by ('72.236.109.185', 57700)

Socket Programming | 795

And here is what appears in the Windows console window where I run the client. A

“connected by” message appears in the server SSH window each time the client script

is run in the client window:

C:\...\PP4E\Internet\Sockets> python echo-client.py learning-python.com

Client received: b'Echo=>Hello network world'

C:\...\PP4E\Internet\Sockets> python echo-client.py learning-python.com ni Ni NI

Client received: b'Echo=>ni'

Client received: b'Echo=>Ni'

Client received: b'Echo=>NI'

C:\...\PP4E\Internet\Sockets> python echo-client.py learning-python.com Shrubbery

Client received: b'Echo=>Shrubbery'

The ping command can be used to get an IP address for a machine’s domain name;

either machine name form can be used to connect in the client:

C:\...\PP4E\Internet\Sockets> ping learning-python.com

Pinging learning-python.com [97.74.215.115] with 32 bytes of data:

Reply from 97.74.215.115: bytes=32 time=94ms TTL=47

Ctrl-C

C:\...\PP4E\Internet\Sockets> python echo-client.py 97.74.215.115 Brave Sir Robin

Client received: b'Echo=>Brave'

Client received: b'Echo=>Sir'

Client received: b'Echo=>Robin'

This output is perhaps a bit understated—a lot is happening under the hood. The client,

running on my Windows laptop, connects with and talks to the server program running

on a Linux machine perhaps thousands of miles away. It all happens about as fast as

when client and server both run on the laptop, and it uses the same library calls; only

the server name passed to clients differs.

Though simple, this illustrates one of the major advantages of using sockets for cross-

program communication: they naturally support running the conversing programs on

different machines, with little or no change to the scripts themselves. In the process,

sockets make it easy to decouple and distribute parts of a system over a network when

needed.

Socket pragmatics

Before we move on, there are three practical usage details you should know. First, you

can run the client and server like this on any two Internet-aware machines where Python

is installed. Of course, to run the client and server on different computers, you need

both a live Internet connection and access to another machine on which to run the

server.

This need not be an expensive proposition, though; when sockets are opened, Python

is happy to initiate and use whatever connectivity you have, be it a dedicated T1 line,

wireless router, cable modem, or dial-up account. Moreover, if you don’t have a server

796 | Chapter 12: Network Scripting

account of your own like the one I’m using on learning-python.com, simply run client

and server examples on the same machine, localhost, as shown earlier; all you need

then is a computer that allows sockets, and most do.

Second, the socket module generally raises exceptions if you ask for something invalid.

For instance, trying to connect to a nonexistent server (or unreachable servers, if you

have no Internet link) fails:

C:\...\PP4E\Internet\Sockets> python echo-client.py www.nonesuch.com hello

Traceback (most recent call last):

File "echo-client.py", line 24, in <module>

sockobj.connect((serverHost, serverPort)) # connect to server machine...

socket.error: [Errno 10060] A connection attempt failed because the connected

party did not properly respond after a period of time, or established connection

failed because connected host has failed to respond

Finally, also be sure to kill the server process before restarting it again, or else the port

number will still be in use, and you’ll get another exception; on my remote server

machine:

[...]$ ps -x

PID TTY STAT TIME COMMAND

5378 pts/0 S 0:00 python echo-server.py

22017 pts/0 Ss 0:00 -bash

26805 pts/0 R+ 0:00 ps –x

[...]$ python echo-server.py

Traceback (most recent call last):

File "echo-server.py", line 14, in <module>

sockobj.bind((myHost, myPort)) # bind it to server port number

socket.error: [Errno 10048] Only one usage of each socket address (protocol/

network address/port) is normally permitted

A series of Ctrl-Cs will kill the server on Linux (be sure to type fg to bring it to the

foreground first if started with an &):

[...]$ fg

python echo-server.py

Traceback (most recent call last):

File "echo-server.py", line 18, in <module>

connection, address = sockobj.accept() # wait for next client connect

KeyboardInterrupt

As mentioned earlier, a Ctrl-C kill key combination won’t kill the server on my Win-

dows 7 machine, however. To kill the perpetually running server process running lo-

cally on Windows, you may need to start Task Manager (e.g., using a Ctrl-Alt-Delete

key combination), and then end the Python task by selecting it in the process listbox

that appears. Closing the window in which the server is running will also suffice on

Windows, but you’ll lose that window’s command history. You can also usually kill a

server on Linux with a kill −9 pid shell command if it is running in another window

or in the background, but Ctrl-C requires less typing.

Socket Programming | 797

Tips on Using Remote Servers

Some of this chapter’s examples run server code on a remote computer. Though you

can also run the examples locally on localhost, remote execution better captures the

flexibility and power of sockets. To run remotely, you’ll need access to an Internet

accessible computer with Python, where you can upload and run scripts. You’ll also

need to be able to access the remote server from your PC. To help with this last step,

here are a few hints for readers new to using remote servers.

To transfer scripts to a remote machine, the FTP command is standard on Windows

machines and most others. On Windows, simply type it in a console window to connect

to an FTP server or start your favorite FTP client GUI program; on Linux, type the FTP

command in an xterm window. You’ll need to supply your account name and password

to connect to a nonanonymous FTP site. For anonymous FTP, use “anonymous” for

the username and your email address for the password.

To run scripts remotely from a command line, Telnet is a standard command on some

Unix-like machines, too. On Windows, it’s often run as a client GUI. For some server

machines, you’ll need to use SSH secure shell rather than Telnet to access a shell

prompt. There are a variety of SSH utilities available on the Web, including PuTTY,

used for this book. Python itself comes with a telnetlib telnet module, and a web

search will reveals current SSH options for Python scripts, including ssh.py, para-

miko, Twisted, Pexpect, and even subprocess.Popen.

Spawning Clients in Parallel

So far, we’ve run a server locally and remotely, and run individual clients manually,

one after another. Realistic servers are generally intended to handle many clients, of

course, and possibly at the same time. To see how our echo server handles the load,

let’s fire up eight copies of the client script in parallel using the script in Exam-

ple 12-3; see the end of Chapter 5 for details on the launchmodes module used here to

spawn clients and alternatives such as the multiprocessing and subprocess modules.

Example 12-3. PP4E\Internet\Sockets\testecho.py

import sys

from PP4E.launchmodes import QuietPortableLauncher

numclients = 8

def start(cmdline):

QuietPortableLauncher(cmdline, cmdline)()

# start('echo-server.py') # spawn server locally if not yet started

args = ' '.join(sys.argv[1:]) # pass server name if running remotely

for i in range(numclients):

start('echo-client.py %s' % args) # spawn 8? clients to test the server

798 | Chapter 12: Network Scripting

To run this script, pass no arguments to talk to a server listening on port 50007 on the

local machine; pass a real machine name to talk to a server running remotely. Three

console windows come into play in this scheme—the client, a local server, and a remote

server. On Windows, the clients’ output is discarded when spawned from this script,

but it would be similar to what we’ve already seen. Here’s the client window

interaction—8 clients are spawned locally to talk to both a local and a remote server:

C:\...\PP4E\Internet\Sockets> set PYTHONPATH=C:\...\dev\Examples

C:\...\PP4E\Internet\Sockets> python testecho.py

C:\...\PP4E\Internet\Sockets> python testecho.py learning-python.com

If the spawned clients connect to a server run locally (the first run of the script on the

client), connection messages show up in the server’s window on the local machine:

C:\...\PP4E\Internet\Sockets> python echo-server.py

Server connected by ('127.0.0.1', 57721)

Server connected by ('127.0.0.1', 57722)

Server connected by ('127.0.0.1', 57723)

Server connected by ('127.0.0.1', 57724)

Server connected by ('127.0.0.1', 57725)

Server connected by ('127.0.0.1', 57726)

Server connected by ('127.0.0.1', 57727)

Server connected by ('127.0.0.1', 57728)

If the server is running remotely, the client connection messages instead appear in the

window displaying the SSH (or other) connection to the remote computer, here,

learning-python.com:

[...]$ python echo-server.py

Server connected by ('72.236.109.185', 57729)

Server connected by ('72.236.109.185', 57730)

Server connected by ('72.236.109.185', 57731)

Server connected by ('72.236.109.185', 57732)

Server connected by ('72.236.109.185', 57733)

Server connected by ('72.236.109.185', 57734)

Server connected by ('72.236.109.185', 57735)

Server connected by ('72.236.109.185', 57736)

Preview: Denied client connections

The net effect is that our echo server converses with multiple clients, whether running

locally or remotely. Keep in mind, however, that this works for our simple scripts only

because the server doesn’t take a long time to respond to each client’s requests—it can

get back to the top of the server script’s outer while loop in time to process the next

incoming client. If it could not, we would probably need to change the server to handle

each client in parallel, or some might be denied a connection.

Technically, client connections would fail after 5 clients are already waiting for the

server’s attention, as specified in the server’s listen call. To prove this to yourself, add

a time.sleep call somewhere inside the echo server’s main loop in Example 12-1 after

Socket Programming | 799

a connection is accepted, to simulate a long-running task (this is from file echo-server-

sleep.py in the examples package if you wish to experiment):

while True: # listen until process killed

connection, address = sockobj.accept() # wait for next client connect

while True:

data = connection.recv(1024) # read next line on client socket

time.sleep(3) # take time to process request

...

If you then run this server and the testecho clients script, you’ll notice that not all 8

clients wind up receiving a connection, because the server is too busy to empty its

pending-connections queue in time. Only 6 clients are served when I run this on Win-

dows—one accepted initially, and 5 in the pending-requests listen queue. The other

two clients are denied connections and fail.

The following shows the server and client messages produced when the server is stalled

this way, including the error messages that the two denied clients receive. To see the

clients’ messages on Windows, you can change testecho to use the StartArgs launcher

with a /B switch at the front of the command line to route messages to the persistent

console window (see file testecho-messages.py in the examples package):

C:\...\PP4E\dev\Examples\PP4E\Internet\Sockets> echo-server-sleep.py

Server connected by ('127.0.0.1', 59625)

Server connected by ('127.0.0.1', 59626)

Server connected by ('127.0.0.1', 59627)

Server connected by ('127.0.0.1', 59628)

Server connected by ('127.0.0.1', 59629)

Server connected by ('127.0.0.1', 59630)

C:\...\PP4E\dev\Examples\PP4E\Internet\Sockets> testecho-messages.py

/B echo-client.py

Client received: b'Echo=>Hello network world'

Traceback (most recent call last):

File "C:\...\PP4E\Internet\Sockets\echo-client.py", line 24, in <module>

sockobj.connect((serverHost, serverPort)) # connect to server machine...

socket.error: [Errno 10061] No connection could be made because the target

machine actively refused it

Traceback (most recent call last):

File "C:\...\PP4E\Internet\Sockets\echo-client.py", line 24, in <module>

sockobj.connect((serverHost, serverPort)) # connect to server machine...

socket.error: [Errno 10061] No connection could be made because the target

machine actively refused it

800 | Chapter 12: Network Scripting

Client received: b'Echo=>Hello network world'

As you can see, with such a sleepy server, 8 clients are spawned, but only 6 receive

service, and 2 fail with exceptions. Unless clients require very little of the server’s at-

tention, to handle multiple requests overlapping in time we need to somehow service

clients in parallel. We’ll see how servers can handle multiple clients more robustly in

a moment; first, though, let’s experiment with some special ports.

Talking to Reserved Ports

It’s also important to know that this client and server engage in a proprietary sort of

discussion, and so use the port number 50007 outside the range reserved for standard

protocols (0 to 1023). There’s nothing preventing a client from opening a socket on

one of these special ports, however. For instance, the following client-side code con-

nects to programs listening on the standard email, FTP, and HTTP web server ports

on three different server machines:

C:\...\PP4E\Internet\Sockets> python

>>> from socket import *

>>> sock = socket(AF_INET, SOCK_STREAM)

>>> sock.connect(('pop.secureserver.net', 110)) # talk to POP email server

>>> print(sock.recv(70))

b'+OK <14654.1272040794@p3pop01-09.prod.phx3.gdg>\r\n'

>>> sock.close()

>>> sock = socket(AF_INET, SOCK_STREAM)

>>> sock.connect(('learning-python.com', 21)) # talk to FTP server

>>> print(sock.recv(70))

b'220---------- Welcome to Pure-FTPd [privsep] [TLS] ----------\r\n220-You'

>>> sock.close()

>>> sock = socket(AF_INET, SOCK_STREAM)

>>> sock.connect(('www.python.net', 80)) # talk to Python's HTTP server

>>> sock.send(b'GET /\r\n') # fetch root page reply

>>> sock.recv(70)

b'<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"\r\n "http://'

>>> sock.recv(70)

b'www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">\r\n<html xmlns="http://www.'

If we know how to interpret the output returned by these ports’ servers, we could use

raw sockets like this to fetch email, transfer files, and grab web pages and invoke server-

side scripts. Fortunately, though, we don’t have to worry about all the underlying de-

tails—Python’s poplib, ftplib, and http.client and urllib.request modules provide

higher-level interfaces for talking to servers on these ports. Other Python protocol

Socket Programming | 801

modules do the same for other standard ports (e.g., NNTP, Telnet, and so on). We’ll

meet some of these client-side protocol modules in the next chapter.§

Binding reserved port servers

Speaking of reserved ports, it’s all right to open client-side connections on reserved

ports as in the prior section, but you can’t install your own server-side scripts for these

ports unless you have special permission. On the server I use to host learning-

python.com, for instance, the web server port 80 is off limits (presumably, unless I shell

out for a virtual or dedicated hosting account):

[...]$ python

>>> from socket import *

>>> sock = socket(AF_INET, SOCK_STREAM) # try to bind web port on general server

>>> sock.bind(('', 80)) # learning-python.com is a shared machine

Traceback (most recent call last):

File "<stdin>", line 1, in

File "<string>", line 1, in bind

socket.error: (13, 'Permission denied')

Even if run by a user with the required permission, you’ll get the different exception

we saw earlier if the port is already being used by a real web server. On computers being

used as general servers, these ports really are reserved. This is one reason we’ll run a

web server of our own locally for testing when we start writing server-side scripts later

in this book—the above code works on a Windows PC, which allows us to experiment

with websites locally, on a self-contained machine:

C:\...\PP4E\Internet\Sockets> python

>>> from socket import *

>>> sock = socket(AF_INET, SOCK_STREAM) # can bind port 80 on Windows

>>> sock.bind(('', 80)) # allows running server on localhost

>>>

We’ll learn more about installing web servers later in Chapter 15. For the purposes of

this chapter, we need to get realistic about how our socket servers handle their clients.

Handling Multiple Clients

The echo client and server programs shown previously serve to illustrate socket funda-

mentals. But the server model used suffers from a fairly major flaw. As described earlier,

if multiple clients try to connect to the server, and it takes a long time to process a given

client’s request, the server will fail. More accurately, if the cost of handling a given

§ You might be interested to know that the last part of this example, talking to port 80, is exactly what your

web browser does as you surf the Web: followed links direct it to download web pages over this port. In fact,

this lowly port is the primary basis of the Web. In Chapter 15, we will meet an entire application environment

based upon sending formatted data over port 80—CGI server-side scripting. At the bottom, though, the Web

is just bytes over sockets, with a user interface. The wizard behind the curtain is not as impressive as he may

seem!

802 | Chapter 12: Network Scripting

request prevents the server from returning to the code that checks for new clients in a

timely manner, it won’t be able to keep up with all the requests, and some clients will

eventually be denied connections.

In real-world client/server programs, it’s far more typical to code a server so as to avoid

blocking new requests while handling a current client’s request. Perhaps the easiest

way to do so is to service each client’s request in parallel—in a new process, in a new

thread, or by manually switching (multiplexing) between clients in an event loop. This

isn’t a socket issue per se, and we already learned how to start processes and threads

in Chapter 5. But since these schemes are so typical of socket server programming, let’s

explore all three ways to handle client requests in parallel here.

Forking Servers

The script in Example 12-4 works like the original echo server, but instead forks a new

process to handle each new client connection. Because the handleClient function runs

in a new process, the dispatcher function can immediately resume its main loop in

order to detect and service a new incoming request.

Example 12-4. PP4E\Internet\Sockets\fork-server.py

"""

Server side: open a socket on a port, listen for a message from a client,

and send an echo reply; forks a process to handle each client connection;

child processes share parent's socket descriptors; fork is less portable

than threads--not yet on Windows, unless Cygwin or similar installed;

"""

import os, time, sys

from socket import * # get socket constructor and constants

myHost = '' # server machine, '' means local host

myPort = 50007 # listen on a non-reserved port number

sockobj = socket(AF_INET, SOCK_STREAM) # make a TCP socket object

sockobj.bind((myHost, myPort)) # bind it to server port number

sockobj.listen(5) # allow 5 pending connects

def now(): # current time on server

return time.ctime(time.time())

activeChildren = []

def reapChildren(): # reap any dead child processes

while activeChildren: # else may fill up system table

pid, stat = os.waitpid(0, os.WNOHANG) # don't hang if no child exited

if not pid: break

activeChildren.remove(pid)

def handleClient(connection): # child process: reply, exit

time.sleep(5) # simulate a blocking activity

while True: # read, write a client socket

data = connection.recv(1024) # till eof when socket closed

if not data: break

Handling Multiple Clients | 803

reply = 'Echo=>%s at %s' % (data, now())

connection.send(reply.encode())

connection.close()

os._exit(0)

def dispatcher(): # listen until process killed

while True: # wait for next connection,

connection, address = sockobj.accept() # pass to process for service

print('Server connected by', address, end=' ')

print('at', now())

reapChildren() # clean up exited children now

childPid = os.fork() # copy this process

if childPid == 0: # if in child process: handle

handleClient(connection)

else: # else: go accept next connect

activeChildren.append(childPid) # add to active child pid list

dispatcher()

Running the forking server

Parts of this script are a bit tricky, and most of its library calls work only on Unix-like

platforms. Crucially, it runs on Cygwin Python on Windows, but not standard Win-

dows Python. Before we get into too many forking details, though, let’s focus on how

this server arranges to handle multiple client requests.

First, notice that to simulate a long-running operation (e.g., database updates, other

network traffic), this server adds a five-second time.sleep delay in its client handler

function, handleClient. After the delay, the original echo reply action is performed.

That means that when we run a server and clients this time, clients won’t receive the

echo reply until five seconds after they’ve sent their requests to the server.

To help keep track of requests and replies, the server prints its system time each time

a client connect request is received, and adds its system time to the reply. Clients print

the reply time sent back from the server, not their own—clocks on the server and client

may differ radically, so to compare apples to apples, all times are server times. Because

of the simulated delays, we also must usually start each client in its own console window

on Windows (clients will hang in a blocked state while waiting for their reply).

But the grander story here is that this script runs one main parent process on the server

machine, which does nothing but watch for connections (in dispatcher), plus one child

process per active client connection, running in parallel with both the main parent

process and the other client processes (in handleClient). In principle, the server can

handle any number of clients without bogging down.

To test, let’s first start the server remotely in a SSH or Telnet window, and start three

clients locally in three distinct console windows. As we’ll see in a moment, this server

can also be run under Cygwin locally if you have Cygwin but don’t have a remote server

account like the one on learning-python.com used here:

804 | Chapter 12: Network Scripting

[server window (SSH or Telnet)]

[...]$ uname -p -o

i686 GNU/Linux

[...]$ python fork-server.py

Server connected by ('72.236.109.185', 58395) at Sat Apr 24 06:46:45 2010

Server connected by ('72.236.109.185', 58396) at Sat Apr 24 06:46:49 2010

Server connected by ('72.236.109.185', 58397) at Sat Apr 24 06:46:51 2010

[client window 1]

C:\...\PP4E\Internet\Sockets> python echo-client.py learning-python.com

Client received: b"Echo=>b'Hello network world' at Sat Apr 24 06:46:50 2010"

[client window 2]

C:\...\PP4E\Internet\Sockets> python echo-client.py learning-python.com Bruce

Client received: b"Echo=>b'Bruce' at Sat Apr 24 06:46:54 2010"

[client window 3]

C:\...\Sockets> python echo-client.py learning-python.com The Meaning of Life

Client received: b"Echo=>b'The' at Sat Apr 24 06:46:56 2010"

Client received: b"Echo=>b'Meaning' at Sat Apr 24 06:46:56 2010"

Client received: b"Echo=>b'of' at Sat Apr 24 06:46:56 2010"

Client received: b"Echo=>b'Life' at Sat Apr 24 06:46:57 2010"

Again, all times here are on the server machine. This may be a little confusing because

four windows are involved. In plain English, the test proceeds as follows:

1. The server starts running remotely.

2. All three clients are started and connect to the server a few seconds apart.

3. On the server, the client requests trigger three forked child processes, which all

immediately go to sleep for five seconds (to simulate being busy doing something

useful).

4. Each client waits until the server replies, which happens five seconds after their

initial requests.

In other words, clients are serviced at the same time by forked processes, while the main

parent process continues listening for new client requests. If clients were not handled

in parallel like this, no client could connect until the currently connected client’s five-

second delay expired.

In a more realistic application, that delay could be fatal if many clients were trying to

connect at once—the server would be stuck in the action we’re simulating with

time.sleep, and not get back to the main loop to accept new client requests. With

process forks per request, clients can be serviced in parallel.

Notice that we’re using the same client script here (echo-client.py, from Exam-

ple 12-2), just a different server; clients simply send and receive data to a machine and

port and don’t care how their requests are handled on the server. The result displayed

shows a byte string within a byte string, because the client sends one to the server and

the server sends one back; because the server uses string formatting and manual

Handling Multiple Clients | 805

encoding instead of byte string concatenation, the client’s message is shown as byte

string explicitly here.

Other run modes: Local servers with Cygwin and remote clients

Also note that the server is running remotely on a Linux machine in the preceding

section. As we learned in Chapter 5, the fork call is not supported on Windows in

standard Python at the time this book was written. It does run on Cygwin Python,

though, which allows us to start this server locally on localhost, on the same machine

as its clients:

[Cygwin shell window]

[C:\...\PP4E\Internet\Socekts]$ python fork-server.py

Server connected by ('127.0.0.1', 58258) at Sat Apr 24 07:50:15 2010

Server connected by ('127.0.0.1', 58259) at Sat Apr 24 07:50:17 2010

[Windows console, same machine]

C:\...\PP4E\Internet\Sockets> python echo-client.py localhost bright side of life

Client received: b"Echo=>b'bright' at Sat Apr 24 07:50:20 2010"

Client received: b"Echo=>b'side' at Sat Apr 24 07:50:20 2010"

Client received: b"Echo=>b'of' at Sat Apr 24 07:50:20 2010"

Client received: b"Echo=>b'life' at Sat Apr 24 07:50:20 2010"

[Windows console, same machine]

C:\...\PP4E\Internet\Sockets> python echo-client.py

Client received: b"Echo=>b'Hello network world' at Sat Apr 24 07:50:22 2010"

We can also run this test on the remote Linux server entirely, with two SSH or Telnet

windows. It works about the same as when clients are started locally, in a DOS console

window, but here “local” actually means a remote machine you’re using locally. Just

for fun, let’s also contact the remote server from a locally running client to show how

the server is also available to the Internet at large—when servers are coded with sockets

and forks this way, clients can connect from arbitrary machines, and can overlap arbi-

trarily in time:

[one SSH (or Telnet) window]

[...]$ python fork-server.py

Server connected by ('127.0.0.1', 55743) at Sat Apr 24 07:15:14 2010

Server connected by ('127.0.0.1', 55854) at Sat Apr 24 07:15:26 2010

Server connected by ('127.0.0.1', 55950) at Sat Apr 24 07:15:36 2010

Server connected by ('72.236.109.185', 58414) at Sat Apr 24 07:19:50 2010

[another SSH window, same machine]

[...]$ python echo-client.py

Client received: b"Echo=>b'Hello network world' at Sat Apr 24 07:15:19 2010"

[...]$ python echo-client.py localhost niNiNI!

Client received: b"Echo=>b'niNiNI!' at Sat Apr 24 07:15:31 2010"

[...]$ python echo-client.py localhost Say no more!

Client received: b"Echo=>b'Say' at Sat Apr 24 07:15:41 2010"

Client received: b"Echo=>b'no' at Sat Apr 24 07:15:41 2010"

Client received: b"Echo=>b'more!' at Sat Apr 24 07:15:41 2010"

[Windows console, local machine]

806 | Chapter 12: Network Scripting

C:\...\Internet\Sockets> python echo-client.py learning-python.com Blue, no yellow!

Client received: b"Echo=>b'Blue,' at Sat Apr 24 07:19:55 2010"

Client received: b"Echo=>b'no' at Sat Apr 24 07:19:55 2010"

Client received: b"Echo=>b'yellow!' at Sat Apr 24 07:19:55 2010"

Now that we have a handle on the basic model, let’s move on to the tricky bits. This

server script is fairly straightforward as forking code goes, but a few words about the

library tools it employs are in order.

Forked processes and sockets

We met os.fork in Chapter 5, but recall that forked processes are essentially a copy of

the process that forks them, and so they inherit file and socket descriptors from their

parent process. As a result, the new child process that runs the handleClient function

has access to the connection socket created in the parent process. Really, this is why

the child process works at all—when conversing on the connected socket, it’s using

the same socket that parent’s accept call returns. Programs know they are in a forked

child process if the fork call returns 0; otherwise, the original parent process gets back

the new child’s ID.

Exiting from children

In earlier fork examples, child processes usually call one of the exec variants to start a

new program in the child process. Here, instead, the child process simply calls a func-

tion in the same program and exits with os._exit. It’s imperative to call os._exit here—

if we did not, each child would live on after handleClient returns, and compete for

accepting new client requests.

In fact, without the exit call, we’d wind up with as many perpetual server processes as

requests served—remove the exit call and do a ps shell command after running a few

clients, and you’ll see what I mean. With the call, only the single parent process listens

for new requests. os._exit is like sys.exit, but it exits the calling process immediately

without cleanup actions. It’s normally used only in child processes, and sys.exit is

used everywhere else.

Killing the zombies: Don’t fear the reaper!

Note, however, that it’s not quite enough to make sure that child processes exit and

die. On systems like Linux, though not on Cygwin, parents must also be sure to issue

a wait system call to remove the entries for dead child processes from the system’s

process table. If we don’t do this, the child processes will no longer run, but they will

consume an entry in the system process table. For long-running servers, these bogus

entries may become problematic.

It’s common to call such dead-but-listed child processes zombies: they continue to use

system resources even though they’ve already passed over to the great operating system

beyond. To clean up after child processes are gone, this server keeps a list,

Handling Multiple Clients | 807

activeChildren, of the process IDs of all child processes it spawns. Whenever a new

incoming client request is received, the server runs its reapChildren to issue a wait for

any dead children by issuing the standard Python os.waitpid(0,os.WNOHANG) call.

The os.waitpid call attempts to wait for a child process to exit and returns its process

ID and exit status. With a 0 for its first argument, it waits for any child process. With

the WNOHANG p a r a m e t e r f o r i t s s e c o n d , i t d o e s n o t h i n g i f n o c h i l d p r o c e s s h a s e x i t e d ( i . e . ,

it does not block or pause the caller). The net effect is that this call simply asks the

operating system for the process ID of any child that has exited. If any have, the process

ID returned is removed both from the system process table and from this script’s

activeChildren list.

To see why all this complexity is needed, comment out the reapChildren call in this

script, run it on a platform where this is an issue, and then run a few clients. On my

Linux server, a ps -f full process listing command shows that all the dead child pro-

cesses stay in the system process table (show as <defunct>):

[...]$ ps –f

UID PID PPID C STIME TTY TIME CMD

5693094 9990 30778 0 04:34 pts/0 00:00:00 python fork-server.py

5693094 10844 9990 0 04:35 pts/0 00:00:00 [python] <defunct>

5693094 10869 9990 0 04:35 pts/0 00:00:00 [python] <defunct>

5693094 11130 9990 0 04:36 pts/0 00:00:00 [python] <defunct>

5693094 11151 9990 0 04:36 pts/0 00:00:00 [python] <defunct>

5693094 11482 30778 0 04:36 pts/0 00:00:00 ps -f

5693094 30778 30772 0 04:23 pts/0 00:00:00 -bash

When the reapChildren c o m m a n d i s r e a c t i v a t e d , d e a d c h i l d z o m b i e e n t r i e s a r e c l e a n e d

up each time the server gets a new client connection request, by calling the Python

os.waitpid f u n c t i o n . A f e w z o m b i e s m a y a c c u m u l a t e i f t h e s e r v e r i s h e a v i l y l o a d e d , b u t

they will remain only until the next client connection is received (you get only as many

zombies as processes served in parallel since the last accept):

[...]$ python fork-server.py &

[1] 20515

[...]$ ps -f

UID PID PPID C STIME TTY TIME CMD

5693094 20515 30778 0 04:43 pts/0 00:00:00 python fork-server.py

5693094 20777 30778 0 04:43 pts/0 00:00:00 ps -f

5693094 30778 30772 0 04:23 pts/0 00:00:00 -bash

[...]$

Server connected by ('72.236.109.185', 58672) at Sun Apr 25 04:43:51 2010

Server connected by ('72.236.109.185', 58673) at Sun Apr 25 04:43:54 2010

[...]$ ps -f

UID PID PPID C STIME TTY TIME CMD

5693094 20515 30778 0 04:43 pts/0 00:00:00 python fork-server.py

5693094 21339 20515 0 04:43 pts/0 00:00:00 [python] <defunct>

5693094 21398 20515 0 04:43 pts/0 00:00:00 [python] <defunct>

5693094 21573 30778 0 04:44 pts/0 00:00:00 ps -f

5693094 30778 30772 0 04:23 pts/0 00:00:00 -bash

[...]$

Server connected by ('72.236.109.185', 58674) at Sun Apr 25 04:44:07 2010

808 | Chapter 12: Network Scripting

[...]$ ps -f

UID PID PPID C STIME TTY TIME CMD

5693094 20515 30778 0 04:43 pts/0 00:00:00 python fork-server.py

5693094 21646 20515 0 04:44 pts/0 00:00:00 [python] <defunct>

5693094 21813 30778 0 04:44 pts/0 00:00:00 ps -f

5693094 30778 30772 0 04:23 pts/0 00:00:00 -bash

In fact, if you type fast enough, you can actually see a child process morph from a real

running program into a zombie. Here, for example, a child spawned to handle a new

request changes to <defunct> on exit. Its connection cleans up lingering zombies, and

its own process entry will be removed completely when the next request is received:

[...]$

Server connected by ('72.236.109.185', 58676) at Sun Apr 25 04:48:22 2010

[...] ps -f

UID PID PPID C STIME TTY TIME CMD

5693094 20515 30778 0 04:43 pts/0 00:00:00 python fork-server.py

5693094 27120 20515 0 04:48 pts/0 00:00:00 python fork-server.py

5693094 27174 30778 0 04:48 pts/0 00:00:00 ps -f

5693094 30778 30772 0 04:23 pts/0 00:00:00 -bash

[...]$ ps -f

UID PID PPID C STIME TTY TIME CMD

5693094 20515 30778 0 04:43 pts/0 00:00:00 python fork-server.py

5693094 27120 20515 0 04:48 pts/0 00:00:00 [python] <defunct>

5693094 27234 30778 0 04:48 pts/0 00:00:00 ps -f

5693094 30778 30772 0 04:23 pts/0 00:00:00 -bash

Preventing zombies with signal handlers on Linux

On some systems, it’s also possible to clean up zombie child processes by resetting the

signal handler for the SIGCHLD signal delivered to a parent process by the operating

system when a child process stops or exits. If a Python script assigns the SIG_IGN (ignore)

action as the SIGCHLD signal handler, zombies will be removed automatically and im-

mediately by the operating system as child processes exit; the parent need not issue

wait calls to clean up after them. Because of that, this scheme is a simpler alternative

to manually reaping zombies on platforms where it is supported.

If you’ve already read Chapter 5, you know that Python’s standard signal module lets

scripts install handlers for signals—software-generated events. By way of review, here

is a brief bit of background to show how this pans out for zombies. The program in

Example 12-5 installs a Python-coded signal handler function to respond to whatever

signal number you type on the command line.

Example 12-5. PP4E\Internet\Sockets\signal-demo.py

"""

Demo Python's signal module; pass signal number as a command-line arg, and use

a "kill -N pid" shell command to send this process a signal; on my Linux machine,

SIGUSR1=10, SIGUSR2=12, SIGCHLD=17, and SIGCHLD handler stays in effect even if

not restored: all other handlers are restored by Python after caught, but SIGCHLD

behavior is left to the platform's implementation; signal works on Windows too,

but defines only a few signal types; signals are not very portable in general;

Handling Multiple Clients | 809

"""

import sys, signal, time

def now():

return time.asctime()

def onSignal(signum, stackframe): # Python signal handler

print('Got signal', signum, 'at', now()) # most handlers stay in effect

if signum == signal.SIGCHLD: # but sigchld handler is not

print('sigchld caught')

#signal.signal(signal.SIGCHLD, onSignal)

signum = int(sys.argv[1])

signal.signal(signum, onSignal) # install signal handler

while True: signal.pause() # sleep waiting for signals

To run this script, simply put it in the background and send it signals by typing the

kill -signal-number process-id shell command line; this is the shell’s equivalent of

Python’s os.kill function available on Unix-like platforms only. Process IDs are listed

in the PID column of ps command results. Here is this script in action catching signal

numbers 10 (reserved for general use) and 9 (the unavoidable terminate signal):

[...]$ python signal-demo.py 10 &

[1] 10141

[...]$ ps -f

UID PID PPID C STIME TTY TIME CMD

5693094 10141 30778 0 05:00 pts/0 00:00:00 python signal-demo.py 10

5693094 10228 30778 0 05:00 pts/0 00:00:00 ps -f

5693094 30778 30772 0 04:23 pts/0 00:00:00 -bash

[...]$ kill −10 10141

Got signal 10 at Sun Apr 25 05:00:31 2010

[...]$ kill −10 10141

Got signal 10 at Sun Apr 25 05:00:34 2010

[...]$ kill −9 10141

[1]+ Killed python signal-demo.py 10

And in the following the script catches signal 17, which happens to be SIGCHLD on my

Linux server. Signal numbers vary from machine to machine, so you should normally

use their names, not their numbers. SIGCHLD behavior may vary per platform as well.

On my Cygwin install, for example, signal 10 can have different meaning, and signal

20 is SIGCHLD—on Cygwin, the script works as shown on Linux here for signal 10,

but generates an exception if it tries to install on handler for signal 17 (and Cygwin

doesn’t require reaping in any event). See the signal module’s library manual entry for

more details:

[...]$ python signal-demo.py 17 &

[1] 11592

[...]$ ps -f

UID PID PPID C STIME TTY TIME CMD

810 | Chapter 12: Network Scripting

5693094 11592 30778 0 05:00 pts/0 00:00:00 python signal-demo.py 17

5693094 11728 30778 0 05:01 pts/0 00:00:00 ps -f

5693094 30778 30772 0 04:23 pts/0 00:00:00 -bash

[...]$ kill −17 11592

Got signal 17 at Sun Apr 25 05:01:28 2010

sigchld caught

[...]$ kill −17 11592

Got signal 17 at Sun Apr 25 05:01:35 2010

sigchld caught

[...]$ kill −9 11592

[1]+ Killed python signal-demo.py 17

Now, to apply all of this signal knowledge to killing zombies, simply set the SIGCHLD

signal handler to the SIG_IGN ignore handler action; on systems where this assignment

is supported, child processes will be cleaned up when they exit. The forking server

variant shown in Example 12-6 uses this trick to manage its children.

Example 12-6. PP4E\Internet\Sockets\fork-server-signal.py

"""

Same as fork-server.py, but use the Python signal module to avoid keeping

child zombie processes after they terminate, instead of an explicit reaper

loop before each new connection; SIG_IGN means ignore, and may not work with

SIG_CHLD child exit signal on all platforms; see Linux documentation for more

about the restartability of a socket.accept call interrupted with a signal;

"""

import os, time, sys, signal, signal

from socket import * # get socket constructor and constants

myHost = '' # server machine, '' means local host

myPort = 50007 # listen on a non-reserved port number

sockobj = socket(AF_INET, SOCK_STREAM) # make a TCP socket object

sockobj.bind((myHost, myPort)) # bind it to server port number

sockobj.listen(5) # up to 5 pending connects

signal.signal(signal.SIGCHLD, signal.SIG_IGN) # avoid child zombie processes

def now(): # time on server machine

return time.ctime(time.time())

def handleClient(connection): # child process replies, exits

time.sleep(5) # simulate a blocking activity

while True: # read, write a client socket

data = connection.recv(1024)

if not data: break

reply = 'Echo=>%s at %s' % (data, now())

connection.send(reply.encode())

connection.close()

os._exit(0)

def dispatcher(): # listen until process killed

while True: # wait for next connection,

Handling Multiple Clients | 811

connection, address = sockobj.accept() # pass to process for service

print('Server connected by', address, end=' ')

print('at', now())

childPid = os.fork() # copy this process

if childPid == 0: # if in child process: handle

handleClient(connection) # else: go accept next connect

dispatcher()

Where applicable, this technique is:

•Much simpler; we don’t need to manually track or reap child processes.

• More accurate; it leaves no zombies temporarily between client requests.

In fact, only one line is dedicated to handling zombies here: the signal.signal call near

the top, to set the handler. Unfortunately, this version is also even less portable than

using os.fork in the first place, because signals may work slightly differently from plat-

form to platform, even among Unix variants. For instance, some Unix platforms may

not allow SIG_IGN to be used as the SIGCHLD action at all. On Linux systems, though,

this simpler forking server variant works like a charm:

[...]$ python fork-server-signal.py &

[1] 3837

Server connected by ('72.236.109.185', 58817) at Sun Apr 25 08:11:12 2010

[...] ps -f

UID PID PPID C STIME TTY TIME CMD

5693094 3837 30778 0 08:10 pts/0 00:00:00 python fork-server-signal.py

5693094 4378 3837 0 08:11 pts/0 00:00:00 python fork-server-signal.py

5693094 4413 30778 0 08:11 pts/0 00:00:00 ps -f

5693094 30778 30772 0 04:23 pts/0 00:00:00 -bash

[...]$ ps -f

UID PID PPID C STIME TTY TIME CMD

5693094 3837 30778 0 08:10 pts/0 00:00:00 python fork-server-signal.py

5693094 4584 30778 0 08:11 pts/0 00:00:00 ps -f

5693094 30778 30772 0 04:23 pts/0 00:00:00 –bash

Notice how in this version the child process’s entry goes away as soon as it exits, even

before a new client request is received; no “defunct” zombie ever appears. More dra-

matically, if we now start up the script we wrote earlier that spawns eight clients in

parallel (testecho.py) to talk to this server remotely, all appear on the server while run-

ning, but are removed immediately as they exit:

[client window]

C:\...\PP4E\Internet\Sockets> testecho.py learning-python.com

[server window]

[...]$

Server connected by ('72.236.109.185', 58829) at Sun Apr 25 08:16:34 2010

Server connected by ('72.236.109.185', 58830) at Sun Apr 25 08:16:34 2010

Server connected by ('72.236.109.185', 58831) at Sun Apr 25 08:16:34 2010

Server connected by ('72.236.109.185', 58832) at Sun Apr 25 08:16:34 2010

812 | Chapter 12: Network Scripting

Server connected by ('72.236.109.185', 58833) at Sun Apr 25 08:16:34 2010

Server connected by ('72.236.109.185', 58834) at Sun Apr 25 08:16:34 2010

Server connected by ('72.236.109.185', 58835) at Sun Apr 25 08:16:34 2010

Server connected by ('72.236.109.185', 58836) at Sun Apr 25 08:16:34 2010

[...]$ ps -f

UID PID PPID C STIME TTY TIME CMD

5693094 3837 30778 0 08:10 pts/0 00:00:00 python fork-server-signal.py

5693094 9666 3837 0 08:16 pts/0 00:00:00 python fork-server-signal.py

5693094 9667 3837 0 08:16 pts/0 00:00:00 python fork-server-signal.py

5693094 9668 3837 0 08:16 pts/0 00:00:00 python fork-server-signal.py

5693094 9670 3837 0 08:16 pts/0 00:00:00 python fork-server-signal.py

5693094 9674 3837 0 08:16 pts/0 00:00:00 python fork-server-signal.py

5693094 9678 3837 0 08:16 pts/0 00:00:00 python fork-server-signal.py

5693094 9681 3837 0 08:16 pts/0 00:00:00 python fork-server-signal.py

5693094 9682 3837 0 08:16 pts/0 00:00:00 python fork-server-signal.py

5693094 9722 30778 0 08:16 pts/0 00:00:00 ps -f

5693094 30778 30772 0 04:23 pts/0 00:00:00 -bash

[...]$ ps -f

UID PID PPID C STIME TTY TIME CMD

5693094 3837 30778 0 08:10 pts/0 00:00:00 python fork-server-signal.py

5693094 10045 30778 0 08:16 pts/0 00:00:00 ps -f

5693094 30778 30772 0 04:23 pts/0 00:00:00 –bash

And now that I’ve shown you how to use signal handling to reap children automatically

on Linux, I should underscore that this technique is not universally supported across

all flavors of Unix. If you care about portability, manually reaping children as we did

in Example 12-4 may still be desirable.

Why multiprocessing doesn’t help with socket server portability

In Chapter 5, we learned about Python’s new multiprocessing module. As we saw, it

provides a way to start function calls in new processes that is more portable than the

os.fork call used in this section’s server code, and it runs processes instead of threads

to work around the thread GIL in some scenarios. In particular, multiprocessing works

on standard Windows Python too, unlike direct os.fork calls.

I experimented with a server variant based upon this module to see if its portability

might help for socket servers. Its full source code is in the examples package in file

multi-server.py, but here are its important bits that differ:

...rest unchanged from fork-server.py...

from multiprocessing import Process

def handleClient(connection):

print('Child:', os.getpid()) # child process: reply, exit

time.sleep(5) # simulate a blocking activity

while True: # read, write a client socket

data = connection.recv(1024) # till eof when socket closed

...rest unchanged...

def dispatcher(): # listen until process killed

Handling Multiple Clients | 813

while True: # wait for next connection,

connection, address = sockobj.accept() # pass to process for service

print('Server connected by', address, end=' ')

print('at', now())

Process(target=handleClient, args=(connection,)).start()

if __name__ == '__main__':

print('Parent:', os.getpid())

sockobj = socket(AF_INET, SOCK_STREAM) # make a TCP socket object

sockobj.bind((myHost, myPort)) # bind it to server port number

sockobj.listen(5) # allow 5 pending connects

dispatcher()

This server variant is noticeably simpler too. Like the forking server it’s derived from,

this server works fine under Cygwin Python on Windows running as localhost, and

would probably work on other Unix-like platforms as well, because multiprocessing

forks a process on such systems, and file and socket descriptors are inherited by child

processes as usual. Hence, the child process uses the same connected socket as the

parent. Here’s the scene in a Cygwin server window and two Windows client windows:

[server window]

[C:\...\PP4E\Internet\Sockets]$ python multi-server.py

Parent: 8388

Server connected by ('127.0.0.1', 58271) at Sat Apr 24 08:13:27 2010

Child: 8144

Server connected by ('127.0.0.1', 58272) at Sat Apr 24 08:13:29 2010

Child: 8036

[two client windows]

C:\...\PP4E\Internet\Sockets> python echo-client.py

Client received: b"Echo=>b'Hello network world' at Sat Apr 24 08:13:33 2010"

C:\...\PP4E\Internet\Sockets> python echo-client.py localhost Brave Sir Robin

Client received: b"Echo=>b'Brave' at Sat Apr 24 08:13:35 2010"

Client received: b"Echo=>b'Sir' at Sat Apr 24 08:13:35 2010"

Client received: b"Echo=>b'Robin' at Sat Apr 24 08:13:35 2010"

However, this server does not work on standard Windows Python—the whole point

of trying to use multiprocessing in this context—because open sockets are not correctly

pickled when passed as arguments into the new process. Here’s what occurs in the

server windows on Windows 7 with Python 3.1:

C:\...\PP4E\Internet\Sockets> python multi-server.py

Parent: 9140

Server connected by ('127.0.0.1', 58276) at Sat Apr 24 08:17:41 2010

Child: 9628

Process Process-1:

Traceback (most recent call last):

File "C:\Python31\lib\multiprocessing\process.py", line 233, in _bootstrap

self.run()

File "C:\Python31\lib\multiprocessing\process.py", line 88, in run

self._target(*self._args, **self._kwargs)

File "C:\...\PP4E\Internet\Sockets\multi-server.py", line 38, in handleClient

data = connection.recv(1024) # till eof when socket closed

814 | Chapter 12: Network Scripting

socket.error: [Errno 10038] An operation was attempted on something that is not

a socket

Recall from Chapter 5 that on Windows multiprocessing passes context to a new

Python interpreter process by pickling it, and that Process arguments must all be

pickleable for Windows. Sockets in Python 3.1 don’t trigger errors when pickled thanks

to the class they are an instance of, but they are not really pickled correctly:

>>> from pickle import *

>>> from socket import *

>>> s = socket()

>>> x = dumps(s)

>>> s

<socket.socket object, fd=180, family=2, type=1, proto=0>

>>> loads(x)

<socket.socket object, fd=-1, family=0, type=0, proto=0>

>>> x

b'\x80\x03csocket\nsocket\nq\x00)\x81q\x01N}q\x02(X\x08\x00\x00\x00_io_refsq\x03

K\x00X\x07\x00\x00\x00_closedq\x04\x89u\x86q\x05b.'

As we saw in Chapter 5, multiprocessing has other IPC tools such as its own pipes and

queues that might be used instead of sockets to work around this issue, but clients

would then have to use them, too—the resulting server would not be as broadly ac-

cessible as one based upon general Internet sockets.

Even if multiprocessing did work on Windows, though, its need to start a new Python

interpreter would likely make it much slower than the more traditional technique of

spawning threads to talk to clients. Coincidentally, that brings us to our next topic.

Threading Servers

The forking model just described works well on Unix-like platforms in general, but it

suffers from some potentially significant limitations:

Performance

On some machines, starting a new process can be fairly expensive in terms of time

and space resources.

Portability

Forking processes is a Unix technique; as we’ve learned, the os.fork call currently

doesn’t work on non-Unix platforms such as Windows under standard Python. As

we’ve also learned, forks can be used in the Cygwin version of Python on Windows,

but they may be inefficient and not exactly the same as Unix forks. And as we just

discovered, multiprocessing won’t help on Windows, because connected sockets

are not pickleable across process boundaries.

Complexity

If you think that forking servers can be complicated, you’re not alone. As we just

saw, forking also brings with it all the shenanigans of managing and reaping zom-

bies—cleaning up after child processes that live shorter lives than their parents.

Handling Multiple Clients | 815

If you read Chapter 5, you know that one solution to all of these dilemmas is to use

threads rather than processes. Threads run in parallel and share global (i.e., module

and interpreter) memory.

Because threads all run in the same process and memory space, they automatically share

sockets passed between them, similar in spirit to the way that child processes inherit

socket descriptors. Unlike processes, though, threads are usually less expensive to start,

and work on both Unix-like machines and Windows under standard Python today.

Furthermore, many (though not all) see threads as simpler to program—child threads

die silently on exit, without leaving behind zombies to haunt the server.

To illustrate, Example 12-7 is another mutation of the echo server that handles client

requests in parallel by running them in threads rather than in processes.

Example 12-7. PP4E\Internet\Sockets\thread-server.py

"""

Server side: open a socket on a port, listen for a message from a client,

and send an echo reply; echoes lines until eof when client closes socket;

spawns a thread to handle each client connection; threads share global

memory space with main thread; this is more portable than fork: threads

work on standard Windows systems, but process forks do not;

"""

import time, _thread as thread # or use threading.Thread().start()

from socket import * # get socket constructor and constants

myHost = '' # server machine, '' means local host

myPort = 50007 # listen on a non-reserved port number

sockobj = socket(AF_INET, SOCK_STREAM) # make a TCP socket object

sockobj.bind((myHost, myPort)) # bind it to server port number

sockobj.listen(5) # allow up to 5 pending connects

def now():

return time.ctime(time.time()) # current time on the server

def handleClient(connection): # in spawned thread: reply

time.sleep(5) # simulate a blocking activity

while True: # read, write a client socket

data = connection.recv(1024)

if not data: break

reply = 'Echo=>%s at %s' % (data, now())

connection.send(reply.encode())

connection.close()

def dispatcher(): # listen until process killed

while True: # wait for next connection,

connection, address = sockobj.accept() # pass to thread for service

print('Server connected by', address, end=' ')

print('at', now())

thread.start_new_thread(handleClient, (connection,))

dispatcher()

816 | Chapter 12: Network Scripting

This dispatcher delegates each incoming client connection request to a newly spawned

thread running the handleClient function. As a result, this server can process multiple

clients at once, and the main dispatcher loop can get quickly back to the top to check

for newly arrived requests. The net effect is that new clients won’t be denied service

due to a busy server.

Functionally, this version is similar to the fork solution (clients are handled in parallel),

but it will work on any machine that supports threads, including Windows and Linux.

Let’s test it on both. First, start the server on a Linux machine and run clients on both

Linux and Windows:

[window 1: thread-based server process, server keeps accepting

client connections while threads are servicing prior requests]

[...]$ python thread-server.py

Server connected by ('127.0.0.1', 37335) at Sun Apr 25 08:59:05 2010

Server connected by ('72.236.109.185', 58866) at Sun Apr 25 08:59:54 2010

Server connected by ('72.236.109.185', 58867) at Sun Apr 25 08:59:56 2010

Server connected by ('72.236.109.185', 58868) at Sun Apr 25 08:59:58 2010

[window 2: client, but on same remote server machine]

[...]$ python echo-client.py

Client received: b"Echo=>b'Hello network world' at Sun Apr 25 08:59:10 2010"

[windows 3-5: local clients, PC]

C:\...\PP4E\Internet\Sockets> python echo-client.py learning-python.com

Client received: b"Echo=>b'Hello network world' at Sun Apr 25 08:59:59 2010"

C:\...\PP4E\Internet\Sockets> python echo-client.py learning-python.com Bruce

Client received: b"Echo=>b'Bruce' at Sun Apr 25 09:00:01 2010"

C:\...\Sockets> python echo-client.py learning-python.com The Meaning of life

Client received: b"Echo=>b'The' at Sun Apr 25 09:00:03 2010"

Client received: b"Echo=>b'Meaning' at Sun Apr 25 09:00:03 2010"

Client received: b"Echo=>b'of' at Sun Apr 25 09:00:03 2010"

Client received: b"Echo=>b'life' at Sun Apr 25 09:00:03 2010"

Because this server uses threads rather than forked processes, we can run it portably

on both Linux and a Windows PC. Here it is at work again, running on the same local

Windows PC as its clients; again, the main point to notice is that new clients are ac-

cepted while prior clients are being processed in parallel with other clients and the main

thread (in the five-second sleep delay):

[window 1: server, on local PC]

C:\...\PP4E\Internet\Sockets> python thread-server.py

Server connected by ('127.0.0.1', 58987) at Sun Apr 25 12:41:46 2010

Server connected by ('127.0.0.1', 58988) at Sun Apr 25 12:41:47 2010

Server connected by ('127.0.0.1', 58989) at Sun Apr 25 12:41:49 2010

[windows 2-4: clients, on local PC]

C:\...\PP4E\Internet\Sockets> python echo-client.py

Client received: b"Echo=>b'Hello network world' at Sun Apr 25 12:41:51 2010"

C:\...\PP4E\Internet\Sockets> python echo-client.py localhost Brian

Handling Multiple Clients | 817

Client received: b"Echo=>b'Brian' at Sun Apr 25 12:41:52 2010"

C:\...\PP4E\Internet\Sockets> python echo-client.py localhost Bright side of life

Client received: b"Echo=>b'Bright' at Sun Apr 25 12:41:54 2010"

Client received: b"Echo=>b'side' at Sun Apr 25 12:41:54 2010"

Client received: b"Echo=>b'of' at Sun Apr 25 12:41:54 2010"

Client received: b"Echo=>b'life' at Sun Apr 25 12:41:54 2010"

Remember that a thread silently exits when the function it is running returns; unlike

the process fork version, we don’t call anything like os._exit in the client handler func-

tion (and we shouldn’t—it may kill all threads in the process, including the main loop

watching for new connections!). Because of this, the thread version is not only more

portable, but also simpler.

Standard Library Server Classes

Now that I’ve shown you how to write forking and threading servers to process clients

without blocking incoming requests, I should also tell you that there are standard tools

in the Python standard library to make this process even easier. In particular, the

socketserver module defines classes that implement all flavors of forking and threading

servers that you are likely to be interested in.

Like the manually-coded servers we’ve just studied, this module’s primary classes im-

plement servers which process clients in parallel (a.k.a. asynchronously) to avoid de-

nying service to new requests during long-running transactions. Their net effect is to

automate the top-levels of common server code. To use this module, simply create the

desired kind of imported server object, passing in a handler object with a callback

method of your own, as demonstrated in the threaded TCP server of Example 12-8.

Example 12-8. PP4E\Internet\Sockets\class-server.py

"""

Server side: open a socket on a port, listen for a message from a client, and

send an echo reply; this version uses the standard library module socketserver to

do its work; socketserver provides TCPServer, ThreadingTCPServer, ForkingTCPServer,

UDP variants of these, and more, and routes each client connect request to a new

instance of a passed-in request handler object's handle method; socketserver also

supports Unix domain sockets, but only on Unixen; see the Python library manual.

"""

import socketserver, time # get socket server, handler objects

myHost = '' # server machine, '' means local host

myPort = 50007 # listen on a non-reserved port number

def now():

return time.ctime(time.time())

class MyClientHandler(socketserver.BaseRequestHandler):

def handle(self): # on each client connect

print(self.client_address, now()) # show this client's address

time.sleep(5) # simulate a blocking activity

while True: # self.request is client socket

818 | Chapter 12: Network Scripting

data = self.request.recv(1024) # read, write a client socket

if not data: break

reply = 'Echo=>%s at %s' % (data, now())

self.request.send(reply.encode())

self.request.close()

# make a threaded server, listen/handle clients forever

myaddr = (myHost, myPort)

server = socketserver.ThreadingTCPServer(myaddr, MyClientHandler)

server.serve_forever()

This server works the same as the threading server we wrote by hand in the previous

section, but instead focuses on service implementation (the customized handle

method), not on threading details. It is run the same way, too—here it is processing

three clients started by hand, plus eight spawned by the testecho script shown we wrote

in Example 12-3:

[window 1: server, serverHost='localhost' in echo-client.py]

C:\...\PP4E\Internet\Sockets> python class-server.py

('127.0.0.1', 59036) Sun Apr 25 13:50:23 2010

('127.0.0.1', 59037) Sun Apr 25 13:50:25 2010

('127.0.0.1', 59038) Sun Apr 25 13:50:26 2010

('127.0.0.1', 59039) Sun Apr 25 13:51:05 2010

('127.0.0.1', 59040) Sun Apr 25 13:51:05 2010

('127.0.0.1', 59041) Sun Apr 25 13:51:06 2010

('127.0.0.1', 59042) Sun Apr 25 13:51:06 2010

('127.0.0.1', 59043) Sun Apr 25 13:51:06 2010

('127.0.0.1', 59044) Sun Apr 25 13:51:06 2010

('127.0.0.1', 59045) Sun Apr 25 13:51:06 2010

('127.0.0.1', 59046) Sun Apr 25 13:51:06 2010

[windows 2-4: client, same machine]

C:\...\PP4E\Internet\Sockets> python echo-client.py

Client received: b"Echo=>b'Hello network world' at Sun Apr 25 13:50:28 2010"

C:\...\PP4E\Internet\Sockets> python echo-client.py localhost Arthur

Client received: b"Echo=>b'Arthur' at Sun Apr 25 13:50:30 2010"

C:\...\PP4E\Internet\Sockets> python echo-client.py localhost Brave Sir Robin

Client received: b"Echo=>b'Brave' at Sun Apr 25 13:50:31 2010"

Client received: b"Echo=>b'Sir' at Sun Apr 25 13:50:31 2010"

Client received: b"Echo=>b'Robin' at Sun Apr 25 13:50:31 2010"

C:\...\PP4E\Internet\Sockets> python testecho.py

To build a forking server instead, just use the class name ForkingTCPServer when cre-

ating the server object. The socketserver module has more power than shown by this

example; it also supports nonparallel (a.k.a. serial or synchronous) servers, UDP and

Unix domain sockets, and Ctrl-C server interrupts on Windows. See Python’s library

manual for more details.

For more advanced server needs, Python also comes with standard library tools that

use those shown here, and allow you to implement in just a few lines of Python code a

Handling Multiple Clients | 819

simple but fully-functional HTTP (web) server that knows how to run server-side CGI

scripts. We’ll explore those larger server tools in Chapter 15.

Multiplexing Servers with select

So far we’ve seen how to handle multiple clients at once with both forked processes

and spawned threads, and we’ve looked at a library class that encapsulates both

schemes. Under both approaches, all client handlers seem to run in parallel with one

another and with the main dispatch loop that continues watching for new incoming

requests. Because all of these tasks run in parallel (i.e., at the same time), the server

doesn’t get blocked when accepting new requests or when processing a long-running

client handler.

Technically, though, threads and processes don’t really run in parallel, unless you’re

lucky enough to have a machine with many CPUs. Instead, your operating system

performs a juggling act—it divides the computer’s processing power among all active

tasks. It runs part of one, then part of another, and so on. All the tasks appear to run

in parallel, but only because the operating system switches focus between tasks so fast

that you don’t usually notice. This process of switching between tasks is sometimes

called time-slicing when done by an operating system; it is more generally known as

multiplexing.

When we spawn threads and processes, we rely on the operating system to juggle the

active tasks so that none are starved of computing resources, especially the main server

dispatcher loop. However, there’s no reason that a Python script can’t do so as well.

For instance, a script might divide tasks into multiple steps—run a step of one task,

then one of another, and so on, until all are completed. The script need only know how

to divide its attention among the multiple active tasks to multiplex on its own.

Servers can apply this technique to yield yet another way to handle multiple clients at

once, a way that requires neither threads nor forks. By multiplexing client connections

and the main dispatcher with the select system call, a single event loop can process

multiple clients and accept new ones in parallel (or at least close enough to avoid stall-

ing). Such servers are sometimes called asynchronous, because they service clients in

spurts, as each becomes ready to communicate. In asynchronous servers, a single main

loop run in a single process and thread decides which clients should get a bit of attention

each time through. Client requests and the main dispatcher loop are each given a small

slice of the server’s attention if they are ready to converse.

Most of the magic behind this server structure is the operating system select call,

available in Python’s standard select module on all major platforms. Roughly,

select is asked to monitor a list of input sources, output sources, and exceptional

condition sources and tells us which sources are ready for processing. It can be made

to simply poll all the sources to see which are ready; wait for a maximum time period

for sources to become ready; or wait indefinitely until one or more sources are ready

for processing.

820 | Chapter 12: Network Scripting

However used, select lets us direct attention to sockets ready to communicate, so as

to avoid blocking on calls to ones that are not. That is, when the sources passed to

select are sockets, we can be sure that socket calls like accept, recv, and send will not

block (pause) the server when applied to objects returned by select. Because of that,

a single-loop server that uses select need not get stuck communicating with one client

or waiting for new ones while other clients are starved for the server’s attention.

Because this type of server does not need to start threads or processes, it can be efficient

when transactions with clients are relatively short-lived. However, it also requires that

these transactions be quick; if they are not, it still runs the risk of becoming bogged

down waiting for a dialog with a particular client to end, unless augmented with threads

or forks for long-running transactions.‖

A select-based echo server

Let’s see how all of this translates into code. The script in Example 12-9 implements

another echo server, one that can handle multiple clients without ever starting new

processes or threads.

Example 12-9. PP4E\Internet\Sockets\select-server.py

"""

Server: handle multiple clients in parallel with select. use the select

module to manually multiplex among a set of sockets: main sockets which

accept new client connections, and input sockets connected to accepted

clients; select can take an optional 4th arg--0 to poll, n.m to wait n.m

seconds, or omitted to wait till any socket is ready for processing.

"""

import sys, time

from select import select

from socket import socket, AF_INET, SOCK_STREAM

def now(): return time.ctime(time.time())

myHost = '' # server machine, '' means local host

myPort = 50007 # listen on a non-reserved port number

if len(sys.argv) == 3: # allow host/port as cmdline args too

myHost, myPort = sys.argv[1:]

numPortSocks = 2 # number of ports for client connects

# make main sockets for accepting new client requests

mainsocks, readsocks, writesocks = [], [], []

for i in range(numPortSocks):

portsock = socket(AF_INET, SOCK_STREAM) # make a TCP/IP socket object

‖Confusingly, select-based servers are often called asynchronous, to describe their multiplexing of short-lived

transactions. Really, though, the classic forking and threading servers we met earlier are asynchronous, too,

as they do not wait for completion of a given client’s request. There is a clearer distinction between serial and

parallel servers—the former process one transaction at a time and the latter do not—and “synchronous” and

“asynchronous” are essentially synonyms for “serial” and “parallel.” By this definition, forking, threading,

and select loops are three alternative ways to implement parallel, asynchronous servers.

Handling Multiple Clients | 821

portsock.bind((myHost, myPort)) # bind it to server port number

portsock.listen(5) # listen, allow 5 pending connects

mainsocks.append(portsock) # add to main list to identify

readsocks.append(portsock) # add to select inputs list

myPort += 1 # bind on consecutive ports

# event loop: listen and multiplex until server process killed

print('select-server loop starting')

while True:

#print(readsocks)

readables, writeables, exceptions = select(readsocks, writesocks, [])

for sockobj in readables:

if sockobj in mainsocks: # for ready input sockets

# port socket: accept new client

newsock, address = sockobj.accept() # accept should not block

print('Connect:', address, id(newsock)) # newsock is a new socket

readsocks.append(newsock) # add to select list, wait

else:

# client socket: read next line

data = sockobj.recv(1024) # recv should not block

print('\tgot', data, 'on', id(sockobj))

if not data: # if closed by the clients

sockobj.close() # close here and remv from

readsocks.remove(sockobj) # del list else reselected

else:

# this may block: should really select for writes too

reply = 'Echo=>%s at %s' % (data, now())

sockobj.send(reply.encode())

The bulk of this script is its while event loop at the end that calls select to find out

which sockets are ready for processing; these include both main port sockets on which

clients can connect and open client connections. It then loops over all such ready sock-

ets, accepting connections on main port sockets and reading and echoing input on any

client sockets ready for input. Both the accept and recv calls in this code are guaranteed

to not block the server process after select returns; as a result, this server can quickly

get back to the top of the loop to process newly arrived client requests and already

connected clients’ inputs. The net effect is that all new requests and clients are serviced

in pseudoparallel fashion.

To make this process work, the server appends the connected socket for each client to

the readables list passed to select, and simply waits for the socket to show up in the

selected inputs list. For illustration purposes, this server also listens for new clients on

more than one port—on ports 50007 and 50008, in our examples. Because these main

port sockets are also interrogated with select, connection requests on either port can

be accepted without blocking either already connected clients or new connection re-

quests appearing on the other port. The select call returns whatever sockets in

readables are ready for processing—both main port sockets and sockets connected to

clients currently being processed.

822 | Chapter 12: Network Scripting

Running the select server

Let’s run this script locally to see how it does its stuff (the client and server can also be

run on different machines, as in prior socket examples). First, we’ll assume we’ve al-

ready started this server script on the local machine in one window, and run a few

clients to talk to it. The following listing gives the interaction in two such client console

windows running on Windows. The first client simply runs the echo-client script twice

to contact the server, and the second also kicks off the testecho script to spawn eight

echo-client programs running in parallel.

As before, the server simply echoes back whatever text that client sends, though without

a sleep pause here (more on this in a moment). Notice how the second client window

really runs a script called echo-client-50008 so as to connect to the second port socket

in the server; it’s the same as echo-client, with a different hardcoded port number;

alas, the original script wasn’t designed to input a port number:

[client window 1]

C:\...\PP4E\Internet\Sockets> python echo-client.py

Client received: b"Echo=>b'Hello network world' at Sun Apr 25 14:51:21 2010"

C:\...\PP4E\Internet\Sockets> python echo-client.py

Client received: b"Echo=>b'Hello network world' at Sun Apr 25 14:51:27 2010"

[client window 2]

C:\...\PP4E\Internet\Sockets> python echo-client-5008.py localhost Sir Galahad

Client received: b"Echo=>b'Sir' at Sun Apr 25 14:51:22 2010"

Client received: b"Echo=>b'Galahad' at Sun Apr 25 14:51:22 2010"

C:\...\PP4E\Internet\Sockets> python testecho.py

The next listing is the sort of output that show up in the window where the server has

been started. The first three connections come from echo-client runs; the rest is the

result of the eight programs spawned by testecho in the second client window. We can

run this server on Windows, too, because select is available on this platform. Correlate

this output with the server’s code to see how it runs.

Notice that for testecho, new client connections and client inputs are multiplexed to-

gether. If you study the output closely, you’ll see that they overlap in time, because all

activity is dispatched by the single event loop in the server. In fact, the trace output on

the server will probably look a bit different nearly every time it runs. Clients and new

connections are interleaved almost at random due to timing differences on the host

machines. This happens in the earlier forking and treading servers, too, but the oper-

ating system automatically switches between the execution paths of the dispatcher loop

and client transactions.

Also note that the server gets an empty string when the client has closed its socket. We

take care to close and delete these sockets at the server right away, or else they would

be needlessly reselected again and again, each time through the main loop:

[server window]

C:\...\PP4E\Internet\Sockets> python select-server.py

Handling Multiple Clients | 823

C:\Users\mark\Stuff\Books\4E\PP4E\dev\Examples\PP4E\Internet\Sockets>python sele

ct-server.py

select-server loop starting

Connect: ('127.0.0.1', 59080) 21339352

got b'Hello network world' on 21339352

got b'' on 21339352

Connect: ('127.0.0.1', 59081) 21338128

got b'Sir' on 21338128

got b'Galahad' on 21338128

got b'' on 21338128

Connect: ('127.0.0.1', 59082) 21339352

got b'Hello network world' on 21339352

got b'' on 21339352

[testecho results]

Connect: ('127.0.0.1', 59083) 21338128

got b'Hello network world' on 21338128

got b'' on 21338128

Connect: ('127.0.0.1', 59084) 21339352

got b'Hello network world' on 21339352

got b'' on 21339352

Connect: ('127.0.0.1', 59085) 21338128

got b'Hello network world' on 21338128

got b'' on 21338128

Connect: ('127.0.0.1', 59086) 21339352

got b'Hello network world' on 21339352

got b'' on 21339352

Connect: ('127.0.0.1', 59087) 21338128

got b'Hello network world' on 21338128

got b'' on 21338128

Connect: ('127.0.0.1', 59088) 21339352

Connect: ('127.0.0.1', 59089) 21338128

got b'Hello network world' on 21339352

got b'Hello network world' on 21338128

Connect: ('127.0.0.1', 59090) 21338056

got b'' on 21339352

got b'' on 21338128

got b'Hello network world' on 21338056

got b'' on 21338056

Besides this more verbose output, there’s another subtle but crucial difference to

notice—a time.sleep call to simulate a long-running task doesn’t make sense in the

server here. Because all clients are handled by the same single loop, sleeping would

pause everything, and defeat the whole point of a multiplexing server. Again, manual

multiplexing servers like this one work well when transactions are short, but also gen-

erally require them to either be so, or be handled specially.

Before we move on, here are a few additional notes and options:

select call details

Formally, select is called with three lists of selectable objects (input sources, out-

put sources, and exceptional condition sources), plus an optional timeout. The

timeout argument may be a real wait expiration value in seconds (use floating-point

824 | Chapter 12: Network Scripting

numbers to express fractions of a second), a zero value to mean simply poll and

return immediately, or omitted to mean wait until at least one object is ready (as

done in our server script). The call returns a triple of ready objects—subsets of the

first three arguments—any or all of which may be empty if the timeout expired

before sources became ready.

select portability

Like threading, but unlike forking, this server works in standard Windows Python,

too. Technically, the select call works only for sockets on Windows, but also

works for things like files and pipes on Unix and Macintosh. For servers running

over the Internet, of course, the primary devices we are interested in are sockets.

Nonblocking sockets

select lets us be sure that socket calls like accept and recv won’t block (pause) the

caller, but it’s also possible to make Python sockets nonblocking in general. Call

the setblocking method of socket objects to set the socket to blocking or non-

blocking mode. For example, given a call like sock.setblocking(flag), the socket

sock is set to nonblocking mode if the flag is zero and to blocking mode otherwise.

All sockets start out in blocking mode initially, so socket calls may always make

the caller wait.

However, when in nonblocking mode, a socket.error exception is raised if a

recv socket call doesn’t find any data, or if a send call can’t immediately transfer

data. A script can catch this exception to determine whether the socket is ready for

processing. In blocking mode, these calls always block until they can proceed. Of

course, there may be much more to processing client requests than data transfers

(requests may also require long-running computations), so nonblocking sockets

don’t guarantee that servers won’t stall in general. They are simply another way to

code multiplexing servers. Like select, they are better suited when client requests

can be serviced quickly.

The asyncore module framework

If you’re interested in using select, you will probably also be interested in checking

out the asyncore.py module in the standard Python library. It implements a class-

based callback model, where input and output callbacks are dispatched to class

methods by a precoded select event loop. As such, it allows servers to be con-

structed without threads or forks, and it is a select-based alternative to the sock

etserver module’s threading and forking module we met in the prior sections. As

for this type of server in general, asyncore is best when transactions are short—

what it describes as “I/O bound” instead of “CPU bound” programs, the latter of

which still require threads or forks. See the Python library manual for details and

a usage example.

Twisted

For other server options, see also the open source Twisted system (http://twisted

matrix.com). Twisted is an asynchronous networking framework written in Python

that supports TCP, UDP, multicast, SSL/TLS, serial communication, and more. It

Handling Multiple Clients | 825

supports both clients and servers and includes implementations of a number of

commonly used network services such as a web server, an IRC chat server, a mail

server, a relational database interface, and an object broker.

Although Twisted supports processes and threads for longer-running actions, it

also uses an asynchronous, event-driven model to handle clients, which is similar

to the event loop of GUI libraries like tkinter. It abstracts an event loop, which

multiplexes among open socket connections, automates many of the details in-

herent in an asynchronous server, and provides an event-driven framework for

scripts to use to accomplish application tasks. Twisted’s internal event engine is

similar in spirit to our select-based server and the asyncore module, but it is re-

garded as much more advanced. Twisted is a third-party system, not a standard

library tool; see its website and documentation for more details.

Summary: Choosing a Server Scheme

So when should you use select to build a server, instead of threads or forks? Needs

vary per application, of course, but as mentioned, servers based on the select call

generally perform very well when client transactions are relatively short and are not

CPU-bound. If they are not short, threads or forks may be a better way to split pro-

cessing among multiple clients. Threads and forks are especially useful if clients require

long-running processing above and beyond the socket calls used to pass data. However,

combinations are possible too—nothing is stopping a select-based polling loop from

using threads, too.

It’s important to remember that schemes based on select (and nonblocking sockets)

are not completely immune to blocking. In Example 12-9, for instance, the send call

that echoes text back to a client might block, too, and hence stall the entire server. We

could work around that blocking potential by using select to make sure that the output

operation is ready before we attempt it (e.g., use the writesocks list and add another

loop to send replies to ready output sockets), albeit at a noticeable cost in program

clarity.

In general, though, if we cannot split up the processing of a client’s request in such a

way that it can be multiplexed with other requests and not block the server’s main loop,

select may not be the best way to construct a server by itself. While some network

servers can satisfy this constraint, many cannot.

Moreover, select also seems more complex than spawning either processes or threads,

because we need to manually transfer control among all tasks (for instance, compare

the threaded and select versions of our echo server, even without write selects). As

usual, though, the degree of that complexity varies per application. The asyncore

standard library module mentioned earlier simplifies some of the tasks of implementing

a select-based event-loop socket server, and Twisted offers additional hybrid

solutions.

826 | Chapter 12: Network Scripting

Making Sockets Look Like Files and Streams

So far in this chapter, we’ve focused on the role of sockets in the classic client/server

networking model. That’s one of their primary roles, but they have other common use

cases as well.

In Chapter 5, for instance, we saw sockets as a basic IPC device between processes and

threads on a single machine. And in Chapter 10’s exploration of linking non-GUI scripts

to GUIs, we wrote a utility module (Example 10-23) which connected a caller’s standard

output stream to a socket, on which a GUI could listen for output to be displayed.

There, I promised that we’d flesh out that module with additional transfer modes once

we had a chance to explore sockets in more depth. Now that we have, this section takes

a brief detour from the world of remote network servers to tell the rest of this story.

Although some programs can be written or rewritten to converse over sockets explicitly,

this isn’t always an option; it may be too expensive an effort for existing scripts, and

might preclude desirable nonsocket usage modes for others. In some cases, it’s better

to allow a script to use standard stream tools such as the print and input built-in

functions and sys module file calls (e.g., sys.stdout.write), and connect them to sock-

ets only when needed.

Because such stream tools are designed to operate on text-mode files, though, probably

the biggest trick here is fooling them into operating on the inherently binary mode and

very different method interface of sockets. Luckily, sockets come with a method that

achieves all the forgery we need.

The socket object makefile method comes in handy anytime you wish to process a

socket with normal file object methods or need to pass a socket to an existing interface

or program that expects a file. The socket wrapper object returned allows your scripts

to transfer data over the underlying socket with read and write calls, rather than recv

and send. Since input and print built-in functions use the former methods set, they will

happily interact with sockets wrapped by this call, too.

The makefile method also allows us to treat normally binary socket data as text instead

of byte strings, and has additional arguments such as encoding that let us specify non-

default Unicode encodings for the transferred text—much like the built-in open and

os.fdopen calls we met in Chapter 4 do for file descriptors. Although text can always

be encoded and decoded with manual calls after binary mode socket transfers, make

file shifts the burden of text encodings from your code to the file wrapper object.

This equivalence to files comes in handy any time we want to use software that supports

file interfaces. For example, the Python pickle module’s load and dump methods expect

an object with a file-like interface (e.g., read and write methods), but they don’t require

a physical file. Passing a TCP/IP socket wrapped with the makefile call to the pickler

allows us to ship serialized Python objects over the Internet, without having to pickle

to byte strings ourselves and call raw socket methods manually. This is an alternative

to using the pickle module’s string-based calls (dumps, loads) with socket send and

Making Sockets Look Like Files and Streams | 827

recv calls, and might offer more flexibility for software that must support a variety of

transport mechanisms. See Chapter 17 for more details on object serialization

interfaces.

More generally, any component that expects a file-like method protocol will gladly

accept a socket wrapped with a socket object makefile call. Such interfaces will also

accept strings wrapped with the built-in io.StringIO class, and any other sort of object

that supports the same kinds of method calls as built-in file objects. As always in Python,

we code to protocols—object interfaces—not to specific datatypes.

A Stream Redirection Utility

To illustrate the makefile method’s operation, Example 12-10 implements a variety of

redirection schemes, which redirect the caller’s streams to a socket that can be used by

another process for communication. The first of its functions connects output, and is

what we used in Chapter 10; the others connect input, and both input and output in

three different modes.

Naturally, the wrapper object returned by socket.makefile can also be used with direct

file interface read and write method calls and independently of standard streams. This

example uses those methods, too, albeit in most cases indirectly and implicitly through

the print and input stream access built-ins, and reflects a common use case for the tool.

Example 12-10. PP4E\Internet\Sockets\socket_stream_redirect.py

"""

###############################################################################

Tools for connecting standard streams of non-GUI programs to sockets that

a GUI (or other) program can use to interact with the non-GUI program.

###############################################################################

"""

import sys

from socket import *

port = 50008 # pass in different port if multiple dialogs on machine

host = 'localhost' # pass in different host to connect to remote listeners

def initListenerSocket(port=port):

"""

initialize connected socket for callers that listen in server mode

"""

sock = socket(AF_INET, SOCK_STREAM)

sock.bind(('', port)) # listen on this port number

sock.listen(5) # set pending queue length

conn, addr = sock.accept() # wait for client to connect

return conn # return connected socket

def redirectOut(port=port, host=host):

"""

connect caller's standard output stream to a socket for GUI to listen

start caller after listener started, else connect fails before accept

828 | Chapter 12: Network Scripting

"""

sock = socket(AF_INET, SOCK_STREAM)

sock.connect((host, port)) # caller operates in client mode

file = sock.makefile('w') # file interface: text, buffered

sys.stdout = file # make prints go to sock.send

return sock # if caller needs to access it raw

def redirectIn(port=port, host=host):

"""

connect caller's standard input stream to a socket for GUI to provide

"""

sock = socket(AF_INET, SOCK_STREAM)

sock.connect((host, port))

file = sock.makefile('r') # file interface wrapper

sys.stdin = file # make input come from sock.recv

return sock # return value can be ignored

def redirectBothAsClient(port=port, host=host):

"""

connect caller's standard input and output stream to same socket

in this mode, caller is client to a server: sends msg, receives reply

"""

sock = socket(AF_INET, SOCK_STREAM)

sock.connect((host, port)) # or open in 'rw' mode

ofile = sock.makefile('w') # file interface: text, buffered

ifile = sock.makefile('r') # two file objects wrap same socket

sys.stdout = ofile # make prints go to sock.send

sys.stdin = ifile # make input come from sock.recv

return sock

def redirectBothAsServer(port=port, host=host):

"""

connect caller's standard input and output stream to same socket

in this mode, caller is server to a client: receives msg, send reply

"""

sock = socket(AF_INET, SOCK_STREAM)

sock.bind((host, port)) # caller is listener here

sock.listen(5)

conn, addr = sock.accept()

ofile = conn.makefile('w') # file interface wrapper

ifile = conn.makefile('r') # two file objects wrap same socket

sys.stdout = ofile # make prints go to sock.send

sys.stdin = ifile # make input come from sock.recv

return conn

To test, the script in Example 12-11 defines five sets of client/server functions. It runs

the client’s code in process, but deploys the Python multiprocessing module we met

in Chapter 5 to portably spawn the server function’s side of the dialog in a separate

process. In the end, the client and server test functions run in different processes, but

converse over a socket that is connected to standard streams within the test script’s

process.

Making Sockets Look Like Files and Streams | 829

Example 12-11. PP4E\Internet\Sockets\test-socket_stream_redirect.py

"""

###############################################################################

test the socket_stream_redirection.py modes

###############################################################################

"""

import sys, os, multiprocessing

from socket_stream_redirect import *

###############################################################################

# redirected client output

###############################################################################

def server1():

mypid = os.getpid()

conn = initListenerSocket() # block till client connect

file = conn.makefile('r')

for i in range(3): # read/recv client's prints

data = file.readline().rstrip() # block till data ready

print('server %s got [%s]' % (mypid, data)) # print normally to terminal

def client1():

mypid = os.getpid()

redirectOut()

for i in range(3):

print('client %s: %s' % (mypid, i)) # print to socket

sys.stdout.flush() # else buffered till exits!

###############################################################################

# redirected client input

###############################################################################

def server2():

mypid = os.getpid() # raw socket not buffered

conn = initListenerSocket() # send to client's input

for i in range(3):

conn.send(('server %s: %s\n' % (mypid, i)).encode())

def client2():

mypid = os.getpid()

redirectIn()

for i in range(3):

data = input() # input from socket

print('client %s got [%s]' % (mypid, data)) # print normally to terminal

###############################################################################

# redirect client input + output, client is socket client

###############################################################################

def server3():

mypid = os.getpid()

conn = initListenerSocket() # wait for client connect

file = conn.makefile('r') # recv print(), send input()

for i in range(3): # readline blocks till data

830 | Chapter 12: Network Scripting

data = file.readline().rstrip()

conn.send(('server %s got [%s]\n' % (mypid, data)).encode())

def client3():

mypid = os.getpid()

redirectBothAsClient()

for i in range(3):

print('client %s: %s' % (mypid, i)) # print to socket

data = input() # input from socket: flushes!

sys.stderr.write('client %s got [%s]\n' % (mypid, data)) # not redirected

###############################################################################

# redirect client input + output, client is socket server

###############################################################################

def server4():

mypid = os.getpid()

sock = socket(AF_INET, SOCK_STREAM)

sock.connect((host, port))

file = sock.makefile('r')

for i in range(3):

sock.send(('server %s: %s\n' % (mypid, i)).encode()) # send to input()

data = file.readline().rstrip() # recv from print()

print('server %s got [%s]' % (mypid, data)) # result to terminal

def client4():

mypid = os.getpid()

redirectBothAsServer() # I'm actually the socket server in this mode

for i in range(3):

data = input() # input from socket: flushes!

print('client %s got [%s]' % (mypid, data)) # print to socket

sys.stdout.flush() # else last buffered till exit!

###############################################################################

# redirect client input + output, client is socket client, server xfers first

###############################################################################

def server5():

mypid = os.getpid() # test 4, but server accepts

conn = initListenerSocket() # wait for client connect

file = conn.makefile('r') # send input(), recv print()

for i in range(3):

conn.send(('server %s: %s\n' % (mypid, i)).encode())

data = file.readline().rstrip()

print('server %s got [%s]' % (mypid, data))

def client5():

mypid = os.getpid()

s = redirectBothAsClient() # I'm the socket client in this mode

for i in range(3):

data = input() # input from socket: flushes!

print('client %s got [%s]' % (mypid, data)) # print to socket

sys.stdout.flush() # else last buffered till exit!

###############################################################################

Making Sockets Look Like Files and Streams | 831

# test by number on command-line

###############################################################################

if __name__ == '__main__':

server = eval('server' + sys.argv[1])

client = eval('client' + sys.argv[1]) # client in this process

multiprocessing.Process(target=server).start() # server in new process

client() # reset streams in client

#import time; time.sleep(5) # test effect of exit flush

Run the test script with a client and server number on the command line to test the

module’s tools; messages display process ID numbers, and those within square brackets

reflect a transfer across streams connected to sockets (twice, when nested):

C:\...\PP4E\Internet\Sockets> test-socket_stream_redirect.py 1

server 3844 got [client 1112: 0]

server 3844 got [client 1112: 1]

server 3844 got [client 1112: 2]

C:\...\PP4E\Internet\Sockets> test-socket_stream_redirect.py 2

client 5188 got [server 2020: 0]

client 5188 got [server 2020: 1]

client 5188 got [server 2020: 2]

C:\...\PP4E\Internet\Sockets> test-socket_stream_redirect.py 3

client 7796 got [server 2780 got [client 7796: 0]]

client 7796 got [server 2780 got [client 7796: 1]]

client 7796 got [server 2780 got [client 7796: 2]]

C:\...\PP4E\Internet\Sockets> test-socket_stream_redirect.py 4

server 4288 got [client 3852 got [server 4288: 0]]

server 4288 got [client 3852 got [server 4288: 1]]

server 4288 got [client 3852 got [server 4288: 2]]

C:\...\PP4E\Internet\Sockets> test-socket_stream_redirect.py 5

server 6040 got [client 7728 got [server 6040: 0]]

server 6040 got [client 7728 got [server 6040: 1]]

server 6040 got [client 7728 got [server 6040: 2]]

If you correlate this script’s output with its code to see how messages are passed be-

tween client and server, you’ll find that print and input calls in client functions are

ultimately routed over sockets to another process. To the client functions, the socket

linkage is largely invisible.

Text-mode files and buffered output streams

Before we move on, there are two remarkably subtle aspects of the example’s code

worth highlighting:

Binary to text translations

Raw sockets transfer binary byte strings, but by opening the wrapper files in text

mode, their content is automatically translated to text strings on input and output.

Text-mode file wrappers are required if accessed through standard stream tools

832 | Chapter 12: Network Scripting

such as the print built-in that writes text strings (as we’ve learned, binary mode

files require byte strings instead). When dealing with the raw socket directly,

though, text must still be manually encoded to byte strings, as shown in most of

Example 12-11’s tests.

Buffered streams, program output, and deadlock

As we learned in Chapters 5 and 10, standard streams are normally buffered, and

printed text may need to be flushed so that it appears on a socket connected to a

process’s output stream. Indeed, some of this example’s tests require explicit or

implicit flush calls to work properly at all; otherwise their output is either incom-

plete or absent altogether until program exit. In pathological cases, this can lead

to deadlock, with a process waiting for output from another that never appears. In

other configurations, we may also get socket errors in a reader if a writer exits too

soon, especially in two-way dialogs.

For example, if client1 and client4 did not flush periodically as they do, the only

reason that they would work is because output streams are automatically flushed

when their process exits. Without manual flushes, client1 transfers no data until

process exit (at which point all its output is sent at once in a single message), and

client4’s data is incomplete till exit (its last printed message is delayed).

Even more subtly, both client3 and client4 rely on the fact that the input built-

in first automatically flushes sys.stdout internally for its prompt option, thereby

sending data from preceding print calls. Without this implicit flush (or the addition

of manual flushes), client3 would experience deadlock immediately, as would

client4 if its manual flush call was removed (even with input’s flush, removing

client4’s manual flush causes its final print message to not be transferred until

process exit). client5 has this same behavior as client4, because it simply swaps

which process binds and accepts and which connects.

In the general case, if we wish to read a program’s output as it is produced, instead

of all at once when it exits or as its buffers fill, the program must either call

sys.stdout.flush periodically, or be run with unbuffered streams by using

Python’s -u command-line argument of Chapter 5 if applicable.

Although we can open socket wrapper files in unbuffered mode with a second

makefile argument of zero (like normal open), this does not allow the wrapper to

run in the text mode required for print and desired for input. In fact, attempting

to make a socket wrapper file both text mode and unbuffered this way fails with

an exception, because Python 3.X no longer supports unbuffered mode for text

files (it is allowed for binary mode only today). In other words, because print

requires text mode, buffered mode is also implied for output stream files. More-

over, attempting to open a socket file wrapper in line-buffered mode appears to not

be supported in Python 3.X (more on this ahead).

While some buffering behavior may be library and platform dependent, manual

flush calls or direct socket access might sometimes still be required. Note that

sockets can also be made nonblocking with the setblocking(0) method, but this

Making Sockets Look Like Files and Streams | 833

only avoids wait states for transfer calls and does not address the data producer’s

failure to send buffered output.

Stream requirements

To make some of this more concrete, Example 12-12 illustrates how some of these

complexities apply to redirected standard streams, by attempting to connect them to

both text and binary mode files produced by open and accessing them with print and

input built-ins much as redirected script might.

Example 12-12. PP4E\Internet\Sockets\test-stream-modes.py

"""

test effect of connecting standard streams to text and binary mode files

same holds true for socket.makefile: print requires text mode, but text

mode precludes unbuffered mode -- use -u or sys.stdout.flush() calls

"""

import sys

def reader(F):

tmp, sys.stdin = sys.stdin, F

line = input()

print(line)

sys.stdin = tmp

reader( open('test-stream-modes.py') ) # works: input() returns text

reader( open('test-stream-modes.py', 'rb') ) # works: but input() returns bytes

def writer(F):

tmp, sys.stdout = sys.stdout, F

print(99, 'spam')

sys.stdout = tmp

writer( open('temp', 'w') ) # works: print() passes text str to .write()

print(open('temp').read())

writer( open('temp', 'wb') ) # FAILS on print: binary mode requires bytes

writer( open('temp', 'w', 0) ) # FAILS on open: text must be unbuffered

When run, the last two lines in this script both fail—the second to last fails because

print passes text strings to a binary-mode file (never allowed for files in general), and

the last fails because we cannot open text-mode files in unbuffered mode in Python 3.X

(text mode implies Unicode encodings). Here are the errors we get when this script is

run: the first run uses the script as shown, and the second shows what happens if the

second to last line is commented out (I edited the exception text slightly for

presentation):

C:\...\PP4E\Internet\Sockets> test-stream-modes.py

"""

b'"""\r'

99 spam

834 | Chapter 12: Network Scripting

Traceback (most recent call last):

File "C:\...\PP4E\Internet\Sockets\test-stream-modes.py", line 26, in <module>

writer( open('temp', 'wb') ) # FAILS on print: binary mode...

File "C:\...\PP4E\Internet\Sockets\test-stream-modes.py", line 20, in writer

print(99, 'spam')

TypeError: must be bytes or buffer, not str

C:\...\PP4E\Internet\Sockets> test-streams-binary.py

"""

b'"""\r'

99 spam

Traceback (most recent call last):

File "C:\...\PP4E\Internet\Sockets\test-stream-modes.py", line 27, in <module>

writer( open('temp', 'w', 0) ) # FAILS on open: text must be...

ValueError: can't have unbuffered text I/O

The same rules apply to socket wrapper file objects created with a socket’s makefile

method—they must be opened in text mode for print and should be opened in text

mode for input if we wish to receive text strings, but text mode prevents us from using

fully unbuffered file mode altogether:

>>> from socket import *

>>> s = socket() # defaults to tcp/ip (AF_INET, SOCK_STREAM)

>>> s.makefile('w', 0) # this used to work in Python 2.X

Traceback (most recent call last):

File "C:\Python31\lib\socket.py", line 151, in makefile

ValueError: unbuffered streams must be binary

Line buffering

Text-mode socket wrappers also accept a buffering-mode argument of 1 to specify line-

buffering instead of the default full buffering:

>>> from socket import *

>>> s = socket()

>>> f = s.makefile('w', 1) # same as buffering=1, but acts as fully buffered!

This appears to be no different than full buffering, and still requires the resulting file

to be flushed manually to transfer lines as they are produced. Consider the simple socket

server and client scripts in Examples 12-13 and 12-14. The server simply reads three

messages using the raw socket interface.

Example 12-13. PP4E\Internet\Sockets\socket-unbuff-server.py

from socket import * # read three messages over a raw socket

sock = socket()

sock.bind(('', 60000))

sock.listen(5)

print('accepting...')

conn, id = sock.accept() # blocks till client connect

for i in range(3):

print('receiving...')

Making Sockets Look Like Files and Streams | 835

msg = conn.recv(1024) # blocks till data received

print(msg) # gets all print lines at once unless flushed

The client in Example 12-14 sends three messages; the first two over a socket wrapper

file, and the last using the raw socket; the manual flush calls in this are commented out

but retained so you can experiment with turning them on, and sleep calls make the

server wait for data.

Example 12-14. PP4\Internet\Sockets\socket-unbuff-client.py

import time # send three msgs over wrapped and raw socket

from socket import *

sock = socket() # default=AF_INET, SOCK_STREAM (tcp/ip)

sock.connect(('localhost', 60000))

file = sock.makefile('w', buffering=1) # default=full buff, 0=error, 1 not linebuff!

print('sending data1')

file.write('spam\n')

time.sleep(5) # must follow with flush() to truly send now

#file.flush() # uncomment flush lines to see the difference

print('sending data2')

print('eggs', file=file) # adding more file prints does not flush buffer either

time.sleep(5)

#file.flush() # output appears at server recv only upon flush or exit

print('sending data3')

sock.send(b'ham\n') # low-level byte string interface sends immediately

time.sleep(5) # received first if don't flush other two!

Run the server in one window first and the client in another (or run the server first in

the background in Unix-like platforms). The output in the server window follows—

the messages sent with the socket wrapper are deferred until program exit, but the raw

socket call transfers data immediately:

C:\...\PP4E\Internet\Sockets> socket-unbuff-server.py

accepting...

receiving...

b'ham\n'

receiving...

b'spam\r\neggs\r\n'

receiving...

b''

The client window simply displays “sending” lines 5 seconds apart; its third message

appears at the server in 10 seconds, but the first and second messages it sends using

the wrapper file are deferred until exit (for 15 seconds) because the socket wrapper is

still fully buffered. If the manual flush calls in the client are uncommented, each of the

three sent messages is delivered in serial, 5 seconds apart (the third appears immediately

after the second):

836 | Chapter 12: Network Scripting

C:\...\PP4E\Internet\Sockets> socket-unbuff-server.py

accepting...

receiving...

b'spam\r\n'

receiving...

b'eggs\r\n'

receiving...

b'ham\n'

In other words, even when line buffering is requested, socket wrapper file writes (and

by association, prints) are buffered until the program exits, manual flushes are reques-

ted, or the buffer becomes full.

Solutions

The short story here is this: to avoid delayed outputs or deadlock, scripts that might

send data to waiting programs by printing to wrapped sockets (or for that matter, by

using print or sys.stdout.write in general) should do one of the following:

• Call sys.stdout.flush periodically to flush their printed output so it becomes

available as produced, as shown in Example 12-11.

• Be run with the -u Python command-line flag, if possible, to force the output stream

to be unbuffered. This works for unmodified programs spawned by pipe tools such

as os.popen. It will not help with the use case here, though, because we manually

reset the stream files to buffered text socket wrappers after a process starts. To

prove this, uncomment Example 12-11’s manual flush calls and the sleep call at its

end, and run with -u: the first test’s output is still delayed for 5 seconds.

• Use threads to read from sockets to avoid blocking, especially if the receiving pro-

gram is a GUI and it cannot depend upon the client to flush. See Chapter 10 for

pointers. This doesn’t really fix the problem—the spawned reader thread may be

blocked or deadlocked, too—but at least the GUI remains active during waits.

• Implement their own custom socket wrapper objects which intercept text write

calls, encode to binary, and route to a socket with send calls; socket.makefile is

really just a convenience tool, and we can always code a wrapper of our own for

more specific roles. For hints, see Chapter 10’s GuiOutput class, the stream redi-

rection class in Chapter 3, and the classes of the io standard library module (upon

which Python’s input/output tools are based, and which you can mix in custom

ways).

• Skip print altogether and communicate directly with the native interfaces of IPC

devices, such as socket objects’ raw send and recv methods—these transfer data

immediately and do not buffer data as file methods can. We can either transfer

simple byte strings this way or use the pickle module’s dumps and loads tools to

convert Python objects to and from byte strings for such direct socket transfer

(more on pickle in Chapter 17).

Making Sockets Look Like Files and Streams | 837

The latter option may be more direct (and the redirection utility module also returns

the raw socket in support of such usage), but it isn’t viable in all scenarios, especially

for existing or multimode scripts. In many cases, it may be most straightforward to use

manual flush calls in shell-oriented programs whose streams might be linked to other

programs through sockets.

Buffering in other contexts: Command pipes revisited

Also keep in mind that buffered streams and deadlock are general issues that go beyond

socket wrapper files. We explored this topic in Chapter 5; as a quick review, the non-

socket Example 12-15 does not fully buffer its output when it is connected to a terminal

(output is only line buffered when run from a shell command prompt), but does if

connected to something else (including a socket or pipe).

Example 12-15. PP4E\Internet\Sockets\pipe-unbuff-writer.py

# output line buffered (unbuffered) if stdout is a terminal, buffered by default for

# other devices: use -u or sys.stdout.flush() to avoid delayed output on pipe/socket

import time, sys

for i in range(5):

print(time.asctime()) # print transfers per stream buffering

sys.stdout.write('spam\n') # ditto for direct stream file access

time.sleep(2) # unles sys.stdout reset to other file

Although text-mode files are required for Python 3.X’s print in general, the -u flag still

works in 3.X to suppress full output stream buffering. In Example 12-16, using this flag

makes the spawned script’s printed output appear every 2 seconds, as it is produced.

Not using this flag defers all output for 10 seconds, until the spawned script exits, unless

the spawned script calls sys.stdout.flush on each iteration.

Example 12-16. PP4E\Internet\Sockets\pipe-unbuff-reader.py

# no output for 10 seconds unless Python -u flag used or sys.stdout.flush()

# but writer's output appears here every 2 seconds when either option is used

import os

for line in os.popen('python -u pipe-unbuff-writer.py'): # iterator reads lines

print(line, end='') # blocks without -u!

Following is the reader script’s output; unlike the socket examples, it spawns the writer

automatically, so we don’t need separate windows to test. Recall from Chapter 5 that

os.popen also accepts a buffering argument much like socket.makefile, but it does not

apply to the spawned program’s stream, and so would not prevent output buffering in

this case.

C:\...\PP4E\Internet\Sockets> pipe-unbuff-reader.py

Wed Apr 07 09:32:28 2010

spam

Wed Apr 07 09:32:30 2010

spam

838 | Chapter 12: Network Scripting

Wed Apr 07 09:32:32 2010

spam

Wed Apr 07 09:32:34 2010

spam

Wed Apr 07 09:32:36 2010

spam

The net effect is that -u still works around the steam buffering issue for connected

programs in 3.X, as long as you don’t reset the streams to other objects in the spawned

program as we did for socket redirection in Example 12-11. For socket redirections,

manual flush calls or replacement socket wrappers may be required.

Sockets versus command pipes

So why use sockets in this redirection role at all? In short, for server independence and

networked use cases. Notice how for command pipes it’s not clear who should be called

“server” and “client,” since neither script runs perpetually. In fact, this is one of the

major downsides of using command pipes like this instead of sockets—because the

programs require a direct spawning relationship, command pipes do not support longer-

lived or remotely running servers the way that sockets do.

With sockets, we can start client and server independently, and the server may continue

running perpetually to serve multiple clients (albeit with some changes to our utility

module’s listener initialization code). Moreover, passing in remote machine names to

our socket redirection tools would allow a client to connect to a server running on a

completely different machine. As we learned in Chapter 5, named pipes (fifos) accessed

with the open call support stronger independence of client and server, too, but unlike

sockets, they are usually limited to the local machine, and are not supported on all

platforms.

Experiment with this code on your own for more insight. Also try changing Exam-

ple 12-11 to run the client function in a spawned process instead of or in addition to

the server, with and without flush calls and time.sleep calls to defer exits; the spawning

structure might have some impact on the soundness of a given socket dialog structure

as well, which we’ll finesse here in the interest of space.

Despite the care that must be taken with text encodings and stream buffering, the utility

provided by Example 12-10 is still arguably impressive—prints and input calls are

routed over network or local-machine socket connections in a largely automatic fash-

ion, and with minimal changes to the nonsocket code that uses the module. In many

cases, the technique can extend a script’s applicability.

In the next section, we’ll use the makefile method again to wrap the socket in a file-

like object, so that it can be read by lines using normal text-file method calls and tech-

niques. This isn’t strictly required in the example—we could read lines as byte strings

with the socket recv call, too. In general, though, the makefile method comes in handy

any time you wish to treat sockets as though they were simple files. To see this at work,

let’s move on.

Making Sockets Look Like Files and Streams | 839

A Simple Python File Server

It’s time for something realistic. Let’s conclude this chapter by putting some of the

socket ideas we’ve studied to work doing something a bit more useful than echoing

text back and forth. Example 12-17 implements both the server-side and the client-side

logic needed to ship a requested file from server to client machines over a raw socket.

In effect, this script implements a simple file download system. One instance of the

script is run on the machine where downloadable files live (the server), and another on

the machines you wish to copy files to (the clients). Command-line arguments tell the

script which flavor to run and optionally name the server machine and port number

over which conversations are to occur. A server instance can respond to any number

of client file requests at the port on which it listens, because it serves each in a thread.

Example 12-17. PP4E\Internet\Sockets\getfile.py

"""

#############################################################################

implement client and server-side logic to transfer an arbitrary file from

server to client over a socket; uses a simple control-info protocol rather

than separate sockets for control and data (as in ftp), dispatches each

client request to a handler thread, and loops to transfer the entire file

by blocks; see ftplib examples for a higher-level transport scheme;

#############################################################################

"""

import sys, os, time, _thread as thread

from socket import *

blksz = 1024

defaultHost = 'localhost'

defaultPort = 50001

helptext = """

Usage...

server=> getfile.py -mode server [-port nnn] [-host hhh|localhost]

client=> getfile.py [-mode client] -file fff [-port nnn] [-host hhh|localhost]

"""

def now():

return time.asctime()

def parsecommandline():

dict = {} # put in dictionary for easy lookup

args = sys.argv[1:] # skip program name at front of args

while len(args) >= 2: # example: dict['-mode'] = 'server'

dict[args[0]] = args[1]

args = args[2:]

return dict

def client(host, port, filename):

sock = socket(AF_INET, SOCK_STREAM)

sock.connect((host, port))

840 | Chapter 12: Network Scripting

sock.send((filename + '\n').encode()) # send remote name with dir: bytes

dropdir = os.path.split(filename)[1] # filename at end of dir path

file = open(dropdir, 'wb') # create local file in cwd

while True:

data = sock.recv(blksz) # get up to 1K at a time

if not data: break # till closed on server side

file.write(data) # store data in local file

sock.close()

file.close()

print('Client got', filename, 'at', now())

def serverthread(clientsock):

sockfile = clientsock.makefile('r') # wrap socket in dup file obj

filename = sockfile.readline()[:-1] # get filename up to end-line

try:

file = open(filename, 'rb')

while True:

bytes = file.read(blksz) # read/send 1K at a time

if not bytes: break # until file totally sent

sent = clientsock.send(bytes)

assert sent == len(bytes)

except:

print('Error downloading file on server:', filename)

clientsock.close()

def server(host, port):

serversock = socket(AF_INET, SOCK_STREAM) # listen on TCP/IP socket

serversock.bind((host, port)) # serve clients in threads

serversock.listen(5)

while True:

clientsock, clientaddr = serversock.accept()

print('Server connected by', clientaddr, 'at', now())

thread.start_new_thread(serverthread, (clientsock,))

def main(args):

host = args.get('-host', defaultHost) # use args or defaults

port = int(args.get('-port', defaultPort)) # is a string in argv

if args.get('-mode') == 'server': # None if no -mode: client

if host == 'localhost': host = '' # else fails remotely

server(host, port)

elif args.get('-file'): # client mode needs -file

client(host, port, args['-file'])

else:

print(helptext)

if __name__ == '__main__':

args = parsecommandline()

main(args)

This script isn’t much different from the examples we saw earlier. Depending on the

command-line arguments passed, it invokes one of two functions:

• The server function farms out each incoming client request to a thread that trans-

fers the requested file’s bytes.

A Simple Python File Server | 841

• The client function sends the server a file’s name and stores all the bytes it gets

back in a local file of the same name.

The most novel feature here is the protocol between client and server: the client starts

the conversation by shipping a filename string up to the server, terminated with an end-

of-line character, and including the file’s directory path in the server. At the server, a

spawned thread extracts the requested file’s name by reading the client socket, and

opens and transfers the requested file back to the client, one chunk of bytes at a time.

Running the File Server and Clients

Since the server uses threads to process clients, we can test both client and server on

the same Windows machine. First, let’s start a server instance and execute two client

instances on the same machine while the server runs:

[server window, localhost]

C:\...\Internet\Sockets> python getfile.py -mode server

Server connected by ('127.0.0.1', 59134) at Sun Apr 25 16:26:50 2010

Server connected by ('127.0.0.1', 59135) at Sun Apr 25 16:27:21 2010

[client window, localhost]

C:\...\Internet\Sockets> dir /B *.gif *.txt

File Not Found

C:\...\Internet\Sockets> python getfile.py -file testdir\ora-lp4e.gif

Client got testdir\ora-lp4e.gif at Sun Apr 25 16:26:50 2010

C:\...\Internet\Sockets> python getfile.py -file testdir\textfile.txt -port 50001

Client got testdir\textfile.txt at Sun Apr 25 16:27:21 2010

Clients run in the directory where you want the downloaded file to appear—the client

instance code strips the server directory path when making the local file’s name. Here

the “download” simply copies the requested files up to the local parent directory (the

DOS fc command compares file contents):

C:\...\Internet\Sockets> dir /B *.gif *.txt

ora-lp4e.gif

textfile.txt

C:\...\Internet\Sockets> fc /B ora-lp4e.gif testdir/ora-lp4e.gif

FC: no differences encountered

C:\...\Internet\Sockets> fc textfile.txt testdir\textfile.txt

FC: no differences encountered

As usual, we can run server and clients on different machines as well. For instance, here

are the sort of commands we would use to launch the server remotely and fetch files

from it locally; run this on your own to see the client and server outputs:

[remote server window]

[...]$ python getfile.py -mode server

842 | Chapter 12: Network Scripting

[client window: requested file downloaded in a thread on server]

C:\...\Internet\Sockets> python getfile.py –mode client

-host learning-python.com

-port 50001 -file python.exe

C:\...\Internet\Sockets> python getfile.py

-host learning-python.com -file index.html

One subtle security point here: the server instance code is happy to send any server-

side file whose pathname is sent from a client, as long as the server is run with a user-

name that has read access to the requested file. If you care about keeping some of your

server-side files private, you should add logic to suppress downloads of restricted files.

I’ll leave this as a suggested exercise here, but we will implement such filename checks

in a different getfile download tool later in this book.#

Adding a User-Interface Frontend

After all the GUI commotion in the prior part of this book, you might have noticed that

we have been living in the realm of the command line for this entire chapter—our socket

clients and servers have been started from simple DOS or Linux shells. Nothing is

stopping us from adding a nice point-and-click user interface to some of these scripts,

though; GUI and network scripting are not mutually exclusive techniques. In fact, they

can be arguably “sexy” when used together well.

For instance, it would be easy to implement a simple tkinter GUI frontend to the client-

side portion of the getfile script we just met. Such a tool, run on the client machine,

may simply pop up a window with Entry widgets for typing the desired filename, server,

and so on. Once download parameters have been input, the user interface could either

import and call the getfile.client function with appropriate option arguments, or

build and run the implied getfile.py command line using tools such as os.system,

os.popen, subprocess, and so on.

Using row frames and command lines

To help make all of this more concrete, let’s very quickly explore a few simple scripts

that add a tkinter frontend to the getfile client-side program. All of these examples

assume that you are running a server instance of getfile; they merely add a GUI for

the client side of the conversation, to fetch a file from the server. The first, in Exam-

ple 12-18, uses form construction techniques we met in Chapters 8 and 9 to create a

dialog for inputting server, port, and filename information, and simply constructs the

#We’ll see three more getfile programs before we leave Internet scripting. The next chapter’s getfile.py fetches

a file with the higher-level FTP interface instead of using raw socket calls, and its http-getfile scripts fetch files

over the HTTP protocol. Later, Chapter 15 presents a server-side getfile.py CGI script that transfers file

contents over the HTTP port in response to a request made in a web browser client (files are sent as the output

of a CGI script). All four of the download schemes presented in this text ultimately use sockets, but only the

version here makes that use explicit.

A Simple Python File Server | 843

corresponding getfile command line and runs it with the os.system call we studied in

Part II.

Example 12-18. PP4E\Internet\Sockets\getfilegui-1.py

"""

launch getfile script client from simple tkinter GUI;

could also use os.fork+exec, os.spawnv (see Launcher);

windows: replace 'python' with 'start' if not on path;

"""

import sys, os

from tkinter import *

from tkinter.messagebox import showinfo

def onReturnKey():

cmdline = ('python getfile.py -mode client -file %s -port %s -host %s' %

(content['File'].get(),

content['Port'].get(),

content['Server'].get()))

os.system(cmdline)

showinfo('getfilegui-1', 'Download complete')

box = Tk()

labels = ['Server', 'Port', 'File']

content = {}

for label in labels:

row = Frame(box)

row.pack(fill=X)

Label(row, text=label, width=6).pack(side=LEFT)

entry = Entry(row)

entry.pack(side=RIGHT, expand=YES, fill=X)

content[label] = entry

box.title('getfilegui-1')

box.bind('<Return>', (lambda event: onReturnKey()))

mainloop()

When run, this script creates the input form shown in Figure 12-1. Pressing the Enter

key (<Return>) runs a client-side instance of the getfile program; when the generated

getfile command line is finished, we get the verification pop up displayed in

Figure 12-2.

Figure 12-1. getfilegui-1 in action

844 | Chapter 12: Network Scripting

Figure 12-2. getfilegui-1 verification pop up

Using grids and function calls

The first user-interface script (Example 12-18) uses the pack geometry manager and

row Frames with fixed-width labels to lay out the input form and runs the getfile client

as a standalone program. As we learned in Chapter 9, it’s arguably just as easy to use

the grid manager for layout and to import and call the client-side logic function instead

of running a program. The script in Example 12-19 shows how.

Example 12-19. PP4E\Internet\Sockets\getfilegui-2.py

"""

same, but with grids and import+call, not packs and cmdline;

direct function calls are usually faster than running files;

"""

import getfile

from tkinter import *

from tkinter.messagebox import showinfo

def onSubmit():

getfile.client(content['Server'].get(),

int(content['Port'].get()),

content['File'].get())

showinfo('getfilegui-2', 'Download complete')

box = Tk()

labels = ['Server', 'Port', 'File']

rownum = 0

content = {}

for label in labels:

Label(box, text=label).grid(column=0, row=rownum)

entry = Entry(box)

entry.grid(column=1, row=rownum, sticky=E+W)

content[label] = entry

rownum += 1

box.columnconfigure(0, weight=0) # make expandable

A Simple Python File Server | 845

box.columnconfigure(1, weight=1)

Button(text='Submit', command=onSubmit).grid(row=rownum, column=0, columnspan=2)

box.title('getfilegui-2')

box.bind('<Return>', (lambda event: onSubmit()))

mainloop()

This version makes a similar window (Figure 12-3), but adds a button at the bottom

that does the same thing as an Enter key press—it runs the getfile client procedure.

Generally speaking, importing and calling functions (as done here) is faster than run-

ning command lines, especially if done more than once. The getfile script is set up to

work either way—as program or function library.

Figure 12-3. getfilegui-2 in action

Using a reusable form-layout class

If you’re like me, though, writing all the GUI form layout code in those two scripts can

seem a bit tedious, whether you use packing or grids. In fact, it became so tedious to

me that I decided to write a general-purpose form-layout class, shown in Exam-

ple 12-20, which handles most of the GUI layout grunt work.

Example 12-20. PP4E\Internet\Sockets\form.py

"""

##################################################################

a reusable form class, used by getfilegui (and others)

##################################################################

"""

from tkinter import *

entrysize = 40

class Form: # add non-modal form box

def __init__(self, labels, parent=None): # pass field labels list

labelsize = max(len(x) for x in labels) + 2

box = Frame(parent) # box has rows, buttons

box.pack(expand=YES, fill=X) # rows has row frames

rows = Frame(box, bd=2, relief=GROOVE) # go=button or return key

rows.pack(side=TOP, expand=YES, fill=X) # runs onSubmit method

self.content = {}

846 | Chapter 12: Network Scripting

for label in labels:

row = Frame(rows)

row.pack(fill=X)

Label(row, text=label, width=labelsize).pack(side=LEFT)

entry = Entry(row, width=entrysize)

entry.pack(side=RIGHT, expand=YES, fill=X)

self.content[label] = entry

Button(box, text='Cancel', command=self.onCancel).pack(side=RIGHT)

Button(box, text='Submit', command=self.onSubmit).pack(side=RIGHT)

box.master.bind('<Return>', (lambda event: self.onSubmit()))

def onSubmit(self): # override this

for key in self.content: # user inputs in

print(key, '\t=>\t', self.content[key].get()) # self.content[k]

def onCancel(self): # override if need

Tk().quit() # default is exit

class DynamicForm(Form):

def __init__(self, labels=None):

labels = input('Enter field names: ').split()

Form.__init__(self, labels)

def onSubmit(self):

print('Field values...')

Form.onSubmit(self)

self.onCancel()

if __name__ == '__main__':

import sys

if len(sys.argv) == 1:

Form(['Name', 'Age', 'Job']) # precoded fields, stay after submit

else:

DynamicForm() # input fields, go away after submit

mainloop()

Compare the approach of this module with that of the form row builder function we

wrote in Chapter 10’s Example 10-9. While that example much reduced the amount

of code required, the module here is a noticeably more complete and automatic

scheme—it builds the entire form given a set of label names, and provides a dictionary

with every field’s entry widget ready to be fetched.

Running this module standalone triggers its self-test code at the bottom. Without ar-

guments (and when double-clicked in a Windows file explorer), the self-test generates

a form with canned input fields captured in Figure 12-4, and displays the fields’ values

on Enter key presses or Submit button clicks:

C:\...\PP4E\Internet\Sockets> python form.py

Age => 40

Name => Bob

Job => Educator, Entertainer

With a command-line argument, the form class module’s self-test code prompts for an

arbitrary set of field names for the form; fields can be constructed as dynamically as we

A Simple Python File Server | 847

like. Figure 12-5 shows the input form constructed in response to the following console

interaction. Field names could be accepted on the command line, too, but the input

built-in function works just as well for simple tests like this. In this mode, the GUI goes

away after the first submit, because DynamicForm.onSubmit says so:

C:\...\PP4E\Internet\Sockets> python form.py -

Enter field names: Name Email Web Locale

Field values...

Locale => Florida

Web => http://learning-python.com

Name => Book

Email => pp4e@learning-python.com

Figure 12-5. Form test, dynamic fields

And last but not least, Example 12-21 shows the getfile user interface again, this time

constructed with the reusable form layout class. We need to fill in only the form labels

list and provide an onSubmit callback method of our own. All of the work needed to

construct the form comes “for free,” from the imported and widely reusable Form

superclass.

Example 12-21. PP4E\Internet\Sockets\getfilegui.py

"""

launch getfile client with a reusable GUI form class;

os.chdir to target local dir if input (getfile stores in cwd);

Figure 12-4. Form test, canned fields

848 | Chapter 12: Network Scripting

to do: use threads, show download status and getfile prints;

"""

from form import Form

from tkinter import Tk, mainloop

from tkinter.messagebox import showinfo

import getfile, os

class GetfileForm(Form):

def __init__(self, oneshot=False):

root = Tk()

root.title('getfilegui')

labels = ['Server Name', 'Port Number', 'File Name', 'Local Dir?']

Form.__init__(self, labels, root)

self.oneshot = oneshot

def onSubmit(self):

Form.onSubmit(self)

localdir = self.content['Local Dir?'].get()

portnumber = self.content['Port Number'].get()

servername = self.content['Server Name'].get()

filename = self.content['File Name'].get()

if localdir:

os.chdir(localdir)

portnumber = int(portnumber)

getfile.client(servername, portnumber, filename)

showinfo('getfilegui', 'Download complete')

if self.oneshot: Tk().quit() # else stay in last localdir

if __name__ == '__main__':

GetfileForm()

mainloop()

The form layout class imported here can be used by any program that needs to input

form-like data; when used in this script, we get a user interface like that shown in

Figure 12-6 under Windows 7 (and similar on other versions and platforms).

Figure 12-6. getfilegui in action

Pressing this form’s Submit button or the Enter key makes the getfilegui script call

the imported getfile.client client-side function as before. This time, though, we also

A Simple Python File Server | 849

first change to the local directory typed into the form so that the fetched file is stored

there (getfile stores in the current working directory, whatever that may be when it is

called). Here are the messages printed in the client’s console, along with a check on the

file transfer; the server is still running above testdir, but the client stores the file else-

where after it’s fetched on the socket:

C:\...\Internet\Sockets> getfilegui.py

Local Dir? => C:\users\Mark\temp

File Name => testdir\ora-lp4e.gif

Server Name => localhost

Port Number => 50001

Client got testdir\ora-lp4e.gif at Sun Apr 25 17:22:39 2010

C:\...\Internet\Sockets> fc /B C:\Users\mark\temp\ora-lp4e.gif testdir\ora-lp4e.gif

FC: no differences encountered

As usual, we can use this interface to connect to servers running locally on the same

machine (as done here), or remotely on a different computer. Use a different server

name and file paths if you’re running the server on a remote machine; the magic of

sockets make this all “just work” in either local or remote modes.

One caveat worth pointing out here: the GUI is essentially dead while the download is

in progress (even screen redraws aren’t handled—try covering and uncovering the

window and you’ll see what I mean). We could make this better by running the down-

load in a thread, but since we’ll see how to do that in the next chapter when we explore

the FTP protocol, you should consider this problem a preview.

In closing, a few final notes: first, I should point out that the scripts in this chapter use

tkinter techniques we’ve seen before and won’t go into here in the interest of space; be

sure to see the GUI chapters in this book for implementation hints.

Keep in mind, too, that these interfaces just add a GUI on top of the existing script to

reuse its code; any command-line tool can be easily GUI-ified in this way to make it

more appealing and user friendly. In Chapter 14, for example, we’ll meet a more useful

client-side tkinter user interface for reading and sending email over sockets (PyMail-

GUI), which largely just adds a GUI to mail-processing tools. Generally speaking, GUIs

can often be added as almost an afterthought to a program. Although the degree of

user-interface and core logic separation varies per program, keeping the two distinct

makes it easier to focus on each.

And finally, now that I’ve shown you how to build user interfaces on top of this chap-

ter’s getfile, I should also say that they aren’t really as useful as they might seem. In

particular, getfile clients can talk only to machines that are running a getfile server.

In the next chapter, we’ll discover another way to download files—FTP—which also

runs on sockets but provides a higher-level interface and is available as a standard

service on many machines on the Net. We don’t generally need to start up a custom

server to transfer files over FTP, the way we do with getfile. In fact, the user-interface

scripts in this chapter could be easily changed to fetch the desired file with Python’s

850 | Chapter 12: Network Scripting

FTP tools, instead of the getfile module. But instead of spilling all the beans here, I’ll

just say, “Read on.”

Using Serial Ports

Sockets, the main subject of this chapter, are the programmer’s interface to network

connections in Python scripts. As we’ve seen, they let us write scripts that converse

with computers arbitrarily located on a network, and they form the backbone of the

Internet and the Web.

If you’re looking for a lower-level way to communicate with devices in general, though,

you may also be interested in the topic of Python’s serial port interfaces. This isn’t quite

related to Internet scripting, but it’s similar enough in spirit and is discussed often

enough on the Net to merit a few words here.

In brief, scripts can use serial port interfaces to engage in low-level communication with

things like mice, modems, and a wide variety of serial devices and hardware. Serial port

interfaces are also used to communicate with devices connected over infrared ports

(e.g., hand-held computers and remote modems). Such interfaces let scripts tap into

raw data streams and implement device protocols of their own. Other Python tools

such as the ctypes and struct modules may provide additional tools for creating and

extracting the packed binary data these ports transfer.

At this writing, there are a variety of ways to send and receive data over serial ports in

Python scripts. Notable among these options is an open source extension package

known as pySerial, which allows Python scripts to control serial ports on both Windows

and Linux, as well as BSD Unix, Jython (for Java), and IronPython (for .Net and Mono).

Unfortunately, there is not enough space to cover this or any other serial port option

in any sort of detail in this text. As always, see your favorite web search engine for up-

to-date details on this front.

A Simple Python File Server | 851

CHAPTER 13

Client-Side Scripting

“Socket to Me!”

The preceding chapter introduced Internet fundamentals and explored sockets—the

underlying communications mechanism over which bytes flow on the Net. In this

chapter, we climb the encapsulation hierarchy one level and shift our focus to Python

tools that support the client-side interfaces of common Internet protocols.

We talked about the Internet’s higher-level protocols in the abstract at the start of the

preceding chapter, and you should probably review that material if you skipped over

it the first time around. In short, protocols define the structure of the conversations

that take place to accomplish most of the Internet tasks we’re all familiar with—reading

email, transferring files by FTP, fetching web pages, and so on.

At the most basic level, all of these protocol dialogs happen over sockets using fixed

and standard message structures and ports, so in some sense this chapter builds upon

the last. But as we’ll see, Python’s protocol modules hide most of the underlying

details—scripts generally need to deal only with simple objects and methods, and Py-

thon automates the socket and messaging logic required by the protocol.

In this chapter, we’ll concentrate on the FTP and email protocol modules in Python,

and we’ll peek at a few others along the way (NNTP news, HTTP web pages, and so

on). Because it is so prevalent, we will especially focus on email in much of this chapter,

as well as in the two to follow—we’ll use tools and techniques introduced here in the

larger PyMailGUI and PyMailCGI client and server-side programs of Chapters

14 and 16.

All of the tools employed in examples here are in the standard Python library and come

with the Python system. All of the examples here are also designed to run on the client

side of a network connection—these scripts connect to an already running server to

request interaction and can be run from a basic PC or other client device (they require

only a server to converse with). And as usual, all the code here is also designed to teach

us something about Python programming in general—we’ll refactor FTP examples and

package email code to show object-oriented programming (OOP) in action.

853

In the next chapter, we’ll look at a complete client-side program example before moving

on to explore scripts designed to be run on the server side instead. Python programs

can also produce pages on a web server, and there is support in the Python world for

implementing the server side of things like HTTP, email, and FTP. For now, let’s focus

on the client.*

FTP: Transferring Files over the Net

As we saw in the preceding chapter, sockets see plenty of action on the Net. For in-

stance, the last chapter’s getfile example allowed us to transfer entire files between

machines. In practice, though, higher-level protocols are behind much of what happens

on the Net. Protocols run on top of sockets, but they hide much of the complexity of

the network scripting examples of the prior chapter.

FTP—the File Transfer Protocol—is one of the more commonly used Internet proto-

cols. It defines a higher-level conversation model that is based on exchanging command

strings and file contents over sockets. By using FTP, we can accomplish the same task

as the prior chapter’s getfile script, but the interface is simpler, standard and more

general—FTP lets us ask for files from any server machine that supports FTP, without

requiring that it run our custom getfile script. FTP also supports more advanced op-

erations such as uploading files to the server, getting remote directory listings, and

more.

Really, FTP runs on top of two sockets: one for passing control commands between

client and server (port 21), and another for transferring bytes. By using a two-socket

model, FTP avoids the possibility of deadlocks (i.e., transfers on the data socket do not

block dialogs on the control socket). Ultimately, though, Python’s ftplib support

module allows us to upload and download files at a remote server machine by FTP,

without dealing in raw socket calls or FTP protocol details.

Transferring Files with ftplib

Because the Python FTP interface is so easy to use, let’s jump right into a realistic

example. The script in Example 13-1 automatically fetches (a.k.a. “downloads”) and

* There is also support in the Python world for other technologies that some might classify as “client-side

scripting,” too, such as Jython/Java applets; XML-RPC and SOAP web services; and Rich Internet Application

tools like Flex, Silverlight, pyjamas, and AJAX. These were all introduced early in Chapter 12. Such tools are

generally bound up with the notion of web-based interactions—they either extend the functionality of a web

browser running on a client machine, or simplify web server access in clients. We’ll study browser-based

techniques in Chapters 15 and 16; here, client-side scripting means the client side of common Internet

protocols such as FTP and email, independent of the Web or web browsers. At the bottom, web browsers

are really just desktop GUI applications that make use of client-side protocols, including those we’ll study

here, such as HTTP and FTP. See Chapter 12 as well as the end of this chapter for more on other client-side

techniques.

854 | Chapter 13: Client-Side Scripting

opens a remote file with Python. More specifically, this Python script does the

following:

1. Downloads an image file (by default) from a remote FTP site

2. Opens the downloaded file with a utility we wrote in Example 6-23, in Chapter 6

The download portion will run on any machine with Python and an Internet connec-

tion, though you’ll probably want to change the script’s settings so it accesses a server

and file of your own. The opening part works if your playfile.py supports your platform;

see Chapter 6 for details, and change as needed.

Example 13-1. PP4E\Internet\Ftp\getone.py

#!/usr/local/bin/python

"""

A Python script to download and play a media file by FTP. Uses ftplib, the ftp

protocol handler which uses sockets. Ftp runs on 2 sockets (one for data, one

for control--on ports 20 and 21) and imposes message text formats, but Python's

ftplib module hides most of this protocol's details. Change for your site/file.

"""

import os, sys

from getpass import getpass # hidden password input

from ftplib import FTP # socket-based FTP tools

nonpassive = False # force active mode FTP for server?

filename = 'monkeys.jpg' # file to be downloaded

dirname = '.' # remote directory to fetch from

sitename = 'ftp.rmi.net' # FTP site to contact

userinfo = ('lutz', getpass('Pswd?')) # use () for anonymous

if len(sys.argv) > 1: filename = sys.argv[1] # filename on command line?

print('Connecting...')

connection = FTP(sitename) # connect to FTP site

connection.login(*userinfo) # default is anonymous login

connection.cwd(dirname) # xfer 1k at a time to localfile

if nonpassive: # force active FTP if server requires

connection.set_pasv(False)

print('Downloading...')

localfile = open(filename, 'wb') # local file to store download

connection.retrbinary('RETR ' + filename, localfile.write, 1024)

connection.quit()

localfile.close()

if input('Open file?') in ['Y', 'y']:

from PP4E.System.Media.playfile import playfile

playfile(filename)

Most of the FTP protocol details are encapsulated by the Python ftplib module im-

ported here. This script uses some of the simplest interfaces in ftplib (we’ll see others

later in this chapter), but they are representative of the module in general.

Transferring Files with ftplib | 855

To open a connection to a remote (or local) FTP server, create an instance of the

ftplib.FTP object, passing in the string name (domain or IP style) of the machine you

wish to connect to:

connection = FTP(sitename) # connect to ftp site

Assuming this call doesn’t throw an exception, the resulting FTP object exports meth-

ods that correspond to the usual FTP operations. In fact, Python scripts act much like

typical FTP client programs—just replace commands you would normally type or select

with method calls:

connection.login(*userinfo) # default is anonymous login

connection.cwd(dirname) # xfer 1k at a time to localfile

Once connected, we log in and change to the remote directory from which we want to

fetch a file. The login method allows us to pass in a username and password as addi-

tional optional arguments to specify an account login; by default, it performs anony-

mous FTP. Notice the use of the nonpassive flag in this script:

if nonpassive: # force active FTP if server requires

connection.set_pasv(False)

If this flag is set to True, the script will transfer the file in active FTP mode rather than

the default passive mode. We’ll finesse the details of the difference here (it has to do

with which end of the dialog chooses port numbers for the transfer), but if you have

trouble doing transfers with any of the FTP scripts in this chapter, try using active mode

as a first step. In Python 2.1 and later, passive FTP mode is on by default. Now, open

a local file to receive the file’s content, and fetch the file:

localfile = open(filename, 'wb')

connection.retrbinary('RETR ' + filename, localfile.write, 1024)

Once we’re in the target remote directory, we simply call the retrbinary method to

download the target server file in binary mode. The retrbinary call will take a while to

complete, since it must download a big file. It gets three arguments:

• An FTP command string; here, the string RETR filename, which is the standard

format for FTP retrievals.

• A function or method to which Python passes each chunk of the downloaded file’s

bytes; here, the write method of a newly created and opened local file.

• A size for those chunks of bytes; here, 1,024 bytes are downloaded at a time, but

the default is reasonable if this argument is omitted.

Because this script creates a local file named localfile of the same name as the remote

file being fetched, and passes its write method to the FTP retrieval method, the remote

file’s contents will automatically appear in a local, client-side file after the download is

finished.

Observe how this file is opened in wb binary output mode. If this script is run on Win-

dows we want to avoid automatically expanding any \n bytes into \r\n byte sequences;

856 | Chapter 13: Client-Side Scripting

as we saw in Chapter 4, this happens automatically on Windows when writing files

opened in w text mode. We also want to avoid Unicode issues in Python 3.X—as we

also saw in Chapter 4, strings are encoded when written in text mode and this isn’t

appropriate for binary data such as images. A text-mode file would also not allow for

the bytes strings passed to write by the FTP library’s retrbinary in any event, so rb is

effectively required here (more on output file modes later).

Finally, we call the FTP quit method to break the connection with the server and man-

ually close the local file to force it to be complete before it is further processed (it’s not

impossible that parts of the file are still held in buffers before the close call):

connection.quit()

localfile.close()

And that’s all there is to it—all the FTP, socket, and networking details are hidden

behind the ftplib interface module. Here is this script in action on a Windows 7 ma-

chine; after the download, the image file pops up in a Windows picture viewer on my

laptop, as captured in Figure 13-1. Change the server and file assignments in this script

to test on your own, and be sure your PYTHONPATH environment variable includes the

PP4E root’s container, as we’re importing across directories on the examples tree here:

C:\...\PP4E\Internet\Ftp> python getone.py

Pswd?

Connecting...

Downloading...

Open file?y

Notice how the standard Python getpass.getpass is used to ask for an FTP password.

Like the input built-in function, this call prompts for and reads a line of text from the

console user; unlike input, getpass does not echo typed characters on the screen at all

(see the moreplus stream redirection example of Chapter 3 for related tools). This is

handy for protecting things like passwords from potentially prying eyes. Be careful,

though—after issuing a warning, the IDLE GUI echoes the password anyhow!

The main thing to notice is that this otherwise typical Python script fetches information

from an arbitrarily remote FTP site and machine. Given an Internet link, any informa-

tion published by an FTP server on the Net can be fetched by and incorporated into

Python scripts using interfaces such as these.

Using urllib to Download Files

In fact, FTP is just one way to transfer information across the Net, and there are more

general tools in the Python library to accomplish the prior script’s download. Perhaps

the most straightforward is the Python urllib.request module: given an Internet ad-

dress string—a URL, or Universal Resource Locator—this module opens a connection

to the specified server and returns a file-like object ready to be read with normal file

object method calls (e.g., read, readline).

Transferring Files with ftplib | 857

We can use such a higher-level interface to download anything with an address on the

Web—files published by FTP sites (using URLs that start with ftp://); web pages and

output of scripts that live on remote servers (using http:// URLs); and even local files

(using file:// URLs). For instance, the script in Example 13-2 does the same as the one

in Example 13-1, but it uses the general urllib.request module to fetch the source

distribution file, instead of the protocol-specific ftplib.

Example 13-2. PP4E\Internet\Ftp\getone-urllib.py

#!/usr/local/bin/python

"""

A Python script to download a file by FTP by its URL string; use higher-level

urllib instead of ftplib to fetch file; urllib supports FTP, HTTP, client-side

HTTPS, and local files, and handles proxies, redirects, cookies, and more;

urllib also allows downloads of html pages, images, text, etc.; see also

Python html/xml parsers for web pages fetched by urllib in Chapter 19;

"""

import os, getpass

from urllib.request import urlopen # socket-based web tools

Figure 13-1. Image file downloaded by FTP and opened locally

858 | Chapter 13: Client-Side Scripting

filename = 'monkeys.jpg' # remote/local filename

password = getpass.getpass('Pswd?')

remoteaddr = 'ftp://lutz:%s@ftp.rmi.net/%s;type=i' % (password, filename)

print('Downloading', remoteaddr)

# this works too:

# urllib.request.urlretrieve(remoteaddr, filename)

remotefile = urlopen(remoteaddr) # returns input file-like object

localfile = open(filename, 'wb') # where to store data locally

localfile.write(remotefile.read())

localfile.close()

remotefile.close()

Note how we use a binary mode output file again; urllib fetches return byte strings,

even for HTTP web pages. Don’t sweat the details of the URL string used here; it is

fairly complex, and we’ll explain its structure and that of URLs in general in Chap-

ter 15. We’ll also use urllib again in this and later chapters to fetch web pages, format

generated URL strings, and get the output of remote scripts on the Web.

Technically speaking, urllib.request supports a variety of Internet protocols (HTTP,

FTP, and local files). Unlike ftplib, urllib.request is generally used for reading remote

objects, not for writing or uploading them (though the HTTP and FTP protocols sup-

port file uploads too). As with ftplib, retrievals must generally be run in threads if

blocking is a concern. But the basic interface shown in this script is straightforward.

The call:

remotefile = urllib.request.urlopen(remoteaddr) # returns input file-like object

contacts the server named in the remoteaddr URL string and returns a file-like object

connected to its download stream (here, an FTP-based socket). Calling this file’s

read method pulls down the file’s contents, which are written to a local client-side file.

An even simpler interface:

urllib.request.urlretrieve(remoteaddr, filename)

also does the work of opening a local file and writing the downloaded bytes into it—

things we do manually in the script as coded. This comes in handy if we want to down-

load a file, but it is less useful if we want to process its data immediately.

Either way, the end result is the same: the desired server file shows up on the client

machine. The output is similar to the original version, but we don’t try to automatically

open this time (I’ve changed the password in the URL here to protect the innocent):

C:\...\PP4E\Internet\Ftp> getone-urllib.py

Pswd?

Downloading ftp://lutz:xxxxxx@ftp.rmi.net/monkeys.jpg;type=i

C:\...\PP4E\Internet\Ftp> fc monkeys.jpg test\monkeys.jpg

FC: no differences encountered

C:\...\PP4E\Internet\Ftp> start monkeys.jpg

Transferring Files with ftplib | 859

For more urllib download examples, see the section on HTTP later in this chapter,

and the server-side examples in Chapter 15. As we’ll see in Chapter 15, in bigger terms,

tools like the urllib.request urlopen function allow scripts to both download remote

files and invoke programs that are located on a remote server machine, and so serves

as a useful tool for testing and using web sites in Python scripts. In Chapter 15, we’ll

also see that urllib.parse includes tools for formatting (escaping) URL strings for safe

transmission.

FTP get and put Utilities

When I present the ftplib interfaces in Python classes, students often ask why pro-

grammers need to supply the RETR string in the retrieval method. It’s a good

question—the RETR string is the name of the download command in the FTP protocol,

but ftplib is supposed to encapsulate that protocol. As we’ll see in a moment, we have

to supply an arguably odd STOR string for uploads as well. It’s boilerplate code that

you accept on faith once you see it, but that begs the question. You could propose a

patch to ftplib, but that’s not really a good answer for beginning Python students, and

it may break existing code (the interface is as it is for a reason).

Perhaps a better answer is that Python makes it easy to extend the standard library

modules with higher-level interfaces of our own—with just a few lines of reusable code,

we can make the FTP interface look any way we want in Python. For instance, we could,

once and for all, write utility modules that wrap the ftplib interfaces to hide the RETR

string. If we place these utility modules in a directory on PYTHONPATH, they become just

as accessible as ftplib itself, automatically reusable in any Python script we write in

the future. Besides removing the RETR string requirement, a wrapper module could

also make assumptions that simplify FTP operations into single function calls.

For instance, given a module that encapsulates and simplifies ftplib, our Python fetch-

and-play script could be further reduced to the script shown in Example 13-3—essen-

tially just two function calls plus a password prompt, but with a net effect exactly like

Example 13-1 when run.

Example 13-3. PP4E\Internet\Ftp\getone-modular.py

#!/usr/local/bin/python

"""

A Python script to download and play a media file by FTP.

Uses getfile.py, a utility module which encapsulates FTP step.

"""

import getfile

from getpass import getpass

filename = 'monkeys.jpg'

# fetch with utility

getfile.getfile(file=filename,

site='ftp.rmi.net',

860 | Chapter 13: Client-Side Scripting

dir ='.',

user=('lutz', getpass('Pswd?')),

refetch=True)

# rest is the same

if input('Open file?') in ['Y', 'y']:

from PP4E.System.Media.playfile import playfile

playfile(filename)

Besides having a much smaller line count, the meat of this script has been split off into

a file for reuse elsewhere. If you ever need to download a file again, simply import an

existing function instead of copying code with cut-and-paste editing. Changes in down-

load operations would need to be made in only one file, not everywhere we’ve copied

boilerplate code; getfile.getfile could even be changed to use urllib rather than

ftplib without affecting any of its clients. It’s good engineering.

Download utility

So just how would we go about writing such an FTP interface wrapper (he asks, rhet-

orically)? Given the ftplib library module, wrapping downloads of a particular file in

a particular directory is straightforward. Connected FTP objects support two download

methods:

retrbinary

This method downloads the requested file in binary mode, sending its bytes in

chunks to a supplied function, without line-feed mapping. Typically, the supplied

function is a write method of an open local file object, such that the bytes are placed

in the local file on the client.

retrlines

This method downloads the requested file in ASCII text mode, sending each line

of text to a supplied function with all end-of-line characters stripped. Typically,

the supplied function adds a \n newline (mapped appropriately for the client ma-

chine), and writes the line to a local file.

We will meet the retrlines method in a later example; the getfile utility module in

Example 13-4 always transfers in binary mode with retrbinary. That is, files are down-

loaded exactly as they were on the server, byte for byte, with the server’s line-feed

conventions in text files (you may need to convert line feeds after downloads if they

look odd in your text editor—see your editor or system shell commands for pointers,

or write a Python script that opens and writes the text as needed).

Example 13-4. PP4E\Internet\Ftp\getfile.py

#!/usr/local/bin/python

"""

Fetch an arbitrary file by FTP. Anonymous FTP unless you pass a

user=(name, pswd) tuple. Self-test FTPs a test file and site.

"""

Transferring Files with ftplib | 861

from ftplib import FTP # socket-based FTP tools

from os.path import exists # file existence test

def getfile(file, site, dir, user=(), *, verbose=True, refetch=False):

"""

fetch a file by ftp from a site/directory

anonymous or real login, binary transfer

"""

if exists(file) and not refetch:

if verbose: print(file, 'already fetched')

else:

if verbose: print('Downloading', file)

local = open(file, 'wb') # local file of same name

try:

remote = FTP(site) # connect to FTP site

remote.login(*user) # anonymous=() or (name, pswd)

remote.cwd(dir)

remote.retrbinary('RETR ' + file, local.write, 1024)

remote.quit()

finally:

local.close() # close file no matter what

if verbose: print('Download done.') # caller handles exceptions

if __name__ == '__main__':

from getpass import getpass

file = 'monkeys.jpg'

dir = '.'

site = 'ftp.rmi.net'

user = ('lutz', getpass('Pswd?'))

getfile(file, site, dir, user)

This module is mostly just a repackaging of the FTP code we used to fetch the image

file earlier, to make it simpler and reusable. Because it is a callable function, the exported

getfile.getfile here tries to be as robust and generally useful as possible, but even a

function this small implies some design decisions. Here are a few usage notes:

FTP mode

The getfile function in this script runs in anonymous FTP mode by default, but

a two-item tuple containing a username and password string may be passed to the

user argument in order to log in to the remote server in nonanonymous mode. To

use anonymous FTP, either don’t pass the user argument or pass it an empty tuple,

(). The FTP object login method allows two optional arguments to denote a user-

name and password, and the function(*args) call syntax in Example 13-4 sends

it whatever argument tuple you pass to user as individual arguments.

Processing modes

If passed, the last two arguments (verbose, refetch) allow us to turn off status

messages printed to the stdout stream (perhaps undesirable in a GUI context) and

to force downloads to happen even if the file already exists locally (the download

overwrites the existing local file).

862 | Chapter 13: Client-Side Scripting

These two arguments are coded as Python 3.X default keyword-only arguments, so

if used they must be passed by name, not position. The user argument instead can

be passed either way, if it is passed at all. Keyword-only arguments here prevent

passed verbose or refetch values from being incorrectly matched against the user

argument if the user value is actually omitted in a call.

Exception protocol

The caller is expected to handle exceptions; this function wraps downloads in a

try/finally statement to guarantee that the local output file is closed, but it lets

exceptions propagate. If used in a GUI or run from a thread, for instance, excep-

tions may require special handling unknown in this file.

Self-test

If run standalone, this file downloads an image file again from my website as a self-

test (configure for your server and file as desired), but the function will normally

be passed FTP filenames, site names, and directory names as well.

File mode

As in earlier examples, this script is careful to open the local output file in wb binary

mode to suppress end-line mapping and conform to Python 3.X’s Unicode string

model. As we learned in Chapter 4, it’s not impossible that true binary datafiles

may have bytes whose value is equal to a \n line-feed character; opening in w text

mode instead would make these bytes automatically expand to a \r\n two-byte

sequence when written locally on Windows. This is only an issue when run on

Windows; mode w doesn’t change end-lines elsewhere.

As we also learned in Chapter 4, though, binary mode is required to suppress the

automatic Unicode translations performed for text in Python 3.X. Without binary

mode, Python would attempt to encode fetched data when written per a default

or passed Unicode encoding scheme, which might fail for some types of fetched

text and would normally fail for truly binary data such as images and audio.

Because retrbinary writes bytes strings in 3.X, we really cannot open the output

file in text mode anyhow, or write will raise exceptions. Recall that in 3.X text-

mode files require str strings, and binary mode files expect bytes. Since

retrbinary writes bytes and retrlines writes str, they implicitly require binary

and text-mode output files, respectively. This constraint is irrespective of end-line

or Unicode issues, but it effectively accomplishes those goals as well.

As we’ll see in later examples, text-mode retrievals have additional encoding re-

quirements; in fact, ftplib will turn out to be a good example of the impacts of

Python 3.X’s Unicode string model on real-world code. By always using binary

mode in the script here, we sidestep the issue altogether.

Directory model

This function currently uses the same filename to identify both the remote file and

the local file where the download should be stored. As such, it should be run in

the directory where you want the file to show up; use os.chdir to move to direc-

tories if needed. (We could instead assume filename is the local file’s name, and

Transferring Files with ftplib | 863

strip the local directory with os.path.split to get the remote name, or accept two

distinct filename arguments—local and remote.)

Also notice that, despite its name, this module is very different from the getfile.py script

we studied at the end of the sockets material in the preceding chapter. The socket-based

getfile implemented custom client and server-side logic to download a server file to a

client machine over raw sockets.

The new getfile here is a client-side tool only. Instead of raw sockets, it uses the

standard FTP protocol to request a file from a server; all socket-level details are hidden

in the simpler ftplib module’s implementation of the FTP client protocol. Further-

more, the server here is a perpetually running program on the server machine, which

listens for and responds to FTP requests on a socket, on the dedicated FTP port (number

21). The net functional effect is that this script requires an FTP server to be running on

the machine where the desired file lives, but such a server is much more likely to be

available.

Upload utility

While we’re at it, let’s write a script to upload a single file by FTP to a remote machine.

The upload interfaces in the FTP module are symmetric with the download interfaces.

Given a connected FTP object, its:

•storbinary method can be used to upload bytes from an open local file object

•storlines method can be used to upload text in ASCII mode from an open local

file object

Unlike the download interfaces, both of these methods are passed a file object as a

whole, not a file object method (or other function). We will meet the storlines method

in a later example. The utility module in Example 13-5 uses storbinary such that the

file whose name is passed in is always uploaded verbatim—in binary mode, without

Unicode encodings or line-feed translations for the target machine’s conventions. If

this script uploads a text file, it will arrive exactly as stored on the machine it came

from, with client line-feed markers and existing Unicode encoding.

Example 13-5. PP4E\Internet\Ftp\putfile.py

#!/usr/local/bin/python

"""

Store an arbitrary file by FTP in binary mode. Uses anonymous

ftp unless you pass in a user=(name, pswd) tuple of arguments.

"""

import ftplib # socket-based FTP tools

def putfile(file, site, dir, user=(), *, verbose=True):

"""

store a file by ftp to a site/directory

anonymous or real login, binary transfer

864 | Chapter 13: Client-Side Scripting

"""

if verbose: print('Uploading', file)

local = open(file, 'rb') # local file of same name

remote = ftplib.FTP(site) # connect to FTP site

remote.login(*user) # anonymous or real login

remote.cwd(dir)

remote.storbinary('STOR ' + file, local, 1024)

remote.quit()

local.close()

if verbose: print('Upload done.')

if __name__ == '__main__':

site = 'ftp.rmi.net'

dir = '.'

import sys, getpass

pswd = getpass.getpass(site + ' pswd?') # filename on cmdline

putfile(sys.argv[1], site, dir, user=('lutz', pswd)) # nonanonymous login

Notice that for portability, the local file is opened in rb binary input mode this time to

suppress automatic line-feed character conversions. If this is binary information, we

don’t want any bytes that happen to have the value of the \r carriage-return character

to mysteriously go away during the transfer when run on a Windows client. We also

want to suppress Unicode encodings for nontext files, and we want reads to produce

the bytes strings expected by the storbinary upload operation (more on input file

modes later).

This script uploads a file you name on the command line as a self-test, but you will

normally pass in real remote filename, site name, and directory name strings. Also like

the download utility, you may pass a (username, password) tuple to the user argument

to trigger nonanonymous FTP mode (anonymous FTP is the default).

Playing the Monty Python theme song

It’s time for a bit of fun. To test, let’s use these scripts to transfer a copy of the Monty

Python theme song audio file I have at my website. First, let’s write a module that

downloads and plays the sample file, as shown in Example 13-6.

Example 13-6. PP4E\Internet\Ftp\sousa.py

#!/usr/local/bin/python

"""

Usage: sousa.py. Fetch and play the Monty Python theme song.

This will not work on your system as is: it requires a machine with Internet access

and an FTP server account you can access, and uses audio filters on Unix and your

.au player on Windows. Configure this and playfile.py as needed for your platform.

"""

from getpass import getpass

from PP4E.Internet.Ftp.getfile import getfile

from PP4E.System.Media.playfile import playfile

file = 'sousa.au' # default file coordinates

Transferring Files with ftplib | 865

site = 'ftp.rmi.net' # Monty Python theme song

dir = '.'

user = ('lutz', getpass('Pswd?'))

getfile(file, site, dir, user) # fetch audio file by FTP

playfile(file) # send it to audio player

# import os

# os.system('getone.py sousa.au') # equivalent command line

There’s not much to this script, because it really just combines two tools we’ve already

coded. We’re reusing Example 13-4’s getfile to download, and Chapter 6’s play

file module (Example 6-23) to play the audio sample after it is downloaded (turn back

to that example for more details on the player part of the task). Also notice the last two

lines in this file—we can achieve the same effect by passing in the audio filename as a

command-line argument to our original script, but it’s less direct.

As is, this script assumes my FTP server account; configure as desired (alas, this file

used to be at the ftp.python.org anonymous FTP site, but that site went dark for security

reasons between editions of this book). Once configured, this script will run on any

machine with Python, an Internet link, and a recognizable audio player; it works on

my Windows laptop with a broadband Internet connection, and it plays the music clip

in Windows Media Player (and if I could insert an audio file hyperlink here to show

what it sounds like, I would…):

C:\...\PP4E\Internet\Ftp> sousa.py

Pswd?

Downloading sousa.au

Download done.

C:\...\PP4E\Internet\Ftp> sousa.py

Pswd?

sousa.au already fetched

The getfile and putfile modules themselves can be used to move the sample file

around too. Both can either be imported by clients that wish to use their functions, or

run as top-level programs to trigger self-tests and command-line usage. For variety, let’s

run these scripts from a command line and the interactive prompt to see how they work.

When run standalone, the filename is passed in the command line to putfile and both

use password input and default site settings:

C:\...\PP4E\Internet\Ftp> putfile.py sousa.py

ftp.rmi.net pswd?

Uploading sousa.py

Upload done.

When imported, parameters are passed explicitly to functions:

C:\...\PP4E\Internet\Ftp> python

>>> from getfile import getfile

>>> getfile(file='sousa.au', site='ftp.rmi.net', dir='.', user=('lutz', 'XXX'))

sousa.au already fetched

866 | Chapter 13: Client-Side Scripting

C:\...\PP4E\Internet\Ftp> del sousa.au

C:\...\PP4E\Internet\Ftp> python

>>> from getfile import getfile

>>> getfile(file='sousa.au', site='ftp.rmi.net', dir='.', user=('lutz', 'XXX'))

Downloading sousa.au

Download done.

>>> from PP4E.System.Media.playfile import playfile

>>> playfile('sousa.au')

Although Python’s ftplib already automates the underlying socket and message for-

matting chores of FTP, tools of our own like these can make the process even simpler.

Adding a User Interface

If you read the preceding chapter, you’ll recall that it concluded with a quick look at

scripts that added a user interface to a socket-based getfile script—one that transferred

files over a proprietary socket dialog, instead of over FTP. At the end of that presenta-

tion, I mentioned that FTP is a much more generally useful way to move files around

because FTP servers are so widely available on the Net. For illustration purposes,

Example 13-7 shows a simple mutation of the prior chapter’s user interface, imple-

mented as a new subclass of the preceding chapter’s general form builder, form.py of

Example 12-20.

Example 13-7. PP4E\Internet\Ftp\getfilegui.py

"""

#################################################################################

launch FTP getfile function with a reusable form GUI class; uses os.chdir to

goto target local dir (getfile currently assumes that filename has no local

directory path prefix); runs getfile.getfile in thread to allow more than one

to be running at once and avoid blocking GUI during downloads; this differs

from socket-based getfilegui, but reuses Form GUI builder tool; supports both

user and anonymous FTP as currently coded;

caveats: the password field is not displayed as stars here, errors are printed

to the console instead of shown in the GUI (threads can't generally update the

GUI on Windows), this isn't 100% thread safe (there is a slight delay between

os.chdir here and opening the local output file in getfile) and we could

display both a save-as popup for picking the local dir, and a remote directory

listing for picking the file to get; suggested exercises: improve me;

#################################################################################

"""

from tkinter import Tk, mainloop

from tkinter.messagebox import showinfo

import getfile, os, sys, _thread # FTP getfile here, not socket

from PP4E.Internet.Sockets.form import Form # reuse form tool in socket dir

class FtpForm(Form):

def __init__(self):

root = Tk()

Transferring Files with ftplib | 867

root.title(self.title)

labels = ['Server Name', 'Remote Dir', 'File Name',

'Local Dir', 'User Name?', 'Password?']

Form.__init__(self, labels, root)

self.mutex = _thread.allocate_lock()

self.threads = 0

def transfer(self, filename, servername, remotedir, userinfo):

try:

self.do_transfer(filename, servername, remotedir, userinfo)

print('%s of "%s" successful' % (self.mode, filename))

except:

print('%s of "%s" has failed:' % (self.mode, filename), end=' ')

print(sys.exc_info()[0], sys.exc_info()[1])

self.mutex.acquire()

self.threads -= 1

self.mutex.release()

def onSubmit(self):

Form.onSubmit(self)

localdir = self.content['Local Dir'].get()

remotedir = self.content['Remote Dir'].get()

servername = self.content['Server Name'].get()

filename = self.content['File Name'].get()

username = self.content['User Name?'].get()

password = self.content['Password?'].get()

userinfo = ()

if username and password:

userinfo = (username, password)

if localdir:

os.chdir(localdir)

self.mutex.acquire()

self.threads += 1

self.mutex.release()

ftpargs = (filename, servername, remotedir, userinfo)

_thread.start_new_thread(self.transfer, ftpargs)

showinfo(self.title, '%s of "%s" started' % (self.mode, filename))

def onCancel(self):

if self.threads == 0:

Tk().quit()

else:

showinfo(self.title,

'Cannot exit: %d threads running' % self.threads)

class FtpGetfileForm(FtpForm):

title = 'FtpGetfileGui'

mode = 'Download'

def do_transfer(self, filename, servername, remotedir, userinfo):

getfile.getfile(

filename, servername, remotedir, userinfo, verbose=False, refetch=True)

if __name__ == '__main__':

FtpGetfileForm()

mainloop()

868 | Chapter 13: Client-Side Scripting

If you flip back to the end of the preceding chapter, you’ll find that this version is similar

in structure to its counterpart there; in fact, it has the same name (and is distinct only

because it lives in a different directory). The class here, though, knows how to use the

FTP-based getfile module from earlier in this chapter instead of the socket-based

getfile module we met a chapter ago. When run, this version also implements more

input fields, as in Figure 13-2, shown on Windows 7.

Figure 13-2. FTP getfile input form

Notice that a full absolute file path can be entered for the local directory here. If not,

the script assumes the current working directory, which changes after each download

and can vary depending on where the GUI is launched (e.g., the current directory differs

when this script is run by the PyDemos program at the top of the examples tree). When

we click this GUI’s Submit button (or press the Enter key), the script simply passes the

form’s input field values as arguments to the getfile.getfile FTP utility function of

Example 13-4 earlier in this section. It also posts a pop up to tell us the download has

begun (Figure 13-3).

Figure 13-3. FTP getfile info pop up

Transferring Files with ftplib | 869

As currently coded, further download status messages, including any FTP error mes-

sages, show up in the console window; here are the messages for successful downloads

as well as one that fails (with added blank lines for readability):

C:\...\PP4E\Internet\Ftp> getfilegui.py

Server Name => ftp.rmi.net

User Name? => lutz

Local Dir => test

File Name => about-pp.html

Password? => xxxxxxxx

Remote Dir => .

Download of "about-pp.html" successful

Server Name => ftp.rmi.net

User Name? => lutz

Local Dir => C:\temp

File Name => ora-lp4e-big.jpg

Password? => xxxxxxxx

Remote Dir => .

Download of "ora-lp4e-big.jpg" successful

Server Name => ftp.rmi.net

User Name? => lutz

Local Dir => C:\temp

File Name => ora-lp4e.jpg

Password? => xxxxxxxx

Remote Dir => .

Download of "ora-lp4e.jpg" has failed: <class 'ftplib.error_perm'>

550 ora-lp4e.jpg: No such file or directory

Given a username and password, the downloader logs into the specified account. To

do anonymous FTP instead, leave the username and password fields blank.

Now, to illustrate the threading capabilities of this GUI, start a download of a large file,

then start another download while this one is in progress. The GUI stays active while

downloads are underway, so we simply change the input fields and press Submit again.

This second download starts and runs in parallel with the first, because each download

is run in a thread, and more than one Internet connection can be active at once. In fact,

the GUI itself stays active during downloads only because downloads are run in threads;

if they were not, even screen redraws wouldn’t happen until a download finished.

We discussed threads in Chapter 5, and their application to GUIs in Chapters 9 and

10, but this script illustrates some practical thread concerns:

• This program takes care to not do anything GUI-related in a download thread. As

we’ve learned, only the thread that makes GUIs can generally process them.

• To avoid killing spawned download threads on some platforms, the GUI must also

be careful not to exit while any downloads are in progress. It keeps track of the

number of in-progress threads, and just displays a pop up if we try to kill the GUI

by pressing the Cancel button while both of these downloads are in progress.

870 | Chapter 13: Client-Side Scripting

We learned about ways to work around the no-GUI rule for threads in Chapter 10, and

we will apply such techniques when we explore the PyMailGUI example in the next

chapter. To be portable, though, we can’t really close the GUI until the active-thread

count falls to zero; the exit model of the threading module of Chapter 5 can be used

to achieve the same effect. Here is the sort of output that appears in the console window

when two downloads overlap in time:

C:\...\PP4E\Internet\Ftp> python getfilegui.py

Server Name => ftp.rmi.net

User Name? => lutz

Local Dir => C:\temp

File Name => spain08.JPG

Password? => xxxxxxxx

Remote Dir => .

Server Name => ftp.rmi.net

User Name? => lutz

Local Dir => C:\temp

File Name => index.html

Password? => xxxxxxxx

Remote Dir => .

Download of "index.html" successful

Download of "spain08.JPG" successful

This example isn’t much more useful than a command line-based tool, of course, but

it can be easily modified by changing its Python code, and it provides enough of a GUI

to qualify as a simple, first-cut FTP user interface. Moreover, because this GUI runs

downloads in Python threads, more than one can be run at the same time from this

GUI without having to start or restart a different FTP client tool.

While we’re in a GUI mood, let’s add a simple interface to the putfile utility, too. The

script in Example 13-8 creates a dialog that starts uploads in threads, using core FTP

logic imported from Example 13-5. It’s almost the same as the getfile GUI we just

wrote, so there’s not much new to say. In fact, because get and put operations are so

similar from an interface perspective, most of the get form’s logic was deliberately

factored out into a single generic class (FtpForm), so changes need be made in only a

single place. That is, the put GUI here is mostly just a reuse of the get GUI, with distinct

output labels and transfer methods. It’s in a file by itself, though, to make it easy to

launch as a standalone program.

Example 13-8. PP4E\Internet\Ftp\putfilegui.py

"""

###############################################################

launch FTP putfile function with a reusable form GUI class;

see getfilegui for notes: most of the same caveats apply;

the get and put forms have been factored into a single

class such that changes need be made in only one place;

###############################################################

"""

Transferring Files with ftplib | 871

from tkinter import mainloop

import putfile, getfilegui

class FtpPutfileForm(getfilegui.FtpForm):

title = 'FtpPutfileGui'

mode = 'Upload'

def do_transfer(self, filename, servername, remotedir, userinfo):

putfile.putfile(filename, servername, remotedir, userinfo, verbose=False)

if __name__ == '__main__':

FtpPutfileForm()

mainloop()

Running this script looks much like running the download GUI, because it’s almost

entirely the same code at work. Let’s upload some files from the client machine to the

server; Figure 13-4 shows the state of the GUI while starting one.

Figure 13-4. FTP putfile input form

And here is the console window output we get when uploading two files in serial fash-

ion; here again, uploads run in parallel threads, so if we start a new upload before one

in progress is finished, they overlap in time:

C:\...\PP4E\Internet\Ftp\test> ..\putfilegui.py

Server Name => ftp.rmi.net

User Name? => lutz

Local Dir => .

File Name => sousa.au

Password? => xxxxxxxx

Remote Dir => .

Upload of "sousa.au" successful

Server Name => ftp.rmi.net

User Name? => lutz

Local Dir => .

File Name => about-pp.html

872 | Chapter 13: Client-Side Scripting

Password? => xxxxxxxx

Remote Dir => .

Upload of "about-pp.html" successful

Finally, we can bundle up both GUIs in a single launcher script that knows how to start

the get and put interfaces, regardless of which directory we are in when the script is

started, and independent of the platform on which it runs. Example 13-9 shows this

process.

Example 13-9. PP4E\Internet\Ftp\PyFtpGui.pyw

"""

spawn FTP get and put GUIs no matter what directory I'm run from; os.getcwd is not

necessarily the place this script lives; could also hardcode path from $PP4EHOME,

or guessLocation; could also do: [from PP4E.launchmodes import PortableLauncher,

PortableLauncher('getfilegui', '%s/getfilegui.py' % mydir)()], but need the DOS

console pop up on Windows to view status messages which describe transfers made;

"""

import os, sys

print('Running in: ', os.getcwd())

# PP3E

# from PP4E.Launcher import findFirst

# mydir = os.path.split(findFirst(os.curdir, 'PyFtpGui.pyw'))[0]

# PP4E

from PP4E.Tools.find import findlist

mydir = os.path.dirname(findlist('PyFtpGui.pyw', startdir=os.curdir)[0])

if sys.platform[:3] == 'win':

os.system('start %s\getfilegui.py' % mydir)

os.system('start %s\putfilegui.py' % mydir)

else:

os.system('python %s/getfilegui.py &' % mydir)

os.system('python %s/putfilegui.py &' % mydir)

Notice that we’re reusing the find utility from Chapter 6’s Example 6-13 again here—

this time to locate the home directory of the script in order to build command lines.

When run by launchers in the examples root directory or command lines elsewhere in

general, the current working directory may not always be this script’s container. In the

prior edition, this script used a tool in the Launcher module instead to search for its

own directory (see the examples distribution for that equivalent).

When this script is started, both the get and put GUIs appear as distinct, independently

run programs; alternatively, we might attach both forms to a single interface. We could

get much fancier than these two interfaces, of course. For instance, we could pop up

local file selection dialogs, and we could display widgets that give the status of down-

loads and uploads in progress. We could even list files available at the remote site in a

selectable listbox by requesting remote directory listings over the FTP connection. To

learn how to add features like that, though, we need to move on to the next section.

Transferring Files with ftplib | 873

Transferring Directories with ftplib

Once upon a time, I used Telnet to manage my website at my Internet Service Provider

(ISP). I logged in to the web server in a shell window, and performed all my edits directly

on the remote machine. There was only one copy of a site’s files—on the machine that

hosted it. Moreover, content updates could be performed from any machine that ran

a Telnet client—ideal for people with travel-based careers.†

Of course, times have changed. Like most personal websites, today mine are maintained

on my laptop and I transfer their files to and from my ISP as needed. Often, this is a

simple matter of one or two files, and it can be accomplished with a command-line FTP

client. Sometimes, though, I need an easy way to transfer the entire site. Maybe I need

to download to detect files that have become out of sync. Occasionally, the changes

are so involved that it’s easier to upload the entire site in a single step.

Although there are a variety of ways to approach this task (including options in site-

builder tools), Python can help here, too: writing Python scripts to automate the upload

and download tasks associated with maintaining my website on my laptop provides a

portable and mobile solution. Because Python FTP scripts will work on any machine

with sockets, they can be run on my laptop and on nearly any other computer where

Python is installed. Furthermore, the same scripts used to transfer page files to and

from my PC can be used to copy my site to another web server as a backup copy, should

my ISP experience an outage. The effect is sometimes called a mirror—a copy of a

remote site.

Downloading Site Directories

The following two scripts address these needs. The first, downloadflat.py, automatically

downloads (i.e., copies) by FTP all the files in a directory at a remote site to a directory

on the local machine. I keep the main copy of my website files on my PC these days,

but I use this script in two ways:

• To download my website to client machines where I want to make edits, I fetch

the contents of my web directory of my account on my ISP’s machine.

• To mirror my site to my account on another server, I run this script periodically

on the target machine if it supports Telnet or SSH secure shell; if it does not, I

simply download to one machine and upload from there to the target server.

† No, really. The second edition of this book included a tale of woe here about how my ISP forced its users to

wean themselves off Telnet access. This seems like a small issue today. Common practice on the Internet has

come far in a short time. One of my sites has even grown too complex for manual edits (except, of course,

to work around bugs in the site-builder tool). Come to think of it, so has Python’s presence on the Web.

When I first found Python in 1992, it was a set of encoded email messages, which users decoded and

concatenated and hoped the result worked. Yes, yes, I know—gee, Grandpa, tell us more…

874 | Chapter 13: Client-Side Scripting

More generally, this script (shown in Example 13-10) will download a directory full of

files to any machine with Python and sockets, from any machine running an FTP server.

Example 13-10. PP4E\Internet\Ftp\Mirror\downloadflat.py

#!/bin/env python

"""

###############################################################################

use FTP to copy (download) all files from a single directory at a remote

site to a directory on the local machine; run me periodically to mirror

a flat FTP site directory to your ISP account; set user to 'anonymous'

to do anonymous FTP; we could use try to skip file failures, but the FTP

connection is likely closed if any files fail; we could also try to

reconnect with a new FTP instance before each transfer: connects once now;

if failures, try setting nonpassive for active FTP, or disable firewalls;

this also depends on a working FTP server, and possibly its load policies.

###############################################################################

"""

import os, sys, ftplib

from getpass import getpass

from mimetypes import guess_type

nonpassive = False # passive FTP on by default in 2.1+

remotesite = 'home.rmi.net' # download from this site

remotedir = '.' # and this dir (e.g., public_html)

remoteuser = 'lutz'

remotepass = getpass('Password for %s on %s: ' % (remoteuser, remotesite))

localdir = (len(sys.argv) > 1 and sys.argv[1]) or '.'

cleanall = input('Clean local directory first? ')[:1] in ['y', 'Y']

print('connecting...')

connection = ftplib.FTP(remotesite) # connect to FTP site

connection.login(remoteuser, remotepass) # login as user/password

connection.cwd(remotedir) # cd to directory to copy

if nonpassive: # force active mode FTP

connection.set_pasv(False) # most servers do passive

if cleanall:

for localname in os.listdir(localdir): # try to delete all locals

try: # first, to remove old files

print('deleting local', localname) # os.listdir omits . and ..

os.remove(os.path.join(localdir, localname))

except:

print('cannot delete local', localname)

count = 0 # download all remote files

remotefiles = connection.nlst() # nlst() gives files list

# dir() gives full details

for remotename in remotefiles:

if remotename in ('.', '..'): continue # some servers include . and ..

mimetype, encoding = guess_type(remotename) # e.g., ('text/plain', 'gzip')

mimetype = mimetype or '?/?' # may be (None, None)

maintype = mimetype.split('/')[0] # .jpg ('image/jpeg', None')

Transferring Directories with ftplib | 875

localpath = os.path.join(localdir, remotename)

print('downloading', remotename, 'to', localpath, end=' ')

print('as', maintype, encoding or '')

if maintype == 'text' and encoding == None:

# use ascii mode xfer and text file

# use encoding compatible wth ftplib's

localfile = open(localpath, 'w', encoding=connection.encoding)

callback = lambda line: localfile.write(line + '\n')

connection.retrlines('RETR ' + remotename, callback)

else:

# use binary mode xfer and bytes file

localfile = open(localpath, 'wb')

connection.retrbinary('RETR ' + remotename, localfile.write)

localfile.close()

count += 1

connection.quit()

print('Done:', count, 'files downloaded.')

There’s not much that is new to speak of in this script, compared to other FTP examples

we’ve seen thus far. We open a connection with the remote FTP server, log in with a

username and password for the desired account (this script never uses anonymous

FTP), and go to the desired remote directory. New here, though, are loops to iterate

over all the files in local and remote directories, text-based retrievals, and file deletions:

Deleting all local files

This script has a cleanall option, enabled by an interactive prompt. If selected,

the script first deletes all the files in the local directory before downloading, to make

sure there are no extra files that aren’t also on the server (there may be junk here

from a prior download). To delete local files, the script calls os.listdir to get a list

of filenames in the directory, and os.remove to delete each; see Chapter 4 (or the

Python library manual) for more details if you’ve forgotten what these calls do.

Notice the use of os.path.join to concatenate a directory path and filename ac-

cording to the host platform’s conventions; os.listdir returns filenames without

their directory paths, and this script is not necessarily run in the local directory

where downloads will be placed. The local directory defaults to the current direc-

tory (“.”), but can be set differently with a command-line argument to the script.

Fetching all remote files

To grab all the files in a remote directory, we first need a list of their names. The

FTP object’s nlst method is the remote equivalent of os.listdir: nlst returns a

list of the string names of all files in the current remote directory. Once we have

this list, we simply step through it in a loop, running FTP retrieval commands for

each filename in turn (more on this in a minute).

The nlst method is, more or less, like requesting a directory listing with an ls

command in typical interactive FTP programs, but Python automatically splits up

876 | Chapter 13: Client-Side Scripting

the listing’s text into a list of filenames. We can pass it a remote directory to be

listed; by default it lists the current server directory. A related FTP method, dir,

returns the list of line strings produced by an FTP LIST command; its result is like

typing a dir command in an FTP session, and its lines contain complete file infor-

mation, unlike nlst. If you need to know more about all the remote files, parse the

result of a dir method call (we’ll see how in a later example).

Notice how we skip “.” and “..” current and parent directory indicators if present

in remote directory listings; unlike os.listdir, some (but not all) servers include

these, so we need to either skip these or catch the exceptions they may trigger (more

on this later when we start using dir, too).

Selecting transfer modes with mimetypes

We discussed output file modes for FTP earlier, but now that we’ve started trans-

ferring text, too, I can fill in the rest of this story. To handle Unicode encodings

and to keep line-ends in sync with the machines that my web files live on, this script

distinguishes between binary and text file transfers. It uses the Python mimetypes

module to choose between text and binary transfer modes for each file.

We met mimetypes in Chapter 6 near Example 6-23, where we used it to play media

files (see the examples and description there for an introduction). Here, mime

types is used to decide whether a file is text or binary by guessing from its filename

extension. For instance, HTML web pages and simple text files are transferred as

text with automatic line-end mappings, and images and tar archives are transferred

in raw binary mode.

Downloading: text versus binary

For binary files data is pulled down with the retrbinary method we met earlier,

and stored in a local file with binary open mode of wb. This file open mode is

required to allow for the bytes strings passed to the write method by retrbinary,

but it also suppresses line-end byte mapping and Unicode encodings in the process.

Again, text mode requires encodable text in Python 3.X, and this fails for binary

data like images. This script may also be run on Windows or Unix-like platforms,

and we don’t want a \n byte embedded in an image to get expanded to \r\n on

Windows. We don’t use a chunk-size third argument for binary transfers here,

though—it defaults to a reasonable size if omitted.

For text files, the script instead uses the retrlines method, passing in a function

to be called for each line in the text file downloaded. The text line handler function

receives lines in str string form, and mostly just writes the line to a local text file.

But notice that the handler function created by the lambda here also adds a \n line-

end character to the end of the line it is passed. Python’s retrlines method strips

all line-feed characters from lines to sidestep platform differences. By adding a

\n, the script ensures the proper line-end marker character sequence for the local

platform on which this script runs when written to the file (\n or \r\n).

For this auto-mapping of the \n in the script to work, of course, we must also open

text output files in w text mode, not in wb—the mapping from \n to \r\n on

Transferring Directories with ftplib | 877

Windows happens when data is written to the file. As discussed earlier, text mode

also means that the file’s write method will allow for the str string passed in by

retrlines, and that text will be encoded per Unicode when written.

Subtly, though, we also explicitly use the FTP connection object’s Unicode encod-

ing scheme for our text output file in open, instead of the default. Without this

encoding option, the script aborted with a UnicodeEncodeError exception for some

files in my site. In retrlines, the FTP object itself reads the remote file data over a

socket with a text-mode file wrapper and an explicit encoding scheme for decoding;

since the FTP object can do no better than this encoding anyhow, we use its en-

coding for our output file as well.

By default, FTP objects use the latin1 scheme for decoding text fetched (as well

as for encoding text sent), but this can be specialized by assigning to their

encoding attribute. Our script’s local text output file will inherit whatever encoding

ftplib uses and so be compatible with the encoded text data that it produces and

passes.

We could try to also catch Unicode exceptions for files outside the Unicode en-

coding used by the FTP object, but exceptions leave the FTP object in an unre-

coverable state in tests I’ve run in Python 3.1. Alternatively, we could use wb binary

mode for the local text output file and manually encode line strings with

line.encode, or simply use retrbinary and binary mode files in all cases, but both

of these would fail to map end-lines portably—the whole point of making text

distinct in this context.

All of this is simpler in action than in words. Here is the command I use to download

my entire book support website from my ISP server account to my Windows laptop

PC, in a single step:

C:\...\PP4E\Internet\Ftp\Mirror> downloadflat.py test

Password for lutz on home.rmi.net:

Clean local directory first? y

connecting...

deleting local 2004-longmont-classes.html

deleting local 2005-longmont-classes.html

deleting local 2006-longmont-classes.html

deleting local about-hopl.html

deleting local about-lp.html

deleting local about-lp2e.html

deleting local about-pp-japan.html

...lines omitted...

downloading 2004-longmont-classes.html to test\2004-longmont-classes.html as text

downloading 2005-longmont-classes.html to test\2005-longmont-classes.html as text

downloading 2006-longmont-classes.html to test\2006-longmont-classes.html as text

downloading about-hopl.html to test\about-hopl.html as text

downloading about-lp.html to test\about-lp.html as text

downloading about-lp2e.html to test\about-lp2e.html as text

downloading about-pp-japan.html to test\about-pp-japan.html as text

878 | Chapter 13: Client-Side Scripting

...lines omitted...

downloading ora-pyref4e.gif to test\ora-pyref4e.gif as image

downloading ora-lp4e-big.jpg to test\ora-lp4e-big.jpg as image

downloading ora-lp4e.gif to test\ora-lp4e.gif as image

downloading pyref4e-updates.html to test\pyref4e-updates.html as text

downloading lp4e-updates.html to test\lp4e-updates.html as text

downloading lp4e-examples.html to test\lp4e-examples.html as text

downloading LP4E-examples.zip to test\LP4E-examples.zip as application

Done: 297 files downloaded.

This may take a few moments to complete, depending on your site’s size and your

connection speed (it’s bound by network speed constraints, and it usually takes roughly

two to three minutes for my site on my current laptop and wireless broadband con-

nection). It is much more accurate and easier than downloading files by hand, though.

The script simply iterates over all the remote files returned by the nlst method, and

downloads each with the FTP protocol (i.e., over sockets) in turn. It uses text transfer

mode for names that imply text data, and binary mode for others.

With the script running this way, I make sure the initial assignments in it reflect the

machines involved, and then run the script from the local directory where I want

the site copy to be stored. Because the target download directory is often not where the

script lives, I may need to give Python the full path to the script file. When run on a

server in a Telnet or SSH session window, for instance, the execution and script direc-

tory paths are different, but the script works the same way.

If you elect to delete local files in the download directory, you may also see a batch of

“deleting local…” messages scroll by on the screen before any “downloading…” lines

appear: this automatically cleans out any garbage lingering from a prior download. And

if you botch the input of the remote site password, a Python exception is raised; I

sometimes need to run it again (and type more slowly):

C:\...\PP4E\Internet\Ftp\Mirror> downloadflat.py test

Password for lutz on home.rmi.net:

Clean local directory first?

connecting...

Traceback (most recent call last):

File "C:\...\PP4E\Internet\Ftp\Mirror\downloadflat.py", line 29, in <module>

connection.login(remoteuser, remotepass) # login as user/password

File "C:\Python31\lib\ftplib.py", line 375, in login

if resp[0] == '3': resp = self.sendcmd('PASS ' + passwd)

File "C:\Python31\lib\ftplib.py", line 245, in sendcmd

return self.getresp()

File "C:\Python31\lib\ftplib.py", line 220, in getresp

raise error_perm(resp)

ftplib.error_perm: 530 Login incorrect.

It’s worth noting that this script is at least partially configured by assignments near the

top of the file. In addition, the password and deletion options are given by interactive

inputs, and one command-line argument is allowed—the local directory name to store

Transferring Directories with ftplib | 879

the downloaded files (it defaults to “.”, the directory where the script is run).

Command-line arguments could be employed to universally configure all the other

download parameters and options, too, but because of Python’s simplicity and lack of

compile/link steps, changing settings in the text of Python scripts is usually just as easy

as typing words on a command line.

To check for version skew after a batch of downloads and uploads, you

can run the diffall script we wrote in Chapter 6, Example 6-12. For

instance, I find files that have diverged over time due to updates on

multiple platforms by comparing the download to a local copy of my

website using a shell command line such as C:\...\PP4E\Internet

\Ftp> ..\..\System\Filetools\diffall.py Mirror\test C:\...\Web

sites\public_html. See Chapter 6 for more details on this tool, and file

diffall.out.txt in the diffs subdirectory of the examples distribution for a

sample run; its text file differences stem from either final line newline

characters or newline differences reflecting binary transfers that Win-

dows fc commands and FTP servers do not notice.

Uploading Site Directories

Uploading a full directory is symmetric to downloading: it’s mostly a matter of swap-

ping the local and remote machines and operations in the program we just met. The

script in Example 13-11 uses FTP to copy all files in a directory on the local machine

on which it runs up to a directory on a remote machine.

I really use this script, too, most often to upload all of the files maintained on my laptop

PC to my ISP account in one fell swoop. I also sometimes use it to copy my site from

my PC to a mirror machine or from the mirror machine back to my ISP. Because this

script runs on any computer with Python and sockets, it happily transfers a directory

from any machine on the Net to any machine running an FTP server. Simply change

the initial setting in this module as appropriate for the transfer you have in mind.

Example 13-11. PP4E\Internet\Ftp\Mirror\uploadflat.py

#!/bin/env python

"""

##############################################################################

use FTP to upload all files from one local dir to a remote site/directory;

e.g., run me to copy a web/FTP site's files from your PC to your ISP;

assumes a flat directory upload: uploadall.py does nested directories.

see downloadflat.py comments for more notes: this script is symmetric.

##############################################################################

"""

import os, sys, ftplib

from getpass import getpass

from mimetypes import guess_type

880 | Chapter 13: Client-Side Scripting

nonpassive = False # passive FTP by default

remotesite = 'learning-python.com' # upload to this site

remotedir = 'books' # from machine running on

remoteuser = 'lutz'

remotepass = getpass('Password for %s on %s: ' % (remoteuser, remotesite))

localdir = (len(sys.argv) > 1 and sys.argv[1]) or '.'

cleanall = input('Clean remote directory first? ')[:1] in ['y', 'Y']

print('connecting...')

connection = ftplib.FTP(remotesite) # connect to FTP site

connection.login(remoteuser, remotepass) # log in as user/password

connection.cwd(remotedir) # cd to directory to copy

if nonpassive: # force active mode FTP

connection.set_pasv(False) # most servers do passive

if cleanall:

for remotename in connection.nlst(): # try to delete all remotes

try: # first, to remove old files

print('deleting remote', remotename)

connection.delete(remotename) # skips . and .. if attempted

except:

print('cannot delete remote', remotename)

count = 0 # upload all local files

localfiles = os.listdir(localdir) # listdir() strips dir path

# any failure ends script

for localname in localfiles:

mimetype, encoding = guess_type(localname) # e.g., ('text/plain', 'gzip')

mimetype = mimetype or '?/?' # may be (None, None)

maintype = mimetype.split('/')[0] # .jpg ('image/jpeg', None')

localpath = os.path.join(localdir, localname)

print('uploading', localpath, 'to', localname, end=' ')

print('as', maintype, encoding or '')

if maintype == 'text' and encoding == None:

# use ascii mode xfer and bytes file

# need rb mode for ftplib's crlf logic

localfile = open(localpath, 'rb')

connection.storlines('STOR ' + localname, localfile)

else:

# use binary mode xfer and bytes file

localfile = open(localpath, 'rb')

connection.storbinary('STOR ' + localname, localfile)

localfile.close()

count += 1

connection.quit()

print('Done:', count, 'files uploaded.')

Transferring Directories with ftplib | 881

Similar to the mirror download script, this program illustrates a handful of new FTP

interfaces and a set of FTP scripting techniques:

Deleting all remote files

Just like the mirror script, the upload begins by asking whether we want to delete

all the files in the remote target directory before copying any files there. This

cleanall option is useful if we’ve deleted files in the local copy of the directory in

the client—the deleted files would remain on the server-side copy unless we delete

all files there first.

To implement the remote cleanup, this script simply gets a listing of all the files in

the remote directory with the FTP nlst method, and deletes each in turn with

the FTP delete method. Assuming we have delete permission, the directory will

be emptied (file permissions depend on the account we logged into when con-

necting to the server). We’ve already moved to the target remote directory when

deletions occur, so no directory paths need to be prepended to filenames here. Note

that nlst may raise an exception for some servers if the remote directory is empty;

we don’t catch the exception here, but you can simply not select a cleaning if one

fails for you. We do catch deletion exceptions, because directory names like “.”

and “..” may be returned in the listing by some servers.

Storing all local files

To apply the upload operation to each file in the local directory, we get a list of

local filenames with the standard os.listdir call, and take care to prepend the

local source directory path to each filename with the os.path.join call. Recall that

os.listdir returns filenames without directory paths, and the source directory may

not be the same as the script’s execution directory if passed on the command line.

Uploading: Text versus binary

This script may also be run on both Windows and Unix-like clients, so we need to

handle text files specially. Like the mirror download, this script picks text or binary

transfer modes by using Python’s mimetypes module to guess a file’s type from its

filename extension; HTML and text files are moved in FTP text mode, for instance.

We already met the storbinary FTP object method used to upload files in binary

mode—an exact, byte-for-byte copy appears at the remote site.

Text-mode transfers work almost identically: the storlines method accepts an FTP

command string and a local file (or file-like) object, and simply copies each line

read from the local file to a same-named file on the remote machine.

Notice, though, that the local text input file must be opened in rb binary mode in

Python3.X. Text input files are normally opened in r text mode to perform Unicode

decoding and to convert any \r\n end-of-line sequences on Windows to the \n

platform-neutral character as lines are read. However, ftplib in Python 3.1 requires

that the text file be opened in rb binary mode, because it converts all end-lines to

the \r\n sequence for transmission; to do so, it must read lines as raw bytes with

readlines and perform bytes string processing, which implies binary mode files.

882 | Chapter 13: Client-Side Scripting

This ftplib string processing worked with text-mode files in Python 2.X, but only

because there was no separate bytes type; \n was expanded to \r\n. Opening the

local file in binary mode for ftplib to read also means no Unicode decoding will

occur: the text is sent over sockets as a byte string in already encoded form. All of

which is, of course, a prime lesson on the impacts of Unicode encodings; consult

the module ftplib.py in the Python source library directory for more details.

For binary mode transfers, things are simpler—we open the local file in rb binary

mode to suppress Unicode decoding and automatic mapping everywhere, and re-

turn the bytes strings expected by ftplib on read. Binary data is not Unicode text,

and we don’t want bytes in an audio file that happen to have the same value as

\r to magically disappear when read on Windows.

As for the mirror download script, this program simply iterates over all files to be

transferred (files in the local directory listing this time), and transfers each in turn—in

either text or binary mode, depending on the files’ names. Here is the command I use

to upload my entire website from my laptop Windows PC to a remote Linux server at

my ISP, in a single step:

C:\...\PP4E\Internet\Ftp\Mirror> uploadflat.py test

Password for lutz on learning-python.com:

Clean remote directory first? y

connecting...

deleting remote .

cannot delete remote .

deleting remote ..

cannot delete remote ..

deleting remote 2004-longmont-classes.html

deleting remote 2005-longmont-classes.html

deleting remote 2006-longmont-classes.html

deleting remote about-lp1e.html

deleting remote about-lp2e.html

deleting remote about-lp3e.html

deleting remote about-lp4e.html

...lines omitted...

uploading test\2004-longmont-classes.html to 2004-longmont-classes.html as text

uploading test\2005-longmont-classes.html to 2005-longmont-classes.html as text

uploading test\2006-longmont-classes.html to 2006-longmont-classes.html as text

uploading test\about-lp1e.html to about-lp1e.html as text

uploading test\about-lp2e.html to about-lp2e.html as text

uploading test\about-lp3e.html to about-lp3e.html as text

uploading test\about-lp4e.html to about-lp4e.html as text

uploading test\about-pp-japan.html to about-pp-japan.html as text

...lines omitted...

uploading test\whatsnew.html to whatsnew.html as text

uploading test\whatsold.html to whatsold.html as text

uploading test\wxPython.doc.tgz to wxPython.doc.tgz as application gzip

uploading test\xlate-lp.html to xlate-lp.html as text

Transferring Directories with ftplib | 883

uploading test\zaurus0.jpg to zaurus0.jpg as image

uploading test\zaurus1.jpg to zaurus1.jpg as image

uploading test\zaurus2.jpg to zaurus2.jpg as image

uploading test\zoo-jan-03.jpg to zoo-jan-03.jpg as image

uploading test\zopeoutline.htm to zopeoutline.htm as text

Done: 297 files uploaded.

For my site and on my current laptop and wireless broadband connection, this process

typically takes six minutes, depending on server load. As with the download script, I

often run this command from the local directory where my web files are kept, and I

pass Python the full path to the script. When I run this on a Linux server, it works in

the same way, but the paths to the script and my web files directory differ.‡

Refactoring Uploads and Downloads for Reuse

The directory upload and download scripts of the prior two sections work as advertised

and, apart from the mimetypes logic, were the only FTP examples that were included in

the second edition of this book. If you look at these two scripts long enough, though,

their similarities will pop out at you eventually. In fact, they are largely the same—they

use identical code to configure transfer parameters, connect to the FTP server, and

determine file type. The exact details have been lost to time, but some of this code was

certainly copied from one file to the other.

Although such redundancy isn’t a cause for alarm if we never plan on changing these

scripts, it can be a killer in software projects in general. When you have two copies of

identical bits of code, not only is there a danger of them becoming out of sync over time

(you’ll lose uniformity in user interface and behavior), but you also effectively double

your effort when it comes time to change code that appears in both places. Unless you’re

a big fan of extra work, it pays to avoid redundancy wherever possible.

This redundancy is especially glaring when we look at the complex code that uses

mimetypes to determine file types. Repeating magic like this in more than one place is

almost always a bad idea—not only do we have to remember how it works every time

we need the same utility, but it is a recipe for errors.

Refactoring with functions

As originally coded, our download and upload scripts comprise top-level script code

that relies on global variables. Such a structure is difficult to reuse—code runs imme-

diately on imports, and it’s difficult to generalize for varying contexts. Worse, it’s dif-

ficult to maintain—when you program by cut-and-paste of existing code, you increase

the cost of future changes every time you click the Paste button.

‡ Usage note: These scripts are highly dependent on the FTP server functioning properly. For a while, the

upload script occasionally had timeout errors when running over my current broadband connection. These

errors went away later, when my ISP fixed or reconfigured their server. If you have failures, try running against

a different server; connecting and disconnecting around each transfer may or may not help (some servers

limit their number of connections).

884 | Chapter 13: Client-Side Scripting

To demonstrate how we might do better, Example 13-12 shows one way to refactor

(reorganize) the download script. By wrapping its parts in functions, they become re-

usable in other modules, including our upload program.

Example 13-12. PP4E\Internet\Ftp\Mirror\downloadflat_modular.py

#!/bin/env python

"""

##############################################################################

use FTP to copy (download) all files from a remote site and directory

to a directory on the local machine; this version works the same, but has

been refactored to wrap up its code in functions that can be reused by the

uploader, and possibly other programs in the future - else code redundancy,

which may make the two diverge over time, and can double maintenance costs.

##############################################################################

"""

import os, sys, ftplib

from getpass import getpass

from mimetypes import guess_type, add_type

defaultSite = 'home.rmi.net'

defaultRdir = '.'

defaultUser = 'lutz'

def configTransfer(site=defaultSite, rdir=defaultRdir, user=defaultUser):

"""

get upload or download parameters

uses a class due to the large number

"""

class cf: pass

cf.nonpassive = False # passive FTP on by default in 2.1+

cf.remotesite = site # transfer to/from this site

cf.remotedir = rdir # and this dir ('.' means acct root)

cf.remoteuser = user

cf.localdir = (len(sys.argv) > 1 and sys.argv[1]) or '.'

cf.cleanall = input('Clean target directory first? ')[:1] in ['y','Y']

cf.remotepass = getpass(

'Password for %s on %s:' % (cf.remoteuser, cf.remotesite))

return cf

def isTextKind(remotename, trace=True):

"""

use mimetype to guess if filename means text or binary

for 'f.html, guess is ('text/html', None): text

for 'f.jpeg' guess is ('image/jpeg', None): binary

for 'f.txt.gz' guess is ('text/plain', 'gzip'): binary

for unknowns, guess may be (None, None): binary

mimetype can also guess name from type: see PyMailGUI

"""

add_type('text/x-python-win', '.pyw') # not in tables

mimetype, encoding = guess_type(remotename, strict=False) # allow extras

mimetype = mimetype or '?/?' # type unknown?

maintype = mimetype.split('/')[0] # get first part

if trace: print(maintype, encoding or '')

Transferring Directories with ftplib | 885

return maintype == 'text' and encoding == None # not compressed

def connectFtp(cf):

print('connecting...')

connection = ftplib.FTP(cf.remotesite) # connect to FTP site

connection.login(cf.remoteuser, cf.remotepass) # log in as user/password

connection.cwd(cf.remotedir) # cd to directory to xfer

if cf.nonpassive: # force active mode FTP

connection.set_pasv(False) # most servers do passive

return connection

def cleanLocals(cf):

"""

try to delete all locals files first to remove garbage

"""

if cf.cleanall:

for localname in os.listdir(cf.localdir): # local dirlisting

try: # local file delete

print('deleting local', localname)

os.remove(os.path.join(cf.localdir, localname))

except:

print('cannot delete local', localname)

def downloadAll(cf, connection):

"""

download all files from remote site/dir per cf config

ftp nlst() gives files list, dir() gives full details

"""

remotefiles = connection.nlst() # nlst is remote listing

for remotename in remotefiles:

if remotename in ('.', '..'): continue

localpath = os.path.join(cf.localdir, remotename)

print('downloading', remotename, 'to', localpath, 'as', end=' ')

if isTextKind(remotename):

# use text mode xfer

localfile = open(localpath, 'w', encoding=connection.encoding)

def callback(line): localfile.write(line + '\n')

connection.retrlines('RETR ' + remotename, callback)

else:

# use binary mode xfer

localfile = open(localpath, 'wb')

connection.retrbinary('RETR ' + remotename, localfile.write)

localfile.close()

connection.quit()

print('Done:', len(remotefiles), 'files downloaded.')

if __name__ == '__main__':

cf = configTransfer()

conn = connectFtp(cf)

cleanLocals(cf) # don't delete if can't connect

downloadAll(cf, conn)

Compare this version with the original. This script, and every other in this section, runs

the same as the original flat download and upload programs. Although we haven’t

886 | Chapter 13: Client-Side Scripting

changed its behavior, though, we’ve modified the script’s software structure radically—

its code is now a set of tools that can be imported and reused in other programs.

The refactored upload program in Example 13-13, for instance, is now noticeably sim-

pler, and the code it shares with the download script only needs to be changed in one

place if it ever requires improvement.

Example 13-13. PP4E\Internet\Ftp\Mirror\uploadflat_modular.py

#!/bin/env python

"""

##############################################################################

use FTP to upload all files from a local dir to a remote site/directory;

this version reuses downloader's functions, to avoid code redundancy;

##############################################################################

"""

import os

from downloadflat_modular import configTransfer, connectFtp, isTextKind

def cleanRemotes(cf, connection):

"""

try to delete all remote files first to remove garbage

"""

if cf.cleanall:

for remotename in connection.nlst(): # remote dir listing

try: # remote file delete

print('deleting remote', remotename) # skips . and .. exc

connection.delete(remotename)

except:

print('cannot delete remote', remotename)

def uploadAll(cf, connection):

"""

upload all files to remote site/dir per cf config

listdir() strips dir path, any failure ends script

"""

localfiles = os.listdir(cf.localdir) # listdir is local listing

for localname in localfiles:

localpath = os.path.join(cf.localdir, localname)

print('uploading', localpath, 'to', localname, 'as', end=' ')

if isTextKind(localname):

# use text mode xfer

localfile = open(localpath, 'rb')

connection.storlines('STOR ' + localname, localfile)

else:

# use binary mode xfer

localfile = open(localpath, 'rb')

connection.storbinary('STOR ' + localname, localfile)

localfile.close()

connection.quit()

print('Done:', len(localfiles), 'files uploaded.')

if __name__ == '__main__':

cf = configTransfer(site='learning-python.com', rdir='books', user='lutz')

Transferring Directories with ftplib | 887

conn = connectFtp(cf)

cleanRemotes(cf, conn)

uploadAll(cf, conn)

Not only is the upload script simpler now because it reuses common code, but it will

also inherit any changes made in the download module. For instance, the isTextKind

function was later augmented with code that adds the .pyw extension to mimetypes

tables (this file type is not recognized by default); because it is a shared function, the

change is automatically picked up in the upload program, too.

This script and the one it imports achieve the same goals as the originals, but changing

them for easier code maintenance is a big deal in the real world of software development.

The following, for example, downloads the site from one server and uploads to another:

C:\...\PP4E\Internet\Ftp\Mirror> python downloadflat_modular.py test

Clean target directory first?

Password for lutz on home.rmi.net:

connecting...

downloading 2004-longmont-classes.html to test\2004-longmont-classes.html as text

...lines omitted...

downloading relo-feb010-index.html to test\relo-feb010-index.html as text

Done: 297 files downloaded.

C:\...\PP4E\Internet\Ftp\Mirror> python uploadflat_modular.py test

Clean target directory first?

Password for lutz on learning-python.com:

connecting...

uploading test\2004-longmont-classes.html to 2004-longmont-classes.html as text

...lines omitted...

uploading test\zopeoutline.htm to zopeoutline.htm as text

Done: 297 files uploaded.

Refactoring with classes

The function-based approach of the last two examples addresses the redundancy issue,

but they are perhaps clumsier than they need to be. For instance, their cf configuration

options object provides a namespace that replaces global variables and breaks cross-

file dependencies. Once we start making objects to model namespaces, though, Py-

thon’s OOP support tends to be a more natural structure for our code. As one last twist,

Example 13-14 refactors the FTP code one more time in order to leverage Python’s class

feature.

Example 13-14. PP4E\Internet\Ftp\Mirror\ftptools.py

#!/bin/env python

"""

##############################################################################

use FTP to download or upload all files in a single directory from/to a

remote site and directory; this version has been refactored to use classes

and OOP for namespace and a natural structure; we could also structure this

as a download superclass, and an upload subclass which redefines the clean

and transfer methods, but then there is no easy way for another client to

888 | Chapter 13: Client-Side Scripting

invoke both an upload and download; for the uploadall variant and possibly

others, also make single file upload/download code in orig loops methods;

##############################################################################

"""

import os, sys, ftplib

from getpass import getpass

from mimetypes import guess_type, add_type

# defaults for all clients

dfltSite = 'home.rmi.net'

dfltRdir = '.'

dfltUser = 'lutz'

class FtpTools:

# allow these 3 to be redefined

def getlocaldir(self):

return (len(sys.argv) > 1 and sys.argv[1]) or '.'

def getcleanall(self):

return input('Clean target dir first?')[:1] in ['y','Y']

def getpassword(self):

return getpass(

'Password for %s on %s:' % (self.remoteuser, self.remotesite))

def configTransfer(self, site=dfltSite, rdir=dfltRdir, user=dfltUser):

"""

get upload or download parameters

from module defaults, args, inputs, cmdline

anonymous ftp: user='anonymous' pass=emailaddr

"""

self.nonpassive = False # passive FTP on by default in 2.1+

self.remotesite = site # transfer to/from this site

self.remotedir = rdir # and this dir ('.' means acct root)

self.remoteuser = user

self.localdir = self.getlocaldir()

self.cleanall = self.getcleanall()

self.remotepass = self.getpassword()

def isTextKind(self, remotename, trace=True):

"""

use mimetypes to guess if filename means text or binary

for 'f.html, guess is ('text/html', None): text

for 'f.jpeg' guess is ('image/jpeg', None): binary

for 'f.txt.gz' guess is ('text/plain', 'gzip'): binary

for unknowns, guess may be (None, None): binary

mimetypes can also guess name from type: see PyMailGUI

"""

add_type('text/x-python-win', '.pyw') # not in tables

mimetype, encoding = guess_type(remotename, strict=False)# allow extras

mimetype = mimetype or '?/?' # type unknown?

maintype = mimetype.split('/')[0] # get 1st part

if trace: print(maintype, encoding or '')

Transferring Directories with ftplib | 889

return maintype == 'text' and encoding == None # not compressed

def connectFtp(self):

print('connecting...')

connection = ftplib.FTP(self.remotesite) # connect to FTP site

connection.login(self.remoteuser, self.remotepass) # log in as user/pswd

connection.cwd(self.remotedir) # cd to dir to xfer

if self.nonpassive: # force active mode

connection.set_pasv(False) # most do passive

self.connection = connection

def cleanLocals(self):

"""

try to delete all local files first to remove garbage

"""

if self.cleanall:

for localname in os.listdir(self.localdir): # local dirlisting

try: # local file delete

print('deleting local', localname)

os.remove(os.path.join(self.localdir, localname))

except:

print('cannot delete local', localname)

def cleanRemotes(self):

"""

try to delete all remote files first to remove garbage

"""

if self.cleanall:

for remotename in self.connection.nlst(): # remote dir listing

try: # remote file delete

print('deleting remote', remotename)

self.connection.delete(remotename)

except:

print('cannot delete remote', remotename)

def downloadOne(self, remotename, localpath):

"""

download one file by FTP in text or binary mode

local name need not be same as remote name

"""

if self.isTextKind(remotename):

localfile = open(localpath, 'w', encoding=self.connection.encoding)

def callback(line): localfile.write(line + '\n')

self.connection.retrlines('RETR ' + remotename, callback)

else:

localfile = open(localpath, 'wb')

self.connection.retrbinary('RETR ' + remotename, localfile.write)

localfile.close()

def uploadOne(self, localname, localpath, remotename):

"""

upload one file by FTP in text or binary mode

remote name need not be same as local name

"""

if self.isTextKind(localname):

890 | Chapter 13: Client-Side Scripting

localfile = open(localpath, 'rb')

self.connection.storlines('STOR ' + remotename, localfile)

else:

localfile = open(localpath, 'rb')

self.connection.storbinary('STOR ' + remotename, localfile)

localfile.close()

def downloadDir(self):

"""

download all files from remote site/dir per config

ftp nlst() gives files list, dir() gives full details

"""

remotefiles = self.connection.nlst() # nlst is remote listing

for remotename in remotefiles:

if remotename in ('.', '..'): continue

localpath = os.path.join(self.localdir, remotename)

print('downloading', remotename, 'to', localpath, 'as', end=' ')

self.downloadOne(remotename, localpath)

print('Done:', len(remotefiles), 'files downloaded.')

def uploadDir(self):

"""

upload all files to remote site/dir per config

listdir() strips dir path, any failure ends script

"""

localfiles = os.listdir(self.localdir) # listdir is local listing

for localname in localfiles:

localpath = os.path.join(self.localdir, localname)

print('uploading', localpath, 'to', localname, 'as', end=' ')

self.uploadOne(localname, localpath, localname)

print('Done:', len(localfiles), 'files uploaded.')

def run(self, cleanTarget=lambda:None, transferAct=lambda:None):

"""

run a complete FTP session

default clean and transfer are no-ops

don't delete if can't connect to server

"""

self.connectFtp()

cleanTarget()

transferAct()

self.connection.quit()

if __name__ == '__main__':

ftp = FtpTools()

xfermode = 'download'

if len(sys.argv) > 1:

xfermode = sys.argv.pop(1) # get+del 2nd arg

if xfermode == 'download':

ftp.configTransfer()

ftp.run(cleanTarget=ftp.cleanLocals, transferAct=ftp.downloadDir)

elif xfermode == 'upload':

ftp.configTransfer(site='learning-python.com', rdir='books', user='lutz')

ftp.run(cleanTarget=ftp.cleanRemotes, transferAct=ftp.uploadDir)

Transferring Directories with ftplib | 891

else:

print('Usage: ftptools.py ["download" | "upload"] [localdir]')

In fact, this last mutation combines uploads and downloads into a single file, because

they are so closely related. As before, common code is factored into methods to avoid

redundancy. New here, the instance object itself becomes a natural namespace for

storing configuration options (they become self attributes). Study this example’s code

for more details of the restructuring applied.

Again, this revision runs the same as our original site download and upload scripts; see

its self-test code at the end for usage details, and pass in a command-line argument to

specify “download” or “upload.” We haven’t changed what it does, we’ve refactored

it for maintainability and reuse:

C:\...\PP4E\Internet\Ftp\Mirror> ftptools.py download test

Clean target dir first?

Password for lutz on home.rmi.net:

connecting...

downloading 2004-longmont-classes.html to test\2004-longmont-classes.html as text

...lines omitted...

downloading relo-feb010-index.html to test\relo-feb010-index.html as text

Done: 297 files downloaded.

C:\...\PP4E\Internet\Ftp\Mirror> ftptools.py upload test

Clean target dir first?

Password for lutz on learning-python.com:

connecting...

uploading test\2004-longmont-classes.html to 2004-longmont-classes.html as text

...lines omitted...

uploading test\zopeoutline.htm to zopeoutline.htm as text

Done: 297 files uploaded.

Although this file can still be run as a command-line script like this, its class is really

now a package of FTP tools that can be mixed into other programs and reused. By

wrapping its code in a class, it can be easily customized by redefining its methods—its

configuration calls, such as getlocaldir, for example, may be redefined in subclasses

for custom scenarios.

Perhaps most importantly, using classes optimizes code reusability. Clients of this file

can both upload and download directories by simply subclassing or embedding an

instance of this class and calling its methods. To see one example of how, let’s move

on to the next section.

Transferring Directory Trees with ftplib

Perhaps the biggest limitation of the website download and upload scripts we just met

is that they assume the site directory is flat (hence their names). That is, the preceding

scripts transfer simple files only, and none of them handle nested subdirectories within

the web directory to be transferred.

892 | Chapter 13: Client-Side Scripting

For my purposes, that’s often a reasonable constraint. I avoid nested subdirectories to

keep things simple, and I store my book support home website as a simple directory of

files. For other sites, though, including one I keep at another machine, site transfer

scripts are easier to use if they also automatically transfer subdirectories along the way.

Uploading Local Trees

It turns out that supporting directories on uploads is fairly simple—we need to add

only a bit of recursion and remote directory creation calls. The upload script in Exam-

ple 13-15 extends the class-based version we just saw in Example 13-14, to handle

uploading all subdirectories nested within the transferred directory. Furthermore, it

recursively transfers subdirectories within subdirectories—the entire directory tree

contained within the top-level transfer directory is uploaded to the target directory at

the remote server.

In terms of its code structure, Example 13-15 is just a customization of the FtpTools

class of the prior section—really, we’re just adding a method for recursive uploads, by

subclassing. As one consequence, we get tools such as parameter configuration, content

type testing, and connection and upload code for free here; with OOP, some of the

work is done before we start.

Example 13-15. PP4E\Internet\Ftp\Mirror\uploadall.py

#!/bin/env python

"""

############################################################################

extend the FtpTools class to upload all files and subdirectories from a

local dir tree to a remote site/dir; supports nested dirs too, but not

the cleanall option (that requires parsing FTP listings to detect remote

dirs: see cleanall.py); to upload subdirectories, uses os.path.isdir(path)

to see if a local file is really a directory, FTP().mkd(path) to make dirs

on the remote machine (wrapped in a try in case it already exists there),

and recursion to upload all files/dirs inside the nested subdirectory.

############################################################################

"""

import os, ftptools

class UploadAll(ftptools.FtpTools):

"""

upload an entire tree of subdirectories

assumes top remote directory exists

"""

def __init__(self):

self.fcount = self.dcount = 0

def getcleanall(self):

return False # don't even ask

def uploadDir(self, localdir):

"""

Transferring Directory Trees with ftplib | 893

for each directory in an entire tree

upload simple files, recur into subdirectories

"""

localfiles = os.listdir(localdir)

for localname in localfiles:

localpath = os.path.join(localdir, localname)

print('uploading', localpath, 'to', localname, end=' ')

if not os.path.isdir(localpath):

self.uploadOne(localname, localpath, localname)

self.fcount += 1

else:

try:

self.connection.mkd(localname)

print('directory created')

except:

print('directory not created')

self.connection.cwd(localname) # change remote dir

self.uploadDir(localpath) # upload local subdir

self.connection.cwd('..') # change back up

self.dcount += 1

print('directory exited')

if __name__ == '__main__':

ftp = UploadAll()

ftp.configTransfer(site='learning-python.com', rdir='training', user='lutz')

ftp.run(transferAct = lambda: ftp.uploadDir(ftp.localdir))

print('Done:', ftp.fcount, 'files and', ftp.dcount, 'directories uploaded.')

Like the flat upload script, this one can be run on any machine with Python and sockets

and upload to any machine running an FTP server; I run it both on my laptop PC and

on other servers by Telnet or SSH to upload sites to my ISP.

The crux of the matter in this script is the os.path.isdir test near the top; if this test

detects a directory in the current local directory, we create an identically named direc-

tory on the remote machine with connection.mkd and descend into it with

connection.cwd, and recur into the subdirectory on the local machine (we have to use

recursive calls here, because the shape and depth of the tree are arbitrary). Like all FTP

object methods, mkd and cwd methods issue FTP commands to the remote server. When

we exit a local subdirectory, we run a remote cwd('..') to climb to the remote parent

directory and continue; the recursive call level’s return restores the prior directory on

the local machine. The rest of the script is roughly the same as the original.

In the interest of space, I’ll leave studying this variant in more depth as a suggested

exercise. For more context, try changing this script so as not to assume that the top-

level remote directory already exists. As usual in software, there are a variety of imple-

mentation and operation options here.

Here is the sort of output displayed on the console when the upload-all script is run,

uploading a site with multiple subdirectory levels which I maintain with site builder

tools. It’s similar to the flat upload (which you might expect, given that it is reusing

894 | Chapter 13: Client-Side Scripting

much of the same code by inheritance), but notice that it traverses and uploads nested

subdirectories along the way:

C:\...\PP4E\Internet\Ftp\Mirror> uploadall.py Website-Training

Password for lutz on learning-python.com:

connecting...

uploading Website-Training\2009-public-classes.htm to 2009-public-classes.htm text

uploading Website-Training\2010-public-classes.html to 2010-public-classes.html text

uploading Website-Training\about.html to about.html text

uploading Website-Training\books to books directory created

uploading Website-Training\books\index.htm to index.htm text

uploading Website-Training\books\index.html to index.html text

uploading Website-Training\books\_vti_cnf to _vti_cnf directory created

uploading Website-Training\books\_vti_cnf\index.htm to index.htm text

uploading Website-Training\books\_vti_cnf\index.html to index.html text

directory exited

uploading Website-Training\calendar.html to calendar.html text

uploading Website-Training\contacts.html to contacts.html text

uploading Website-Training\estes-nov06.htm to estes-nov06.htm text

uploading Website-Training\formalbio.html to formalbio.html text

uploading Website-Training\fulloutline.html to fulloutline.html text

...lines omitted...

uploading Website-Training\_vti_pvt\writeto.cnf to writeto.cnf ?

uploading Website-Training\_vti_pvt\_vti_cnf to _vti_cnf directory created

uploading Website-Training\_vti_pvt\_vti_cnf\_x_todo.htm to _x_todo.htm text

uploading Website-Training\_vti_pvt\_vti_cnf\_x_todoh.htm to _x_todoh.htm text

directory exited

uploading Website-Training\_vti_pvt\_x_todo.htm to _x_todo.htm text

uploading Website-Training\_vti_pvt\_x_todoh.htm to _x_todoh.htm text

directory exited

Done: 366 files and 18 directories uploaded.

As is, the script of Example 13-15 handles only directory tree uploads; recursive uploads

are generally more useful than recursive downloads if you maintain your websites on

your local PC and upload to a server periodically, as I do. To also download (mirror) a

website that has subdirectories, a script must parse the output of a remote listing com-

mand to detect remote directories. For the same reason, the recursive upload script was

not coded to support the remote directory tree cleanup option of the original—such a

feature would require parsing remote listings as well. The next section shows how.

Deleting Remote Trees

One last example of code reuse at work: when I initially tested the prior section’s

upload-all script, it contained a bug that caused it to fall into an infinite recursion loop,

and keep copying the full site into new subdirectories, over and over, until the FTP

server kicked me off (not an intended feature of the program!). In fact, the upload got

13 levels deep before being killed by the server; it effectively locked my site until the

mess could be repaired.

Transferring Directory Trees with ftplib | 895

To get rid of all the files accidentally uploaded, I quickly wrote the script in Exam-

ple 13-16 in emergency (really, panic) mode; it deletes all files and nested subdirectories

in an entire remote tree. Luckily, this was very easy to do given all the reuse that

Example 13-16 inherits from the FtpTools superclass. Here, we just have to define the

extension for recursive remote deletions. Even in tactical mode like this, OOP can be

a decided advantage.

Example 13-16. PP4E\Internet\Ftp\Mirror\cleanall.py

#!/bin/env python

"""

##############################################################################

extend the FtpTools class to delete files and subdirectories from a remote

directory tree; supports nested directories too; depends on the dir()

command output format, which may vary on some servers! - see Python's

Tools\Scripts\ftpmirror.py for hints; extend me for remote tree downloads;

##############################################################################

"""

from ftptools import FtpTools

class CleanAll(FtpTools):

"""

delete an entire remote tree of subdirectories

"""

def __init__(self):

self.fcount = self.dcount = 0

def getlocaldir(self):

return None # irrelevent here

def getcleanall(self):

return True # implied here

def cleanDir(self):

"""

for each item in current remote directory,

del simple files, recur into and then del subdirectories

the dir() ftp call passes each line to a func or method

"""

lines = [] # each level has own lines

self.connection.dir(lines.append) # list current remote dir

for line in lines:

parsed = line.split() # split on whitespace

permiss = parsed[0] # assume 'drw... ... filename'

fname = parsed[-1]

if fname in ('.', '..'): # some include cwd and parent

continue

elif permiss[0] != 'd': # simple file: delete

print('file', fname)

self.connection.delete(fname)

self.fcount += 1

else: # directory: recur, del

print('directory', fname)

896 | Chapter 13: Client-Side Scripting

self.connection.cwd(fname) # chdir into remote dir

self.cleanDir() # clean subdirectory

self.connection.cwd('..') # chdir remote back up

self.connection.rmd(fname) # delete empty remote dir

self.dcount += 1

print('directory exited')

if __name__ == '__main__':

ftp = CleanAll()

ftp.configTransfer(site='learning-python.com', rdir='training', user='lutz')

ftp.run(cleanTarget=ftp.cleanDir)

print('Done:', ftp.fcount, 'files and', ftp.dcount, 'directories cleaned.')

Besides again being recursive in order to handle arbitrarily shaped trees, the main trick

employed here is to parse the output of a remote directory listing. The FTP nlst call

used earlier gives us a simple list of filenames; here, we use dir to also get file detail

lines like these:

C:\...\PP4E\Internet\Ftp> ftp learning-python.com

ftp> cd training

ftp> dir

drwxr-xr-x 11 5693094 450 4096 May 4 11:06 .

drwx---r-x 19 5693094 450 8192 May 4 10:59 ..

-rw----r-- 1 5693094 450 15825 May 4 11:02 2009-public-classes.htm

-rw----r-- 1 5693094 450 18084 May 4 11:02 2010-public-classes.html

drwx---r-x 3 5693094 450 4096 May 4 11:02 books

-rw----r-- 1 5693094 450 3783 May 4 11:02 calendar-save-aug09.html

-rw----r-- 1 5693094 450 3923 May 4 11:02 calendar.html

drwx---r-x 2 5693094 450 4096 May 4 11:02 images

-rw----r-- 1 5693094 450 6143 May 4 11:02 index.html

...lines omitted...

This output format is potentially server-specific, so check this on your own server before

relying on this script. For this Unix ISP, if the first character of the first item on the line

is character “d”, the filename at the end of the line names a remote directory. To parse,

the script simply splits on whitespace to extract parts of a line.

Notice how this script, like others before it, must skip the symbolic “.” and “..” current

and parent directory names in listings to work properly for this server. Oddly this can

vary per server as well; one of the servers I used for this book’s examples, for instance,

does not include these special names in listings. We can verify by running ftplib at the

interactive prompt, as though it were a portable FTP client interface:

C:\...\PP4E\Internet\Ftp> python

>>> from ftplib import FTP

>>> f = FTP('ftp.rmi.net')

>>> f.login('lutz', 'xxxxxxxx') # output lines omitted

>>> for x in f.nlst()[:3]: print(x) # no . or .. in listings

...

2004-longmont-classes.html

2005-longmont-classes.html

2006-longmont-classes.html

>>> L = []

Transferring Directory Trees with ftplib | 897

>>> f.dir(L.append) # ditto for detailed list

>>> for x in L[:3]: print(x)

...

-rw-r--r-- 1 ftp ftp 8173 Mar 19 2006 2004-longmont-classes.html

-rw-r--r-- 1 ftp ftp 9739 Mar 19 2006 2005-longmont-classes.html

-rw-r--r-- 1 ftp ftp 805 Jul 8 2006 2006-longmont-classes.html

On the other hand, the server I’m using in this section does include the special dot

names; to be robust, our scripts must skip over these names in remote directory listings

just in case they’re run against a server that includes them (here, the test is required to

avoid falling into an infinite recursive loop!). We don’t need to care about local directory

listings because Python’s os.listdir never includes “.” or “..” in its result, but things

are not quite so consistent in the “Wild West” that is the Internet today:

>>> f = FTP('learning-python.com')

>>> f.login('lutz', 'xxxxxxxx') # output lines omitted

>>> for x in f.nlst()[:5]: print(x) # includes . and .. here

...

.hcc.thumbs

2009-public-classes.htm

2010-public-classes.html

>>> L = []

>>> f.dir(L.append) # ditto for detailed list

>>> for x in L[:5]: print(x)

...

drwx---r-x 19 5693094 450 8192 May 4 10:59 .

drwx---r-x 19 5693094 450 8192 May 4 10:59 ..

drwx------ 2 5693094 450 4096 Feb 18 05:38 .hcc.thumbs

-rw----r-- 1 5693094 450 15824 May 1 14:39 2009-public-classes.htm

-rw----r-- 1 5693094 450 18083 May 4 09:05 2010-public-classes.html

The output of our clean-all script in action follows; it shows up in the system console

window where the script is run. You might be able to achieve the same effect with a

“rm –rf” Unix shell command in a SSH or Telnet window on some servers, but the

Python script runs on the client and requires no other remote access than basic FTP on

the client:

C:\PP4E\Internet\Ftp\Mirror> cleanall.py

Password for lutz on learning-python.com:

connecting...

file 2009-public-classes.htm

file 2010-public-classes.html

file Learning-Python-interview.doc

file Python-registration-form-010.pdf

file PythonPoweredSmall.gif

directory _derived

file 2009-public-classes.htm_cmp_DeepBlue100_vbtn.gif

file 2009-public-classes.htm_cmp_DeepBlue100_vbtn_p.gif

file 2010-public-classes.html_cmp_DeepBlue100_vbtn_p.gif

file 2010-public-classes.html_cmp_deepblue100_vbtn.gif

directory _vti_cnf

898 | Chapter 13: Client-Side Scripting

file 2009-public-classes.htm_cmp_DeepBlue100_vbtn.gif

file 2009-public-classes.htm_cmp_DeepBlue100_vbtn_p.gif

file 2010-public-classes.html_cmp_DeepBlue100_vbtn_p.gif

file 2010-public-classes.html_cmp_deepblue100_vbtn.gif

directory exited

...lines omitted...

file priorclients.html

file public_classes.htm

file python_conf_ora.gif

file topics.html

Done: 366 files and 18 directories cleaned.

Downloading Remote Trees

It is possible to extend this remote tree-cleaner to also download a remote tree with

subdirectories: rather than deleting, as you walk the remote tree simply create a local

directory to match a remote one, and download nondirectory files. We’ll leave this final

step as a suggested exercise, though, partly because its dependence on the format pro-

duced by server directory listings makes it complex to be robust and partly because this

use case is less common for me—in practice, I am more likely to maintain a site on my

PC and upload to the server than to download a tree.

If you do wish to experiment with a recursive download, though, be sure to consult the

script Tools\Scripts\ftpmirror.py in Python’s install or source tree for hints. That script

attempts to download a remote directory tree by FTP, and allows for various directory

listing formats which we’ll skip here in the interest of space. For our purposes, it’s time

to move on to the next protocol on our tour—Internet email.

Processing Internet Email

Some of the other most common, higher-level Internet protocols have to do with read-

ing and sending email messages: POP and IMAP for fetching email from servers, SMTP

for sending new messages, and other formalisms such as RFC822 for specifying email

message content and format. You don’t normally need to know about such acronyms

when using common email tools, but internally, programs like Microsoft Outlook and

webmail systems generally talk to POP and SMTP servers to do your bidding.

Like FTP, email ultimately consists of formatted commands and byte streams shipped

over sockets and ports (port 110 for POP; 25 for SMTP). Regardless of the nature of its

content and attachments, an email message is little more than a string of bytes sent and

received through sockets. But also like FTP, Python has standard library modules to

simplify all aspects of email processing:

Processing Internet Email | 899

•poplib and imaplib for fetching email

•smtplib for sending email

• The email module package for parsing email and constructing email

These modules are related: for nontrivial messages, we typically use email to parse mail

text which has been fetched with poplib and use email to compose mail text to be sent

with smtplib. The email package also handles tasks such as address parsing, date and

time formatting, attachment formatting and extraction, and encoding and decoding of

email content (e,g, uuencode, Base64). Additional modules handle more specific tasks

(e.g., mimetypes to map filenames to and from content types).

In the next few sections, we explore the POP and SMTP interfaces for fetching and

sending email from and to servers, and the email package interfaces for parsing and

composing email message text. Other email interfaces in Python are analogous and are

documented in the Python library reference manual.§

Unicode in Python 3.X and Email Tools

In the prior sections of this chapter, we studied how Unicode encodings can impact

scripts using Python’s ftplib FTP tools in some depth, because it illustrates the impli-

cations of Python 3.X’s Unicode string model for real-world programming. In short:

• All binary mode transfers should open local output and input files in binary mode

(modes wb and rb).

• Text-mode downloads should open local output files in text mode with explicit

encoding names (mode w, with an encoding argument that defaults to latin1 within

ftplib itself).

• Text-mode uploads should open local input files in binary mode (mode rb).

The prior sections describe why these rules are in force. The last two points here differ

for scripts written originally for Python 2.X. As you might expect, given that the un-

derlying sockets transfer byte strings today, the email story is somewhat convoluted for

Unicode in Python 3.X as well. As a brief preview:

Fetching

The poplib module returns fetched email text in bytes string form. Command text

sent to the server is encoded per UTF8 internally, but replies are returned as raw

binary bytes and not decoded into str text.

§ IMAP, or Internet Message Access Protocol, was designed as an alternative to POP, but it is still not as widely

available today, and so it is not presented in this text. For instance, major commercial providers used for this

book’s examples provide only POP (or web-based) access to email. See the Python library manual for IMAP

server interface details. Python used to have a RFC822 module as well, but it’s been subsumed by the

email package in 3.X.

900 | Chapter 13: Client-Side Scripting

Sending

The smtplib module accepts email content to send as str strings. Internally, mes-

sage text passed in str form is encoded to binary bytes for transmission using the

ascii encoding scheme. Passing an already encoded bytes string to the send call

may allow more explicit control.

Composing

The email package produces Unicode str strings containing plain text when gen-

erating full email text for sending with smtplib and accepts optional encoding

specifications for messages and their parts, which it applies according to email

standard rules. Message headers may also be encoded per email, MIME, and Uni-

code conventions.

Parsing

The email package in 3.1 currently requires raw email byte strings of the type

fetched with poplib to be decoded into Unicode str strings as appropriate before

it can be passed in to be parsed into a message object. This pre-parse decoding

might be done by a default, user preference, mail headers inspection, or intelligent

guess. Because this requirement raises difficult issues for package clients, it may be

dropped in a future version of email and Python.

Navigating

The email package returns most message components as str strings, though parts

content decoded by Base64 and other email encoding schemes may be returned as

bytes strings, parts fetched without such decoding may be str or bytes, and some

str string parts are internally encoded to bytes with scheme raw-unicode-escape

before processing. Message headers may be decoded by the package on request as

well.

If you’re migrating email scripts (or your mindset) from 2.X, you’ll need to treat email

text fetched from a server as byte strings, and encode it before passing it along for

parsing; scripts that send or compose email are generally unaffected (and this may be

the majority of Python email-aware scripts), though content may have to be treated

specially if it may be returned as byte strings.

This is the story in Python 3.1, which is of course prone to change over time. We’ll see

how these email constraints translate into code as we move along in this section. Suffice

it to say, the text on the Internet is not as simple as it used to be, though it probably

shouldn’t have been anyhow.

POP: Fetching Email

I confess: up until just before 2000, I took a lowest-common-denominator approach

to email. I preferred to check my messages by Telnetting to my ISP and using a simple

command-line email interface. Of course, that’s not ideal for mail with attachments,

pictures, and the like, but its portability was staggering—because Telnet runs on almost

POP: Fetching Email | 901

any machine with a network link, I was able to check my mail quickly and easily from

anywhere on the planet. Given that I make my living traveling around the world teach-

ing Python classes, this wild accessibility was a big win.

As with website maintenance, times have changed on this front. Somewhere along the

way, most ISPs began offering web-based email access with similar portability and

dropped Telnet altogether. When my ISP took away Telnet access, however, they also

took away one of my main email access methods. Luckily, Python came to the rescue

again—by writing email access scripts in Python, I could still read and send email from

any machine in the world that has Python and an Internet connection. Python can be

as portable a solution as Telnet, but much more powerful.

Moreover, I can still use these scripts as an alternative to tools suggested by the ISP.

Besides my not being fond of delegating control to commercial products of large com-

panies, closed email tools impose choices on users that are not always ideal and may

sometimes fail altogether. In many ways, the motivation for coding Python email scripts

is the same as it was for the larger GUIs in Chapter 11: the scriptability of Python

programs can be a decided advantage.

For example, Microsoft Outlook historically and by default has preferred to download

mail to your PC and delete it from the mail server as soon as you access it. This keeps

your email box small (and your ISP happy), but it isn’t exactly friendly to people who

travel and use multiple machines along the way—once accessed, you cannot get to a

prior email from any machine except the one to which it was initially downloaded.

Worse, the web-based email interfaces offered by my ISPs have at times gone offline

completely, leaving me cut off from email (and usually at the worst possible time).

The next two scripts represent one first-cut solution to such portability and reliability

constraints (we’ll see others in this and later chapters). The first, popmail.py, is a simple

mail reader tool, which downloads and prints the contents of each email in an email

account. This script is admittedly primitive, but it lets you read your email on any

machine with Python and sockets; moreover, it leaves your email intact on the server,

and isn’t susceptible to webmail outages. The second, smtpmail.py, is a one-shot script

for writing and sending a new email message that is as portable as Python itself.

Later in this chapter, we’ll implement an interactive console-based email client (py-

mail), and later in this book we’ll code a full-blown GUI email tool (PyMailGUI) and

a web-based email program of our own (PyMailCGI). For now, we’ll start with the

basics.

Mail Configuration Module

Before we get to the scripts, let’s first take a look at a common module they import and

use. The module in Example 13-17 is used to configure email parameters appropriately

for a particular user. It’s simply a collection of assignments to variables used by mail

programs that appear in this book; each major mail client has its own version, to allow

902 | Chapter 13: Client-Side Scripting

content to vary. Isolating these configuration settings in this single module makes it

easy to configure the book’s email programs for a particular user, without having to

edit actual program logic code.

If you want to use any of this book’s email programs to do mail processing of your own,

be sure to change its assignments to reflect your servers, account usernames, and so on

(as shown, they refer to email accounts used for developing this book). Not all scripts

use all of these settings; we’ll revisit this module in later examples to explain more of

them.

Note that some ISPs may require that you be connected directly to their systems in

order to use their SMTP servers to send mail. For example, when connected directly

by dial-up in the past, I could use my ISP’s server directly, but when connected via

broadband, I had to route requests through a cable Internet provider. You may need

to adjust these settings to match your configuration; see your ISP to obtain the required

POP and SMTP servers. Also, some SMTP servers check domain name validity in ad-

dresses, and may require an authenticating login step—see the SMTP section later in

this chapter for interface details.

Example 13-17. PP4E\Internet\Email\mailconfig.py

"""

user configuration settings for various email programs (pymail/mailtools version);

email scripts get their server names and other email config options from this

module: change me to reflect your server names and mail preferences;

"""

#------------------------------------------------------------------------------

# (required for load, delete: all) POP3 email server machine, user

#------------------------------------------------------------------------------

popservername = 'pop.secureserver.net'

popusername = 'PP4E@learning-python.com'

#------------------------------------------------------------------------------

# (required for send: all) SMTP email server machine name

# see Python smtpd module for a SMTP server class to run locally;

#------------------------------------------------------------------------------

smtpservername = 'smtpout.secureserver.net'

#------------------------------------------------------------------------------

# (optional: all) personal information used by clients to fill in mail if set;

# signature -- can be a triple-quoted block, ignored if empty string;

# address -- used for initial value of "From" field if not empty,

# no longer tries to guess From for replies: this had varying success;

#------------------------------------------------------------------------------

myaddress = 'PP4E@learning-python.com'

mysignature = ('Thanks,\n'

'--Mark Lutz (http://learning-python.com, http://rmi.net/~lutz)')

POP: Fetching Email | 903

#------------------------------------------------------------------------------

# (optional: mailtools) may be required for send; SMTP user/password if

# authenticated; set user to None or '' if no login/authentication is

# required; set pswd to name of a file holding your SMTP password, or

# an empty string to force programs to ask (in a console, or GUI);

#------------------------------------------------------------------------------

smtpuser = None # per your ISP

smtppasswdfile = '' # set to '' to be asked

#------------------------------------------------------------------------------

# (optional: mailtools) name of local one-line text file with your pop

# password; if empty or file cannot be read, pswd is requested when first

# connecting; pswd not encrypted: leave this empty on shared machines;

#------------------------------------------------------------------------------

poppasswdfile = r'c:\temp\pymailgui.txt' # set to '' to be asked

#------------------------------------------------------------------------------

# (required: mailtools) local file where sent messages are saved by some clients;

#------------------------------------------------------------------------------

sentmailfile = r'.\sentmail.txt' # . means in current working dir

#------------------------------------------------------------------------------

# (required: pymail, pymail2) local file where pymail saves pop mail on request;

#------------------------------------------------------------------------------

savemailfile = r'c:\temp\savemail.txt' # not used in PyMailGUI: dialog

#------------------------------------------------------------------------------

# (required: pymail, mailtools) fetchEncoding is the Unicode encoding used to

# decode fetched full message bytes, and to encode and decode message text if

# stored in text-mode save files; see Chapter 13 for details: this is a limited

# and temporary approach to Unicode encodings until a new bytes-friendly email

# package is developed; headersEncodeTo is for sent headers: see chapter13;

#------------------------------------------------------------------------------

fetchEncoding = 'utf8' # 4E: how to decode and store message text (or latin1?)

headersEncodeTo = None # 4E: how to encode non-ASCII headers sent (None=utf8)

#------------------------------------------------------------------------------

# (optional: mailtools) the maximum number of mail headers or messages to

# download on each load request; given this setting N, mailtools fetches at

# most N of the most recently arrived mails; older mails outside this set are

# not fetched from the server, but are returned as empty/dummy emails; if this

# is assigned to None (or 0), loads will have no such limit; use this if you

# have very many mails in your inbox, and your Internet or mail server speed

# makes full loads too slow to be practical; some clients also load only

# newly-arrived emails, but this setting is independent of that feature;

#------------------------------------------------------------------------------

fetchlimit = 25 # 4E: maximum number headers/emails to fetch on loads

904 | Chapter 13: Client-Side Scripting

POP Mail Reader Script

On to reading email in Python: the script in Example 13-18 employs Python’s standard

poplib module, an implementation of the client-side interface to POP—the Post Office

Protocol. POP is a well-defined and widely available way to fetch email from servers

over sockets. This script connects to a POP server to implement a simple yet portable

email download and display tool.

Example 13-18. PP4E\Internet\Email\popmail.py

#!/usr/local/bin/python

"""

##############################################################################

use the Python POP3 mail interface module to view your POP email account

messages; this is just a simple listing--see pymail.py for a client with

more user interaction features, and smtpmail.py for a script which sends

mail; POP is used to retrieve mail, and runs on a socket using port number

110 on the server machine, but Python's poplib hides all protocol details;

to send mail, use the smtplib module (or os.popen('mail...')). see also:

imaplib module for IMAP alternative, PyMailGUI/PyMailCGI for more features;

##############################################################################

"""

import poplib, getpass, sys, mailconfig

mailserver = mailconfig.popservername # ex: 'pop.rmi.net'

mailuser = mailconfig.popusername # ex: 'lutz'

mailpasswd = getpass.getpass('Password for %s?' % mailserver)

print('Connecting...')

server = poplib.POP3(mailserver)

server.user(mailuser) # connect, log in to mail server

server.pass_(mailpasswd) # pass is a reserved word

try:

print(server.getwelcome()) # print returned greeting message

msgCount, msgBytes = server.stat()

print('There are', msgCount, 'mail messages in', msgBytes, 'bytes')

print(server.list())

print('-' * 80)

input('[Press Enter key]')

for i in range(msgCount):

hdr, message, octets = server.retr(i+1) # octets is byte count

for line in message: print(line.decode()) # retrieve, print all mail

print('-' * 80) # mail text is bytes in 3.x

if i < msgCount - 1:

input('[Press Enter key]') # mail box locked till quit

finally: # make sure we unlock mbox

server.quit() # else locked till timeout

print('Bye.')

POP: Fetching Email | 905

Though primitive, this script illustrates the basics of reading email in Python. To es-

tablish a connection to an email server, we start by making an instance of the pop

lib.POP3 object, passing in the email server machine’s name as a string:

server = poplib.POP3(mailserver)

If this call doesn’t raise an exception, we’re connected (by socket) to the POP server

listening on POP port number 110 at the machine where our email account lives.

The next thing we need to do before fetching messages is tell the server our username

and password; notice that the password method is called pass_. Without the trailing

underscore, pass would name a reserved word and trigger a syntax error:

server.user(mailuser) # connect, log in to mail server

server.pass_(mailpasswd) # pass is a reserved word

To keep things simple and relatively secure, this script always asks for the account

password interactively; the getpass module we met in the FTP section of this chapter

is used to input but not display a password string typed by the user.

Once we’ve told the server our username and password, we’re free to fetch mailbox

information with the stat method (number messages, total bytes among all messages)

and fetch the full text of a particular message with the retr method (pass the message

number—they start at 1). The full text includes all headers, followed by a blank line,

followed by the mail’s text and any attached parts. The retr call sends back a tuple that

includes a list of line strings representing the content of the mail:

msgCount, msgBytes = server.stat()

hdr, message, octets = server.retr(i+1) # octets is byte count

We close the email server connection by calling the POP object’s quit method:

server.quit() # else locked till timeout

Notice that this call appears inside the finally clause of a try statement that wraps the

bulk of the script. To minimize complications associated with changes, POP servers

lock your email inbox between the time you first connect and the time you close your

connection (or until an arbitrary, system-defined timeout expires). Because the POP

quit method also unlocks the mailbox, it’s crucial that we do this before exiting,

whether an exception is raised during email processing or not. By wrapping the action

in a try/finally statement, we guarantee that the script calls quit on exit to unlock the

mailbox to make it accessible to other processes (e.g., delivery of incoming email).

Fetching Messages

Here is the popmail script of Example 13-18 in action, displaying two messages in my

account’s mailbox on machine pop.secureserver.net—the domain name of the mail

server machine used by the ISP hosting my learning-python.com domain name, as

configured in the module mailconfig. To keep this output reasonably sized, I’ve omitted

or truncated a few irrelevant message header lines here, including most of the

906 | Chapter 13: Client-Side Scripting

Received: headers that chronicle an email’s journey; run this on your own to see all the

gory details of raw email text:

C:\...\PP4E\Internet\Email> popmail.py

Password for pop.secureserver.net?

Connecting...

b'+OK <1314.1273085900@p3pop01-02.prod.phx3.gdg>'

There are 2 mail messages in 3268 bytes

(b'+OK ', [b'1 1860', b'2 1408'], 16)

--------------------------------------------------------------------------------

[Press Enter key]

Received: (qmail 7690 invoked from network); 5 May 2010 15:29:43 −0000

X-IronPort-Anti-Spam-Result: AskCAG4r4UvRVllAlGdsb2JhbACDF44FjCkVAQEBAQkLCAkRAx+

Received: from 72.236.109.185 by webmail.earthlink.net with HTTP; Wed, 5 May 201

Message-ID: <27293081.1273073376592.JavaMail.root@mswamui-thinleaf.atl.sa.earthl

Date: Wed, 5 May 2010 11:29:36 −0400 (EDT)

From: lutz@rmi.net

Reply-To: lutz@rmi.net

To: pp4e@learning-python.com

Subject: I'm a Lumberjack, and I'm Okay

Mime-Version: 1.0

Content-Type: text/plain; charset=UTF-8

Content-Transfer-Encoding: 7bit

X-Mailer: EarthLink Zoo Mail 1.0

X-ELNK-Trace: 309f369105a89a174e761f5d55cab8bca866e5da7af650083cf64d888edc8b5a35

X-Originating-IP: 209.86.224.51

X-Nonspam: None

I cut down trees, I skip and jump,

I like to press wild flowers...

--------------------------------------------------------------------------------

[Press Enter key]

Received: (qmail 17482 invoked from network); 5 May 2010 15:33:47 −0000

X-IronPort-Anti-Spam-Result: AlIBAIss4UthSoc7mWdsb2JhbACDF44FjD4BAQEBAQYNCgcRIq1

Received: (qmail 4009 invoked by uid 99); 5 May 2010 15:33:47 −0000

Content-Transfer-Encoding: quoted-printable

Content-Type: text/plain; charset="utf-8"

X-Originating-IP: 72.236.109.185

User-Agent: Web-Based Email 5.2.13

Message-Id: <20100505083347.deec9532fd532622acfef00cad639f45.0371a89d29.wbe@emai

From: lutz@learning-python.com

To: PP4E@learning-python.com

Cc: lutz@learning-python.com

Subject: testing

Date: Wed, 05 May 2010 08:33:47 −0700

Mime-Version: 1.0

X-Nonspam: None

Testing Python mail tools.

--------------------------------------------------------------------------------

Bye.

POP: Fetching Email | 907

This user interface is about as simple as it could be—after connecting to the server, it

prints the complete and raw full text of one message at a time, pausing between each

until you press the Enter key. The input built-in is called to wait for the key press

between message displays. The pause keeps messages from scrolling off the screen too

fast; to make them visually distinct, emails are also separated by lines of dashes.

We could make the display fancier (e.g., we can use the email p a c k a g e t o p a r s e h e a d e r s ,

bodies, and attachments—watch for examples in this and later chapters), but here we

simply display the whole message that was sent. This works well for simple mails like

these two, but it can be inconvenient for larger messages with attachments; we’ll im-

prove on this in later clients.

This book won’t cover the full of set of headers that may appear in emails, but we’ll

make use of some along the way. For example, the X-Mailer header line, if present,

typically identifies the sending program; we’ll use it later to identify Python-coded email

senders we write. The more common headers such as From a n d Subject a r e m o r e c r u c i a l

to a message. In fact, a variety of extra header lines can be sent in a message’s text. The

Received headers, for example, trace the machines that a message passed through on

its way to the target mailbox.

Because popmail prints the entire raw text of a message, you see all headers here, but

you usually see only a few by default in end-user-oriented mail GUIs such as Outlook

and webmail pages. The raw text here also makes apparent the email structure we noted

earlier: an email in general consists of a set of headers like those here, followed by a

blank line, which is followed by the mail’s main text, though as we’ll see later, they can

be more complex if there are alternative parts or attachments.

The script in Example 13-18 never deletes mail from the server. Mail is simply retrieved

and printed and will be shown again the next time you run the script (barring deletion

in another tool, of course). To really remove mail permanently, we need to call other

methods (e.g., server.dele(msgnum)), but such a capability is best deferred until we

develop more interactive mail tools.

Notice how the reader script decodes each mail content line with line.decode into a

str string for display; as mentioned earlier, poplib returns content as bytes strings in

3.X. In fact, if we change the script to not decode, this becomes more obvious in its

output:

[Press Enter key]

...assorted lines omitted...

b'Date: Wed, 5 May 2010 11:29:36 −0400 (EDT)'

b'From: lutz@rmi.net'

b'Reply-To: lutz@rmi.net'

b'To: pp4e@learning-python.com'

b"Subject: I'm a Lumberjack, and I'm Okay"

b'Mime-Version: 1.0'

b'Content-Type: text/plain; charset=UTF-8'

b'Content-Transfer-Encoding: 7bit'

b'X-Mailer: EarthLink Zoo Mail 1.0'

908 | Chapter 13: Client-Side Scripting

b''

b'I cut down trees, I skip and jump,'

b'I like to press wild flowers...'

b''

As we’ll see later, we’ll need to decode similarly in order to parse this text with email

tools. The next section exposes the bytes-based interface as well.

Fetching Email at the Interactive Prompt

If you don’t mind typing code and reading POP server messages, it’s possible to use the

Python interactive prompt as a simple email client, too. The following session uses two

additional interfaces we’ll apply in later examples:

conn.list()

Returns a list of “message-number message-size” strings.

conn.top( N , 0)

Retrieves just the header text portion of message number N.

The top call also returns a tuple that includes the list of line strings sent back; its second

argument tells the server how many additional lines after the headers to send, if any. If

all you need are header details, top can be much quicker than the full text fetch of

retr, provided your mail server implements the TOP command (most do):

C:\...\PP4E\Internet\Email> python

>>> from poplib import POP3

>>> conn = POP3('pop.secureserver.net') # connect to server

>>> conn.user('PP4E@learning-python.com') # log in to account

b'+OK '

>>> conn.pass_('xxxxxxxx')

b'+OK '

>>> conn.stat() # num mails, num bytes

(2, 3268)

>>> conn.list()

(b'+OK ', [b'1 1860', b'2 1408'], 16)

>>> conn.top(1, 0)

(b'+OK 1860 octets ', [b'Received: (qmail 7690 invoked from network); 5 May 2010

...lines omitted...

b'X-Originating-IP: 209.86.224.51', b'X-Nonspam: None', b'', b''], 1827)

>>> conn.retr(1)

(b'+OK 1860 octets ', [b'Received: (qmail 7690 invoked from network); 5 May 2010

...lines omitted...

b'X-Originating-IP: 209.86.224.51', b'X-Nonspam: None', b'',

b'I cut down trees, I skip and jump,', b'I like to press wild flowers...',

b'', b''], 1898)

>>> conn.quit()

b'+OK '

POP: Fetching Email | 909

Printing the full text of a message at the interactive prompt is easy once it’s fetched:

simply decode each line to a normal string as it is printed, like our pop mail script did,

or concatenate the line strings returned by retr or top adding a newline between; any

of the following will suffice for an open POP server object:

>>> info, msg, oct = connection.retr(1) # fetch first email in mailbox

>>> for x in msg: print(x.decode()) # four ways to display message lines

>>> print(b'\n'.join(msg).decode())

>>> x = [print(x.decode()) for x in msg]

>>> x = list(map(print, map(bytes.decode, msg)))

Parsing email text to extract headers and components is more complex, especially for

mails with attached and possibly encoded parts, such as images. As we’ll see later in

this chapter, the standard library’s email package can parse the mail’s full or headers

text after it has been fetched with poplib (or imaplib).

See the Python library manual for details on other POP module tools. As of Python 2.4,

there is also a POP3_SSL class in the poplib module that connects to the server over an

SSL-encrypted socket on port 995 by default (the standard port for POP over SSL). It

provides an identical interface, but it uses secure sockets for the conversation where

supported by servers.

SMTP: Sending Email

There is a proverb in hackerdom that states that every useful computer program even-

tually grows complex enough to send email. Whether such wisdom rings true or not

in practice, the ability to automatically initiate email from within a program is a pow-

erful tool.

For instance, test systems can automatically email failure reports, user interface pro-

grams can ship purchase orders to suppliers by email, and so on. Moreover, a portable

Python mail script could be used to send messages from any computer in the world

with Python and an Internet connection that supports standard email protocols. Free-

dom from dependence on mail programs like Outlook is an attractive feature if you

happen to make your living traveling around teaching Python on all sorts of computers.

Luckily, sending email from within a Python script is just as easy as reading it. In fact,

there are at least four ways to do so:

Calling os.popen to launch a command-line mail program

On some systems, you can send email from a script with a call of the form:

os.popen('mail -s "xxx" a@b.c', 'w').write(text)

As we saw earlier in the book, the popen tool runs the command-line string passed

to its first argument, and returns a file-like object connected to it. If we use an open

mode of w, we are connected to the command’s standard input stream—here, we

write the text of the new mail message to the standard Unix mail command-line

910 | Chapter 13: Client-Side Scripting

program. The net effect is as if we had run mail interactively, but it happens inside

a running Python script.

Running the sendmail program

The open source sendmail program offers another way to initiate mail from a pro-

gram. Assuming it is installed and configured on your system, you can launch it

using Python tools like the os.popen call of the previous paragraph.

Using the standard smtplib Python module

Python’s standard library comes with support for the client-side interface to

SMTP—the Simple Mail Transfer Protocol—a higher-level Internet standard for

sending mail over sockets. Like the poplib module we met in the previous section,

smtplib hides all the socket and protocol details and can be used to send mail on

any machine with Python and a suitable socket-based Internet link.

Fetching and using third-party packages and tools

Other tools in the open source library provide higher-level mail handling packages

for Python; most build upon one of the prior three techniques.

Of these four options, smtplib is by far the most portable and direct. Using os.popen

to spawn a mail program usually works on Unix-like platforms only, not on Windows

(it assumes a command-line mail program), and requires spawning one or more pro-

cesses along the way. And although the sendmail program is powerful, it is also some-

what Unix-biased, complex, and may not be installed even on all Unix-like machines.

By contrast, the smtplib module works on any machine that has Python and an Internet

link that supports SMTP access, including Unix, Linux, Mac, and Windows. It sends

mail over sockets in-process, instead of starting other programs to do the work. More-

over, SMTP affords us much control over the formatting and routing of email.

SMTP Mail Sender Script

Since SMTP is arguably the best option for sending mail from a Python script, let’s

explore a simple mailing program that illustrates its interfaces. The Python script shown

in Example 13-19 is intended to be used from an interactive command line; it reads a

new mail message from the user and sends the new mail by SMTP using Python’s

smtplib module.

Example 13-19. PP4E\Internet\Email\smtpmail.py

#!/usr/local/bin/python

"""

###########################################################################

use the Python SMTP mail interface module to send email messages; this

is just a simple one-shot send script--see pymail, PyMailGUI, and

PyMailCGI for clients with more user interaction features; also see

popmail.py for a script that retrieves mail, and the mailtools pkg

for attachments and formatting with the standard library email package;

###########################################################################

"""

SMTP: Sending Email | 911

import smtplib, sys, email.utils, mailconfig

mailserver = mailconfig.smtpservername # ex: smtp.rmi.net

From = input('From? ').strip() # or import from mailconfig

To = input('To? ').strip() # ex: python-list@python.org

Tos = To.split(';') # allow a list of recipients

Subj = input('Subj? ').strip()

Date = email.utils.formatdate() # curr datetime, rfc2822

# standard headers, followed by blank line, followed by text

text = ('From: %s\nTo: %s\nDate: %s\nSubject: %s\n\n' % (From, To, Date, Subj))

print('Type message text, end with line=[Ctrl+d (Unix), Ctrl+z (Windows)]')

while True:

line = sys.stdin.readline()

if not line:

break # exit on ctrl-d/z

#if line[:4] == 'From':

# line = '>' + line # servers may escape

text += line

print('Connecting...')

server = smtplib.SMTP(mailserver) # connect, no log-in step

failed = server.sendmail(From, Tos, text)

server.quit()

if failed: # smtplib may raise exceptions

print('Failed recipients:', failed) # too, but let them pass here

else:

print('No errors.')

print('Bye.')

Most of this script is user interface—it inputs the sender’s address (From), one or more

recipient addresses (To, separated by “;” if more than one), and a subject line. The

sending date is picked up from Python’s standard time module, standard header lines

are formatted, and the while loop reads message lines until the user types the end-of-

file character (Ctrl-Z on Windows, Ctrl-D on Linux).

To be robust, be sure to add a blank line between the header lines and the body in the

message’s text; it’s required by the SMTP protocol and some SMTP servers enforce

this. Our script conforms by inserting an empty line with \n\n at the end of the string

format expression—one \n to terminate the current line and another for a blank line;

smtplib expands \n to Internet-style \r\n internally prior to transmission, so the short

form is fine here. Later in this chapter, we’ll format our messages with the Python

email package, which handles such details for us automatically.

The rest of the script is where all the SMTP magic occurs: to send a mail by SMTP,

simply run these two sorts of calls:

server = smtplib.SMTP(mailserver)

Make an instance of the SMTP object, passing in the name of the SMTP server that

will dispatch the message first. If this doesn’t throw an exception, you’re connected

912 | Chapter 13: Client-Side Scripting

to the SMTP server via a socket when the call returns. Technically, the connect

method establishes connection to a server, but the SMTP object calls this method

automatically if the mail server name is passed in this way.

failed = server.sendmail(From, Tos, text)

Call the SMTP object’s sendmail method, passing in the sender address, one or

more recipient addresses, and the raw text of the message itself with as many

standard mail header lines as you care to provide.

When you’re done, be sure to call the object’s quit method to disconnect from the

server and finalize the transaction. Notice that, on failure, the sendmail method may

either raise an exception or return a list of the recipient addresses that failed; the script

handles the latter case itself but lets exceptions kill the script with a Python error

message.

Subtly, calling the server object’s quit method after sendmail raises an exception may

or may not work as expected—quit can actually hang until a server timeout if the send

fails internally and leaves the interface in an unexpected state. For instance, this can

occur on Unicode encoding errors when translating the outgoing mail to bytes per the

ASCII scheme (the rset reset request hangs in this case, too). An alternative close

method simply closes the client’s sockets without attempting to send a quit command

to the server; quit calls close internally as a last step (assuming the quit command can

be sent!).

For advanced usage, SMTP objects provide additional calls not used in this example:

•server.login(user, password) provides an interface to SMTP servers that require

and support authentication; watch for this call to appear as an option in the mail

tools package example later in this chapter.

•server.starttls([keyfile[, certfile]]) puts the SMTP connection in Transport

Layer Security (TLS) mode; all commands will be encrypted using the Python

ssl module’s socket wrapper SSL support, and they assume the server supports

this mode.

See the Python library manual for more on these and other calls not covered here.

Sending Messages

Let’s ship a few messages across the world. The smtpmail script is a one-shot tool: each

run allows you to send a single new mail message. Like most of the client-side tools in

this chapter, it can be run from any computer with Python and an Internet link that

supports SMTP (most do, though some public access machines may restrict users to

HTTP [Web] access only or require special server SMTP configuration). Here it is run-

ning on Windows:

C:\...\PP4E\Internet\Email> smtpmail.py

From? Eric.the.Half.a.Bee@yahoo.com

To? PP4E@learning-python.com

SMTP: Sending Email | 913

Subj? A B C D E F G

Type message text, end with line=[Ctrl+d (Unix), Ctrl+z (Windows)]

Fiddle de dum, Fiddle de dee,

Eric the half a bee.

Connecting...

No errors.

Bye.

This mail is sent to the book’s email account address (PP4E@learning-python.com), so

it ultimately shows up in the inbox at my ISP, but only after being routed through an

arbitrary number of machines on the Net, and across arbitrarily distant network links.

It’s complex at the bottom, but usually, the Internet “just works.”

Notice the From address, though—it’s completely fictitious (as far as I know, at least).

It turns out that we can usually provide any From address we like because SMTP doesn’t

check its validity (only its general format is checked). Furthermore, unlike POP, there

is usually no notion of a username or password in SMTP, so the sender is more difficult

to determine. We need only pass email to any machine with a server listening on the

SMTP port, and we don’t need an account or login on that machine. Here,

the name Eric.the.Half.a.Bee@yahoo.com works just fine as the sender;

Marketing.Geek.From.Hell@spam.com might work just as well.

In fact, I didn’t import a From email address from the mailconfig.py module on purpose,

because I wanted to be able to demonstrate this behavior; it’s the basis of some of those

annoying junk emails that show up in your mailbox without a real sender’s address.‖

Marketers infected with e-millionaire mania will email advertising to all addresses on

a list without providing a real From address, to cover their tracks.

Normally, of course, you should use the same To address in the message and the SMTP

call and provide your real email address as the From value (that’s the only way people

will be able to reply to your message). Moreover, apart from teasing your significant

other, sending phony addresses is often just plain bad Internet citizenship. Let’s run

the script again to ship off another mail with more politically correct coordinates:

C:\...\PP4E\Internet\Email> smtpmail.py

From? PP4E@learning-python.com

To? PP4E@learning-python.com

Subj? testing smtpmail

Type message text, end with line=[Ctrl+d (Unix), Ctrl+z (Windows)]

Lovely Spam! Wonderful Spam!

Connecting...

No errors.

Bye.

‖We all know by now that such junk mail is usually referred to as spam, but not everyone knows that this

name is a reference to a Monty Python skit in which a restaurant’s customers find it difficult to hear the

reading of menu options over a group of Vikings singing an increasingly loud chorus of “spam, spam, spam…”.

Hence the tie-in to junk email. Spam is used in Python program examples as a sort of generic variable name,

though it also pays homage to the skit.

914 | Chapter 13: Client-Side Scripting

Verifying receipt

At this point, we could run whatever email tool we normally use to access our mailbox

to verify the results of these two send operations; the two new emails should show up

in our mailbox regardless of which mail client is used to view them. Since we’ve already

written a Python script for reading mail, though, let’s put it to use as a verification

tool—running the popmail script from the last section reveals our two new messages at

the end of the mail list (again parts of the output have been trimmed to conserve space

and protect the innocent here):

C:\...\PP4E\Internet\Email> popmail.py

Password for pop.secureserver.net?

Connecting...

b'+OK <29464.1273155506@pop08.mesa1.secureserver.net>'

There are 4 mail messages in 5326 bytes

(b'+OK ', [b'1 1860', b'2 1408', b'3 1049', b'4 1009'], 32)

--------------------------------------------------------------------------------

[Press Enter key]

...first two mails omitted...

Received: (qmail 25683 invoked from network); 6 May 2010 14:12:07 −0000

Received: from unknown (HELO p3pismtp01-018.prod.phx3.secureserver.net) ([10.6.1

(envelope-sender <Eric.the.Half.a.Bee@yahoo.com>)

by p3plsmtp06-04.prod.phx3.secureserver.net (qmail-1.03) with SMTP

for <PP4E@learning-python.com>; 6 May 2010 14:12:07 −0000

...more deleted...

Received: from [66.194.109.3] by smtp.mailmt.com (ArGoSoft Mail Server .NET v.1.

for <PP4E@learning-python.com>; Thu, 06 May 2010 10:12:12 −0400

From: Eric.the.Half.a.Bee@yahoo.com

To: PP4E@learning-python.com

Date: Thu, 06 May 2010 14:11:07 −0000

Subject: A B C D E F G

Message-ID: <jdlohzf0j8dp8z4x06052010101212@SMTP>

X-FromIP: 66.194.109.3

X-Nonspam: None

Fiddle de dum, Fiddle de dee,

Eric the half a bee.

--------------------------------------------------------------------------------

[Press Enter key]

Received: (qmail 4634 invoked from network); 6 May 2010 14:16:57 −0000

Received: from unknown (HELO p3pismtp01-025.prod.phx3.secureserver.net) ([10.6.1

(envelope-sender <PP4E@learning-python.com>)

by p3plsmtp06-05.prod.phx3.secureserver.net (qmail-1.03) with SMTP

for <PP4E@learning-python.com>; 6 May 2010 14:16:57 −0000

...more deleted...

Received: from [66.194.109.3] by smtp.mailmt.com (ArGoSoft Mail Server .NET v.1.

for <PP4E@learning-python.com>; Thu, 06 May 2010 10:17:03 −0400

From: PP4E@learning-python.com

To: PP4E@learning-python.com

Date: Thu, 06 May 2010 14:16:31 −0000

SMTP: Sending Email | 915

Subject: testing smtpmail

Message-ID: <8fad1n462667fik006052010101703@SMTP>

X-FromIP: 66.194.109.3

X-Nonspam: None

Lovely Spam! Wonderful Spam!

--------------------------------------------------------------------------------

Bye.

Notice how the fields we input to our script show up as headers and text in the email’s

raw text delivered to the recipient. Technically, some ISPs test to make sure that at least

the domain of the email sender’s address (the part after “@”) is a real, valid domain

name, and disallow delivery if not. As mentioned earlier, some servers also require that

SMTP senders have a direct connection to their network and may require an authen-

tication call with username and password (described near the end of the preceding

section). In the second edition of the book, I used an ISP that let me get away with more

nonsense, but this may vary per server; the rules have tightened since then to limit spam.

Manipulating both From and To

The first mail listed at the end of the preceding section was the one we sent with a

fictitious sender address; the second was the more legitimate message. Like sender

addresses, header lines are a bit arbitrary under SMTP. Our smtpmail script automati-

cally adds From and To header lines in the message’s text with the same addresses that

are passed to the SMTP interface, but only as a polite convention. Sometimes, though,

you can’t tell who a mail was sent to, either—to obscure the target audience or to

support legitimate email lists, senders may manipulate the contents of both these head-

ers in the message’s text.

For example, if we change smtpmail to not automatically generate a “To:” header line

with the same address(es) sent to the SMTP interface call:

text = ('From: %s\nDate: %s\nSubject: %s\n' % (From, Date, Subj))

we can then manually type a “To:” header that differs from the address we’re really

sending to—the “To” address list passed into the smtplib send call gives the true re-

cipients, but the “To:” header line in the text of the message is what most mail clients

will display (see smtpmail-noTo.py in the examples package for the code needed to

support such anonymous behavior, and be sure to type a blank line after “To:”):

C:\...\PP4E\Internet\Email> smtpmail-noTo.py

From? Eric.the.Half.a.Bee@aol.com

To? PP4E@learning-python.com

Subj? a b c d e f g

Type message text, end with line=(ctrl + D or Z)

To: nobody.in.particular@marketing.com

Spam; Spam and eggs; Spam, spam, and spam.

916 | Chapter 13: Client-Side Scripting

Connecting...

No errors.

Bye.

In some ways, the From and To addresses in send method calls and message header

lines are similar to addresses on envelopes and letters in envelopes, respectively. The

former is used for routing, but the latter is what the reader sees. Here, From is fictitious

in both places. Moreover, I gave the real To address for the account on the server, but

then gave a fictitious name in the manually typed “To:” header line—the first address

is where it really goes and the second appears in mail clients. If your mail tool picks out

the “To:” line, such mails will look odd when viewed.

For instance, when the mail we just sent shows up in my mailbox at learning-

python.com, it’s difficult to tell much about its origin or destination in the webmail

interface my ISP provides, as captured in Figure 13-5.

Figure 13-5. Anonymous mail in a web-mail client (see also ahead: PyMailGUI)

Furthermore, this email’s raw text won’t help unless we look closely at the “Received:”

headers added by the machines it has been routed through:

C:\...\PP4E\Internet\Email> popmail.py

Password for pop.secureserver.net?

Connecting...

b'+OK <4802.1273156821@p3plpop03-03.prod.phx3.secureserver.net>'

There are 5 mail messages in 6364 bytes

(b'+OK ', [b'1 1860', b'2 1408', b'3 1049', b'4 1009', b'5 1038'], 40)

SMTP: Sending Email | 917

--------------------------------------------------------------------------------

[Press Enter key]

...first three mails omitted...

Received: (qmail 30325 invoked from network); 6 May 2010 14:33:45 −0000

Received: from unknown (HELO p3pismtp01-004.prod.phx3.secureserver.net) ([10.6.1

(envelope-sender <Eric.the.Half.a.Bee@aol.com>)

by p3plsmtp06-03.prod.phx3.secureserver.net (qmail-1.03) with SMTP

for <PP4E@learning-python.com>; 6 May 2010 14:33:45 −0000

...more deleted...

Received: from [66.194.109.3] by smtp.mailmt.com (ArGoSoft Mail Server .NET v.1.

for <PP4E@learning-python.com>; Thu, 06 May 2010 10:33:16 −0400

From: Eric.the.Half.a.Bee@aol.com

Date: Thu, 06 May 2010 14:32:32 −0000

Subject: a b c d e f g

To: nobody.in.particular@marketing.com

Message-ID: <66koqg66e0q1c8hl06052010103316@SMTP>

X-FromIP: 66.194.109.3

X-Nonspam: None

Spam; Spam and eggs; Spam, spam, and spam.

--------------------------------------------------------------------------------

Bye.

Once again, though, don’t do this unless you have good cause. This demonstration is

intended only to help you understand how mail headers factor into email processing.

To write an automatic spam filter that deletes incoming junk mail, for instance, you

need to know some of the telltale signs to look for in a message’s text. Spamming

techniques have grown much more sophisticated than simply forging sender and re-

cipient names, of course (you’ll find much more on the subject on the Web at large and

in the SpamBayes mail filter written in Python), but it’s one common trick.

On the other hand, such To address juggling may also be useful in the context of le-

gitimate mailing lists—the name of the list appears in the “To:” header when the mes-

sage is viewed, not the potentially many individual recipients named in the send-mail

call. As the next section’s example demonstrates, a mail client can simply send a mail

to all on the list but insert the general list name in the “To:” header.

But in other contexts, sending email with bogus “From:” and “To:” lines is equivalent

to making anonymous phone calls. Most mailers won’t even let you change the From

line, and they don’t distinguish between the To address and header line. When you

program mail scripts of your own, though, SMTP is wide open in this regard. So be

good out there, OK?

918 | Chapter 13: Client-Side Scripting

Does Anybody Really Know What Time It Is?

In the prior version of the smtpmail script of Example 13-19, a simple date format was

used for the Date email header that didn’t quite follow the SMTP date formatting

standard:

>>> import time

>>> time.asctime()

'Wed May 05 17:52:05 2010'

Most servers don’t care and will let any sort of date text appear in date header lines, or

even add one if needed. Clients are often similarly forgiving, but not always; one of my

ISP webmail programs shows dates correctly anyhow, but another leaves such ill-

formed dates blank in mail displays. If you want to be more in line with the standard,

you could format the date header with code like this (the result can be parsed with

standard tools such as the time.strptime call):

import time

gmt = time.gmtime(time.time())

fmt = '%a, %d %b %Y %H:%M:%S GMT'

str = time.strftime(fmt, gmt)

hdr = 'Date: ' + str

print(hdr)

The hdr variable’s value looks like this when this code is run:

Date: Wed, 05 May 2010 21:49:32 GMT

The time.strftime call allows arbitrary date and time formatting; time.asctime is just

one standard format. Better yet, do what smtpmail does now—in the newer email pack-

age (described in this chapter), an email.utils call can be used to properly format date

and time automatically. The smtpmail script uses the first of the following format

alternatives:

>>> import email.utils

>>> email.utils.formatdate()

'Wed, 05 May 2010 21:54:28 −0000'

>>> email.utils.formatdate(localtime=True)

'Wed, 05 May 2010 17:54:52 −0400'

>>> email.utils.formatdate(usegmt=True)

'Wed, 05 May 2010 21:55:22 GMT'

See the pymail and mailtools examples in this chapter for additional usage examples;

the latter is reused by the larger PyMailGUI and PyMailCGI email clients later in this

book.

Sending Email at the Interactive Prompt

So where are we in the Internet abstraction model now? With all this email fetching

and sending going on, it’s easy to lose the forest for the trees. Keep in mind that because

mail is transferred over sockets (remember sockets?), they are at the root of all this

activity. All email read and written ultimately consists of formatted bytes shipped over

SMTP: Sending Email | 919

sockets between computers on the Net. As we’ve seen, though, the POP and SMTP

interfaces in Python hide all the details. Moreover, the scripts we’ve begun writing even

hide the Python interfaces and provide higher-level interactive tools.

Both the popmail and smtpmail scripts provide portable email tools but aren’t quite what

we’d expect in terms of usability these days. Later in this chapter, we’ll use what we’ve

seen thus far to implement a more interactive, console-based mail tool. In the next

chapter, we’ll also code a tkinter email GUI, and then we’ll go on to build a web-based

interface in a later chapter. All of these tools, though, vary primarily in terms of user

interface only; each ultimately employs the Python mail transfer modules we’ve met

here to transfer mail message text over the Internet with sockets.

Before we move on, one more SMTP note: just as for reading mail, we can use the

Python interactive prompt as our email sending client, too, if we type calls manually.

The following, for example, sends a message through my ISP’s SMTP server to two

recipient addresses assumed to be part of a mail list:

C:\...\PP4E\Internet\Email> python

>>> from smtplib import SMTP

>>> conn = SMTP('smtpout.secureserver.net')

>>> conn.sendmail(

... 'PP4E@learning-python.com', # true sender

... ['lutz@rmi.net', 'PP4E@learning-python.com'], # true recipients

... """From: PP4E@learning-python.com

... To: maillist

... Subject: test interactive smtplib

...

... testing 1 2 3...

... """)

{}

>>> conn.quit() # quit() required, Date added

(221, b'Closing connection. Good bye.')

We’ll verify receipt of this message in a later email client program; the “To” recipient

shows up as “maillist” in email clients—a completely valid use case for header manip-

ulation. In fact, you can achieve the same effect with the smtpmail-noTo script by sep-

arating recipient addresses at the “To?” prompt with a semicolon (e.g. lutz@rmi.net;

PP4E@learning-python.com) and typing the email list’s name in the “To:” header line.

Mail clients that support mailing lists automate such steps.

Sending mail interactively this way is a bit tricky to get right, though—header lines are

governed by standards: the blank line after the subject line is required and significant,

for instance, and Date is omitted altogether (one is added for us). Furthermore, mail

formatting gets much more complex as we start writing messages with attachments. In

practice, the email package in the standard library is generally used to construct emails,

before shipping them off with smtplib. The package lets us build mails by assigning

headers and attaching and possibly encoding parts, and creates a correctly formatted

mail text. To learn how, let’s move on to the next section.

920 | Chapter 13: Client-Side Scripting

email: Parsing and Composing Mail Content

The second edition of this book used a handful of standard library modules (rfc822,

StringIO, and more) to parse the contents of messages, and simple text processing to

compose them. Additionally, that edition included a section on extracting and decoding

attached parts of a message using modules such as mhlib, mimetools, and base64.

In the third edition, those tools were still available, but were, frankly, a bit clumsy and

error-prone. Parsing attachments from messages, for example, was tricky, and com-

posing even basic messages was tedious (in fact, an early printing of the prior edition

contained a potential bug, because it omitted one \n character in a string formatting

operation). Adding attachments to sent messages wasn’t even attempted, due to the

complexity of the formatting involved. Most of these tools are gone completely in Py-

thon 3.X as I write this fourth edition, partly because of their complexity, and partly

because they’ve been made obsolete.

Luckily, things are much simpler today. After the second edition, Python sprouted a

new email package—a powerful collection of tools that automate most of the work

behind parsing and composing email messages. This module gives us an object-based

message interface and handles all the textual message structure details, both analyzing

and creating it. Not only does this eliminate a whole class of potential bugs, it also

promotes more advanced mail processing.

Things like attachments, for instance, become accessible to mere mortals (and authors

with limited book real estate). In fact, an entire original section on manual attachment

parsing and decoding was deleted in the third edition—it’s essentially automatic with

email. The new package parses and constructs headers and attachments; generates

correct email text; decodes and encodes Base64, quoted-printable, and uuencoded

data; and much more.

We won’t cover the email package in its entirety in this book; it is well documented in

Python’s library manual. Our goal here is to explore some example usage code, which

you can study in conjunction with the manuals. But to help get you started, let’s begin

with a quick overview. In a nutshell, the email package is based around the Message

object it provides:

Parsing mail

A mail’s full text, fetched from poplib or imaplib, is parsed into a new Message

object, with an API for accessing its components. In the object, mail headers be-

come dictionary-like keys, and components become a “payload” that can be

walked with a generator interface (more on payloads in a moment).

Creating mail

New mails are composed by creating a new Message object, using an API to attach

headers and parts, and asking the object for its print representation—a correctly

formatted mail message text, ready to be passed to the smtplib module for delivery.

Headers are added by key assignment and attachments by method calls.

email: Parsing and Composing Mail Content | 921

In other words, the Message object is used both for accessing existing messages and for

creating new ones from scratch. In both cases, email can automatically handle details

like content encodings (e.g., attached binary images can be treated as text with Base64

encoding and decoding), content types, and more.

Message Objects

Since the email module’s Message object is at the heart of its API, you need a cursory

understanding of its form to get started. In short, it is designed to reflect the structure

of a formatted email message. Each Message consists of three main pieces of

information:

Type

A content type (plain text, HTML text, JPEG image, and so on), encoded as a

MIME main type and a subtype. For instance, “text/html” means the main type is

text and the subtype is HTML (a web page); “image/jpeg” means a JPEG photo.

A “multipart/mixed” type means there are nested parts within the message.

Headers

A dictionary-like mapping interface, with one key per mail header (From, To, and

so on). This interface supports almost all of the usual dictionary operations, and

headers may be fetched or set by normal key indexing.

Content

A “payload,” which represents the mail’s content. This can be either a string

(bytes or str) for simple messages, or a list of additional Message objects for

multipart container messages with attached or alternative parts. For some oddball

types, the payload may be a Python None object.

The MIME type of a Message is key to understanding its content. For example, mails

with attached images may have a main top-level Message (type multipart/mixed), with

three more Message objects in its payload—one for its main text (type text/plain),

followed by two of type image for the photos (type image/jpeg). The photo parts may

be encoded for transmission as text with Base64 or another scheme; the encoding type,

as well as the original image filename, are specified in the part’s headers.

Similarly, mails that include both simple text and an HTML alternative will have two

nested Message objects in their payload, of type plain text (text/plain) and HTML text

(text/html), along with a main root Message of type multipart/alternative. Your mail

client decides which part to display, often based on your preferences.

Simpler messages may have just a root Message of type text/plain or text/html, repre-

senting the entire message body. The payload for such mails is a simple string. They

may also have no explicitly given type at all, which generally defaults to text/plain.

Some single-part messages are text/html, with no text/plain alternative—they require

a web browser or other HTML viewer (or a very keen-eyed user).

922 | Chapter 13: Client-Side Scripting

Other combinations are possible, including some types that are not commonly seen in

practice, such as message/delivery status. Most messages have a main text part, though

it is not required, and may be nested in a multipart or other construct.

In all cases, an email message is a simple, linear string, but these message structures are

automatically detected when mail text is parsed and are created by your method calls

when new messages are composed. For instance, when creating messages, the message

attach method adds parts for multipart mails, and set_payload sets the entire payload

to a string for simple mails.

Message objects also have assorted properties (e.g., the filename of an attachment), and

they provide a convenient walk generator method, which returns the next Message in

the payload each time through in a for loop or other iteration context. Because the

walker yields the root Message object first (i.e., self), single-part messages don’t have

to be handled as a special case; a nonmultipart message is effectively a Message with a

single item in its payload—itself.

Ultimately, the Message object structure closely mirrors the way mails are formatted as

text. Special header lines in the mail’s text give its type (e.g., plain text or multipart),

as well as the separator used between the content of nested parts. Since the underlying

textual details are automated by the email package—both when parsing and when

composing—we won’t go into further formatting details here.

If you are interested in seeing how this translates to real emails, a great way to learn

mail structure is by inspecting the full raw text of messages displayed by email clients

you already use, as we’ll see with some we meet in this book. In fact, we’ve already seen

a few—see the raw text printed by our earlier POP email scripts for simple mail text

examples. For more on the Message object, and email in general, consult the email

package’s entry in Python’s library manual. We’re skipping details such as its available

encoders and MIME object classes here in the interest of space.

Beyond the email package, the Python library includes other tools for mail-related pro-

cessing. For instance, mimetypes maps a filename to and from a MIME type:

mimetypes.guess_type(filename)

Maps a filename to a MIME type. Name spam.txt maps to text/plan.

mimetypes.guess_extension(contype)

Maps a MIME type to a filename extension. Type text/html maps to .html.

We also used the mimetypes module earlier in this chapter to guess FTP transfer modes

from filenames (see Example 13-10), as well as in Chapter 6, where we used it to guess

a media player for a filename (see the examples there, including playfile.py, Exam-

ple 6-23). For email, these can come in handy when attaching files to a new message

(guess_type) and saving parsed attachments that do not provide a filename

(guess_extension). In fact, this module’s source code is a fairly complete reference to

MIME types. See the library manual for more on these tools.

email: Parsing and Composing Mail Content | 923

Basic email Package Interfaces in Action

Although we can’t provide an exhaustive reference here, let’s step through a simple

interactive session to illustrate the fundamentals of email processing. To compose the

full text of a message—to be delivered with smtplib, for instance—make a Message,

assign headers to its keys, and set its payload to the message body. Converting to a

string yields the mail text. This process is substantially simpler and less error-prone

than the manual text operations we used earlier in Example 13-19 to build mail as

strings:

>>> from email.message import Message

>>> m = Message()

>>> m['from'] = 'Jane Doe <jane@doe.com>'

>>> m['to'] = 'PP4E@learning-python.com'

>>> m.set_payload('The owls are not what they seem...')

>>>

>>> s = str(m)

>>> print(s)

from: Jane Doe <jane@doe.com>

to: PP4E@learning-python.com

The owls are not what they seem...

Parsing a message’s text—like the kind you obtain with poplib—is similarly simple,

and essentially the inverse: we get back a Message object from the text, with keys for

headers and a payload for the body:

>>> s # same as in prior interaction

'from: Jane Doe <jane@doe.com>\nto: PP4E@learning-python.com\n\nThe owls are not...'

>>> from email.parser import Parser

>>> x = Parser().parsestr(s)

>>> x

<email.message.Message object at 0x015EA9F0>

>>>

>>> x['From']

'Jane Doe <jane@doe.com>'

>>> x.get_payload()

'The owls are not what they seem...'

>>> x.items()

[('from', 'Jane Doe <jane@doe.com>'), ('to', 'PP4E@learning-python.com')]

So far this isn’t much different from the older and now-defunct rfc822 module, but as

we’ll see in a moment, things get more interesting when there is more than one part.

For simple messages like this one, the message walk generator treats it as a single-part

mail, of type plain text:

>>> for part in x.walk():

... print(x.get_content_type())

... print(x.get_payload())

...

text/plain

The owls are not what they seem...

924 | Chapter 13: Client-Side Scripting

Handling multipart messages

Making a mail with attachments is a little more work, but not much: we just make a

root Message and attach nested Message objects created from the MIME type object that

corresponds to the type of data we’re attaching. The MIMEText class, for instance, is a

subclass of Message, which is tailored for text parts, and knows how to generate the

right types of header information when printed. MIMEImage and MIMEAudio similarly cus-

tomize Message for images and audio, and also know how to apply Base64 and other

MIME encodings to binary data. The root message is where we store the main headers

of the mail, and we attach parts here, instead of setting the entire payload—the payload

is a list now, not a string. MIMEMultipart is a Message that provides the extra header

protocol we need for the root:

>>> from email.mime.multipart import MIMEMultipart # Message subclasses

>>> from email.mime.text import MIMEText # with extra headers+logic

>>>

>>> top = MIMEMultipart() # root Message object

>>> top['from'] = 'Art <arthur@camelot.org>' # subtype default=mixed

>>> top['to'] = 'PP4E@learning-python.com'

>>>

>>> sub1 = MIMEText('nice red uniforms...\n') # part Message attachments

>>> sub2 = MIMEText(open('data.txt').read())

>>> sub2.add_header('Content-Disposition', 'attachment', filename='data.txt')

>>> top.attach(sub1)

>>> top.attach(sub2)

When we ask for the text, a correctly formatted full mail text is returned, separators

and all, ready to be sent with smtplib—quite a trick, if you’ve ever tried this by hand:

>>> text = top.as_string() # or do: str(top) or print(top)

>>> print(text)

Content-Type: multipart/mixed; boundary="===============1574823535=="

MIME-Version: 1.0

from: Art <arthur@camelot.org>

to: PP4E@learning-python.com

--===============1574823535==

Content-Type: text/plain; charset="us-ascii"

MIME-Version: 1.0

Content-Transfer-Encoding: 7bit

nice red uniforms...

--===============1574823535==

Content-Type: text/plain; charset="us-ascii"

MIME-Version: 1.0

Content-Transfer-Encoding: 7bit

Content-Disposition: attachment; filename="data.txt"

line1

line2

line3

--===============1574823535==--

email: Parsing and Composing Mail Content | 925

If we are sent this message and retrieve it via poplib, parsing its full text yields a

Message object just like the one we built to send. The message walk generator allows us

to step through each part, fetching their types and payloads:

>>> text # same as in prior interaction

'Content-Type: multipart/mixed; boundary="===============1574823535=="\nMIME-Ver...'

>>> from email.parser import Parser

>>> msg = Parser().parsestr(text)

>>> msg['from']

'Art <arthur@camelot.org>'

>>> for part in msg.walk():

... print(part.get_content_type())

... print(part.get_payload())

... print()

...

multipart/mixed

[<email.message.Message object at 0x015EC610>,

<email.message.Message object at0x015EC630>]

text/plain

nice red uniforms...

text/plain

line1

line2

line3

Multipart alternative messages (with text and HTML renditions of the same message)

can be composed and parsed in similar fashion. Because email clients are able to parse

and compose messages with a simple object-based API, they are freed to focus on user-

interface instead of text processing.

Unicode, Internationalization, and the Python 3.1 email Package

Now that I’ve shown you how “cool” the email package is, I unfortunately need to let

you know that it’s not completely operational in Python 3.1. The email package works

as shown for simple messages, but is severely impacted by Python 3.X’s Unicode/bytes

string dichotomy in a number of ways.

In short, the email package in Python 3.1 is still somewhat coded to operate in the realm

of 2.X str text strings. Because these have become Unicode in 3.X, and because some

tools that email uses are now oriented toward bytes strings, which do not mix freely

with str, a variety of conflicts crop up and cause issues for programs that depend upon

this module.

At this writing, a new version of email is being developed which will handle bytes and

Unicode encodings better, but the going consensus is that it won’t be folded back into

Python until release 3.3 or later, long after this book’s release. Although a few patches

926 | Chapter 13: Client-Side Scripting

might make their way into 3.2, the current sense is that fully addressing the package’s

problems appears to require a full redesign.

To be fair, it’s a substantial problem. Email has historically been oriented toward single-

byte ASCII text, and generalizing it for Unicode is difficult to do well. In fact, the same

holds true for most of the Internet today—as discussed elsewhere in this chapter, FTP,

POP, SMTP, and even webpage bytes fetched over HTTP pose the same sorts of issues.

Interpreting the bytes shipped over networks as text is easy if the mapping is one-to-

one, but allowing for arbitrary Unicode encoding in that text opens a Pandora’s box of

dilemmas. The extra complexity is necessary today, but, as email attests, can be a

daunting task.

Frankly, I considered not releasing this edition of this book until this package’s issues

could be resolved, but I decided to go forward because a new email package may be

years away (two Python releases, by all accounts). Moreover, the issues serve as a case

study of the types of problems you’ll run into in the real world of large-scale software

development. Things change over time, and program code is no exception.

Instead, this book’s examples provide new Unicode and Internationalization support

but adopt policies to work around issues where possible. Programs in books are meant

to be educational, after all, not commercially viable. Given the state of the email package

that the examples depend on, though, the solutions used here might not be completely

universal, and there may be additional Unicode issues lurking. To address the future,

watch this book’s website (described in the Preface) for updated notes and code ex-

amples if/when the anticipated new email package appears. Here, we’ll work with what

we have.

The good news is that we’ll be able to make use of email in its current form to build

fairly sophisticated and full-featured email clients in this book anyhow. It still offers an

amazing number of tools, including MIME encoding and decoding, message formatting

and parsing, Internationalized headers extraction and construction, and more. The bad

news is that this will require a handful of obscure workarounds and may need to be

changed in the future, though few software projects are exempt from such realities.

Because email’s limitations have implications for later email code in this book, I’m

going to quickly run through them in this section. Some of this can be safely saved for

later reference, but parts of later examples may be difficult to understand if you don’t

have this background. The upside is that exploring the package’s limitations here also

serves as a vehicle for digging a bit deeper into the email package’s interfaces in general.

Parser decoding requirement

The first Unicode issue in Python3.1’s email package is nearly a showstopper in some

contexts: the bytes strings of the sort produced by poplib for mail fetches must be

decoded to str prior to parsing with email. Unfortunately, because there may not be

enough information to know how to decode the message bytes per Unicode, some

clients of this package may need to be generalized to detect whole-message encodings

email: Parsing and Composing Mail Content | 927

prior to parsing; in worst cases other than email that may mandate mixed data types,

the current package cannot be used at all. Here’s the issue live:

>>> text # from prior example in his section

'Content-Type: multipart/mixed; boundary="===============1574823535=="\nMIME-Ver...'

>>> btext = text.encode()

>>> btext

b'Content-Type: multipart/mixed; boundary="===============1574823535=="\nMIME-Ve...'

>>> msg = Parser().parsestr(text) # email parser expects Unicode str

>>> msg = Parser().parsestr(btext) # but poplib fetches email as bytes!

Traceback (most recent call last):

File "<stdin>", line 1, in <module>

File "C:\Python31\lib\email\parser.py", line 82, in parsestr

return self.parse(StringIO(text), headersonly=headersonly)

TypeError: initial_value must be str or None, not bytes

>>> msg = Parser().parsestr(btext.decode()) # okay per default

>>> msg = Parser().parsestr(btext.decode('utf8')) # ascii encoded (default)

>>> msg = Parser().parsestr(btext.decode('latin1')) # ascii is same in all 3

>>> msg = Parser().parsestr(btext.decode('ascii'))

This is less than ideal, as a bytes-based email would be able to handle message encod-

ings more directly. As mentioned, though, the email package is not really fully func-

tional in Python 3.1, because of its legacy str focus, and the sharp distinction that

Python 3.X makes between Unicode text and byte strings. In this case, its parser should

accept bytes and not expect clients to know how to decode.

Because of that, this book’s email clients take simplistic approaches to decoding fetched

message bytes to be parsed by email. Specifically, full-text decoding will try a user-

configurable encoding name, then fall back on trying common types as a heuristic, and

finally attempt to decode just message headers.

This will suffice for the examples shown but may need to be enhanced for broader

applicability. In some cases, encoding may have to be determined by other schemes

such as inspecting email headers (if present at all), guessing from bytes structure

analysis, or dynamic user feedback. Adding such enhancements in a robust fashion is

likely too complex to attempt in a book’s example code, and it is better performed in

common standard library tools in any event.

Really, robust decoding of mail text may not be possible today at all, if it requires

headers inspections—we can’t inspect a message’s encoding information headers un-

less we parse the message, but we can’t parse a message with 3.1’s email package unless

we already know the encoding. That is, scripts may need to parse in order to decode,

but they need to decode in order to parse! The byte strings of poplib and Unicode strings

of email in 3.1 are fundamentally at odds. Even within its own libraries, Python 3.X’s

changes have created a chicken-and-egg dependency problem that still exists nearly

two years after 3.0’s release.

928 | Chapter 13: Client-Side Scripting

Short of writing our own email parser, or pursuing other similarly complex approaches,

the best bet today for fetched messages seems to be decoding per user preferences and

defaults, and that’s how we’ll proceed in this edition. The PyMailGUI client of Chap-

ter 14, for instance, will allow Unicode encodings for full mail text to be set on a per-

session basis.

The real issue, of course, is that email in general is inherently complicated by the pres-

ence of arbitrary text encodings. Besides full mail text, we also must consider Unicode

encoding issues for the text components of a message once it’s parsed—both its text

parts and its message headers. To see why, let’s move on.

Related Issue for CGI scripts: I should also note that the full text decoding

issue may not be as large a factor for email as it is for some other

email package clients. Because the original email standards call for

ASCII text and require binary data to be MIME encoded, most emails

are likely to decode properly according to a 7- or 8-bit encoding such as

Latin-1.

As we’ll see in Chapter 15, though, a more insurmountable and related

issue looms for server-side scripts that support CGI file uploads on the

Web—because Python’s CGI module also uses the email package to

parse multipart form data; because this package requires data to be de-

coded to str for parsing; and because such data might have mixed text

and binary data (included raw binary data that is not MIME-encoded,

text of any encoding, and even arbitrary combinations of these), these

uploads fail in Python 3.1 if any binary or incompatible text files are

included. The cgi module triggers Unicode decoding or type errors in-

ternally, before the Python script has a chance to intervene.

CGI uploads worked in Python 2.X, because the str type represented

both possibly encoded text and binary data. Saving this type’s content

to a binary mode file as a string of bytes in 2.X sufficed for both arbitrary

text and binary data such as images. Email parsing worked in 2.X for

the same reason. For better or worse, the 3.X str/bytes dichotomy

makes this generality impossible.

In other words, although we can generally work around the email

parser’s str requirement for fetched emails by decoding per an 8-bit

encoding, it’s much more malignant for web scripting today. Watch for

more details on this in Chapter 15, and stay tuned for a future fix, which

may have materialized by the time you read these words.

Text payload encodings: Handling mixed type results

Our next email Unicode issue seems to fly in the face of Python’s generic programming

model: the data types of message payload objects may differ, depending on how they

are fetched. Especially for programs that walk and process payloads of mail parts

generically, this complicates code.

email: Parsing and Composing Mail Content | 929

Specifically, the Message object’s get_payload method we used earlier accepts an op-

tional decode argument to control automatic email-style MIME decoding (e.g., Base64,

uuencode, quoted-printable). If this argument is passed in as 1 (or equivalently, True),

the payload’s data is MIME-decoded when fetched, if required. Because this argument

is so useful for complex messages with arbitrary parts, it will normally be passed as true

in all cases. Binary parts are normally MIME-encoded, but even text parts might also

be present in Base64 or another MIME form if their bytes fall outside email standards.

Some types of Unicode text, for example, require MIME encoding.

The upshot is that get_payload normally returns str strings for str text parts, but re-

turns bytes strings if its decode argument is true—even if the message part is known to

be text by nature. If this argument is not used, the payload’s type depends upon how

it was set: str or bytes. Because Python 3.X does not allow str and bytes to be mixed

freely, clients that need to use the result in text processing or store it in files need to

accommodate the difference. Let’s run some code to illustrate:

>>> from email.message import Message

>>> m = Message()

>>> m['From'] = 'Lancelot'

>>> m.set_payload('Line?...')

>>> m['From']

'Lancelot'

>>> m.get_payload() # str, if payload is str

'Line?...'

>>> m.get_payload(decode=1) # bytes, if MIME decode (same as decode=True)

b'Line?...'

The combination of these different return types and Python 3.X’s strict str/bytes di-

chotomy can cause problems in code that processes the result unless they decode

carefully:

>>> m.get_payload(decode=True) + 'spam' # can't mix in 3.X!

TypeError: can't concat bytes to str

>>> m.get_payload(decode=True).decode() + 'spam' # convert if required

'Line?...spam'

To make sense of these examples, it may help to remember that there are two different

concepts of “encoding” for email text:

•Email-style MIME encodings such as Base64, uuencode, and quoted-printable,

which are applied to binary and otherwise unusual content to make them accept-

able for transmission in email text

•Unicode text encodings for strings in general, which apply to message text as well

as its parts, and may be required after MIME encoding for text message parts

The email package handles email-style MIME encodings automatically when we pass

decode=1 to fetch parsed payloads, or generate text for messages that have nonprintable

parts, but scripts still need to take Unicode encodings into consideration because of

930 | Chapter 13: Client-Side Scripting

Python 3.X’s sharp string types differentiation. For example, the first decode in the

following refers to MIME, and the second to Unicode:

m.get_payload(decode=True).decode() # to bytes via MIME, then to str via Unicode

Even without the MIME decode argument, the payload type may also differ if it is stored

in different forms:

>>> m = Message(); m.set_payload('spam'); m.get_payload() # fetched as stored

'spam'

>>> m = Message(); m.set_payload(b'spam'); m.get_payload()

b'spam'

Moreover, the same hold true for the text-specific MIME subclass (though as we’ll see

later in this section, we cannot pass a bytes to its constructor to force a binary payload):

>>> from email.mime.text import MIMEText

>>> m = MIMEText('Line...?')

>>> m['From'] = 'Lancelot'

>>> m['From']

'Lancelot'

>>> m.get_payload()

'Line...?'

>>> m.get_payload(decode=1)

b'Line...?'

Unfortunately, the fact that payloads might be either str or bytes today not only flies

in the face of Python’s type-neutral mindset, it can complicate your code—scripts may

need to convert in contexts that require one or the other type. For instance, GUI libraries

might allow both, but file saves and web page content generation may be less flexible.

In our example programs, we’ll process payloads as bytes whenever possible, but de-

code to str text in cases where required using the encoding information available in

the header API described in the next section.

Text payload encodings: Using header information to decode

More profoundly, text in email can be even richer than implied so far—in principle,

text payloads of a single message may be encoded in a variety of different Unicode

schemes (e.g., three HTML webpage file attachments, all in different Unicode encod-

ings, and possibly different than the full message text’s encoding). Although treating

such text as binary byte strings can sometimes finesse encoding issues, saving such parts

in text-mode files for opening must respect the original encoding types. Further, any

text processing performed on such parts will be similarly type-specific.

Luckily, the email package both adds character-set headers when generating message

text and retains character-set information for parts if it is present when parsing message

text. For instance, adding non-ASCII text attachments simply requires passing in an

encoding name—the appropriate message headers are added automatically on text

generation, and the character set is available directly via the get_content_charset

method:

email: Parsing and Composing Mail Content | 931

>>> s = b'A\xe4B'

>>> s.decode('latin1')

'AäB'

>>> from email.message import Message

>>> m = Message()

>>> m.set_payload(b'A\xe4B', charset='latin1') # or 'latin-1': see ahead

>>> t = m.as_string()

>>> print(t)

MIME-Version: 1.0

Content-Type: text/plain; charset="latin1"

Content-Transfer-Encoding: base64

QeRC

>>> m.get_content_charset()

'latin1'

Notice how email automatically applies Base64 MIME encoding to non-ASCII text

parts on generation, to conform to email standards. The same is true for the more

specific MIME text subclass of Message:

>>> from email.mime.text import MIMEText

>>> m = MIMEText(b'A\xe4B', _charset='latin1')

>>> t = m.as_string()

>>> print(t)

Content-Type: text/plain; charset="latin1"

MIME-Version: 1.0

Content-Transfer-Encoding: base64

QeRC

>>> m.get_content_charset()

'latin1'

Now, if we parse this message’s text string with email, we get back a new Message whose

text payload is the Base64 MIME-encoded text used to represent the non-ASCII Uni-

code string. Requesting MIME decoding for the payload with decode=1 returns the byte

string we originally attached:

>>> from email.parser import Parser

>>> q = Parser().parsestr(t)

>>> q

<email.message.Message object at 0x019ECA50>

>>> q.get_content_type()

'text/plain'

>>> q._payload

'QeRC\n'

>>> q.get_payload()

'QeRC\n'

>>> q.get_payload(decode=1)

b'A\xe4B'

However, running Unicode decoding on this byte string to convert to text fails if we

attempt to use the platform default on Windows (UTF8). To be more accurate, and

932 | Chapter 13: Client-Side Scripting

support a wide variety of text types, we need to use the character-set information saved

by the parser and attached to the Message object. This is especially important if we need

to save the data to a file—we either have to store as bytes in binary mode files, or specify

the correct (or at least a compatible) Unicode encoding in order to use such strings for

text-mode files. Decoding manually works the same way:

>>> q.get_payload(decode=1).decode()

UnicodeDecodeError: 'utf8' codec can't decode bytes in position 1-2: unexpected

>>> q.get_content_charset()

'latin1'

>>> q.get_payload(decode=1).decode('latin1') # known type

'AäB'

>>> q.get_payload(decode=1).decode(q.get_content_charset()) # allow any type

'AäB'

In fact, all the header details are available on Message objects, if we know where to look.

The character set can also be absent entirely, in which case it’s returned as None; clients

need to define policies for such ambiguous text (they might try common types, guess,

or treat the data as a raw byte string):

>>> q['content-type'] # mapping interface

'text/plain; charset="latin1"'

>>> q.items()

[('Content-Type', 'text/plain; charset="latin1"'), ('MIME-Version', '1.0'),

('Content-Transfer-Encoding', 'base64')]

>> q.get_params(header='Content-Type') # param interface

[('text/plain', ''), ('charset', 'latin1')]

>>> q.get_param('charset', header='Content-Type')

'latin1'

>>> charset = q.get_content_charset() # might be missing

>>> if charset:

... print(q.get_payload(decode=1).decode(charset))

...

AäB

This handles encodings for message text parts in parsed emails. For composing new

emails, we still must apply session-wide user settings or allow the user to specify an

encoding for each part interactively. In some of this book’s email clients, payload con-

versions are performed as needed—using encoding information in message headers

after parsing and provided by users during mail composition.

Message header encodings: email package support

On a related note, the email package also provides support for encoding and decoding

message headers themselves (e.g., From, Subject) per email standards when they are

not simple text. Such headers are often called Internationalized (or i18n) headers, be-

cause they support inclusion of non-ASCII character set text in emails. This term is also

sometimes used to refer to encoded text of message payloads; unlike message headers,

email: Parsing and Composing Mail Content | 933

though, message payload encoding is used for both international Unicode text and truly

binary data such as images (as we’ll see in the next section).

Like mail payload parts, i18n headers are encoded specially for email, and may also be

encoded per Unicode. For instance, here’s how to decode an encoded subject line from

an arguably spammish email that just showed up in my inbox; its =?UTF-8?Q? preamble

declares that the data following it is UTF-8 encoded Unicode text, which is also MIME-

encoded per quoted-printable for transmission in email (in short, unlike the prior sec-

tion’s part payloads, which declare their encodings in separate header lines, headers

themselves may declare their Unicode and MIME encodings by embedding them in

their own content this way):

>>> rawheader = '=?UTF-8?Q?Introducing=20Top=20Values=3A=20A=20Special=20Selecti

on=20of=20Great=20Money=20Savers?='

>>> from email.header import decode_header # decode per email+MIME

>>> decode_header(rawheader)

[(b'Introducing Top Values: A Special Selection of Great Money Savers', 'utf-8')]

>>> bin, enc = decode_header(rawheader)[0] # and decode per Unicode

>>> bin, enc

(b'Introducing Top Values: A Special Selection of Great Money Savers', 'utf-8')

>>> bin.decode(enc)

'Introducing Top Values: A Special Selection of Great Money Savers'

Subtly, the email package can return multiple parts if there are encoded substrings in

the header, and each must be decoded individually and joined to produce decoded

header text. Even more subtly, in 3.1, this package returns all bytes when any substring

(or the entire header) is encoded but returns str for a fully unencoded header, and

uncoded substrings returned as bytes are encoded per “raw-unicode-escape” in the

package—an encoding scheme useful to convert str to bytes when no encoding type

applies:

>>> from email.header import decode_header

>>> S1 = 'Man where did you get that assistant?'

>>> S2 = '=?utf-8?q?Man_where_did_you_get_that_assistant=3F?='

>>> S3 = 'Man where did you get that =?UTF-8?Q?assistant=3F?='

# str: don't decode()

>>> decode_header(S1)

[('Man where did you get that assistant?', None)]

# bytes: do decode()

>>> decode_header(S2)

[(b'Man where did you get that assistant?', 'utf-8')]

# bytes: do decode() using raw-unicode-escape applied in package

>>> decode_header(S3)

[(b'Man where did you get that', None), (b'assistant?', 'utf-8')]

# join decoded parts if more than one

934 | Chapter 13: Client-Side Scripting

>>> parts = decode_header(S3)

>>> ' '.join(abytes.decode('raw-unicode-escape' if enc == None else enc)

... for (abytes, enc) in parts)

'Man where did you get that assistant?'

We’ll use logic similar to the last step here in the mailtools package ahead, but also

retain str substrings intact without attempting to decode.

Late-breaking news: As I write this in mid-2010, it seems possible that

this mixed type, nonpolymorphic, and frankly, non-Pythonic API be-

havior may be addressed in a future Python release. In response to a rant

posted on the Python developers list by a book author whose work you

might be familiar with, there is presently a vigorous discussion of the

topic there. Among other ideas is a proposal for a bytes-like type which

carries with it an explicit Unicode encoding; this may make it possible

to treat some text cases in a more generic fashion. While it’s impossible

to foresee the outcome of such proposals, it’s good to see that the issues

are being actively explored. Stay tuned to this book’s website for further

developments in the Python 3.X library API and Unicode stories.

Message address header encodings and parsing, and header creation

One wrinkle pertaining to the prior section: for message headers that contain email

addresses (e.g., From), the name component of the name/address pair might be encoded

this way as well. Because the email package’s header parser expects encoded substrings

to be followed by whitespace or the end of string, we cannot ask it to decode a complete

address-related header—quotes around name components will fail.

To support such Internationalized address headers, we must also parse out the first

part of the email address and then decode. First of all, we need to extract the name and

address parts of an email address using email package tools:

>>> from email.utils import parseaddr, formataddr

>>> p = parseaddr('"Smith, Bob" <bob@bob.com>') # split into name/addr pair

>>> p # unencoded addr

('Smith, Bob', 'bob@bob.com')

>>> formataddr(p)

'"Smith, Bob" <bob@bob.com>'

>>> parseaddr('Bob Smith <bob@bob.com>') # unquoted name part

('Bob Smith', 'bob@bob.com')

>>> formataddr(parseaddr('Bob Smith <bob@bob.com>'))

'Bob Smith <bob@bob.com>'

>>> parseaddr('bob@bob.com') # simple, no name

('', 'bob@bob.com')

>>> formataddr(parseaddr('bob@bob.com'))

'bob@bob.com'

Fields with multiple addresses (e.g., To) separate individual addresses by commas.

Since email names might embed commas, too, blindly splitting on commas to run each

email: Parsing and Composing Mail Content | 935

though parsing won’t always work. Instead, another utility can be used to parse each

address individually: getaddresses ignores commas in names when spitting apart sep-

arate addresses, and parseaddr does, too, because it simply returns the first pair in the

getaddresses result (some line breaks were added to the following for legibility):

>>> from email.utils import getaddresses

>>> multi = '"Smith, Bob" <bob@bob.com>, Bob Smith <bob@bob.com>, bob@bob.com,

"Bob" <bob@bob.com>'

>>> getaddresses([multi])

[('Smith, Bob', 'bob@bob.com'), ('Bob Smith', 'bob@bob.com'), ('', 'bob@bob.com'),

('Bob', 'bob@bob.com')]

>>> [formataddr(pair) for pair in getaddresses([multi])]

['"Smith, Bob" <bob@bob.com>', 'Bob Smith <bob@bob.com>', 'bob@bob.com',

'Bob <bob@bob.com>']

>>> ', '.join([formataddr(pair) for pair in getaddresses([multi])])

'"Smith, Bob" <bob@bob.com>, Bob Smith <bob@bob.com>, bob@bob.com,

Bob <bob@bob.com>'

>>> getaddresses(['bob@bob.com']) # handles single address cases too

('', 'bob@bob.com')]

Now, decoding email addresses is really just an extra step before and after the normal

header decoding logic we saw earlier:

>>> rawfromheader = '"=?UTF-8?Q?Walmart?=" <newsletters@walmart.com>'

>>> from email.utils import parseaddr, formataddr

>>> from email.header import decode_header

>>> name, addr = parseaddr(rawfromheader) # split into name/addr parts

>>> name, addr

('=?UTF-8?Q?Walmart?=', 'newsletters@walmart.com')

>>> abytes, aenc = decode_header(name)[0] # do email+MIME decoding

>>> abytes, aenc

(b'Walmart', 'utf-8')

>>> name = abytes.decode(aenc) # do Unicode decoding

>>> name

'Walmart'

>>> formataddr((name, addr)) # put parts back together

'Walmart <newsletters@walmart.com>'

Although From headers will typically have just one address, to be fully robust we need

to apply this to every address in headers, such as To, Cc, and Bcc. Again, the multiad-

dress getaddresses utility avoids comma clashes between names and address separa-

tors; since it also handles the single address case, it suffices for From headers as well:

>>> rawfromheader = '"=?UTF-8?Q?Walmart?=" <newsletters@walmart.com>'

>>> rawtoheader = rawfromheader + ', ' + rawfromheader

>>> rawtoheader

936 | Chapter 13: Client-Side Scripting

'"=?UTF-8?Q?Walmart?=" <newsletters@walmart.com>, "=?UTF-8?Q?Walmart?=" <newslet

ters@walmart.com>'

>>> pairs = getaddresses([rawtoheader])

>>> pairs

[('=?UTF-8?Q?Walmart?=', 'newsletters@walmart.com'), ('=?UTF-8?Q?Walmart?=', 'ne

wsletters@walmart.com')]

>>> addrs = []

>>> for name, addr in pairs:

... abytes, aenc = decode_header(name)[0] # email+MIME

... name = abytes.decode(aenc) # Unicode

... addrs.append(formataddr((name, addr))) # one or more addrs

...

>>> ', '.join(addrs)

'Walmart <newsletters@walmart.com>, Walmart <newsletters@walmart.com>'

These tools are generally forgiving for unencoded content and return them intact. To

be robust, though, the last portion of code here should also allow for multiple parts

returned by decode_header (for encoded substrings), None encoding values for parts (for

unencoded substrings), and str substring values instead of bytes (for fully unencoded

names).

Decoding this way applies both MIME and Unicode decoding steps to fetched mails.

Creating properly encoded headers for inclusion in new mails composed and sent is

similarly straightforward:

>>> from email.header import make_header

>>> hdr = make_header([(b'A\xc4B\xe4C', 'latin-1')])

>>> print(hdr)

AÄBäC

>>> print(hdr.encode())

=?iso-8859-1?q?A=C4B=E4C?=

>>> decode_header(hdr.encode())

[(b'A\xc4B\xe4C', 'iso-8859-1')]

This can be applied to entire headers such as Subject, as well as the name component

of each email address in an address-related header line such as From and To (use

getaddresses to split into individual addresses first if needed). The header object pro-

vides an alternative interface; both techniques handle additional details, such as line

lengths, for which we’ll defer to Python manuals:

>>> from email.header import Header

>>> h = Header(b'A\xe4B\xc4X', charset='latin-1')

>>> h.encode()

'=?iso-8859-1?q?A=E4B=C4X?='

>>>

>>> h = Header('spam', charset='ascii') # same as Header('spam')

>>> h.encode()

'spam'

The mailtools package ahead and its PyMailGUI client of Chapter 14 will use these

interfaces to automatically decode message headers in fetched mails per their content

for display, and to encode headers sent that are not in ASCII format. That latter also

email: Parsing and Composing Mail Content | 937

applies to the name component of email addresses, and assumes that SMTP servers will

allow these to pass. This may encroach on some SMTP server issues which we don’t

have space to address in this book. See the Web for more on SMTP headers handling.

For more on headers decoding, see also file _test-i18n-headers.py in the examples pack-

age; it decodes additional subject and address-related headers using mailtools

methods, and displays them in a tkinter Text widget—a foretaste of how these will be

displayed in PyMailGUI.

Workaround: Message text generation for binary attachment payloads is broken

Our last two email Unicode issues are outright bugs which we must work around today,

though they will almost certainly be fixed in a future Python release. The first breaks

message text generation for all but trivial messages—the email package today no longer

supports generation of full mail text for messages that contain any binary parts, such

as images or audio files. Without coding workarounds, only simple emails that consist

entirely of text parts can be composed and generated in Python 3.1’s email package;

any MIME-encoded binary part causes mail text generation to fail.

This is a bit tricky to understand without poring over email’s source code (which,

thankfully, we can in the land of open source), but to demonstrate the issue, first notice

how simple text payloads are rendered as full message text when printed as we’ve

already seen:

C:\...\PP4E\Internet\Email> python

>>> from email.message import Message # generic message object

>>> m = Message()

>>> m['From'] = 'bob@bob.com'

>>> m.set_payload(open('text.txt').read()) # payload is str text

>>> print(m) # print uses as_string()

From: bob@bob.com

spam

Spam

SPAM!

As we’ve also seen, for convenience, the email package also provides subclasses of the

Message object, tailored to add message headers that provide the extra descriptive details

used by email clients to know how to process the data:

>>> from email.mime.text import MIMEText # Message subclass with headers

>>> text = open('text.txt').read()

>>> m = MIMEText(text) # payload is str text

>>> m['From'] = 'bob@bob.com'

>>> print(m)

Content-Type: text/plain; charset="us-ascii"

MIME-Version: 1.0

Content-Transfer-Encoding: 7bit

From: bob@bob.com

938 | Chapter 13: Client-Side Scripting

spam

Spam

SPAM!

This works for text, but watch what happens when we try to render a message part

with truly binary data, such as an image that could not be decoded as Unicode text:

>>> from email.message import Message # generic Message object

>>> m = Message()

>>> m['From'] = 'bob@bob.com'

>>> bytes = open('monkeys.jpg', 'rb').read() # read binary bytes (not Unicode)

>>> m.set_payload(bytes) # we set the payload to bytes

>>> print(m)

Traceback (most recent call last):

...lines omitted...

File "C:\Python31\lib\email\generator.py", line 155, in _handle_text

raise TypeError('string payload expected: %s' % type(payload))

TypeError: string payload expected: <class 'bytes'>

>>> m.get_payload()[:20]

b'\xff\xd8\xff\xe0\x00\x10JFIF\x00\x01\x01\x01\x00x\x00x\x00\x00'

The problem here is that the email package’s text generator assumes that the message’s

payload data is a Base64 (or similar) encoded str text string by generation time, not

bytes. Really, the error is probably our fault in this case, because we set the payload to

raw bytes manually. We should use the MIMEImage MIME subclass tailored for images;

if we do, the email package internally performs Base64 MIME email encoding on the

data when the message object is created. Unfortunately, it still leaves it as bytes, not

str, despite the fact the whole point of Base64 is to change binary data to text (though

the exact Unicode flavor this text should take may be unclear). This leads to additional

failures in Python 3.1:

>>> from email.mime.image import MIMEImage # Message sublcass with hdrs+base64

>>> bytes = open('monkeys.jpg', 'rb').read() # read binary bytes again

>>> m = MIMEImage(bytes) # MIME class does Base64 on data

>>> print(m)

Traceback (most recent call last):

...lines omitted...

File "C:\Python31\lib\email\generator.py", line 155, in _handle_text

raise TypeError('string payload expected: %s' % type(payload))

TypeError: string payload expected: <class 'bytes'>

>>> m.get_payload()[:40] # this is already Base64 text

b'/9j/4AAQSkZJRgABAQEAeAB4AAD/2wBDAAIBAQIB'

>>> m.get_payload()[:40].decode('ascii') # but it's still bytes internally!

'/9j/4AAQSkZJRgABAQEAeAB4AAD/2wBDAAIBAQIB'

In other words, not only does the Python 3.1 email package not fully support the Python

3.X Unicode/bytes dichotomy, it was actually broken by it. Luckily, there’s a work-

around for this case.

To address this specific issue, I opted to create a custom encoding function for binary

MIME attachments, and pass it in to the email package’s MIME message object

email: Parsing and Composing Mail Content | 939

subclasses for all binary data types. This custom function is coded in the upcoming

mailtools package of this chapter (Example 13-23). Because it is used by email to en-

code from bytes to text at initialization time, it is able to decode to ASCII text per

Unicode as an extra step, after running the original call to perform Base64 encoding

and arrange content-encoding headers. The fact that email does not do this extra Uni-

code decoding step itself is a genuine bug in that package (albeit, one introduced by

changes elsewhere in Python standard libraries), but the workaround does its job:

# in mailtools.mailSender module ahead in this chapter...

def fix_encode_base64(msgobj):

from email.encoders import encode_base64

encode_base64(msgobj) # what email does normally: leaves bytes

bytes = msgobj.get_payload() # bytes fails in email pkg on text gen

text = bytes.decode('ascii') # decode to unicode str so text gen works

...line splitting logic omitted...

msgobj.set_payload('\n'.join(lines))

>>> from email.mime.image import MIMEImage

>>> from mailtools.mailSender import fix_encode_base64 # use custom workaround

>>> bytes = open('monkeys.jpg', 'rb').read()

>>> m = MIMEImage(bytes, _encoder=fix_encode_base64) # convert to ascii str

>>> print(m.as_string()[:500])

Content-Type: image/jpeg

MIME-Version: 1.0

Content-Transfer-Encoding: base64

/9j/4AAQSkZJRgABAQEAeAB4AAD/2wBDAAIBAQIBAQICAgICAgICAwUDAwMDAwYEBAMFBwYHBwcG

BwcICQsJCAgKCAcHCg0KCgsMDAwMBwkODw0MDgsMDAz/2wBDAQICAgMDAwYDAwYMCAcIDAwMDAwM

DAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAz/wAARCAHoAvQDASIA

AhEBAxEB/8QAHwAAAQUBAQEBAQEAAAAAAAAAAAECAwQFBgcICQoL/8QAtRAAAgEDAwIEAwUFBAQA

AAF9AQIDAAQRBRIhMUEGE1FhByJxFDKBkaEII0KxwRVS0fAkM2JyggkKFhcYGRolJicoKSo0NTY3

ODk6Q0RFRkdISUpTVFVWV1hZWmNkZWZnaGlqc

>>> print(m) # to print the entire message: very long

Another possible workaround involves defining a custom MIMEImage class that is like

the original but does not attempt to perform Base64 ending on creation; that way, we

could encode and translate to str before message object creation, but still make use of

the original class’s header-generation logic. If you take this route, though, you’ll find

that it requires repeating (really, cutting and pasting) far too much of the original logic

to be reasonable—this repeated code would have to mirror any future email changes:

>>> from email.mime.nonmultipart import MIMENonMultipart

>>> class MyImage(MIMENonMultipart):

... def __init__(self, imagedata, subtype):

... MIMENonMultipart.__init__(self, 'image', subtype)

... self.set_payload(_imagedata)

...repeat all the base64 logic here, with an extra ASCII Unicode decode...

>>> m = MyImage(text_from_bytes)

Interestingly, this regression in email actually reflects an unrelated change in Python’s

base64 module made in 2007, which was completely benign until the Python 3.X bytes/

940 | Chapter 13: Client-Side Scripting

str differentiation came online. Prior to that, the email encoder worked in Python 2.X,

because bytes was really str. In 3.X, though, because base64 returns bytes, the normal

mail encoder in email also leaves the payload as bytes, even though it’s been encoded

to Base64 text form. This in turn breaks email text generation, because it assumes the

payload is text in this case, and requires it to be str. As is common in large-scale

software systems, the effects of some 3.X changes may have been difficult to anticipate

or accommodate in full.

By contrast, parsing binary attachments (as opposed to generating text for them) works

fine in 3.X, because the parsed message payload is saved in message objects as a Base64-

encoded str string, not bytes, and is converted to bytes only when fetched. This bug

seems likely to also go away in a future Python and email package (perhaps even as a

simple patch in Python 3.2), but it’s more serious than the other Unicode decoding

issues described here, because it prevents mail composition for all but trivial mails.

The flexibility afforded by the package and the Python language allows such a work-

around to be developed external to the package, rather than hacking the package’s code

directly. With open source and forgiving APIs, you rarely are truly stuck.

Late-breaking news: This section’s bug is scheduled to be fixed in Python

3.2, making our workaround here unnecessary in this and later Python

releases. This is per communications with members of Python’s email

special interest group (on the “email-sig” mailing list).

Regrettably, this fix didn’t appear until after this chapter and its exam-

ples had been written. I’d like to remove the workaround and its de-

scription entirely, but this book is based on Python 3.1, both before and

after the fix was incorporated.

So that it works under Python 3.2 alpha, too, though, the workaround

code ahead was specialized just before publication to check for bytes

prior to decoding. Moreover, the workaround still must manually split

lines in Base64 data, because 3.2 still does not.

Workaround: Message composition for non-ASCII text parts is broken

Our final email Unicode issue is as severe as the prior one: changes like that of the prior

section introduced yet another regression for mail composition. In short, it’s impossible

to make text message parts today without specializing for different Unicode encodings.

Some types of text are automatically MIME-encoded for transmission. Unfortunately,

because of the str/bytes split, the MIME text message class in email now requires

different string object types for different Unicode encodings. The net effect is that you

now have to know how the email package will process your text data when making a

text message object, or repeat most of its logic redundantly.

For example, to properly generate Unicode encoding headers and apply required MIME

encodings, here’s how we must proceed today for common Unicode text types:

email: Parsing and Composing Mail Content | 941

>>> m = MIMEText('abc', _charset='ascii') # pass text for ascii

>>> print(m)

MIME-Version: 1.0

Content-Type: text/plain; charset="us-ascii"

Content-Transfer-Encoding: 7bit

abc

>>> m = MIMEText('abc', _charset='latin-1') # pass text for latin-1

>>> print(m) # but not for 'latin1': ahead

MIME-Version: 1.0

Content-Type: text/plain; charset="iso-8859-1"

Content-Transfer-Encoding: quoted-printable

abc

>>> m = MIMEText(b'abc', _charset='utf-8') # pass bytes for utf8

>>> print(m)

Content-Type: text/plain; charset="utf-8"

MIME-Version: 1.0

Content-Transfer-Encoding: base64

YWJj

This works, but if you look closely, you’ll notice that we must pass str to the first two,

but bytes to the third. That requires that we special-case code for Unicode types based

upon the package’s internal operation. Types other than those expected for a Unicode

encoding don’t work at all, because of newly invalid str/bytes combinations that occur

inside the email package in 3.1:

>>> m = MIMEText('abc', _charset='ascii')

>>> m = MIMEText(b'abc', _charset='ascii') # bug: assumes 2.X str

Traceback (most recent call last):

...lines omitted...

File "C:\Python31\lib\email\encoders.py", line 60, in encode_7or8bit

orig.encode('ascii')

AttributeError: 'bytes' object has no attribute 'encode'

>>> m = MIMEText('abc', _charset='latin-1')

>>> m = MIMEText(b'abc', _charset='latin-1') # bug: qp uses str

Traceback (most recent call last):

...lines omitted...

File "C:\Python31\lib\email\quoprimime.py", line 176, in body_encode

if line.endswith(CRLF):

TypeError: expected an object with the buffer interface

>>> m = MIMEText(b'abc', _charset='utf-8')

>>> m = MIMEText('abc', _charset='utf-8') # bug: base64 uses bytes

Traceback (most recent call last):

...lines omitted...

File "C:\Python31\lib\email\base64mime.py", line 94, in body_encode

enc = b2a_base64(s[i:i + max_unencoded]).decode("ascii")

TypeError: must be bytes or buffer, not str

Moreover, the email package is pickier about encoding name synonyms than Python

and most other tools are: “latin-1” is detected as a quoted-printable MIME type, but

942 | Chapter 13: Client-Side Scripting

“latin1” is unknown and so defaults to Base64 MIME. In fact, this is why Base64 was

used for the “latin1” Unicode type earlier in this section—an encoding choice that is

irrelevant to any recipient that understands the “latin1” synonym, including Python

itself. Unfortunately, that means that we also need to pass in a different string type if

we use a synonym the package doesn’t understand today:

>>> m = MIMEText('abc', _charset='latin-1') # str for 'latin-1'

>>> print(m)

MIME-Version: 1.0

Content-Type: text/plain; charset="iso-8859-1"

Content-Transfer-Encoding: quoted-printable

abc

>>> m = MIMEText('abc', _charset='latin1')

Traceback (most recent call last):

...lines omitted...

File "C:\Python31\lib\email\base64mime.py", line 94, in body_encode

enc = b2a_base64(s[i:i + max_unencoded]).decode("ascii")

TypeError: must be bytes or buffer, not str

>>> m = MIMEText(b'abc', _charset='latin1') # bytes for 'latin1'!

>>> print(m)

Content-Type: text/plain; charset="latin1"

MIME-Version: 1.0

Content-Transfer-Encoding: base64

YWJj

There are ways to add aliases and new encoding types in the email package, but they’re

not supported out of the box. Programs that care about being robust would have to

cross-check the user’s spelling, which may be valid for Python itself, against that ex-

pected by email. This also holds true if your data is not ASCII in general—you’ll have

to first decode to text in order to use the expected “latin-1” name because its quoted-

printable MIME encoding expects str, even though bytes are required if “latin1”

triggers the default Base64 MIME:

>>> m = MIMEText(b'A\xe4B', _charset='latin1')

>>> print(m)

Content-Type: text/plain; charset="latin1"

MIME-Version: 1.0

Content-Transfer-Encoding: base64

QeRC

>>> m = MIMEText(b'A\xe4B', _charset='latin-1')

Traceback (most recent call last):

...lines omitted...

File "C:\Python31\lib\email\quoprimime.py", line 176, in body_encode

if line.endswith(CRLF):

TypeError: expected an object with the buffer interface

>>> m = MIMEText(b'A\xe4B'.decode('latin1'), _charset='latin-1')

>>> print(m)

email: Parsing and Composing Mail Content | 943

MIME-Version: 1.0

Content-Type: text/plain; charset="iso-8859-1"

Content-Transfer-Encoding: quoted-printable

A=E4B

In fact, the text message object doesn’t check to see that the data you’re MIME-

encoding is valid per Unicode in general—we can send invalid UTF text but the receiver

may have trouble decoding it:

>>> m = MIMEText(b'A\xe4B', _charset='utf-8')

>>> print(m)

Content-Type: text/plain; charset="utf-8"

MIME-Version: 1.0

Content-Transfer-Encoding: base64

QeRC

>>> b'A\xe4B'.decode('utf8')

UnicodeDecodeError: 'utf8' codec can't decode bytes in position 1-2: unexpected...

>>> import base64

>>> base64.b64decode(b'QeRC')

b'A\xe4B'

>>> base64.b64decode(b'QeRC').decode('utf')

UnicodeDecodeError: 'utf8' codec can't decode bytes in position 1-2: unexpected...

So what to do if we need to attach message text to composed messages if the text’s

datatype requirement is indirectly dictated by its Unicode encoding name? The generic

Message superclass doesn’t help here directly if we specify an encoding, as it exhibits

the same encoding-specific behavior:

>>> m = Message()

>>> m.set_payload('spam', charset='us-ascii')

>>> print(m)

MIME-Version: 1.0

Content-Type: text/plain; charset="us-ascii"

Content-Transfer-Encoding: 7bit

spam

>>> m = Message()

>>> m.set_payload(b'spam', charset='us-ascii')

AttributeError: 'bytes' object has no attribute 'encode'

>>> m.set_payload('spam', charset='utf-8')

TypeError: must be bytes or buffer, not str

Although we could try to work around these issues by repeating much of the code that

email runs, the redundancy would make us hopelessly tied to its current implementa-

tion and dependent upon its future changes. The following, for example, parrots the

steps that email runs internally to create a text message object for ASCII encoding text;

unlike the MIMEText class, this approach allows all data to be read from files as binary

byte strings, even if it’s simple ASCII:

944 | Chapter 13: Client-Side Scripting

>>> m = Message()

>>> m.add_header('Content-Type', 'text/plain')

>>> m['MIME-Version'] = '1.0'

>>> m.set_param('charset', 'us-ascii')

>>> m.add_header('Content-Transfer-Encoding', '7bit')

>>> data = b'spam'

>>> m.set_payload(data.decode('ascii')) # data read as bytes here

>>> print(m)

MIME-Version: 1.0

Content-Type: text/plain; charset="us-ascii"

Content-Transfer-Encoding: 7bit

spam

>>> print(MIMEText('spam', _charset='ascii')) # same, but type-specific

MIME-Version: 1.0

Content-Type: text/plain; charset="us-ascii"

Content-Transfer-Encoding: 7bit

spam

To do the same for other kinds of text that require MIME encoding, just insert an extra

encoding step; although we’re concerned with text parts here, a similar imitative ap-

proach could address the binary parts text generation bug we met earlier:

>>> m = Message()

>>> m.add_header('Content-Type', 'text/plain')

>>> m['MIME-Version'] = '1.0'

>>> m.set_param('charset', 'utf-8')

>>> m.add_header('Content-Transfer-Encoding', 'base64')

>>> data = b'spam'

>>> from binascii import b2a_base64 # add MIME encode if needed

>>> data = b2a_base64(data) # data read as bytes here too

>>> m.set_payload(data.decode('ascii'))

>>> print(m)

MIME-Version: 1.0

Content-Type: text/plain; charset="utf-8"

Content-Transfer-Encoding: base64

c3BhbQ==

>>> print(MIMEText(b'spam', _charset='utf-8')) # same, but type-specific

Content-Type: text/plain; charset="utf-8"

MIME-Version: 1.0

Content-Transfer-Encoding: base64

c3BhbQ==

This works, but besides the redundancy and dependency it creates, to use this approach

broadly we’d also have to generalize to account for all the various kinds of Unicode

encodings and MIME encodings possible, like the email package already does inter-

nally. We might also have to support encoding name synonyms to be flexible, adding

further redundancy. In other words, this requires additional work, and in the end, we’d

still have to specialize our code for different Unicode types.

email: Parsing and Composing Mail Content | 945

Any way we go, some dependence on the current implementation seems unavoidable

today. It seems the best we can do here, apart from hoping for an improved email

package in a few years’ time, is to specialize text message construction calls by Unicode

type, and assume both that encoding names match those expected by the package and

that message data is valid for the Unicode type selected. Here is the sort of arguably

magic code that the upcoming mailtools package (again in Example 13-23) will apply

to choose text types:

>>> from email.charset import Charset, BASE64, QP

>>> for e in ('us-ascii', 'latin-1', 'utf8', 'latin1', 'ascii'):

... cset = Charset(e)

... benc = cset.body_encoding

... if benc in (None, QP):

... print(e, benc, 'text') # read/fetch data as str

... else:

... print(e, benc, 'binary') # read/fetch data as bytes

...

us-ascii None text

latin-1 1 text

utf8 2 binary

latin1 2 binary

ascii None text

We’ll proceed this way in this book, with the major caveat that this is almost certainly

likely to require changes in the future because of its strong coupling with the current

email implementation.

Late-breaking news: Like the prior section, it now appears that this sec-

tion’s bug will also be fixed in Python 3.2, making the workaround here

unnecessary in this and later Python releases. The nature of the fix is

unknown, though, and we still need the fix for the version of Python

current when this chapter was written. As of just before publication, the

alpha release of 3.2 is still somewhat type specific on this issue, but now

accepts either str or bytes for text that triggers Base64 encodings, in-

stead of just bytes.

Summary: Solutions and workarounds

The email package in Python 3.1 provides powerful tools for parsing and composing

mails, and can be used as the basis for full-featured mail clients like those in this book

with just a few workarounds. As you can see, though, it is less than fully functional

today. Because of that, further specializing code to its current API is perhaps a tempo-

rary solution. Short of writing our own email parser and composer (not a practical

option in a finitely-sized book!), some compromises are in order here. Moreover, the

inherent complexity of Unicode support in email places some limits on how much we

can pursue this thread in this book.

946 | Chapter 13: Client-Side Scripting

In this edition, we will support Unicode encodings of text parts and headers in messages

composed, and respect the Unicode encodings in text parts and mail headers of mes-

sages fetched. To make this work with the partially crippled email package in

Python 3.1, though, we’ll apply the following Unicode policies in various email clients

in this book:

• Use user preferences and defaults for the preparse decoding of full mail text fetched

and encoding of text payloads sent.

• Use header information, if available, to decode the bytes payloads returned by

get_payload when text parts must be treated as str text, but use binary mode files

to finesse the issue in other contexts.

• Use formats prescribed by email standard to decode and encode message headers

such as From and Subject if they are not simple text.

• Apply the fix described to work around the message text generation issue for binary

parts.

• Special-case construction of text message objects according to Unicode types and

email behavior.

These are not necessarily complete solutions. For example, some of this edition’s email

clients allow for Unicode encodings for both text attachments and mail headers, but

they do nothing about encoding the full text of messages sent beyond the policies in-

herited from smtplib and implement policies that might be inconvenient in some use

cases. But as we’ll see, despite their limitations, our email clients will still be able to

handle complex email tasks and a very large set of emails.

Again, since this story is in flux in Python today, watch this book’s website for updates

that may improve or be required of code that uses email in the future. A future email

may handle Unicode encodings more accurately. Like Python 3.X, though, backward

compatibility may be sacrificed in the process and require updates to this book’s code.

For more on this issue, see the Web as well as up-to-date Python release notes.

Although this quick tour captures the basic flavor of the interface, we need to step up

to larger examples to see more of the email package’s power. The next section takes us

on the first of those steps.

A Console-Based Email Client

Let’s put together what we’ve learned about fetching, sending, parsing, and composing

email in a simple but functional command-line console email tool. The script in Ex-

ample 13-20 implements an interactive email session—users may type commands to

read, send, and delete email messages. It uses poplib and smtplib to fetch and send,

and uses the email package directly to parse and compose.

A Console-Based Email Client | 947

Example 13-20. PP4E\Internet\Email\pymail.py

#!/usr/local/bin/python

"""

##########################################################################

pymail - a simple console email interface client in Python; uses Python

poplib module to view POP email messages, smtplib to send new mails, and

the email package to extract mail headers and payload and compose mails;

##########################################################################

"""

import poplib, smtplib, email.utils, mailconfig

from email.parser import Parser

from email.message import Message

fetchEncoding = mailconfig.fetchEncoding

def decodeToUnicode(messageBytes, fetchEncoding=fetchEncoding):

"""

4E, Py3.1: decode fetched bytes to str Unicode string for display or parsing;

use global setting (or by platform default, hdrs inspection, intelligent guess);

in Python 3.2/3.3, this step may not be required: if so, return message intact;

"""

return [line.decode(fetchEncoding) for line in messageBytes]

def splitaddrs(field):

"""

4E: split address list on commas, allowing for commas in name parts

"""

pairs = email.utils.getaddresses([field]) # [(name,addr)]

return [email.utils.formataddr(pair) for pair in pairs] # [name <addr>]

def inputmessage():

import sys

From = input('From? ').strip()

To = input('To? ').strip() # datetime hdr may be set auto

To = splitaddrs(To) # possible many, name+<addr> okay

Subj = input('Subj? ').strip() # don't split blindly on ',' or ';'

print('Type message text, end with line="."')

text = ''

while True:

line = sys.stdin.readline()

if line == '.\n': break

text += line

return From, To, Subj, text

def sendmessage():

From, To, Subj, text = inputmessage()

msg = Message()

msg['From'] = From

msg['To'] = ', '.join(To) # join for hdr, not send

msg['Subject'] = Subj

msg['Date'] = email.utils.formatdate() # curr datetime, rfc2822

msg.set_payload(text)

server = smtplib.SMTP(mailconfig.smtpservername)

try:

failed = server.sendmail(From, To, str(msg)) # may also raise exc

948 | Chapter 13: Client-Side Scripting

except:

print('Error - send failed')

else:

if failed: print('Failed:', failed)

def connect(servername, user, passwd):

print('Connecting...')

server = poplib.POP3(servername)

server.user(user) # connect, log in to mail server

server.pass_(passwd) # pass is a reserved word

print(server.getwelcome()) # print returned greeting message

return server

def loadmessages(servername, user, passwd, loadfrom=1):

server = connect(servername, user, passwd)

try:

print(server.list())

(msgCount, msgBytes) = server.stat()

print('There are', msgCount, 'mail messages in', msgBytes, 'bytes')

print('Retrieving...')

msgList = [] # fetch mail now

for i in range(loadfrom, msgCount+1): # empty if low >= high

(hdr, message, octets) = server.retr(i) # save text on list

message = decodeToUnicode(message) # 4E, Py3.1: bytes to str

msgList.append('\n'.join(message)) # leave mail on server

finally:

server.quit() # unlock the mail box

assert len(msgList) == (msgCount - loadfrom) + 1 # msg nums start at 1

return msgList

def deletemessages(servername, user, passwd, toDelete, verify=True):

print('To be deleted:', toDelete)

if verify and input('Delete?')[:1] not in ['y', 'Y']:

print('Delete cancelled.')

else:

server = connect(servername, user, passwd)

try:

print('Deleting messages from server...')

for msgnum in toDelete: # reconnect to delete mail

server.dele(msgnum) # mbox locked until quit()

finally:

server.quit()

def showindex(msgList):

count = 0 # show some mail headers

for msgtext in msgList:

msghdrs = Parser().parsestr(msgtext, headersonly=True) # expects str in 3.1

count += 1

print('%d:\t%d bytes' % (count, len(msgtext)))

for hdr in ('From', 'To', 'Date', 'Subject'):

try:

print('\t%-8s=>%s' % (hdr, msghdrs[hdr]))

except KeyError:

print('\t%-8s=>(unknown)' % hdr)

if count % 5 == 0:

A Console-Based Email Client | 949

input('[Press Enter key]') # pause after each 5

def showmessage(i, msgList):

if 1 <= i <= len(msgList):

#print(msgList[i-1]) # old: prints entire mail--hdrs+text

print('-' * 79)

msg = Parser().parsestr(msgList[i-1]) # expects str in 3.1

content = msg.get_payload() # prints payload: string, or [Messages]

if isinstance(content, str): # keep just one end-line at end

content = content.rstrip() + '\n'

print(content)

print('-' * 79) # to get text only, see email.parsers

else:

print('Bad message number')

def savemessage(i, mailfile, msgList):

if 1 <= i <= len(msgList):

savefile = open(mailfile, 'a', encoding=mailconfig.fetchEncoding) # 4E

savefile.write('\n' + msgList[i-1] + '-'*80 + '\n')

else:

print('Bad message number')

def msgnum(command):

try:

return int(command.split()[1])

except:

return −1 # assume this is bad

helptext = """

Available commands:

i - index display

l n? - list all messages (or just message n)

d n? - mark all messages for deletion (or just message n)

s n? - save all messages to a file (or just message n)

m - compose and send a new mail message

q - quit pymail

? - display this help text

"""

def interact(msgList, mailfile):

showindex(msgList)

toDelete = []

while True:

try:

command = input('[Pymail] Action? (i, l, d, s, m, q, ?) ')

except EOFError:

command = 'q'

if not command: command = '*'

# quit

if command == 'q':

break

# index

elif command[0] == 'i':

950 | Chapter 13: Client-Side Scripting

showindex(msgList)

# list

elif command[0] == 'l':

if len(command) == 1:

for i in range(1, len(msgList)+1):

showmessage(i, msgList)

else:

showmessage(msgnum(command), msgList)

# save

elif command[0] == 's':

if len(command) == 1:

for i in range(1, len(msgList)+1):

savemessage(i, mailfile, msgList)

else:

savemessage(msgnum(command), mailfile, msgList)

# delete

elif command[0] == 'd':

if len(command) == 1: # delete all later

toDelete = list(range(1, len(msgList)+1)) # 3.x requires list

else:

delnum = msgnum(command)

if (1 <= delnum <= len(msgList)) and (delnum not in toDelete):

toDelete.append(delnum)

else:

print('Bad message number')

# mail

elif command[0] == 'm': # send a new mail via SMTP

sendmessage()

#execfile('smtpmail.py', {}) # alt: run file in own namespace

elif command[0] == '?':

print(helptext)

else:

print('What? -- type "?" for commands help')

return toDelete

if __name__ == '__main__':

import getpass, mailconfig

mailserver = mailconfig.popservername # ex: 'pop.rmi.net'

mailuser = mailconfig.popusername # ex: 'lutz'

mailfile = mailconfig.savemailfile # ex: r'c:\stuff\savemail'

mailpswd = getpass.getpass('Password for %s?' % mailserver)

print('[Pymail email client]')

msgList = loadmessages(mailserver, mailuser, mailpswd) # load all

toDelete = interact(msgList, mailfile)

if toDelete: deletemessages(mailserver, mailuser, mailpswd, toDelete)

print('Bye.')

A Console-Based Email Client | 951

There isn’t much new here—just a combination of user-interface logic and tools we’ve

already met, plus a handful of new techniques:

Loads

This client loads all email from the server into an in-memory Python list only once,

on startup; you must exit and restart to reload newly arrived email.

Saves

On demand, pymail saves the raw text of a selected message into a local file, whose

name you place in the mailconfig module of Example 13-17.

Deletions

We finally support on-request deletion of mail from the server here: in pymail, mails

are selected for deletion by number, but are still only physically removed from your

server on exit, and then only if you verify the operation. By deleting only on exit,

we avoid changing mail message numbers during a session—under POP, deleting

a mail not at the end of the list decrements the number assigned to all mails fol-

lowing the one deleted. Since mail is cached in memory by pymail, future operations

on the numbered messages in memory can be applied to the wrong mail if deletions

were done immediately.#

Parsing and composing messages

pymail now displays just the payload of a message on listing commands, not the

entire raw text, and the mail index listing only displays selected headers parsed out

of each message. Python’s email package is used to extract headers and content

from a message, as shown in the prior section. Similarly, we use email to compose

a message and ask for its string to ship as a mail.

By now, I expect that you know enough to read this script for a deeper look, so instead

of saying more about its design here, let’s jump into an interactive pymail session to see

how it works.

Running the pymail Console Client

Let’s start up pymail to read and delete email at our mail server and send new messages.

pymail runs on any machine with Python and sockets, fetches mail from any email server

with a POP interface on which you have an account, and sends mail via the SMTP server

you’ve named in the mailconfig module we wrote earlier (Example 13-17).

Here it is in action running on my Windows laptop machine; its operation is identical

on other machines thanks to the portability of both Python and its standard library.

#There will be more on POP message numbers when we study mailtools later in this chapter. Interestingly,

the list of message numbers to be deleted need not be sorted; they remain valid for the duration of the delete

connection, so deletions earlier in the list don’t change numbers of messages later in the list while you are

still connected to the POP server. We’ll also see that some subtle issues may arise if mails in the server inbox

are deleted without pymail’s knowledge (e.g., by your ISP or another email client); although very rare, suffice

it to say for now that deletions in this script are not guaranteed to be accurate.

952 | Chapter 13: Client-Side Scripting

First, we start the script, supply a POP password (remember, SMTP servers usually

require no password), and wait for the pymail email list index to appear; as is, this

version loads the full text of all mails in the inbox on startup:

C:\...\PP4E\Internet\Email> pymail.py

Password for pop.secureserver.net?

[Pymail email client]

Connecting...

b'+OK <8927.1273263898@p3pop01-10.prod.phx3.gdg>'

(b'+OK ', [b'1 1860', b'2 1408', b'3 1049', b'4 1009', b'5 1038', b'6 957'], 47)

There are 6 mail messages in 7321 bytes

Retrieving...

1: 1861 bytes

From =>lutz@rmi.net

To =>pp4e@learning-python.com

Date =>Wed, 5 May 2010 11:29:36 −0400 (EDT)

Subject =>I'm a Lumberjack, and I'm Okay

2: 1409 bytes

From =>lutz@learning-python.com

To =>PP4E@learning-python.com

Date =>Wed, 05 May 2010 08:33:47 −0700

Subject =>testing

3: 1050 bytes

From =>Eric.the.Half.a.Bee@yahoo.com

To =>PP4E@learning-python.com

Date =>Thu, 06 May 2010 14:11:07 −0000

Subject =>A B C D E F G

4: 1010 bytes

From =>PP4E@learning-python.com

To =>PP4E@learning-python.com

Date =>Thu, 06 May 2010 14:16:31 −0000

Subject =>testing smtpmail

5: 1039 bytes

From =>Eric.the.Half.a.Bee@aol.com

To =>nobody.in.particular@marketing.com

Date =>Thu, 06 May 2010 14:32:32 −0000

Subject =>a b c d e f g

[Press Enter key]

6: 958 bytes

From =>PP4E@learning-python.com

To =>maillist

Date =>Thu, 06 May 2010 10:58:40 −0400

Subject =>test interactive smtplib

[Pymail] Action? (i, l, d, s, m, q, ?) l 6

-------------------------------------------------------------------------------

testing 1 2 3...

-------------------------------------------------------------------------------

[Pymail] Action? (i, l, d, s, m, q, ?) l 3

-------------------------------------------------------------------------------

Fiddle de dum, Fiddle de dee,

Eric the half a bee.

-------------------------------------------------------------------------------

[Pymail] Action? (i, l, d, s, m, q, ?)

A Console-Based Email Client | 953

Once pymail downloads your email to a Python list on the local client machine, you

type command letters to process it. The l command lists (prints) the contents of a given

mail number; here, we just used it to list two emails we sent in the preceding section,

with the smtpmail script, and interactively.

pymail also lets us get command help, delete messages (deletions actually occur at the

server on exit from the program), and save messages away in a local text file whose

name is listed in the mailconfig module we saw earlier:

[Pymail] Action? (i, l, d, s, m, q, ?) ?

Available commands:

i - index display

l n? - list all messages (or just message n)

d n? - mark all messages for deletion (or just message n)

s n? - save all messages to a file (or just message n)

m - compose and send a new mail message

q - quit pymail

? - display this help text

[Pymail] Action? (i, l, d, s, m, q, ?) s 4

[Pymail] Action? (i, l, d, s, m, q, ?) d 4

Now, let’s pick the m mail compose option—pymail inputs the mail parts, builds mail

text with email, and ships it off with smtplib. You can separate recipients with a comma,

and use either simple “addr” or full “name <addr>” address pairs if desired. Because

the mail is sent by SMTP, you can use arbitrary From addresses here; but again, you

generally shouldn’t do that (unless, of course, you’re trying to come up with interesting

examples for a book):

[Pymail] Action? (i, l, d, s, m, q, ?) m

From? Cardinal@hotmail.com

To? PP4E@learning-python.com

Subj? Among our weapons are these

Type message text, end with line="."

Nobody Expects the Spanish Inquisition!

[Pymail] Action? (i, l, d, s, m, q, ?) q

To be deleted: [4]

Delete?y

Connecting...

b'+OK <16872.1273264370@p3pop01-17.prod.phx3.secureserver.net>'

Deleting messages from server...

Bye.

As mentioned, deletions really happen only on exit. When we quit pymail with the q

command, it tells us which messages are queued for deletion, and verifies the request.

Once verified, pymail finally contacts the mail server again and issues POP calls to delete

the selected mail messages. Because deletions change message numbers in the server’s

inbox, postponing deletion until exit simplifies the handling of already loaded email

(we’ll improve on this in the PyMailGUI client of the next chapter).

954 | Chapter 13: Client-Side Scripting

Because pymail downloads mail from your server into a local Python list only once at

startup, though, we need to start pymail again to refetch mail from the server if we want

to see the result of the mail we sent and the deletion we made. Here, our new mail

shows up at the end as new number 6, and the original mail assigned number 4 in the

prior session is gone:

C:\...\PP4E\Internet\Email> pymail.py

Password for pop.secureserver.net?

[Pymail email client]

Connecting...

b'+OK <11563.1273264637@p3pop01-26.prod.phx3.secureserver.net>'

(b'+OK ', [b'1 1860', b'2 1408', b'3 1049', b'4 1038', b'5 957', b'6 1037'], 47)

There are 6 mail messages in 7349 bytes

Retrieving...

1: 1861 bytes

From =>lutz@rmi.net

To =>pp4e@learning-python.com

Date =>Wed, 5 May 2010 11:29:36 −0400 (EDT)

Subject =>I'm a Lumberjack, and I'm Okay

2: 1409 bytes

From =>lutz@learning-python.com

To =>PP4E@learning-python.com

Date =>Wed, 05 May 2010 08:33:47 −0700

Subject =>testing

3: 1050 bytes

From =>Eric.the.Half.a.Bee@yahoo.com

To =>PP4E@learning-python.com

Date =>Thu, 06 May 2010 14:11:07 −0000

Subject =>A B C D E F G

4: 1039 bytes

From =>Eric.the.Half.a.Bee@aol.com

To =>nobody.in.particular@marketing.com

Date =>Thu, 06 May 2010 14:32:32 −0000

Subject =>a b c d e f g

5: 958 bytes

From =>PP4E@learning-python.com

To =>maillist

Date =>Thu, 06 May 2010 10:58:40 −0400

Subject =>test interactive smtplib

[Press Enter key]

6: 1038 bytes

From =>Cardinal@hotmail.com

To =>PP4E@learning-python.com

Date =>Fri, 07 May 2010 20:32:38 −0000

Subject =>Among our weapons are these

[Pymail] Action? (i, l, d, s, m, q, ?) l 6

-------------------------------------------------------------------------------

Nobody Expects the Spanish Inquisition!

-------------------------------------------------------------------------------

[Pymail] Action? (i, l, d, s, m, q, ?) q

Bye.

A Console-Based Email Client | 955

Though not shown in this session, you can also send to multiple recipients, and include

full name and address pairs in your email addresses. This works just because the script

employs email utilities described earlier to split up addresses and fully parse to allow

commas as both separators and name characters. The following, for example, would

send to two and three recipients, respectively, using mostly full address formats:

[Pymail] Action? (i, l, d, s, m, q, ?) m

From? "moi 1" <pp4e@learning-python.com>

To? "pp 4e" <pp4e@learning-python.com>, "lu,tz" <lutz@learning-python.com>

[Pymail] Action? (i, l, d, s, m, q, ?) m

From? The Book <pp4e@learning-python.com>

To? "pp 4e" <pp4e@learning-python.com>, "lu,tz" <lutz@learning-python.com>,

lutz@rmi.net

Finally, if you are running this live, you will also find the mail save file on your machine,

containing the one message we asked to be saved in the prior session; it’s simply the

raw text of saved emails, with separator lines. This is both human and machine-

readable—in principle, another script could load saved mail from this file into a Python

list by calling the string object’s split method on the file’s text with the separator line

as a delimiter. As shown in this book, it shows up in file C:\temp\savemail.txt, but you

can configure this as you like in the mailconfig module.

The mailtools Utility Package

The email package used by the pymail example of the prior section is a collection of

powerful tools—in fact, perhaps too powerful to remember completely. At the mini-

mum, some reusable boilerplate code for common use cases can help insulate you from

some of its details; by isolating module usage, such code can also ease the migration to

possible future email changes. To simplify email interfacing for more complex mail

clients, and to further demonstrate the use of standard library email tools, I developed

the custom utility modules listed in this section—a package called mailtools.

mailtools is a Python modules package: a directory of code, with one module per tool

class, and an initialization module run when the directory is first imported. This pack-

age’s modules are essentially just a wrapper layer above the standard library’s email

package, as well as its poplib and smtplib modules. They make some assumptions about

the way email is to be used, but they are reasonable and allow us to forget some of the

underlying complexity of the standard library tools employed.

In a nutshell, the mailtools package provides three classes—to fetch, send, and parse

email messages. These classes can be used as superclasses in order to mix in their meth-

ods to an application-specific class, or as standalone or embedded objects that export

their methods for direct calls. We’ll see these classes deployed both ways in this text.

As a simple example of this package’s tools in action, its selftest.py module serves as

a self-test script. When run, it sends a message from you, to you, which includes the

selftest.py file as an attachment. It also fetches and displays some mail headers and

956 | Chapter 13: Client-Side Scripting

parsed and unparsed content. These interfaces, along with some user-interface magic,

will lead us to full-blown email clients and websites in later chapters.

Two design notes worth mentioning up front: First, none of the code in this package

knows anything about the user interface it will be used in (console, GUI, web, or other)

or does anything about things like threads; it is just a toolkit. As we’ll see, its clients

are responsible for deciding how it will be deployed. By focusing on just email pro-

cessing here, we simplify the code, as well as the programs that will use it.

Second, each of the main modules in this package illustrate Unicode issues that con-

front Python 3.X code, especially when using the 3.1 Python email package:

• The sender must address encodings for the main message text, attachment input

files, saved-mail output files, and message headers.

• The fetcher must resolve full mail text encodings when new mails are fetched.

• The parser must deal with encodings in text part payloads of parsed messages, as

well as those in message headers.

In addition, the sender must provide workarounds for the binary parts generation and

text part creation issues in email described earlier in this chapter. Since these highlight

Unicode factors in general, and might not be solved as broadly as they might be due to

limitations of the current Python email package, I’ll elaborate on each of these choices

along the way.

The next few sections list mailtools source code. Together, its files consist of roughly

1,050 lines of code, including whitespace and comments. We won’t cover all of this

package’s code in depth—study its listings for more details, and see its self-test module

for a usage example. Also, for more context and examples, watch for the three clients

that will use this package—the modified pymail2.py following this listing, the

PyMailGUI client in Chapter 14, and the PyMailCGI server in Chapter 16. By sharing

and reusing this module, all three systems inherit all its utility, as well as any future

enhancements.

Initialization File

The module in Example 13-21 implements the initialization logic of the mailtools

package; as usual, its code is run automatically the first time a script imports through

the package’s directory. Notice how this file collects the contents of all the nested

modules into the directory’s namespace with from * statements—because mailtools

began life as a single .py file, this provides backward compatibility for existing clients.

We also must use package-relative import syntax here (from .module), because Python

3.X no longer includes the package’s own directory on the module import search path

(only the package’s container is on the path). Since this is the root module, global

comments appear here as well.

The mailtools Utility Package | 957

Example 13-21. PP4E\Internet\Email\mailtools\__init__.py

"""

##################################################################################

mailtools package: interface to mail server transfers, used by pymail2, PyMailGUI,

and PyMailCGI; does loads, sends, parsing, composing, and deleting, with part

attachments, encodings (of both the email and Unicdode kind), etc.; the parser,

fetcher, and sender classes here are designed to be mixed-in to subclasses which

use their methods, or used as embedded or standalone objects;

this package also includes convenience subclasses for silent mode, and more;

loads all mail text if pop server doesn't do top; doesn't handle threads or UI

here, and allows askPassword to differ per subclass; progress callback funcs get

status; all calls raise exceptions on error--client must handle in GUI/other;

this changed from file to package: nested modules imported here for bw compat;

4E: need to use package-relative import syntax throughout, because in Py 3.X

package dir in no longer on module import search path if package is imported

elsewhere (from another directory which uses this package); also performs

Unicode decoding on mail text when fetched (see mailFetcher), as well as for

some text part payloads which might have been email-encoded (see mailParser);

TBD: in saveparts, should file be opened in text mode for text/ contypes?

TBD: in walkNamedParts, should we skip oddballs like message/delivery-status?

TBD: Unicode support has not been tested exhaustively: see Chapter 13 for more

on the Py3.1 email package and its limitations, and the policies used here;

##################################################################################

"""

# collect contents of all modules here, when package dir imported directly

from .mailFetcher import *

from .mailSender import * # 4E: package-relative

from .mailParser import *

# export nested modules here, when from mailtools import *

__all__ = 'mailFetcher', 'mailSender', 'mailParser'

# self-test code is in selftest.py to allow mailconfig's path

# to be set before running thr nested module imports above

MailTool Class

Example 13-22 contains common superclasses for the other classes in the package. This

is in part meant for future expansion. At present, these are used only to enable or disable

trace message output (some clients, such as web-based programs, may not want text

to be printed to the output stream). Subclasses mix in the silent variant to turn off

output.

958 | Chapter 13: Client-Side Scripting

Example 13-22. PP4E\Internet\Email\mailtools\mailTool.py

"""

###############################################################################

common superclasses: used to turn trace massages on/off

###############################################################################

"""

class MailTool: # superclass for all mail tools

def trace(self, message): # redef me to disable or log to file

print(message)

class SilentMailTool: # to mixin instead of subclassing

def trace(self, message):

pass

MailSender Class

The class used to compose and send messages is coded in Example 13-23. This module

provides a convenient interface that combines standard library tools we’ve already met

in this chapter—the email package to compose messages with attachments and en-

codings, and the smtplib module to send the resulting email text. Attachments are

passed in as a list of filenames—MIME types and any required encodings are deter-

mined automatically with the module mimetypes. Moreover, date and time strings are

automated with an email.utils call, and non-ASCII headers are encoded per email,

MIME, and Unicode standards. Study this file’s code and comments for more on its

operation.

Unicode issues for attachments, save files, and headers

This is also where we open and add attachment files, generate message text, and save

sent messages to a local file. Most attachment files are opened in binary mode, but as

we’ve seen, some text attachments must be opened in text mode because the current

email package requires them to be str strings when message objects are created. As we

also saw earlier, the email package requires attachments to be str text when mail text

is later generated, possibly as the result of MIME encoding.

To satisfy these constraints with the Python 3.1 email package, we must apply the two

fixes described earlier— part file open calls select between text or binary mode (and

thus read str or bytes) based upon the way email will process the data, and MIME

encoding calls for binary data are augmented to decode the result to ASCII text. The

latter of these also splits the Base64 text into lines here for binary parts (unlike email),

because it is otherwise sent as one long line, which may work in some contexts, but

causes problems in some text editors if the raw text is viewed.

Beyond these fixes, clients may optionally provide the names of the Unicode encoding

scheme associated with the main text part and each text attachment part. In Chap-

ter 14’s PyMailGUI, this is controlled in the mailconfig user settings module, with

UTF-8 used as a fallback default whenever user settings fail to encode a text part. We

The mailtools Utility Package | 959

could in principle also catch part file decoding errors and return an error indicator string

(as we do for received mails in the mail fetcher ahead), but sending an invalid attach-

ment is much more grievous than displaying one. Instead, the send request fails entirely

on errors.

Finally, there is also new support for encoding non-ASCII headers (both full headers

and names of email addresses) per a client-selectable encoding that defaults to UTF-8,

and the sent message save file is opened in the same mailconfig Unicode encoding mode

used to decode messages when they are fetched.

The latter policy for sent mail saves is used because the sent file may be opened to fetch

full mail text in this encoding later by clients which apply this encoding scheme. This

is intended to mirror the way that clients such as PyMailGUI save full message text in

local files to be opened and parsed later. It might fail if the mail fetcher resorted to

guessing a different and incompatible encoding, and it assumes that no message gives

rise to incompatibly encoded data in the file across multiple sessions. We could instead

keep one save file per encoding, but encodings for full message text probably will not

vary; ASCII was the original standard for full mail text, so 7- or 8-bit text is likely.

Example 13-23. PP4E\Internet\Email\mailtools\mailSender.py

"""

###############################################################################

send messages, add attachments (see __init__ for docs, test)

###############################################################################

"""

import mailconfig # client's mailconfig

import smtplib, os, mimetypes # mime: name to type

import email.utils, email.encoders # date string, base64

from .mailTool import MailTool, SilentMailTool # 4E: package-relative

from email.message import Message # general message, obj->text

from email.mime.multipart import MIMEMultipart # type-specific messages

from email.mime.audio import MIMEAudio # format/encode attachments

from email.mime.image import MIMEImage

from email.mime.text import MIMEText

from email.mime.base import MIMEBase

from email.mime.application import MIMEApplication # 4E: use new app class

def fix_encode_base64(msgobj):

"""

4E: workaround for a genuine bug in Python 3.1 email package that prevents

mail text generation for binary parts encoded with base64 or other email

encodings; the normal email.encoder run by the constructor leaves payload

as bytes, even though it's encoded to base64 text form; this breaks email

text generation which assumes this is text and requires it to be str; net

effect is that only simple text part emails can be composed in Py 3.1 email

package as is - any MIME-encoded binary part cause mail text generation to

fail; this bug seems likely to go away in a future Python and email package,

in which case this should become a no-op; see Chapter 13 for more details;

960 | Chapter 13: Client-Side Scripting

"""

linelen = 76 # per MIME standards

from email.encoders import encode_base64

encode_base64(msgobj) # what email does normally: leaves bytes

text = msgobj.get_payload() # bytes fails in email pkg on text gen

if isinstance(text, bytes): # payload is bytes in 3.1, str in 3.2 alpha

text = text.decode('ascii') # decode to unicode str so text gen works

lines = [] # split into lines, else 1 massive line

text = text.replace('\n', '') # no \n present in 3.1, but futureproof me!

while text:

line, text = text[:linelen], text[linelen:]

lines.append(line)

msgobj.set_payload('\n'.join(lines))

def fix_text_required(encodingname):

"""

4E: workaround for str/bytes combination errors in email package; MIMEText

requires different types for different Unicode encodings in Python 3.1, due

to the different ways it MIME-encodes some types of text; see Chapter 13;

the only other alternative is using generic Message and repeating much code;

"""

from email.charset import Charset, BASE64, QP

charset = Charset(encodingname) # how email knows what to do for encoding

bodyenc = charset.body_encoding # utf8, others require bytes input data

return bodyenc in (None, QP) # ascii, latin1, others require str

class MailSender(MailTool):

"""

send mail: format a message, interface with an SMTP server;

works on any machine with Python+Inet, doesn't use cmdline mail;

a nonauthenticating client: see MailSenderAuth if login required;

4E: tracesize is num chars of msg text traced: 0=none, big=all;

4E: supports Unicode encodings for main text and text parts;

4E: supports header encoding, both full headers and email names;

"""

def __init__(self, smtpserver=None, tracesize=256):

self.smtpServerName = smtpserver or mailconfig.smtpservername

self.tracesize = tracesize

def sendMessage(self, From, To, Subj, extrahdrs, bodytext, attaches,

saveMailSeparator=(('=' * 80) + 'PY\n'),

bodytextEncoding='us-ascii',

attachesEncodings=None):

"""

format and send mail: blocks caller, thread me in a GUI;

bodytext is main text part, attaches is list of filenames,

extrahdrs is list of (name, value) tuples to be added;

raises uncaught exception if send fails for any reason;

saves sent message text in a local file if successful;

The mailtools Utility Package | 961

assumes that To, Cc, Bcc hdr values are lists of 1 or more already

decoded addresses (possibly in full name+<addr> format); client

must parse to split these on delimiters, or use multiline input;

note that SMTP allows full name+<addr> format in recipients;

4E: Bcc addrs now used for send/envelope, but header is dropped;

4E: duplicate recipients removed, else will get >1 copies of mail;

caveat: no support for multipart/alternative mails, just /mixed;

"""

# 4E: assume main body text is already in desired encoding;

# clients can decode to user pick, default, or utf8 fallback;

# either way, email needs either str xor bytes specifically;

if fix_text_required(bodytextEncoding):

if not isinstance(bodytext, str):

bodytext = bodytext.decode(bodytextEncoding)

else:

if not isinstance(bodytext, bytes):

bodytext = bodytext.encode(bodytextEncoding)

# make message root

if not attaches:

msg = Message()

msg.set_payload(bodytext, charset=bodytextEncoding)

else:

msg = MIMEMultipart()

self.addAttachments(msg, bodytext, attaches,

bodytextEncoding, attachesEncodings)

# 4E: non-ASCII hdrs encoded on sends; encode just name in address,

# else smtp may drop the message completely; encodes all envelope

# To names (but not addr) also, and assumes servers will allow;

# msg.as_string retains any line breaks added by encoding headers;

hdrenc = mailconfig.headersEncodeTo or 'utf-8' # default=utf8

Subj = self.encodeHeader(Subj, hdrenc) # full header

From = self.encodeAddrHeader(From, hdrenc) # email names

To = [self.encodeAddrHeader(T, hdrenc) for T in To] # each recip

Tos = ', '.join(To) # hdr+envelope

# add headers to root

msg['From'] = From

msg['To'] = Tos # poss many: addr list

msg['Subject'] = Subj # servers reject ';' sept

msg['Date'] = email.utils.formatdate() # curr datetime, rfc2822 utc

recip = To

for name, value in extrahdrs: # Cc, Bcc, X-Mailer, etc.

if value:

if name.lower() not in ['cc', 'bcc']:

value = self.encodeHeader(value, hdrenc)

msg[name] = value

else:

value = [self.encodeAddrHeader(V, hdrenc) for V in value]

recip += value # some servers reject ['']

962 | Chapter 13: Client-Side Scripting

if name.lower() != 'bcc': # 4E: bcc gets mail, no hdr

msg[name] = ', '.join(value) # add commas between cc

recip = list(set(recip)) # 4E: remove duplicates

fullText = msg.as_string() # generate formatted msg

# sendmail call raises except if all Tos failed,

# or returns failed Tos dict for any that failed

self.trace('Sending to...' + str(recip))

self.trace(fullText[:self.tracesize]) # SMTP calls connect

server = smtplib.SMTP(self.smtpServerName) # this may fail too

self.getPassword() # if srvr requires

self.authenticateServer(server) # login in subclass

try:

failed = server.sendmail(From, recip, fullText) # except or dict

except:

server.close() # 4E: quit may hang!

raise # reraise except

else:

server.quit() # connect + send OK

self.saveSentMessage(fullText, saveMailSeparator) # 4E: do this first

if failed:

class SomeAddrsFailed(Exception): pass

raise SomeAddrsFailed('Failed addrs:%s\n' % failed)

self.trace('Send exit')

def addAttachments(self, mainmsg, bodytext, attaches,

bodytextEncoding, attachesEncodings):

"""

format a multipart message with attachments;

use Unicode encodings for text parts if passed;

"""

# add main text/plain part

msg = MIMEText(bodytext, _charset=bodytextEncoding)

mainmsg.attach(msg)

# add attachment parts

encodings = attachesEncodings or (['us-ascii'] * len(attaches))

for (filename, fileencode) in zip(attaches, encodings):

# filename may be absolute or relative

if not os.path.isfile(filename): # skip dirs, etc.

continue

# guess content type from file extension, ignore encoding

contype, encoding = mimetypes.guess_type(filename)

if contype is None or encoding is not None: # no guess, compressed?

contype = 'application/octet-stream' # use generic default

self.trace('Adding ' + contype)

# build sub-Message of appropriate kind

maintype, subtype = contype.split('/', 1)

if maintype == 'text': # 4E: text needs encoding

if fix_text_required(fileencode): # requires str or bytes

data = open(filename, 'r', encoding=fileencode)

The mailtools Utility Package | 963

else:

data = open(filename, 'rb')

msg = MIMEText(data.read(), _subtype=subtype, _charset=fileencode)

data.close()

elif maintype == 'image':

data = open(filename, 'rb') # 4E: use fix for binaries

msg = MIMEImage(

data.read(), _subtype=subtype, _encoder=fix_encode_base64)

data.close()

elif maintype == 'audio':

data = open(filename, 'rb')

msg = MIMEAudio(

data.read(), _subtype=subtype, _encoder=fix_encode_base64)

data.close()

elif maintype == 'application': # new in 4E

data = open(filename, 'rb')

msg = MIMEApplication(

data.read(), _subtype=subtype, _encoder=fix_encode_base64)

data.close()

else:

data = open(filename, 'rb') # application/* could

msg = MIMEBase(maintype, subtype) # use this code too

msg.set_payload(data.read())

data.close() # make generic type

fix_encode_base64(msg) # was broken here too!

#email.encoders.encode_base64(msg) # encode using base64

# set filename and attach to container

basename = os.path.basename(filename)

msg.add_header('Content-Disposition',

'attachment', filename=basename)

mainmsg.attach(msg)

# text outside mime structure, seen by non-MIME mail readers

mainmsg.preamble = 'A multi-part MIME format message.\n'

mainmsg.epilogue = '' # make sure message ends with a newline

def saveSentMessage(self, fullText, saveMailSeparator):

"""

append sent message to local file if send worked for any;

client: pass separator used for your application, splits;

caveat: user may change the file at same time (unlikely);

"""

try:

sentfile = open(mailconfig.sentmailfile, 'a',

encoding=mailconfig.fetchEncoding) # 4E

if fullText[-1] != '\n': fullText += '\n'

sentfile.write(saveMailSeparator)

sentfile.write(fullText)

sentfile.close()

except:

964 | Chapter 13: Client-Side Scripting

self.trace('Could not save sent message') # not a show-stopper

def encodeHeader(self, headertext, unicodeencoding='utf-8'):

"""

4E: encode composed non-ascii message headers content per both email

and Unicode standards, according to an optional user setting or UTF-8;

header.encode adds line breaks in header string automatically if needed;

"""

try:

headertext.encode('ascii')

except:

try:

hdrobj = email.header.make_header([(headertext, unicodeencoding)])

headertext = hdrobj.encode()

except:

pass # auto splits into multiple cont lines if needed

return headertext # smtplib may fail if it won't encode to ascii

def encodeAddrHeader(self, headertext, unicodeencoding='utf-8'):

"""

4E: try to encode non-ASCII names in email addresess per email, MIME,

and Unicode standards; if this fails drop name and use just addr part;

if cannot even get addresses, try to decode as a whole, else smtplib

may run into errors when it tries to encode the entire mail as ASCII;

utf-8 default should work for most, as it formats code points broadly;

inserts newlines if too long or hdr.encode split names to multiple lines,

but this may not catch some lines longer than the cutoff (improve me);

as used, Message.as_string formatter won't try to break lines further;

see also decodeAddrHeader in mailParser module for the inverse of this;

"""

try:

pairs = email.utils.getaddresses([headertext]) # split addrs + parts

encoded = []

for name, addr in pairs:

try:

name.encode('ascii') # use as is if okay as ascii

except UnicodeError: # else try to encode name part

try:

uni = name.encode(unicodeencoding)

hdr = email.header.make_header([(uni, unicodeencoding)])

name = hdr.encode()

except:

name = None # drop name, use address part only

joined = email.utils.formataddr((name, addr)) # quote name if need

encoded.append(joined)

fullhdr = ', '.join(encoded)

if len(fullhdr) > 72 or '\n' in fullhdr: # not one short line?

fullhdr = ',\n '.join(encoded) # try multiple lines

return fullhdr

except:

return self.encodeHeader(headertext)

def authenticateServer(self, server):

The mailtools Utility Package | 965

pass # no login required for this server/class

def getPassword(self):

pass # no login required for this server/class

################################################################################

# specialized subclasses

################################################################################

class MailSenderAuth(MailSender):

"""

use for servers that require login authorization;

client: choose MailSender or MailSenderAuth super

class based on mailconfig.smtpuser setting (None?)

"""

smtpPassword = None # 4E: on class, not self, shared by poss N instances

def __init__(self, smtpserver=None, smtpuser=None):

MailSender.__init__(self, smtpserver)

self.smtpUser = smtpuser or mailconfig.smtpuser

#self.smtpPassword = None # 4E: makes PyMailGUI ask for each send!

def authenticateServer(self, server):

server.login(self.smtpUser, self.smtpPassword)

def getPassword(self):

"""

get SMTP auth password if not yet known;

may be called by superclass auto, or client manual:

not needed until send, but don't run in GUI thread;

get from client-side file or subclass method

"""

if not self.smtpPassword:

try:

localfile = open(mailconfig.smtppasswdfile)

MailSenderAuth.smtpPassword = localfile.readline()[:-1] # 4E

self.trace('local file password' + repr(self.smtpPassword))

except:

MailSenderAuth.smtpPassword = self.askSmtpPassword() # 4E

def askSmtpPassword(self):

assert False, 'Subclass must define method'

class MailSenderAuthConsole(MailSenderAuth):

def askSmtpPassword(self):

import getpass

prompt = 'Password for %s on %s?' % (self.smtpUser, self.smtpServerName)

return getpass.getpass(prompt)

class SilentMailSender(SilentMailTool, MailSender):

pass # replaces trace

966 | Chapter 13: Client-Side Scripting

MailFetcher Class

The class defined in Example 13-24 does the work of interfacing with a POP email

server—loading, deleting, and synchronizing. This class merits a few additional words

of explanation.

General usage

This module deals strictly in email text; parsing email after it has been fetched is dele-

gated to a different module in the package. Moreover, this module doesn’t cache already

loaded information; clients must add their own mail-retention tools if desired. Clients

must also provide password input methods or pass one in, if they cannot use the console

input subclass here (e.g., GUIs and web-based programs).

The loading and deleting tasks use the standard library poplib module in ways we saw

earlier in this chapter, but notice that there are interfaces for fetching just message

header text with the TOP action in POP if the mail server supports it. This can save

substantial time if clients need to fetch only basic details for an email index. In addition,

the header and full-text fetchers are equipped to load just mails newer than a particular

number (useful once an initial load is run), and to restrict fetches to a fixed-sized set of

the mostly recently arrived emails (useful for large inboxes with slow Internet access

or servers).

This module also supports the notion of progress indicators—for methods that perform

multiple downloads or deletions, callers may pass in a function that will be called as

each mail is processed. This function will receive the current and total step numbers.

It’s left up to the caller to render this in a GUI, console, or other user interface.

Unicode decoding for full mail text on fetches

Additionally, this module is where we apply the session-wide message bytes Unicode

decoding policy required for parsing, as discussed earlier in this chapter. This decoding

uses an encoding name user setting in the mailconfig module, followed by heuristics.

Because this decoding is performed immediately when a mail is fetched, all clients of

this package can assume message text is str Unicode strings—including any later pars-

ing, display, or save operations. In addition to the mailconfig setting, we also apply a

few guesses with common encoding types, though it’s not impossible that this may lead

to problems if mails decoded by guessing cannot be written to mail save fails using the

mailconfig setting.

As described, this session-wide approach to encodings is not ideal, but it can be adjusted

per client session and reflects the current limitations of email in Python 3.1—its parser

requires already decoded Unicode strings, but fetches return bytes. If this decoding

fails, as a last resort we attempt to decode headers only, as either ASCII (or other com-

mon format) text or the platform default, and insert an error message in the email

body—a heuristic that attempts to avoid killing clients with exceptions if possible (see

The mailtools Utility Package | 967

file _test-decoding.py in the examples package for a test of this logic). In practice, an 8-

bit Unicode encoding such as Latin-1 will probably suffice in most cases, because ASCII

was the original requirement of email standards.

In principle, we could try to search for encoding information in message headers if it’s

present, by parsing mails partially ourselves. We might then take a per-message instead

of per-session approach to decoding full text, and associate an encoding type with each

mail for later processing such as saves, though this raises further complications, as a

save file can have just one (compatible) encoding, not one per message. Moreover,

character sets in email headers may refer to individual components, not the entire

email’s text. Since most mails will conform to 7- or 8-bit standards, and since a future

email release will likely address this issue, extra complexity is probably not warranted

for this case in this book.

Also keep in mind that the Unicode decoding performed here is for the entire mail text

fetched from a server. Really, this is just one part of the email encoding story in the

Unicode-aware world of today. In addition:

• Payloads of parsed message parts may still be returned as bytes and require special

handling or further Unicode decoding (see the parser module ahead).

• Text parts and attachments in composed mails impose encoding choices as well

(see the sender module earlier).

• Message headers have their own encoding conventions, and may be both MIME

and Unicode encoded if Internationalized (see both the parser and sender

modules).

Inbox synchronization tools

When you start studying this example, you’ll also notice that Example 13-24 devotes

substantial code to detecting synchronization errors between an email list held by a

client and the current state of the inbox at the POP email server. Normally, POP assigns

relative message numbers to email in the inbox, and only adds newly arrived emails to

the end of the inbox. As a result, relative message numbers from an earlier fetch may

usually be used to delete and fetch in the future.

However, although rare, it is not impossible for the server’s inbox to change in ways

that invalidate previously fetched message numbers. For instance, emails may be de-

leted in another client, and the server itself may move mails from the inbox to an un-

deliverable state on download errors (this may vary per ISP). In both cases, email may

be removed from the middle of the inbox, throwing some prior relative message num-

bers out of sync with the server.

This situation can result in fetching the wrong message in an email client—users receive

a different message than the one they thought they had selected. Worse, this can make

deletions inaccurate—if a mail client uses a relative message number in a delete request,

the wrong mail may be deleted if the inbox has changed since the index was fetched.

968 | Chapter 13: Client-Side Scripting

To assist clients, Example 13-24 includes tools, which match message headers on de-

letions to ensure accuracy and perform general inbox synchronization tests on demand.

These tools are useful only to clients that retain the fetched email list as state informa-

tion. We’ll use these in the PyMailGUI client in Chapter 14. There, deletions use the

safe interface, and loads run the on-demand synchronization test; on detection of syn-

chronization errors, the inbox index is automatically reloaded. For now, see Exam-

ple 13-24 source code and comments for more details.

Note that the synchronization tests try a variety of matching techniques, but require

the complete headers text and, in the worst case, must parse headers and match many

header fields. In many cases, the single previously fetched message-id header field

would be sufficient for matching against messages in the server’s inbox. However, be-

cause this field is optional and can be forged to have any value, it might not always be

a reliable way to identify messages. In other words, a same-valued message-id may not

suffice to guarantee a match, although it can be used to identify a mismatch; in Exam-

ple 13-24, the message-id is used to rule out a match if either message has one, and they

differ in value. This test is performed before falling back on slower parsing and multiple

header matches.

Example 13-24. PP4E\Internet\Email\mailtools\mailFetcher.py

"""

###############################################################################

retrieve, delete, match mail from a POP server (see __init__ for docs, test)

###############################################################################

"""

import poplib, mailconfig, sys # client's mailconfig on sys.path

print('user:', mailconfig.popusername) # script dir, pythonpath, changes

from .mailParser import MailParser # for headers matching (4E: .)

from .mailTool import MailTool, SilentMailTool # trace control supers (4E: .)

# index/server msgnum out of synch tests

class DeleteSynchError(Exception): pass # msg out of synch in del

class TopNotSupported(Exception): pass # can't run synch test

class MessageSynchError(Exception): pass # index list out of sync

class MailFetcher(MailTool):

"""

fetch mail: connect, fetch headers+mails, delete mails

works on any machine with Python+Inet; subclass me to cache

implemented with the POP protocol; IMAP requires new class;

4E: handles decoding of full mail text on fetch for parser;

"""

def __init__(self, popserver=None, popuser=None, poppswd=None, hastop=True):

self.popServer = popserver or mailconfig.popservername

self.popUser = popuser or mailconfig.popusername

self.srvrHasTop = hastop

self.popPassword = poppswd # ask later if None

The mailtools Utility Package | 969

def connect(self):

self.trace('Connecting...')

self.getPassword() # file, GUI, or console

server = poplib.POP3(self.popServer)

server.user(self.popUser) # connect,login POP server

server.pass_(self.popPassword) # pass is a reserved word

self.trace(server.getwelcome()) # print returned greeting

return server

# use setting in client's mailconfig on import search path;

# to tailor, this can be changed in class or per instance;

fetchEncoding = mailconfig.fetchEncoding

def decodeFullText(self, messageBytes):

"""

4E, Py3.1: decode full fetched mail text bytes to str Unicode string;

done at fetch, for later display or parsing (full mail text is always

Unicode thereafter); decode with per-class or per-instance setting, or

common types; could also try headers inspection, or intelligent guess

from structure; in Python 3.2/3.3, this step may not be required: if so,

change to return message line list intact; for more details see Chapter 13;

an 8-bit encoding such as latin-1 will likely suffice for most emails, as

ASCII is the original standard; this method applies to entire/full message

text, which is really just one part of the email encoding story: Message

payloads and Message headers may also be encoded per email, MIME, and

Unicode standards; see Chapter 13 and mailParser and mailSender for more;

"""

text = None

kinds = [self.fetchEncoding] # try user setting first

kinds += ['ascii', 'latin1', 'utf8'] # then try common types

kinds += [sys.getdefaultencoding()] # and platform dflt (may differ)

for kind in kinds: # may cause mail saves to fail

try:

text = [line.decode(kind) for line in messageBytes]

break

except (UnicodeError, LookupError): # LookupError: bad name

pass

if text == None:

# try returning headers + error msg, else except may kill client;

# still try to decode headers per ascii, other, platform default;

blankline = messageBytes.index(b'')

hdrsonly = messageBytes[:blankline]

commons = ['ascii', 'latin1', 'utf8']

for common in commons:

try:

text = [line.decode(common) for line in hdrsonly]

break

except UnicodeError:

pass

else: # none worked

try:

text = [line.decode() for line in hdrsonly] # platform dflt?

970 | Chapter 13: Client-Side Scripting

except UnicodeError:

text = ['From: (sender of unknown Unicode format headers)']

text += ['', '--Sorry: mailtools cannot decode this mail content!--']

return text

def downloadMessage(self, msgnum):

"""

load full raw text of one mail msg, given its

POP relative msgnum; caller must parse content

"""

self.trace('load ' + str(msgnum))

server = self.connect()

try:

resp, msglines, respsz = server.retr(msgnum)

finally:

server.quit()

msglines = self.decodeFullText(msglines) # raw bytes to Unicode str

return '\n'.join(msglines) # concat lines for parsing

def downloadAllHeaders(self, progress=None, loadfrom=1):

"""

get sizes, raw header text only, for all or new msgs

begins loading headers from message number loadfrom

use loadfrom to load newly arrived mails only

use downloadMessage to get a full msg text later

progress is a function called with (count, total);

returns: [headers text], [mail sizes], loadedfull?

4E: add mailconfig.fetchlimit to support large email

inboxes: if not None, only fetches that many headers,

and returns others as dummy/empty mail; else inboxes

like one of mine (4K emails) are not practical to use;

4E: pass loadfrom along to downloadAllMsgs (a buglet);

"""

if not self.srvrHasTop: # not all servers support TOP

# naively load full msg text

return self.downloadAllMsgs(progress, loadfrom)

else:

self.trace('loading headers')

fetchlimit = mailconfig.fetchlimit

server = self.connect() # mbox now locked until quit

try:

resp, msginfos, respsz = server.list() # 'num size' lines list

msgCount = len(msginfos) # alt to srvr.stat[0]

msginfos = msginfos[loadfrom-1:] # drop already loadeds

allsizes = [int(x.split()[1]) for x in msginfos]

allhdrs = []

for msgnum in range(loadfrom, msgCount+1): # poss empty

if progress: progress(msgnum, msgCount) # run callback

if fetchlimit and (msgnum <= msgCount - fetchlimit):

# skip, add dummy hdrs

hdrtext = 'Subject: --mail skipped--\n\n'

allhdrs.append(hdrtext)

else:

# fetch, retr hdrs only

The mailtools Utility Package | 971

resp, hdrlines, respsz = server.top(msgnum, 0)

hdrlines = self.decodeFullText(hdrlines)

allhdrs.append('\n'.join(hdrlines))

finally:

server.quit() # make sure unlock mbox

assert len(allhdrs) == len(allsizes)

self.trace('load headers exit')

return allhdrs, allsizes, False

def downloadAllMessages(self, progress=None, loadfrom=1):

"""

load full message text for all msgs from loadfrom..N,

despite any caching that may be being done in the caller;

much slower than downloadAllHeaders, if just need hdrs;

4E: support mailconfig.fetchlimit: see downloadAllHeaders;

could use server.list() to get sizes of skipped emails here

too, but clients probably don't care about these anyhow;

"""

self.trace('loading full messages')

fetchlimit = mailconfig.fetchlimit

server = self.connect()

try:

(msgCount, msgBytes) = server.stat() # inbox on server

allmsgs = []

allsizes = []

for i in range(loadfrom, msgCount+1): # empty if low >= high

if progress: progress(i, msgCount)

if fetchlimit and (i <= msgCount - fetchlimit):

# skip, add dummy mail

mailtext = 'Subject: --mail skipped--\n\nMail skipped.\n'

allmsgs.append(mailtext)

allsizes.append(len(mailtext))

else:

# fetch, retr full mail

(resp, message, respsz) = server.retr(i) # save text on list

message = self.decodeFullText(message)

allmsgs.append('\n'.join(message)) # leave mail on server

allsizes.append(respsz) # diff from len(msg)

finally:

server.quit() # unlock the mail box

assert len(allmsgs) == (msgCount - loadfrom) + 1 # msg nums start at 1

#assert sum(allsizes) == msgBytes # not if loadfrom > 1

return allmsgs, allsizes, True # not if fetchlimit

def deleteMessages(self, msgnums, progress=None):

"""

delete multiple msgs off server; assumes email inbox

unchanged since msgnums were last determined/loaded;

use if msg headers not available as state information;

fast, but poss dangerous: see deleteMessagesSafely

"""

self.trace('deleting mails')

server = self.connect()

try:

972 | Chapter 13: Client-Side Scripting

for (ix, msgnum) in enumerate(msgnums): # don't reconnect for each

if progress: progress(ix+1, len(msgnums))

server.dele(msgnum)

finally: # changes msgnums: reload

server.quit()

def deleteMessagesSafely(self, msgnums, synchHeaders, progress=None):

"""

delete multiple msgs off server, but use TOP fetches to

check for a match on each msg's header part before deleting;

assumes the email server supports the TOP interface of POP,

else raises TopNotSupported - client may call deleteMessages;

use if the mail server might change the inbox since the email

index was last fetched, thereby changing POP relative message

numbers; this can happen if email is deleted in a different

client; some ISPs may also move a mail from inbox to the

undeliverable box in response to a failed download;

synchHeaders must be a list of already loaded mail hdrs text,

corresponding to selected msgnums (requires state); raises

exception if any out of synch with the email server; inbox is

locked until quit, so it should not change between TOP check

and actual delete: synch check must occur here, not in caller;

may be enough to call checkSynchError+deleteMessages, but check

each msg here in case deletes and inserts in middle of inbox;

"""

if not self.srvrHasTop:

raise TopNotSupported('Safe delete cancelled')

self.trace('deleting mails safely')

errmsg = 'Message %s out of synch with server.\n'

errmsg += 'Delete terminated at this message.\n'

errmsg += 'Mail client may require restart or reload.'

server = self.connect() # locks inbox till quit

try: # don't reconnect for each

(msgCount, msgBytes) = server.stat() # inbox size on server

for (ix, msgnum) in enumerate(msgnums):

if progress: progress(ix+1, len(msgnums))

if msgnum > msgCount: # msgs deleted

raise DeleteSynchError(errmsg % msgnum)

resp, hdrlines, respsz = server.top(msgnum, 0) # hdrs only

hdrlines = self.decodeFullText(hdrlines)

msghdrs = '\n'.join(hdrlines)

if not self.headersMatch(msghdrs, synchHeaders[msgnum-1]):

raise DeleteSynchError(errmsg % msgnum)

else:

server.dele(msgnum) # safe to delete this msg

finally: # changes msgnums: reload

server.quit() # unlock inbox on way out

def checkSynchError(self, synchHeaders):

"""

check to see if already loaded hdrs text in synchHeaders

The mailtools Utility Package | 973

list matches what is on the server, using the TOP command in

POP to fetch headers text; use if inbox can change due to

deletes in other client, or automatic action by email server;

raises except if out of synch, or error while talking to server;

for speed, only checks last in last: this catches inbox deletes,

but assumes server won't insert before last (true for incoming

mails); check inbox size first: smaller if just deletes; else

top will differ if deletes and newly arrived messages added at

end; result valid only when run: inbox may change after return;

"""

self.trace('synch check')

errormsg = 'Message index out of synch with mail server.\n'

errormsg += 'Mail client may require restart or reload.'

server = self.connect()

try:

lastmsgnum = len(synchHeaders) # 1..N

(msgCount, msgBytes) = server.stat() # inbox size

if lastmsgnum > msgCount: # fewer now?

raise MessageSynchError(errormsg) # none to cmp

if self.srvrHasTop:

resp, hdrlines, respsz = server.top(lastmsgnum, 0) # hdrs only

hdrlines = self.decodeFullText(hdrlines)

lastmsghdrs = '\n'.join(hdrlines)

if not self.headersMatch(lastmsghdrs, synchHeaders[-1]):

raise MessageSynchError(errormsg)

finally:

server.quit()

def headersMatch(self, hdrtext1, hdrtext2):

""""

may not be as simple as a string compare: some servers add

a "Status:" header that changes over time; on one ISP, it

begins as "Status: U" (unread), and changes to "Status: RO"

(read, old) after fetched once - throws off synch tests if

new when index fetched, but have been fetched once before

delete or last-message check; "Message-id:" line is unique

per message in theory, but optional, and can be anything if

forged; match more common: try first; parsing costly: try last

"""

# try match by simple string compare

if hdrtext1 == hdrtext2:

self.trace('Same headers text')

return True

# try match without status lines

split1 = hdrtext1.splitlines() # s.split('\n'), but no final ''

split2 = hdrtext2.splitlines()

strip1 = [line for line in split1 if not line.startswith('Status:')]

strip2 = [line for line in split2 if not line.startswith('Status:')]

if strip1 == strip2:

self.trace('Same without Status')

return True

# try mismatch by message-id headers if either has one

974 | Chapter 13: Client-Side Scripting

msgid1 = [line for line in split1 if line[:11].lower() == 'message-id:']

msgid2 = [line for line in split2 if line[:11].lower() == 'message-id:']

if (msgid1 or msgid2) and (msgid1 != msgid2):

self.trace('Different Message-Id')

return False

# try full hdr parse and common headers if msgid missing or trash

tryheaders = ('From', 'To', 'Subject', 'Date')

tryheaders += ('Cc', 'Return-Path', 'Received')

msg1 = MailParser().parseHeaders(hdrtext1)

msg2 = MailParser().parseHeaders(hdrtext2)

for hdr in tryheaders: # poss multiple Received

if msg1.get_all(hdr) != msg2.get_all(hdr): # case insens, dflt None

self.trace('Diff common headers')

return False

# all common hdrs match and don't have a diff message-id

self.trace('Same common headers')

return True

def getPassword(self):

"""

get POP password if not yet known

not required until go to server

from client-side file or subclass method

"""

if not self.popPassword:

try:

localfile = open(mailconfig.poppasswdfile)

self.popPassword = localfile.readline()[:-1]

self.trace('local file password' + repr(self.popPassword))

except:

self.popPassword = self.askPopPassword()

def askPopPassword(self):

assert False, 'Subclass must define method'

################################################################################

# specialized subclasses

################################################################################

class MailFetcherConsole(MailFetcher):

def askPopPassword(self):

import getpass

prompt = 'Password for %s on %s?' % (self.popUser, self.popServer)

return getpass.getpass(prompt)

class SilentMailFetcher(SilentMailTool, MailFetcher):

pass # replaces trace

The mailtools Utility Package | 975

MailParser Class

Example 13-25 implements the last major class in the mailtools package—given the

(already decoded) text of an email message, its tools parse the mail’s content into a

message object, with headers and decoded parts. This module is largely just a wrapper

around the standard library’s email package, but it adds convenience tools—finding

the main text part of a message, filename generation for message parts, saving attached

parts to files, decoding headers, splitting address lists, and so on. See the code for more

information. Also notice the parts walker here: by coding its search logic in one place

as a generator function, we guarantee that all its three clients here, as well as any others

elsewhere, implement the same traversal.

Unicode decoding for text part payloads and message headers

This module also provides support for decoding message headers per email standards

(both full headers and names in address headers), and handles decoding per text part

encodings. Headers are decoded according to their content, using tools in the email

package; the headers themselves give their MIME and Unicode encodings, so no user

intervention is required. For client convenience, we also perform Unicode decoding for

main text parts to convert them from bytes to str here if needed.

The latter main-text decoding merits elaboration. As discussed earlier in this chapter,

Message objects (main or attached) may return their payloads as bytes if we fetch with

a decode=1 argument, or if they are bytes to begin with; in other cases, payloads may

be returned as str. We generally need to decode bytes in order to treat payloads as text.

In mailtools itself, str text part payloads are automatically encoded to bytes by

decode=1 and then saved to binary-mode files to finesse encoding issues, but main-text

payloads are decoded to str if they are bytes. This main-text decoding is performed

per the encoding name in the part’s message header (if present and correct), the plat-

form default, or a guess. As we learned in Chapter 9, while GUIs may allow bytes for

display, str text generally provides broader Unicode support; furthermore, str is

sometimes needed for later processing such as line wrapping and webpage generation.

Since this package can’t predict the role of other part payloads besides the main text,

clients are responsible for decoding and encoding as necessary. For instance, other text

parts which are saved in binary mode here may require that message headers be

consulted later to extract Unicode encoding names for better display. For example,

Chapter 14’s PyMailGUI will proceed this way to open text parts on demand, passing

message header encoding information on to PyEdit for decoding as text is loaded.

Some of the to-text conversions performed here are potentially partial solutions (some

parts may lack the required headers and fail per the platform defaults) and may need

to be improved; since this seems likely to be addressed in a future release of Python’s

email package, we’ll settle for our assumptions here.

976 | Chapter 13: Client-Side Scripting

Example 13-25. PP4E\Internet\Email\mailtools\mailParser.py

"""

###############################################################################

parsing and attachment extract, analyse, save (see __init__ for docs, test)

###############################################################################

"""

import os, mimetypes, sys # mime: map type to name

import email.parser # parse text to Message object

import email.header # 4E: headers decode/encode

import email.utils # 4E: addr header parse/decode

from email.message import Message # Message may be traversed

from .mailTool import MailTool # 4E: package-relative

class MailParser(MailTool):

"""

methods for parsing message text, attachments

subtle thing: Message object payloads are either a simple

string for non-multipart messages, or a list of Message

objects if multipart (possibly nested); we don't need to

distinguish between the two cases here, because the Message

walk generator always returns self first, and so works fine

on non-multipart messages too (a single object is walked);

for simple messages, the message body is always considered

here to be the sole part of the mail; for multipart messages,

the parts list includes the main message text, as well as all

attachments; this allows simple messages not of type text to

be handled like attachments in a UI (e.g., saved, opened);

Message payload may also be None for some oddball part types;

4E note: in Py 3.1, text part payloads are returned as bytes

for decode=1, and might be str otherwise; in mailtools, text

is stored as bytes for file saves, but main-text bytes payloads

are decoded to Unicode str per mail header info or platform

default+guess; clients may need to convert other payloads:

PyMailGUI uses headers to decode parts saved to binary files;

4E supports fetched message header auto-decoding per its own

content, both for general headers such as Subject, as well as

for names in address header such as From and To; client must

request this after parse, before display: parser doesn't decode;

"""

def walkNamedParts(self, message):

"""

generator to avoid repeating part naming logic;

skips multipart headers, makes part filenames;

message is already parsed email.message.Message object;

doesn't skip oddball types: payload may be None, must

handle in part saves; some others may warrant skips too;

"""

for (ix, part) in enumerate(message.walk()): # walk includes message

fulltype = part.get_content_type() # ix includes parts skipped

The mailtools Utility Package | 977

maintype = part.get_content_maintype()

if maintype == 'multipart': # multipart/*: container

continue

elif fulltype == 'message/rfc822': # 4E: skip message/rfc822

continue # skip all message/* too?

else:

filename, contype = self.partName(part, ix)

yield (filename, contype, part)

def partName(self, part, ix):

"""

extract filename and content type from message part;

filename: tries Content-Disposition, then Content-Type

name param, or generates one based on mimetype guess;

"""

filename = part.get_filename() # filename in msg hdrs?

contype = part.get_content_type() # lowercase maintype/subtype

if not filename:

filename = part.get_param('name') # try content-type name

if not filename:

if contype == 'text/plain': # hardcode plain text ext

ext = '.txt' # else guesses .ksh!

else:

ext = mimetypes.guess_extension(contype)

if not ext: ext = '.bin' # use a generic default

filename = 'part-%03d%s' % (ix, ext)

return (filename, contype)

def saveParts(self, savedir, message):

"""

store all parts of a message as files in a local directory;

returns [('maintype/subtype', 'filename')] list for use by

callers, but does not open any parts or attachments here;

get_payload decodes base64, quoted-printable, uuencoded data;

mail parser may give us a None payload for oddball types we

probably should skip over: convert to str here to be safe;

"""

if not os.path.exists(savedir):

os.mkdir(savedir)

partfiles = []

for (filename, contype, part) in self.walkNamedParts(message):

fullname = os.path.join(savedir, filename)

fileobj = open(fullname, 'wb') # use binary mode

content = part.get_payload(decode=1) # decode base64,qp,uu

if not isinstance(content, bytes): # 4E: need bytes for rb

content = b'(no content)' # decode=1 returns bytes,

fileobj.write(content) # but some payloads None

fileobj.close() # 4E: not str(content)

partfiles.append((contype, fullname)) # for caller to open

return partfiles

def saveOnePart(self, savedir, partname, message):

"""

ditto, but find and save just one part by name

"""

978 | Chapter 13: Client-Side Scripting

if not os.path.exists(savedir):

os.mkdir(savedir)

fullname = os.path.join(savedir, partname)

(contype, content) = self.findOnePart(partname, message)

if not isinstance(content, bytes): # 4E: need bytes for rb

content = b'(no content)' # decode=1 returns bytes,

open(fullname, 'wb').write(content) # but some payloads None

return (contype, fullname) # 4E: not str(content)

def partsList(self, message):

""""

return a list of filenames for all parts of an

already parsed message, using same filename logic

as saveParts, but do not store the part files here

"""

validParts = self.walkNamedParts(message)

return [filename for (filename, contype, part) in validParts]

def findOnePart(self, partname, message):

"""

find and return part's content, given its name;

intended to be used in conjunction with partsList;

we could also mimetypes.guess_type(partname) here;

we could also avoid this search by saving in dict;

4E: content may be str or bytes--convert as needed;

"""

for (filename, contype, part) in self.walkNamedParts(message):

if filename == partname:

content = part.get_payload(decode=1) # does base64,qp,uu

return (contype, content) # may be bytes text

def decodedPayload(self, part, asStr=True):

"""

4E: decode text part bytes to Unicode str for display, line wrap,

etc.; part is a Message; (decode=1) undoes MIME email encodings

(base64, uuencode, qp), bytes.decode() performs additional Unicode

text string decodings; tries charset encoding name in message

headers first (if present, and accurate), then tries platform

defaults and a few guesses before giving up with error string;

"""

payload = part.get_payload(decode=1) # payload may be bytes

if asStr and isinstance(payload, bytes): # decode=1 returns bytes

tries = []

enchdr = part.get_content_charset() # try msg headers first!

if enchdr:

tries += [enchdr] # try headers first

tries += [sys.getdefaultencoding()] # same as bytes.decode()

tries += ['latin1', 'utf8'] # try 8-bit, incl ascii

for trie in tries: # try utf8 (windows dflt)

try:

payload = payload.decode(trie) # give it a shot, eh?

break

except (UnicodeError, LookupError): # lookuperr: bad name

pass

else:

The mailtools Utility Package | 979

payload = '--Sorry: cannot decode Unicode text--'

return payload

def findMainText(self, message, asStr=True):

"""

for text-oriented clients, return first text part's str;

for the payload of a simple message, or all parts of

a multipart message, looks for text/plain, then text/html,

then text/*, before deducing that there is no text to

display; this is a heuristic, but covers most simple,

multipart/alternative, and multipart/mixed messages;

content-type defaults to text/plain if not in simple msg;

handles message nesting at top level by walking instead

of list scans; if non-multipart but type is text/html,

returns the HTML as the text with an HTML type: caller

may open in web browser, extract plain text, etc; if

nonmultipart and not text, there is no text to display:

save/open message content in UI; caveat: does not try

to concatenate multiple inline text/plain parts if any;

4E: text payloads may be bytes--decodes to str here;

4E: asStr=False to get raw bytes for HTML file saves;

"""

# try to find a plain text

for part in message.walk(): # walk visits message

type = part.get_content_type() # if nonmultipart

if type == 'text/plain': # may be base64,qp,uu

return type, self.decodedPayload(part, asStr) # bytes to str too?

# try to find an HTML part

for part in message.walk():

type = part.get_content_type() # caller renders html

if type == 'text/html':

return type, self.decodedPayload(part, asStr)

# try any other text type, including XML

for part in message.walk():

if part.get_content_maintype() == 'text':

return part.get_content_type(), self.decodedPayload(part, asStr)

# punt: could use first part, but it's not marked as text

failtext = '[No text to display]' if asStr else b'[No text to display]'

return 'text/plain', failtext

def decodeHeader(self, rawheader):

"""

4E: decode existing i18n message header text per both email and Unicode

standards, according to its content; return as is if unencoded or fails;

client must call this to display: parsed Message object does not decode;

i18n header example: '=?UTF-8?Q?Introducing=20Top=20Values=20..Savers?=';

i18n header example: 'Man where did you get that =?UTF-8?Q?assistant=3F?=';

decode_header handles any line breaks in header string automatically, may

return multiple parts if any substrings of hdr are encoded, and returns all

980 | Chapter 13: Client-Side Scripting

bytes in parts list if any encodings found (with unencoded parts encoded as

raw-unicode-escape and enc=None) but returns a single part with enc=None

that is str instead of bytes in Py3.1 if the entire header is unencoded

(must handle mixed types here); see Chapter 13 for more details/examples;

the following first attempt code was okay unless any encoded substrings, or

enc was returned as None (raised except which returned rawheader unchanged):

hdr, enc = email.header.decode_header(rawheader)[0]

return hdr.decode(enc) # fails if enc=None: no encoding or encoded substrs

"""

try:

parts = email.header.decode_header(rawheader)

decoded = []

for (part, enc) in parts: # for all substrings

if enc == None: # part unencoded?

if not isinstance(part, bytes): # str: full hdr unencoded

decoded += [part] # else do unicode decode

else:

decoded += [part.decode('raw-unicode-escape')]

else:

decoded += [part.decode(enc)]

return ' '.join(decoded)

except:

return rawheader # punt!

def decodeAddrHeader(self, rawheader):

"""

4E: decode existing i18n address header text per email and Unicode,

according to its content; must parse out first part of email address

to get i18n part: '"=?UTF-8?Q?Walmart?=" <newsletters@walmart.com>';

From will probably have just 1 addr, but To, Cc, Bcc may have many;

decodeHeader handles nested encoded substrings within an entire hdr,

but we can't simply call it for entire hdr here because it fails if

encoded name substring ends in " quote instead of whitespace or endstr;

see also encodeAddrHeader in mailSender module for the inverse of this;

the following first attempt code failed to handle encoded substrings in

name, and raised exc for unencoded bytes parts if any encoded substrings;

namebytes, nameenc = email.header.decode_header(name)[0] (do email+MIME)

if nameenc: name = namebytes.decode(nameenc) (do Unicode?)

"""

try:

pairs = email.utils.getaddresses([rawheader]) # split addrs and parts

decoded = [] # handles name commas

for (name, addr) in pairs:

try:

name = self.decodeHeader(name) # email+MIME+Uni

except:

name = None # but uses encooded name if exc in decodeHeader

joined = email.utils.formataddr((name, addr)) # join parts

decoded.append(joined)

return ', '.join(decoded) # >= 1 addrs

except:

return self.decodeHeader(rawheader) # try decoding entire string

The mailtools Utility Package | 981

def splitAddresses(self, field):

"""

4E: use comma separator for multiple addrs in the UI, and

getaddresses to split correctly and allow for comma in the

name parts of addresses; used by PyMailGUI to split To, Cc,

Bcc as needed for user inputs and copied headers; returns

empty list if field is empty, or any exception occurs;

"""

try:

pairs = email.utils.getaddresses([field]) # [(name,addr)]

return [email.utils.formataddr(pair) for pair in pairs] # [name <addr>]

except:

return '' # syntax error in user-entered field?, etc.

# returned when parses fail

errorMessage = Message()

errorMessage.set_payload('[Unable to parse message - format error]')

def parseHeaders(self, mailtext):

"""

parse headers only, return root email.message.Message object

stops after headers parsed, even if nothing else follows (top)

email.message.Message object is a mapping for mail header fields

payload of message object is None, not raw body text

"""

try:

return email.parser.Parser().parsestr(mailtext, headersonly=True)

except:

return self.errorMessage

def parseMessage(self, fulltext):

"""

parse entire message, return root email.message.Message object

payload of message object is a string if not is_multipart()

payload of message object is more Messages if multiple parts

the call here same as calling email.message_from_string()

"""

try:

return email.parser.Parser().parsestr(fulltext) # may fail!

except:

return self.errorMessage # or let call handle? can check return

def parseMessageRaw(self, fulltext):

"""

parse headers only, return root email.message.Message object

stops after headers parsed, for efficiency (not yet used here)

payload of message object is raw text of mail after headers

"""

try:

return email.parser.HeaderParser().parsestr(fulltext)

except:

return self.errorMessage

982 | Chapter 13: Client-Side Scripting

Self-Test Script

The last file in the mailtools package, Example 13-26, lists the self-test code for the

package. This code is a separate script file, in order to allow for import search path

manipulation—it emulates a real client, which is assumed to have a mailconfig.py

module in its own source directory (this module can vary per client).

Example 13-26. PP4E\Internet\Email\mailtools\selftest.py

"""

###############################################################################

self-test when this file is run as a program

###############################################################################

"""

# mailconfig normally comes from the client's source directory or

# sys.path; for testing, get it from Email directory one level up

import sys

sys.path.append('..')

import mailconfig

print('config:', mailconfig.__file__)

# get these from __init__

from mailtools import (MailFetcherConsole,

MailSender, MailSenderAuthConsole,

MailParser)

if not mailconfig.smtpuser:

sender = MailSender(tracesize=5000)

else:

sender = MailSenderAuthConsole(tracesize=5000)

sender.sendMessage(From = mailconfig.myaddress,

To = [mailconfig.myaddress],

Subj = 'testing mailtools package',

extrahdrs = [('X-Mailer', 'mailtools')],

bodytext = 'Here is my source code\n',

attaches = ['selftest.py'],

)

# bodytextEncoding='utf-8', # other tests to try

# attachesEncodings=['latin-1'], # inspect text headers

# attaches=['monkeys.jpg']) # verify Base64 encoded

# to='i18n adddr list...', # test mime/unicode headers

# change mailconfig to test fetchlimit

fetcher = MailFetcherConsole()

def status(*args): print(args)

hdrs, sizes, loadedall = fetcher.downloadAllHeaders(status)

for num, hdr in enumerate(hdrs[:5]):

print(hdr)

The mailtools Utility Package | 983

if input('load mail?') in ['y', 'Y']:

print(fetcher.downloadMessage(num+1).rstrip(), '\n', '-'*70)

last5 = len(hdrs)-4

msgs, sizes, loadedall = fetcher.downloadAllMessages(status, loadfrom=last5)

for msg in msgs:

print(msg[:200], '\n', '-'*70)

parser = MailParser()

for i in [0]: # try [0 , len(msgs)]

fulltext = msgs[i]

message = parser.parseMessage(fulltext)

ctype, maintext = parser.findMainText(message)

print('Parsed:', message['Subject'])

print(maintext)

input('Press Enter to exit') # pause if clicked on Windows

Running the self-test

Here’s a run of the self-test script; it generates a lot of output, most of which has been

deleted here for presentation in this book—as usual, run this on your own for further

details:

C:\...\PP4E\Internet\Email\mailtools> selftest.py

config: ..\mailconfig.py

user: PP4E@learning-python.com

Adding text/x-python

Sending to...['PP4E@learning-python.com']

Content-Type: multipart/mixed; boundary="===============0085314748=="

MIME-Version: 1.0

From: PP4E@learning-python.com

To: PP4E@learning-python.com

Subject: testing mailtools package

Date: Sat, 08 May 2010 19:26:22 −0000

X-Mailer: mailtools

A multi-part MIME format message.

--===============0085314748==

Content-Type: text/plain; charset="us-ascii"

MIME-Version: 1.0

Content-Transfer-Encoding: 7bit

Here is my source code

--===============0085314748==

Content-Type: text/x-python; charset="us-ascii"

MIME-Version: 1.0

Content-Transfer-Encoding: 7bit

Content-Disposition: attachment; filename="selftest.py"

"""

###############################################################################

self-test when this file is run as a program

###############################################################################

984 | Chapter 13: Client-Side Scripting

"""

...more lines omitted...

print(maintext)

input('Press Enter to exit') # pause if clicked on Windows

--===============0085314748==--

Send exit

loading headers

Connecting...

Password for PP4E@learning-python.com on pop.secureserver.net?

b'+OK <28121.1273346862@p3pop01-07.prod.phx3.gdg>'

(1, 7)

(2, 7)

(3, 7)

(4, 7)

(5, 7)

(6, 7)

(7, 7)

load headers exit

Received: (qmail 7690 invoked from network); 5 May 2010 15:29:43 −0000

Received: from unknown (HELO p3pismtp01-026.prod.phx3.secureserver.net) ([10.6.1

...more lines omitted...

load mail?y

load 1

Connecting...

b'+OK <29205.1273346957@p3pop01-10.prod.phx3.gdg>'

Received: (qmail 7690 invoked from network); 5 May 2010 15:29:43 −0000

Received: from unknown (HELO p3pismtp01-026.prod.phx3.secureserver.net) ([10.6.1

...more lines omitted...

load mail?

loading full messages

Connecting...

b'+OK <31655.1273347055@p3pop01-25.prod.phx3.secureserver.net>'

(3, 7)

(4, 7)

(5, 7)

(6, 7)

(7, 7)

Received: (qmail 25683 invoked from network); 6 May 2010 14:12:07 −0000

Received: from unknown (HELO p3pismtp01-018.prod.phx3.secureserver.net) ([10.6.1

...more lines omitted...

Parsed: A B C D E F G

Fiddle de dum, Fiddle de dee,

Eric the half a bee.

Press Enter to exit

The mailtools Utility Package | 985

Updating the pymail Console Client

As a final email example in this chapter, and to give a better use case for the mail

tools module package of the preceding sections, Example 13-27 provides an updated

version of the pymail program we met earlier (Example 13-20). It uses our mailtools

package to access email, instead of interfacing with Python’s email package directly.

Compare its code to the original pymail in this chapter to see how mailtools is employed

here. You’ll find that its mail download and send logic is substantially simpler.

Example 13-27. PP4E\Internet\Email\pymail2.py

#!/usr/local/bin/python

"""

################################################################################

pymail2 - simple console email interface client in Python; this version uses

the mailtools package, which in turn uses poplib, smtplib, and the email package

for parsing and composing emails; displays first text part of mails, not the

entire full text; fetches just mail headers initially, using the TOP command;

fetches full text of just email selected to be displayed; caches already

fetched mails; caveat: no way to refresh index; uses standalone mailtools

objects - they can also be used as superclasses;

################################################################################

"""

import mailconfig, mailtools

from pymail import inputmessage

mailcache = {}

def fetchmessage(i):

try:

fulltext = mailcache[i]

except KeyError:

fulltext = fetcher.downloadMessage(i)

mailcache[i] = fulltext

return fulltext

def sendmessage():

From, To, Subj, text = inputmessage()

sender.sendMessage(From, To, Subj, [], text, attaches=None)

def deletemessages(toDelete, verify=True):

print('To be deleted:', toDelete)

if verify and input('Delete?')[:1] not in ['y', 'Y']:

print('Delete cancelled.')

else:

print('Deleting messages from server...')

fetcher.deleteMessages(toDelete)

def showindex(msgList, msgSizes, chunk=5):

count = 0

for (msg, size) in zip(msgList, msgSizes): # email.message.Message, int

count += 1 # 3.x iter ok here

print('%d:\t%d bytes' % (count, size))

for hdr in ('From', 'To', 'Date', 'Subject'):

986 | Chapter 13: Client-Side Scripting

print('\t%-8s=>%s' % (hdr, msg.get(hdr, '(unknown)')))

if count % chunk == 0:

input('[Press Enter key]') # pause after each chunk

def showmessage(i, msgList):

if 1 <= i <= len(msgList):

fulltext = fetchmessage(i)

message = parser.parseMessage(fulltext)

ctype, maintext = parser.findMainText(message)

print('-' * 79)

print(maintext.rstrip() + '\n') # main text part, not entire mail

print('-' * 79) # and not any attachments after

else:

print('Bad message number')

def savemessage(i, mailfile, msgList):

if 1 <= i <= len(msgList):

fulltext = fetchmessage(i)

savefile = open(mailfile, 'a', encoding=mailconfig.fetchEncoding) # 4E

savefile.write('\n' + fulltext + '-'*80 + '\n')

else:

print('Bad message number')

def msgnum(command):

try:

return int(command.split()[1])

except:

return −1 # assume this is bad

helptext = """

Available commands:

i - index display

l n? - list all messages (or just message n)

d n? - mark all messages for deletion (or just message n)

s n? - save all messages to a file (or just message n)

m - compose and send a new mail message

q - quit pymail

? - display this help text

"""

def interact(msgList, msgSizes, mailfile):

showindex(msgList, msgSizes)

toDelete = []

while True:

try:

command = input('[Pymail] Action? (i, l, d, s, m, q, ?) ')

except EOFError:

command = 'q'

if not command: command = '*'

if command == 'q': # quit

break

elif command[0] == 'i': # index

showindex(msgList, msgSizes)

The mailtools Utility Package | 987

elif command[0] == 'l': # list

if len(command) == 1:

for i in range(1, len(msgList)+1):

showmessage(i, msgList)

else:

showmessage(msgnum(command), msgList)

elif command[0] == 's': # save

if len(command) == 1:

for i in range(1, len(msgList)+1):

savemessage(i, mailfile, msgList)

else:

savemessage(msgnum(command), mailfile, msgList)

elif command[0] == 'd': # mark for deletion later

if len(command) == 1: # 3.x needs list(): iter

toDelete = list(range(1, len(msgList)+1))

else:

delnum = msgnum(command)

if (1 <= delnum <= len(msgList)) and (delnum not in toDelete):

toDelete.append(delnum)

else:

print('Bad message number')

elif command[0] == 'm': # send a new mail via SMTP

try:

sendmessage()

except:

print('Error - mail not sent')

elif command[0] == '?':

print(helptext)

else:

print('What? -- type "?" for commands help')

return toDelete

def main():

global parser, sender, fetcher

mailserver = mailconfig.popservername

mailuser = mailconfig.popusername

mailfile = mailconfig.savemailfile

parser = mailtools.MailParser()

sender = mailtools.MailSender()

fetcher = mailtools.MailFetcherConsole(mailserver, mailuser)

def progress(i, max):

print(i, 'of', max)

hdrsList, msgSizes, ignore = fetcher.downloadAllHeaders(progress)

msgList = [parser.parseHeaders(hdrtext) for hdrtext in hdrsList]

print('[Pymail email client]')

toDelete = interact(msgList, msgSizes, mailfile)

988 | Chapter 13: Client-Side Scripting

if toDelete: deletemessages(toDelete)

if __name__ == '__main__': main()

Running the pymail2 console client

This program is used interactively, the same as the original. In fact, the output is nearly

identical, so we won’t go into further details. Here’s a quick look at this script in action;

run this on your own machine to see it firsthand:

C:\...\PP4E\Internet\Email> pymail2.py

user: PP4E@learning-python.com

loading headers

Connecting...

Password for PP4E@learning-python.com on pop.secureserver.net?

b'+OK <24460.1273347818@pop15.prod.mesa1.secureserver.net>'

1 of 7

2 of 7

3 of 7

4 of 7

5 of 7

6 of 7

7 of 7

load headers exit

[Pymail email client]

1: 1860 bytes

From =>lutz@rmi.net

To =>pp4e@learning-python.com

Date =>Wed, 5 May 2010 11:29:36 −0400 (EDT)

Subject =>I'm a Lumberjack, and I'm Okay

2: 1408 bytes

From =>lutz@learning-python.com

To =>PP4E@learning-python.com

Date =>Wed, 05 May 2010 08:33:47 −0700

Subject =>testing

3: 1049 bytes

From =>Eric.the.Half.a.Bee@yahoo.com

To =>PP4E@learning-python.com

Date =>Thu, 06 May 2010 14:11:07 −0000

Subject =>A B C D E F G

4: 1038 bytes

From =>Eric.the.Half.a.Bee@aol.com

To =>nobody.in.particular@marketing.com

Date =>Thu, 06 May 2010 14:32:32 −0000

Subject =>a b c d e f g

5: 957 bytes

From =>PP4E@learning-python.com

To =>maillist

Date =>Thu, 06 May 2010 10:58:40 −0400

Subject =>test interactive smtplib

[Press Enter key]

6: 1037 bytes

From =>Cardinal@hotmail.com

To =>PP4E@learning-python.com

Date =>Fri, 07 May 2010 20:32:38 −0000

The mailtools Utility Package | 989

Subject =>Among our weapons are these

7: 3248 bytes

From =>PP4E@learning-python.com

To =>PP4E@learning-python.com

Date =>Sat, 08 May 2010 19:26:22 −0000

Subject =>testing mailtools package

[Pymail] Action? (i, l, d, s, m, q, ?) l 7

load 7

Connecting...

b'+OK <20110.1273347827@pop07.prod.mesa1.secureserver.net>'

-------------------------------------------------------------------------------

Here is my source code

-------------------------------------------------------------------------------

[Pymail] Action? (i, l, d, s, m, q, ?) d 7

[Pymail] Action? (i, l, d, s, m, q, ?) m

From? lutz@rmi.net

To? PP4E@learning-python.com

Subj? test pymail2 send

Type message text, end with line="."

Run away! Run away!

Sending to...['PP4E@learning-python.com']

From: lutz@rmi.net

To: PP4E@learning-python.com

Subject: test pymail2 send

Date: Sat, 08 May 2010 19:44:25 −0000

Run away! Run away!

Send exit

[Pymail] Action? (i, l, d, s, m, q, ?) q

To be deleted: [7]

Delete?y

Deleting messages from server...

deleting mails

Connecting...

b'+OK <11553.1273347873@pop17.prod.mesa1.secureserver.net>'

The messages in our mailbox have quite a few origins now—ISP webmail clients, basic

SMTP scripts, the Python interactive command line, mailtools self-test code, and two

console-based email clients; in later chapters, we’ll add even more. All their mails look

the same to our script; here’s a verification of the email we just sent (the second fetch

finds it already in-cache):

C:\...\PP4E\Internet\Email> pymail2.py

user: PP4E@learning-python.com

loading headers

Connecting...

...more lines omitted...

[Press Enter key]

6: 1037 bytes

From =>Cardinal@hotmail.com

To =>PP4E@learning-python.com

990 | Chapter 13: Client-Side Scripting

Date =>Fri, 07 May 2010 20:32:38 −0000

Subject =>Among our weapons are these

7: 984 bytes

From =>lutz@rmi.net

To =>PP4E@learning-python.com

Date =>Sat, 08 May 2010 19:44:25 −0000

Subject =>test pymail2 send

[Pymail] Action? (i, l, d, s, m, q, ?) l 7

load 7

Connecting...

b'+OK <31456.1273348189@p3pop01-03.prod.phx3.gdg>'

-------------------------------------------------------------------------------

Run away! Run away!

-------------------------------------------------------------------------------

[Pymail] Action? (i, l, d, s, m, q, ?) l 7

-------------------------------------------------------------------------------

Run away! Run away!

-------------------------------------------------------------------------------

[Pymail] Action? (i, l, d, s, m, q, ?) q

Study pymail2’s code for more insights. As you’ll see, this version eliminates some

complexities, such as the manual formatting of composed mail message text. It also

does a better job of displaying a mail’s text—instead of blindly listing the full mail text

(attachments and all), it uses mailtools to fetch the first text part of the message. The

messages we’re using are too simple to show the difference, but for a mail with attach-

ments, this new version will be more focused about what it displays.

Moreover, because the interface to mail is encapsulated in the mailtools package’s

modules, if it ever must change, it will only need to be changed in that module, re-

gardless of how many mail clients use its tools. And because the code in mailtools is

shared, if we know it works for one client, we can be sure it will work in another; there

is no need to debug new code.

On the other hand, pymail2 doesn’t really leverage much of the power of either mail

tools or the underlying email package it uses. For example, things like attachments,

Internationalized headers, and inbox synchronization are not handled at all, and print-

ing of some decoded main text may contain character sets incompatible with the con-

sole terminal interface. To see the full scope of the email package, we need to explore

a larger email system, such as PyMailGUI or PyMailCGI. The first of these is the topic

of the next chapter, and the second appears in Chapter 16. First, though, let’s quickly

survey a handful of additional client-side protocol tools.

NNTP: Accessing Newsgroups

So far in this chapter, we have focused on Python’s FTP and email processing tools and

have met a handful of client-side scripting modules along the way: ftplib, poplib,

smtplib, email, mimetypes, urllib, and so on. This set is representative of Python’s

NNTP: Accessing Newsgroups | 991

client-side library tools for transferring and processing information over the Internet,

but it’s not at all complete.

A more or less comprehensive list of Python’s Internet-related modules appears at the

start of the previous chapter. Among other things, Python also includes client-side

support libraries for Internet news, Telnet, HTTP, XML-RPC, and other standard pro-

tocols. Most of these are analogous to modules we’ve already met—they provide an

object-based interface that automates the underlying sockets and message structures.

For instance, Python’s nntplib module supports the client-side interface to NNTP—

the Network News Transfer Protocol—which is used for reading and posting articles

to Usenet newsgroups on the Internet. Like other protocols, NNTP runs on top of

sockets and merely defines a standard message protocol; like other modules, nntplib

hides most of the protocol details and presents an object-based interface to Python

scripts.

We won’t get into full protocol details here, but in brief, NNTP servers store a range

of articles on the server machine, usually in a flat-file database. If you have the domain

or IP name of a server machine that runs an NNTP server program listening on the

NNTP port, you can write scripts that fetch or post articles from any machine that has

Python and an Internet connection. For instance, the script in Example 13-28 by default

fetches and displays the last 10 articles from Python’s Internet newsgroup,

comp.lang.python, from the news.rmi.net NNTP server at one of my ISPs.

Example 13-28. PP4E\Internet\Other\readnews.py

"""

fetch and print usenet newsgroup posting from comp.lang.python via the

nntplib module, which really runs on top of sockets; nntplib also supports

posting new messages, etc.; note: posts not deleted after they are read;

"""

listonly = False

showhdrs = ['From', 'Subject', 'Date', 'Newsgroups', 'Lines']

try:

import sys

servername, groupname, showcount = sys.argv[1:]

showcount = int(showcount)

except:

servername = nntpconfig.servername # assign this to your server

groupname = 'comp.lang.python' # cmd line args or defaults

showcount = 10 # show last showcount posts

# connect to nntp server

print('Connecting to', servername, 'for', groupname)

from nntplib import NNTP

connection = NNTP(servername)

(reply, count, first, last, name) = connection.group(groupname)

print('%s has %s articles: %s-%s' % (name, count, first, last))

# get request headers only

fetchfrom = str(int(last) - (showcount-1))

992 | Chapter 13: Client-Side Scripting

(reply, subjects) = connection.xhdr('subject', (fetchfrom + '-' + last))

# show headers, get message hdr+body

for (id, subj) in subjects: # [-showcount:] if fetch all hdrs

print('Article %s [%s]' % (id, subj))

if not listonly and input('=> Display?') in ['y', 'Y']:

reply, num, tid, list = connection.head(id)

for line in list:

for prefix in showhdrs:

if line[:len(prefix)] == prefix:

print(line[:80])

break

if input('=> Show body?') in ['y', 'Y']:

reply, num, tid, list = connection.body(id)

for line in list:

print(line[:80])

print()

print(connection.quit())

As for FTP and email tools, the script creates an NNTP object and calls its methods to

fetch newsgroup information and articles’ header and body text. The xhdr method, for

example, loads selected headers from a range of messages.

For NNTP servers that require authentication, you may also have to pass a username,

a password, and possibly a reader-mode flag to the NNTP call. See the Python Library

manual for more on other NNTP parameters and object methods.

In the interest of space and time, I’ll omit this script’s outputs here. When run, it

connects to the server and displays each article’s subject line, pausing to ask whether

it should fetch and show the article’s header information lines (headers listed in the

variable showhdrs only) and body text. We can also pass this script an explicit server

name, newsgroup, and display count on the command line to apply it in different ways.

With a little more work, we could turn this script into a full-blown news interface. For

instance, new articles could be posted from within a Python script with code of this

form (assuming the local file already contains proper NNTP header lines):

# to post, say this (but only if you really want to post!)

connection = NNTP(servername)

localfile = open('filename') # file has proper headers

connection.post(localfile) # send text to newsgroup

connection.quit()

We might also add a tkinter-based GUI frontend to this script to make it more usable,

but we’ll leave such an extension on the suggested exercise heap (see also the PyMail-

GUI interface’s suggested extensions at the end of the next chapter—email and news

messages have a similar structure).

NNTP: Accessing Newsgroups | 993

HTTP: Accessing Websites

Python’s standard library (the modules that are installed with the interpreter) also in-

cludes client-side support for HTTP—the Hypertext Transfer Protocol—a message

structure and port standard used to transfer information on the World Wide Web. In

short, this is the protocol that your web browser (e.g., Internet Explorer, Firefox,

Chrome, or Safari) uses to fetch web pages and run applications on remote servers as

you surf the Web. Essentially, it’s just bytes sent over port 80.

To really understand HTTP-style transfers, you need to know some of the server-side

scripting topics covered in Chapter 15 (e.g., script invocations and Internet address

schemes), so this section may be less useful to readers with no such background.

Luckily, though, the basic HTTP interfaces in Python are simple enough for a cursory

understanding even at this point in the book, so let’s take a brief look here.

Python’s standard http.client module automates much of the protocol defined by

HTTP and allows scripts to fetch web pages as clients much like web browsers; as we’ll

see in Chapter 15, http.server also allows us to implement web servers to handle the

other side of the dialog. For instance, the script in Example 13-29 can be used to grab

any file from any server machine running an HTTP web server program. As usual, the

file (and descriptive header lines) is ultimately transferred as formatted messages over

a standard socket port, but most of the complexity is hidden by the http.client module

(see our raw socket dialog with a port 80 HTTP server in Chapter 12 for a comparison).

Example 13-29. PP4E\Internet\Other\http-getfile.py

"""

fetch a file from an HTTP (web) server over sockets via http.client; the filename

parameter may have a full directory path, and may name a CGI script with ? query

parameters on the end to invoke a remote program; fetched file data or remote

program output could be saved to a local file to mimic FTP, or parsed with str.find

or html.parser module; also: http.client request(method, url, body=None, hdrs={});

"""

import sys, http.client

showlines = 6

try:

servername, filename = sys.argv[1:] # cmdline args?

except:

servername, filename = 'learning-python.com', '/index.html'

print(servername, filename)

server = http.client.HTTPConnection(servername) # connect to http site/server

server.putrequest('GET', filename) # send request and headers

server.putheader('Accept', 'text/html') # POST requests work here too

server.endheaders() # as do CGI script filenames

reply = server.getresponse() # read reply headers + data

if reply.status != 200: # 200 means success

print('Error sending request', reply.status, reply.reason)

else:

994 | Chapter 13: Client-Side Scripting

data = reply.readlines() # file obj for data received

reply.close() # show lines with eoln at end

for line in data[:showlines]: # to save, write data to file

print(line) # line already has \n, but bytes

Desired server names and filenames can be passed on the command line to override

hardcoded defaults in the script. You need to know something of the HTTP protocol

to make the most sense of this code, but it’s fairly straightforward to decipher. When

run on the client, this script makes an HTTP object to connect to the server, sends it a

GET request along with acceptable reply types, and then reads the server’s reply. Much

like raw email message text, the HTTP server’s reply usually begins with a set of

descriptive header lines, followed by the contents of the requested file. The HTTP

object’s getfile method gives us a file object from which we can read the downloaded

data.

Let’s fetch a few files with this script. Like all Python client-side scripts, this one works

on any machine with Python and an Internet connection (here it runs on a Windows

client). Assuming that all goes well, the first few lines of the downloaded file are printed;

in a more realistic application, the text we fetch would probably be saved to a local file,

parsed with Python’s html.parser module (introduced in Chapter 19), and so on.

Without arguments, the script simply fetches the HTML index page at http://learning

-python.com, a domain name I host at a commercial service provider:

C:\...\PP4E\Internet\Other> http-getfile.py

learning-python.com /index.html

b'<HTML>\n'

b' \n'

b'<HEAD>\n'

b"<TITLE>Mark Lutz's Python Training Services</TITLE>\n"

b'<link rel="stylesheet" type="text/css" href="_themes/blends/blen...'

b'</HEAD>\n'

Notice that in Python 3.X the fetched data comes back as bytes strings again, not str;

since the Python html.parser HTML parse we’ll meet in Chapter 19 expects str text

strings instead of bytes, you’ll likely need to resolve a Unicode encoding choice here

in order to parse, much the same as we did for email message text earlier in this chapter.

As there, we might decode from bytes to str per a default, user preferences or selections,

headers inspection, or byte structure analysis. Because sockets send raw bytes, we con-

front this choice point whenever data shipped over them is text in nature; unless that

text’s type is known or always simple in form, Unicode implies extra steps.

We can also list a server and file to be fetched on the command line, if we want to be

more specific. In the following code, we use the script to fetch files from two different

websites by listing their names on the command lines (I’ve truncated some of these

lines so they fit in this book). Notice that the filename argument can include an arbitrary

remote directory path to the desired file, as in the last fetch here:

C:\...\PP4E\Internet\Other> http-getfile.py www.python.org /index.html

www.python.org /index.html

b'<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3....'

HTTP: Accessing Websites | 995

b'\n'

b'<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">\n'

b'\n'

b'<head>\n'

C:\...\PP4E\Internet\Other> http-getfile.py www.python.org index.html

www.python.org index.html

Error sending request 400 Bad Request

C:\...\PP4E\Internet\Other> http-getfile.py www.rmi.net /~lutz

www.rmi.net /~lutz

Error sending request 301 Moved Permanently

C:\...\PP4E\Internet\Other> http-getfile.py www.rmi.net /~lutz/index.html

www.rmi.net /~lutz/index.html

b'<HTML>\n'

b'\n'

b'<HEAD>\n'

b"<TITLE>Mark Lutz's Book Support Site</TITLE>\n"

b'</HEAD>\n'

b'<BODY BGCOLOR="#f1f1ff">\n'

Notice the second and third attempts in this code: if the request fails, the script receives

and displays an HTTP error code from the server (we forgot the leading slash on the

second, and the “index.html” on the third—required for this server and interface). With

the raw HTTP interfaces, we need to be precise about what we want.

Technically, the string we call filename in the script can refer to either a simple static

web page file or a server-side program that generates HTML as its output. Those server-

side programs are usually called CGI scripts—the topic of Chapters 15 and 16. For

now, keep in mind that when filename refers to a script, this program can be used to

invoke another program that resides on a remote server machine. In that case, we can

also specify parameters (called a query string) to be passed to the remote program after

a ?.

Here, for instance, we pass a language=Python parameter to a CGI script we will meet

in Chapter 15 (to make this work, we also need to first spawn a locally running HTTP

web server coded in Python using a script we first met in Chapter 1 and will revisit in

Chapter 15):

In a different window

C:\...\PP4E\Internet\Web> webserver.py

webdir ".", port 80

C:\...\PP4E\Internet\Other> http-getfile.py localhost

/cgi-bin/languages.py?language=Python

localhost /cgi-bin/languages.py?language=Python

b'<TITLE>Languages</TITLE>\n'

b'<H1>Syntax</H1><HR>\n'

b'<H3>Python</H3><P><PRE>\n'

b" print('Hello World') \n"

996 | Chapter 13: Client-Side Scripting

b'</PRE></P><BR>\n'

b'<HR>\n'

This book has much more to say later about HTML, CGI scripts, and the meaning of

the HTTP GET request used in Example 13-29 (along with POST, one of two way to

format information sent to an HTTP server), so we’ll skip additional details here.

Suffice it to say, though, that we could use the HTTP interfaces to write our own web

browsers and build scripts that use websites as though they were subroutines. By send-

ing parameters to remote programs and parsing their results, websites can take on the

role of simple in-process functions (albeit, much more slowly and indirectly).

The urllib Package Revisited

The http.client module we just met provides low-level control for HTTP clients.

When dealing with items available on the Web, though, it’s often easier to code down-

loads with Python’s standard urllib.request module, introduced in the FTP section

earlier in this chapter. Since this module is another way to talk HTTP, let’s expand on

its interfaces here.

Recall that given a URL, urllib.request either downloads the requested object over

the Net to a local file or gives us a file-like object from which we can read the requested

object’s contents. As a result, the script in Example 13-30 does the same work as the

http.client script we just wrote but requires noticeably less code.

Example 13-30. PP4E\Internet\Other\http-getfile-urllib1.py

"""

fetch a file from an HTTP (web) server over sockets via urllib; urllib supports

HTTP, FTP, files, and HTTPS via URL address strings; for HTTP, the URL can name

a file or trigger a remote CGI script; see also the urllib example in the FTP

section, and the CGI script invocation in a later chapter; files can be fetched

over the net with Python in many ways that vary in code and server requirements:

over sockets, FTP, HTTP, urllib, and CGI outputs; caveat: should run filename

through urllib.parse.quote to escape properly unless hardcoded--see later chapters;

"""

import sys

from urllib.request import urlopen

showlines = 6

try:

servername, filename = sys.argv[1:] # cmdline args?

except:

servername, filename = 'learning-python.com', '/index.html'

remoteaddr = 'http://%s%s' % (servername, filename) # can name a CGI script too

print(remoteaddr)

remotefile = urlopen(remoteaddr) # returns input file object

remotedata = remotefile.readlines() # read data directly here

remotefile.close()

for line in remotedata[:showlines]: print(line) # bytes with embedded \n

The urllib Package Revisited | 997

Almost all HTTP transfer details are hidden behind the urllib.request interface here.

This version works in almost the same way as the http.client version we wrote first,

but it builds and submits an Internet URL address to get its work done (the constructed

URL is printed as the script’s first output line). As we saw in the FTP section of this

chapter, the urllib.request function urlopen returns a file-like object from which we

can read the remote data. But because the constructed URLs begin with “http://” here,

the urllib.request module automatically employs the lower-level HTTP interfaces to

download the requested file instead of FTP:

C:\...\PP4E\Internet\Other> http-getfile-urllib1.py

http://learning-python.com/index.html

b'<HTML>\n'

b' \n'

b'<HEAD>\n'

b"<TITLE>Mark Lutz's Python Training Services</TITLE>\n"

b'<link rel="stylesheet" type="text/css" href="_themes/blends/blen...'

b'</HEAD>\n'

C:\...\PP4E\Internet\Other> http-getfile-urllib1.py www.python.org /index

http://www.python.org/index

b'<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3....'

b'\n'

b'<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">\n'

b'\n'

b'<head>\n'

C:\...\PP4E\Internet\Other> http-getfile-urllib1.py www.rmi.net /~lutz

http://www.rmi.net/~lutz

b'<HTML>\n'

b'\n'

b'<HEAD>\n'

b"<TITLE>Mark Lutz's Book Support Site</TITLE>\n"

b'</HEAD>\n'

b'<BODY BGCOLOR="#f1f1ff">\n'

C:\...\PP4E\Internet\Other> http-getfile-urllib1.py

localhost /cgi-bin/languages.py?language=Java

http://localhost/cgi-bin/languages.py?language=Java

b'<TITLE>Languages</TITLE>\n'

b'<H1>Syntax</H1><HR>\n'

b'<H3>Java</H3><P><PRE>\n'

b' System.out.println("Hello World"); \n'

b'</PRE></P><BR>\n'

b'<HR>\n'

As before, the filename argument can name a simple file or a program invocation with

optional parameters at the end, as in the last run here. If you read this output carefully,

you’ll notice that this script still works if you leave the “index.html” off the end of a

site’s root filename (in the third command line); unlike the raw HTTP version of the

preceding section, the URL-based interface is smart enough to do the right thing.

998 | Chapter 13: Client-Side Scripting

Other urllib Interfaces

One last mutation: the following urllib.request downloader script uses the slightly

higher-level urlretrieve interface in that module to automatically save the downloaded

file or script output to a local file on the client machine. This interface is handy if we

really mean to store the fetched data (e.g., to mimic the FTP protocol). If we plan on

processing the downloaded data immediately, though, this form may be less convenient

than the version we just met: we need to open and read the saved file. Moreover, we

need to provide an extra protocol for specifying or extracting a local filename, as in

Example 13-31.

Example 13-31. PP4E\Internet\Other\http-getfile-urllib2.py

"""

fetch a file from an HTTP (web) server over sockets via urlllib; this version

uses an interface that saves the fetched data to a local binary-mode file; the

local filename is either passed in as a cmdline arg or stripped from the URL with

urllib.parse: the filename argument may have a directory path at the front and query

parameters at end, so os.path.split is not enough (only splits off directory path);

caveat: should urllib.parse.quote filename unless known ok--see later chapters;

"""

import sys, os, urllib.request, urllib.parse

showlines = 6

try:

servername, filename = sys.argv[1:3] # first 2 cmdline args?

except:

servername, filename = 'learning-python.com', '/index.html'

remoteaddr = 'http://%s%s' % (servername, filename) # any address on the Net

if len(sys.argv) == 4: # get result filename

localname = sys.argv[3]

else:

(scheme, server, path, parms, query, frag) = urllib.parse.urlparse(remoteaddr)

localname = os.path.split(path)[1]

print(remoteaddr, localname)

urllib.request.urlretrieve(remoteaddr, localname) # can be file or script

remotedata = open(localname, 'rb').readlines() # saved to local file

for line in remotedata[:showlines]: print(line) # file is bytes/binary

Let’s run this last variant from a command line. Its basic operation is the same as the

last two versions: like the prior one, it builds a URL, and like both of the last two, we

can list an explicit target server and file path on the command line:

C:\...\PP4E\Internet\Other> http-getfile-urllib2.py

http://learning-python.com/index.html index.html

b'<HTML>\n'

b' \n'

b'<HEAD>\n'

b"<TITLE>Mark Lutz's Python Training Services</TITLE>\n"

b'<link rel="stylesheet" type="text/css" href="_themes/blends/blen...'

b'</HEAD>\n'

The urllib Package Revisited | 999

C:\...\PP4E\Internet\Other> http-getfile-urllib2.py www.python.org /index.html

http://www.python.org/index.html index.html

b'<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3....'

b'\n'

b'<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">\n'

b'\n'

b'<head>\n'

Because this version uses a urllib.request interface that automatically saves the down-

loaded data in a local file, it’s similar to FTP downloads in spirit. But this script must

also somehow come up with a local filename for storing the data. You can either let the

script strip and use the base filename from the constructed URL, or explicitly pass a

local filename as a last command-line argument. In the prior run, for instance, the

downloaded web page is stored in the local file index.html in the current working

directory—the base filename stripped from the URL (the script prints the URL and

local filename as its first output line). In the next run, the local filename is passed

explicitly as py-index.html:

C:\...\PP4E\Internet\Other> http-getfile-urllib2.py

www.python.org /index.html py-index.html

http://www.python.org/index.html py-index.html

b'<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3....'

b'\n'

b'<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">\n'

b'\n'

b'<head>\n'

C:\...\PP4E\Internet\Other> http-getfile-urllib2.py www.rmi.net /~lutz books.html

http://www.rmi.net/~lutz books.html

b'<HTML>\n'

b'\n'

b'<HEAD>\n'

b"<TITLE>Mark Lutz's Book Support Site</TITLE>\n"

b'</HEAD>\n'

b'<BODY BGCOLOR="#f1f1ff">\n'

C:\...\PP4E\Internet\Other> http-getfile-urllib2.py www.rmi.net /~lutz/about-pp.html

http://www.rmi.net/~lutz/about-pp.html about-pp.html

b'<HTML>\n'

b'\n'

b'<HEAD>\n'

b'<TITLE>About "Programming Python"</TITLE>\n'

b'</HEAD>\n'

b'\n'

Invoking programs and escaping text

The next listing shows this script being used to trigger a remote program. As before, if

you don’t give the local filename explicitly, the script strips the base filename out of

the filename argument. That’s not always easy or appropriate for program

1000 | Chapter 13: Client-Side Scripting

invocations—the filename can contain both a remote directory path at the front and

query parameters at the end for a remote program invocation.

Given a script invocation URL and no explicit output filename, the script extracts the

base filename in the middle by using first the standard urllib.parse module to pull out

the file path, and then os.path.split to strip off the directory path. However, the re-

sulting filename is a remote script’s name, and it may or may not be an appropriate

place to store the data locally. In the first run that follows, for example, the script’s

output goes in a local file called languages.py, the script name in the middle of the URL;

in the second, we instead name the output CxxSyntax.html explicitly to suppress file-

name extraction:

C:\...\PP4E\Internet\Other> python http-getfile-urllib2.py localhost

/cgi-bin/languages.py?language=Scheme

http://localhost/cgi-bin/languages.py?language=Scheme languages.py

b'<TITLE>Languages</TITLE>\n'

b'<H1>Syntax</H1><HR>\n'

b'<H3>Scheme</H3><P><PRE>\n'

b' (display "Hello World") (newline) \n'

b'</PRE></P><BR>\n'

b'<HR>\n'

C:\...\PP4E\Internet\Other> python http-getfile-urllib2.py localhost

/cgi-bin/languages.py?language=C++ CxxSyntax.html

http://localhost/cgi-bin/languages.py?language=C++ CxxSyntax.html

b'<TITLE>Languages</TITLE>\n'

b'<H1>Syntax</H1><HR>\n'

b'<H3>C </H3><P><PRE>\n'

b"Sorry--I don't know that language\n"

b'</PRE></P><BR>\n'

b'<HR>\n'

The remote script returns a not-found message when passed “C++” in the last com-

mand here. It turns out that “+” is a special character in URL strings (meaning a space),

and to be robust, both of the urllib scripts we’ve just written should really run the

filename string through something called urllib.parse.quote, a tool that escapes spe-

cial characters for transmission. We will talk about this in depth in Chapter 15, so

consider this a preview for now. But to make this invocation work, we need to use

special sequences in the constructed URL. Here’s how to do it by hand:

C:\...\PP4E\Internet\Other> python http-getfile-urllib2.py localhost

/cgi-bin/languages.py?language=C%2b%2b CxxSyntax.html

http://localhost/cgi-bin/languages.py?language=C%2b%2b CxxSyntax.html

b'<TITLE>Languages</TITLE>\n'

b'<H1>Syntax</H1><HR>\n'

b'<H3>C++</H3><P><PRE>\n'

b' cout << "Hello World" << endl; \n'

b'</PRE></P><BR>\n'

b'<HR>\n'

The odd %2b strings in this command line are not entirely magical: the escaping required

for URLs can be seen by running standard Python tools manually—this is what these

The urllib Package Revisited | 1001

scripts should do automatically to be able to handle all possible cases well;

urllib.parse.unquote can undo these escapes if needed:

C:\...\PP4E\Internet\Other> python

>>> import urllib.parse

>>> urllib.parse.quote('C++')

'c%2B%2B'

Again, don’t work too hard at understanding these last few commands; we will revisit

URLs and URL escapes in Chapter 15, while exploring server-side scripting in Python.

I will also explain there why the C++ result came back with other oddities like

<<—HTML escapes for <<, generated by the tool cgi.escape in the script on the

server that produces the reply, and usually undone by HTML parsers including Py-

thon’s html.parser module we’ll meet in Chapter 19:

>>> import cgi

>>> cgi.escape('<<')

'<<'

Also in Chapter 15, we’ll meet urllib support for proxies, and its support for client-

side cookies. We’ll discuss the related HTTPS concept in Chapter 16—HTTP trans-

missions over secure sockets, supported by urllib.request on the client side if SSL

support is compiled into your Python. For now, it’s time to wrap up our look at the

Web, and the Internet at large, from the client side of the fence.

Other Client-Side Scripting Options

In this chapter, we focused on client-side interfaces to standard protocols that run over

sockets, but as suggested in an earlier footnote, client-side programming can take other

forms, too. We outlined many of these at the start of Chapter 12—web service protocols

(including SOAP and XML-RPC); Rich Internet Application toolkits (including Flex,

Silverlight, and pyjamas); cross-language framework integration (including Java

and .NET); and more.

As mentioned, most of these serve to extend the functionality of web browsers, and so

ultimately run on top of the HTTP protocol we explored in this chapter. For instance:

• The Jython system, a compiler that supports Python-coded Java applets—general-

purpose programs downloaded from a server and run locally on the client when

accessed or referenced by a URL, which extend the functionality of web browsers

and interactions.

• Similarly, RIAs provide AJAX communication and widget toolkits that allow Java-

Script to implement user interaction within web browsers, which is more dynamic

and rich than HTML and web browsers otherwise support.

• In Chapter 19, we’ll also study Python’s support for XML—structured text that is

used as the data transfer medium of client/server dialogs in web service protocols

such as XML-RPC, which transfer XML-encoded objects over HTTP, and are

1002 | Chapter 13: Client-Side Scripting

supported by Python’s xmlrpc standard library package. Such protocols can sim-

plify the interface to web servers in their clients.

In deference to time and space, though, we won’t go into further details on these and

other client-side tools here. If you are interested in using Python to script clients, you

should take a few minutes to become familiar with the list of Internet tools documented

in the Python library reference manual. All work on similar principles but have slightly

distinct interfaces.

In Chapter 15, we’ll hop the fence to the other side of the Internet world and explore

scripts that run on server machines. Such programs give rise to the grander notion of

applications that live entirely on the Web and are launched by web browsers. As we

take this leap in structure, keep in mind that the tools we met in this and the preceding

chapter are often sufficient to implement all the distributed processing that many ap-

plications require, and they can work in harmony with scripts that run on a server. To

completely understand the Web worldview, though, we need to explore the server

realm, too.

Before we get there, though, the next chapter puts concepts we’ve learned here to work

by presenting a complete client-side program—a full-blown mail client GUI, which ties

together many of the tools we’ve learned and coded. In fact, much of the email work

we’ve done in this chapter was designed to lay the groundwork we’ll need to tackle the

realistically scaled PyMailGUI example of the next chapter. Really, much of this book

so far has served to build up skills required to equip us for this task: as we’ll see,

PyMailGUI combines system tools, GUIs, and client-side Internet protocols to produce

a useful system that does real work. As an added bonus, this example will help us

understand the trade-offs between the client solutions we’ve met here and the server-

side solutions we’ll study later in this part of the book.

Other Client-Side Scripting Options | 1003

CHAPTER 14

The PyMailGUI Client

“Use the Source, Luke”

The preceding chapter introduced Python’s client-side Internet protocols tool set—the

standard library modules available for email, FTP, network news, HTTP, and more,

from within a Python script. This chapter picks up where the last one left off and

presents a complete client-side example—PyMailGUI, a Python program that sends,

receives, composes, and parses Internet email messages.

Although the end result is a working program that you can actually use for your email,

this chapter also has a few additional agendas worth noting before we get started:

Client-side scripting

PyMailGUI implements a full-featured desktop GUI that runs on your machine

and communicates with your mail servers when necessary. As such, it is a network

client program that further illustrates some of the preceding chapter’s topics, and

it will help us contrast server-side solutions introduced in the next chapter.

Code reuse

Additionally, PyMailGUI ties together a number of the utility modules we’ve been

writing in the book so far, and it demonstrates the power of code reuse in the

process—it uses a thread module to allow mail transfers to overlap in time, a set

of mail modules to process message content and route it across networks, a window

protocol module to handle icons, a text editor component, and so on. Moreover,

it inherits the power of tools in the Python standard library, such as the email

package; message construction and parsing, for example, is nearly trivial here.

Programming in the large

And finally, this chapter serves to illustrate realistic and large-scale software de-

velopment in action. Because PyMailGUI is a relatively large and complete pro-

gram, it shows by example some of the code structuring techniques that prove

useful once we leave the realm of the small and artificial. For instance, object-

oriented programming and modular design work well here to divide the system in

smaller, self-contained units.

1005

Ultimately, though, PyMailGUI serves to illustrate just how far the combination of

GUIs, networking, and Python can take us. Like all Python programs, this system is

scriptable—once you’ve learned its general structure, you can easily change it to work

as you like, by modifying its source code. And like all Python programs, this one is

portable—you can run it on any system with Python and a network connection, without

having to change its code. Such advantages become automatic when your software is

coded in an open source, portable, and readable language like Python.

Source Code Modules and Size

This chapter is something of a self-study exercise. Because PyMailGUI is fairly large

and mostly applies concepts we’ve already learned, we won’t go into much detail about

its actual code. Instead, it is listed for you to read on your own. I encourage you to

study the source and comments and to run this program live to get a feel for its oper-

ation; example save-mail files are included so you can even experiment offline.

As you study and run this program, you’ll also want to refer back to the modules we

introduced earlier in the book and are reusing here, to gain a full understanding of the

system. For reference, here are the major examples that will see new action in this

chapter:

Example 13-21: PP4E.Internet.Email.mailtools (package)

Server sends and receives, parsing, construction (Client-side scripting chapter)

Example 10-20: PP4E.Gui.Tools.threadtools.py

Thread queue management for GUI callbacks (GUI tools chapter)

Example 10-16: PP4E.Gui.Tools.windows.py

Border configuration for top-level window (GUI tools chapter)

Example 11-4: PP4E.Gui.TextEditor.textEditor.py

Text widget used in mail view windows, and in some pop ups (GUI examples

chapter)

Some of these modules in turn use additional examples we coded earlier but that are

not imported by PyMailGUI itself (textEditor, for instance, uses guimaker to create its

windows and toolbar). Naturally, we’ll also be coding new modules here. The following

new modules are intended to be potentially useful in other programs:

popuputil.py

Various pop-up windows, written for general use

messagecache.py

A cache manager that keeps track of mail already loaded

wraplines.py

A utility for wrapping long lines of messages

mailconfig.py

User configuration parameters—server names, fonts, and so on (augmented here)

1006 | Chapter 14: The PyMailGUI Client

html2text.py

A rudimentary parser for extracting plain text from HTML-based emails

Finally, the following are the new major modules coded in this chapter which are spe-

cific to the PyMailGUI program. In total, PyMailGUI itself consists of the ten modules

in this and the preceding lists, along with a handful of less prominent source files we’ll

see in this chapter:

SharedNames.py

Program-wide globals used by multiple files

ViewWindows.py

The implementation of View, Write, Reply, and Forward message view windows

ListWindows.py

The implementation of mail-server and local-file message list windows

PyMailGuiHelp.py

User-oriented help text, opened by the main window’s bar button

PyMailGui.py

The main, top-level file of the program, run to launch the main window

Code size

As a realistically scaled system, PyMailGUI’s size is also instructive. All told, PyMailGUI

is composed of 18 new files: the 10 new Python modules in the two preceding lists,

plus an HTML help file, a small configuration file for PyEdit pop ups, a currently unused

package initialization file, and 5 short Python files in a subdirectory used for alternate

account configuration.

Together, it contains some 2,400 new lines of program source code in 16 Python files

(including comments and whitespace), plus roughly 1,700 lines of help text in one

Python and one HTML file (in two flavors). This 4,100 new line total doesn’t include

the four other book examples listed in the previous section that are reused in PyMail-

GUI. The reused examples themselves constitute 2,600 additional lines of Python pro-

gram code—roughly 1,000 lines each for PyEdit and mailtools alone. That brings the

grand total to 6,700 lines: 4,100 new + 2,600 reused. Of this total, 5,000 lines is in

program code files (2,400 of which are new here) and 1,700 lines is help text.*

I obtained these lines counts with PyEdit’s Info pop up, and opened the files with the

code button in the PyDemos entry for this program (the Source button in PyMailGUI’s

* And remember: you would have to multiply these line counts by a factor of four or more to get the equivalent

in a language like C or C++. If you’ve done much programming, you probably recognize that the fact that

we can implement a fairly full-featured mail processing program in roughly 5,000 total lines of program code

speaks volumes about the power of the Python language and its libraries. For comparison, the original 1.0

version of this program from the second edition of this book was just 745 total lines in 3 new modules, but

it also was very limited—it did not support PyMailGUI 2.X’s attachments, thread overlap, local mail files,

and so on, and did not have the Internationalization support or other features of this edition’s PyMailGUI 3.X.

“Use the Source, Luke” | 1007

own text-based help window does similar work). For the break down by individual

files, see the Excel spreadsheet file linecounts.xls in the media subdirectory of PyMail-

GUI; this file is also used to test attachment sends and receives, and so appears near

the end of the emails in file SavedEmail\version30-4E if opened in the GUI (we’ll see

how to open mail save files in a moment).

Watch for the changes section ahead for size comparisons to prior versions. Also see

the SLOC counter script in Chapter 6 for an alternative way to count source lines that

is less manual, but can’t include all related files in a single run and doesn’t discriminate

between program code and help text.

Code Structure

As these statistics probably suggest, this is the largest example we’ll see in this book,

but you shouldn’t be deterred by its size. Because it uses modular and OOP techniques,

the code is simpler than you may think:

• Python’s modules allow us to divide the system into files that have a cohesive

purpose, with minimal coupling between them—code is easier to locate and un-

derstand if your modules have a logical, self-contained structure.

• Python’s OOP support allows us to factor code for reuse and avoid redundancy—

as you’ll see, code is customized, not repeated, and the classes we will code reflect

the actual components of the GUI to make them easy to follow.

For instance, the implementation of mail list windows is easy to read and change, be-

cause it has been factored into a common shared superclass, which is customized by

subclasses for mail-server and save-file lists; since these are mostly just variations on a

theme, most of the code appears in just one place. Similarly, the code that implements

the message view window is a superclass shared by write, reply, and forward compo-

sition windows; subclasses simply tailor it for writing rather than viewing.

Although we’ll deploy these techniques in the context of a mail processing program

here, such techniques will apply to any nontrivial program you’ll write in Python.

To help get you started, the PyMailGuiHelp.py m o d u l e l i s t e d i n p a r t n e a r t h e e n d o f t h i s

chapter includes a help text string that describes how this program is used, as well as

its major features. You can also view this help live in both text and HTML form when

the program is run. Experimenting with the system, while referring to its code, is prob-

ably the best and quickest way to uncover its secrets.

Why PyMailGUI?

Before we start digging into the code of this relatively large system, some context is in

order. PyMailGUI is a Python program that implements a client-side email processing

user interface with the standard tkinter GUI toolkit. It is presented both as an instance

1008 | Chapter 14: The PyMailGUI Client

of Python Internet scripting and as a realistically scaled example that ties together other

tools we’ve already seen, such as threads and tkinter GUIs.

Like the pymail console-based program we wrote in Chapter 13, PyMailGUI runs en-

tirely on your local computer. Your email is fetched from and sent to remote mail servers

over sockets, but the program and its user interface run locally. As a result, PyMailGUI

is called an email client: like pymail, it employs Python’s client-side tools to talk to mail

servers from the local machine. Unlike pymail, though, PyMailGUI is a full-featured

user interface: email operations are performed with point-and-click operations and

advanced mail processing such as attachments, save files, and Internationalization is

supported.

Like many examples presented in this text, PyMailGUI is a practical, useful program.

In fact, I run it on all kinds of machines to check my email while traveling around the

world teaching Python classes. Although PyMailGUI won’t put Microsoft Outlook out

of business anytime soon, it has two key pragmatic features alluded to earlier that have

nothing to do with email itself—portability and scriptability, which are attractive fea-

tures in their own right and merit a few additional words here:

It’s portable

PyMailGUI runs on any machine with sockets and a Python with tkinter installed.

Because email is transferred with the Python libraries, any Internet connection that

supports Post Office Protocol (POP) and Simple Mail Transfer Protocol (SMTP)

access will do. Moreover, because the user interface is coded with tkinter, PyMail-

GUI should work, unchanged, on Windows, the X Window System (Unix, Linux),

and the Macintosh (classic and OS X), as long as Python 3.X runs there too.

Microsoft Outlook may be a more feature-rich package, but it has to be run on

Windows, and more specifically, on a single Windows machine. Because it gener-

ally deletes email from a server as it is downloaded by default and stores it on the

client, you cannot run Outlook on multiple machines without spreading your email

across all those machines. By contrast, PyMailGUI saves and deletes email only on

request, and so it is a bit friendlier to people who check their email in an ad hoc

fashion on arbitrary computers (like me).

It’s scriptable

PyMailGUI can become anything you want it to be because it is fully programma-

ble. In fact, this is the real killer feature of PyMailGUI and of open source software

like Python in general—because you have full access to PyMailGUI’s source code,

you are in complete control of where it evolves from here. You have nowhere near

as much control over commercial, closed products like Outlook; you generally get

whatever a large company decided you need, along with whatever bugs that com-

pany might have introduced.

As a Python script, PyMailGUI is a much more flexible tool. For instance, we can

change its layout, disable features, and add completely new functionality quickly

by changing its Python source code. Don’t like the mail-list display? Change a few

“Use the Source, Luke” | 1009

lines of code to customize it. Want to save and delete your mail automatically as

it is loaded? Add some more code and buttons. Tired of seeing junk mail? Add a

few lines of text processing code to the load function to filter spam. These are just

a few examples. The point is that because PyMailGUI is written in a high-level,

easy-to-maintain scripting language, such customizations are relatively simple, and

might even be fun.

At the end of the day, because of such features, this is a realistic Python program that

I actually use—both as a primary email tool and as a fallback option when my ISP’s

webmail system goes down (which, as I mentioned in the prior chapter, has a way of

happening at the worst possible times).† Python scripting is an enabling skill to have.

Running PyMailGUI

Of course, to script PyMailGUI on your own, you’ll need to be able to run it. PyMailGUI

requires only a computer with some sort of Internet connectivity (a PC with a broad-

band or dial-up account will do) and an installed Python with the tkinter extension

enabled. The Windows port of Python has this capability, so Windows PC users should

be able to run this program immediately by clicking its icon.

Two notes on running the system: first, you’ll want to change the file mailconfig.py in

the program’s source directory to reflect your account’s parameters, if you wish to send

or receive mail from a live server; more on this as we interact with the system ahead.

Second, you can still experiment with the system without a live Internet connection—

for a quick look at message view windows, use the main window’s Open buttons to

open saved-mail files included in the program’s SavedMail subdirectory. The PyDemos

launcher script at the top of the book’s examples directory, for example, forces Py-

MailGUI to open saved-mail files by passing filenames on the command line. Although

you’ll probably want to connect to your email servers eventually, viewing saved mails

offline is enough to sample the system’s flavor and does not require any configuration

file changes.

Presentation Strategy

PyMailGUI is easily the largest program in this book, but it doesn’t introduce many

library interfaces that we haven’t already seen in this book. For instance:

• The PyMailGUI interface is built with Python’s tkinter, using the familiar listboxes,

buttons, and text widgets we met earlier.

• Python’s email package is applied to pull-out headers, text, and attachments of

messages, and to compose the same.

† In fact, my ISP’s webmail send system went down the very day I had to submit the third edition of this book

to my publisher! No worries—I fired up PyMailGUI and used it to send the book as attachment files through

a different server. In a sense, this book submitted itself.

1010 | Chapter 14: The PyMailGUI Client

• Python’s POP and SMTP library modules are used to fetch, send, and delete mail

over sockets.

• Python threads, if installed in your Python interpreter, are put to work to avoid

blocking during potentially overlapping, long-running mail operations.

We’re also going to reuse the PyEdit TextEditor object we wrote in Chapter 11 to view

and compose messages and to pop up raw text, attachments, and source; the mail

tools package’s tools we wrote in Chapter 13 to load, send, and delete mail with a

server; and the mailconfig module strategy introduced in Chapter 13 to support end-

user settings. PyMailGUI is largely an exercise in combining existing tools.

On the other hand, because this program is so long, we won’t exhaustively document

all of its code. Instead, we’ll begin with a quick look at how PyMailGUI has evolved,

and then move on to describing how it works today from an end user’s perspective—

a brief demo of its windows in action. After that, we’ll list the system’s new source code

modules without many additional comments, for further study.

Like most of the longer case studies in this book, this section assumes that you already

know enough Python to make sense of the code on your own. If you’ve been reading

this book linearly, you should also know enough about tkinter, threads, and mail in-

terfaces to understand the library tools applied here. If you get stuck, you may wish to

brush up on the presentation of these topics earlier in the book.

Major PyMailGUI Changes

Like the PyEdit text editor of Chapter 11, PyMailGUI serves as a good example of

software evolution in action. Because its revisions help document this system’s func-

tionality, and because this example is as much about software engineering as about

Python itself, let’s take a quick look at its recent changes.

New in Version 2.1 and 2.0 (Third Edition)

The 2.1 version of PyMailGUI presented in the third edition of the book in early 2006

is still largely present and current in this fourth edition in 2010. Version 2.1 added a

handful of enhancements to version 2.0, and version 2.0 was a complete rewrite of the

1.0 version of the second edition with a radically expanded feature set.

In fact, the second edition’s version 1.0 of this program written in early 2000 was only

some 685 total program lines long (515 lines for the GUI main script and 170 lines in

an email utilities module), not counting related examples reused, and just 60 lines in

its help text module. Version 1.0 was really something of a prototype (if not toy), written

mostly to serve as a short book example.

Although it did not yet support Internationalized mail content or other 3.0 extensions,

in the third edition, PyMailGUI 2.1 became a much more realistic and feature-rich

program that could be used for day-to-day email processing. It grew by nearly a factor

Major PyMailGUI Changes | 1011

of three to be 1,800 new program source lines (plus 1,700 program lines in related

modules reused, and 500 additional lines of help text). By comparison, version 3.0 by

itself grew only by some 30% to be 2,400 new program source lines as described earlier

(plus 2,500 lines in related modules, and 1,700 lines of help text). Statistically minded

readers: consult file linecounts-prior-version.xls in PyMailGUI’s media subdirectory for

a line counts breakdown for version 2.1 by file.

In version 2.1, among PyMailGUI’s new weapons were (and still are) these:

• MIME multipart mails with attachments may be both viewed and composed.

• Mail transfers are no longer blocking, and may overlap in time.

• Mail may be saved and processed offline from a local file.

• Message parts may now be opened automatically within the GUI.

• Multiple messages may be selected for processing in list windows.

• Initial downloads fetch mail headers only; full mails are fetched on request.

• View window headers and list window columns are configurable.

• Deletions are performed immediately, not delayed until program exit.

• Most server transfers report their progress in the GUI.

• Long lines are intelligently wrapped in viewed and quoted text.

• Fonts and colors in list and view windows may be configured by the user.

• Authenticating SMTP mail-send servers that require login are supported.

• Sent messages are saved in a local file, which may be opened in the GUI.

• View windows intelligently pick a main text part to be displayed.

• Already fetched mail headers and full mails are cached for speed.

• Date strings and addresses in composed mails are formatted properly.

• View windows now have quick-access buttons for attachments/parts (2.1).

• Inbox out-of-sync errors are detected on deletes, and on index and mail loads (2.1).

• Save-mail file loads and deletes are threaded, to avoid pauses for large files (2.1).

The last three items on this list were added in version 2.1; the rest were part of the 2.0

rewrite. Some of these changes were made simple by growth in standard library tools

(e.g., support for attachments is straightforward with the new email package), but most

represented changes in PyMailGUI itself. There were also a few genuine fixes: addresses

were parsed more accurately, and date and time formats in sent mails became standards

conforming, because these tasks used new tools in the email package.

New in Version 3.0 (Fourth Edition)

PyMailGUI version 3.0, presented in this fourth edition of this book, inherits all of 2.1’s

upgrades described in the prior section and adds many of its own. Changes are perhaps

less dramatic in version 3.0, though some address important usability issues, and they

1012 | Chapter 14: The PyMailGUI Client

seem collectively sufficient to justify assigning this version a new major release number.

Here’s a summary of what’s new this time around:

Python 3.X port

The code was updated to run under Python 3.X only; Python 2.X is no longer

supported without code changes. Although some of the task of porting to Python

3.X requires only minor coding changes, other idiomatic implications are more far

reaching. Python 3.X’s new Unicode focus, for example, motivated much of the

Internationalization support in this version of PyMailGUI (discussed ahead).

Layout improvements

View window forms are laid out with gridding instead of packed column frames,

for better appearance and platform neutrality of email headers (see Chapter 9 for

more details on form layout). In addition, list window toolbars are now arranged

with expanding separators for clarity; this effectively groups buttons by their roles

and scope. List windows are also larger when initially opened to show more.

Text editor fix for Tk change

Both the embedded text editor and some text editor instances popped up on de-

mand are now forcibly updated before new text is inserted, for accurate initial

positioning at line 1. See PyEdit in Chapter 11 for more on this requirement; it

stems from a recent change (bug?) in either Tk or tkinter.

Text editor upgrades inherited

Because the PyEdit program is reused in multiple roles here, this version of Py-

MailGUI also acquires all its latest fixes by proxy. Most prominently, these include

a new Grep external files search dialog and support for displaying, opening, and

saving Unicode text. See Chapter 11 for details.

Workaround for Python 3.1 bug on traceback prints

In the obscure-but-all-too-typical category: the common function in

SharedNames.py that prints traceback details had to be changed to work correctly

under Python 3.X. The traceback module’s print_tb function can no longer print

a stack trace to sys.stdout if the calling program is spawned from another on

Windows; it still can as before if the caller was run normally from a shell prompt.

Since this function is called from the main thread on worker thread exceptions, if

allowed to fail any printed error kills the GUI entirely when it is spawned from the

gadget or demo launchers.

To work around this, the function now catches exceptions when print_tb is called

and in response runs it again with a real file instead of sys.stdout. This appears to

be a Python 3.X regression, as the same code worked correctly in both contexts in

Python 2.5 and 2.6. Unlike some similar issues, it has nothing to do with printing

Unicode, as stack traces are all ASCII text. Even more baffling, directly printing to

stdout in the same function works fine. Hey, if it were easy, they wouldn’t call it

“work.”

Major PyMailGUI Changes | 1013

Bcc addresses added to envelope but header omitted

Minor change: addresses entered in the user-selectable Bcc header line of edit win-

dows are included in the recipients list (the “envelope”), but the Bcc header line

itself is no longer included in the message text sent. Otherwise, Bcc recipients might

be seen by some email readers and clients (including PyMailGUI), which defeats

most of this header’s purpose.

Avoiding parallel fetches of the same mail

PyMailGUI loads only mail headers initially, and fetches a mail’s full text later when

needed for viewing and other operations, allowing multiple fetches to overlap in

time (they are run in parallel threads). Though unlikely, it was not impossible for

a user to trigger a new fetch for a mail that was currently being fetched, by selecting

the mail again during its download (clicking its list entry twice quickly sufficed to

kick this off). Although the message cache updates performed in the parallel fetch

threads appeared to be thread safe, this behavior seemed odd and wasted time.

To do better, this version now keeps track of all fetches in progress in the main

thread, to avoid this overlap potential entirely—a message fetch in progress disa-

bles all new fetch requests that it is a part of, until its fetch completes. Multiple

overlapping fetches are still allowed, as long as their targets do not intersect. A set

is used to detect nondisjoint fetch requests. Mails already fetched and cached are

not subject to this check and can always be selected irrespective of any fetches in

progress.

Multiple recipients separated in GUI by commas, not semicolons

In the prior edition, “;” was used as the recipient character, and addresses were

naively split on “;” on a send. This attempted to avoid conflicts with “,” commonly

used in email names. Replies dropped the name part if it contained a “;” when

extracting a To address, but it was not impossible that clashes could still arise if a

“;” appeared both as the separator and in manually typed address’s name.

To improve, this edition uses “,” as the recipient separator, and fully parses email

address lists with the email package’s getaddresses and parseaddr tools, instead of

splitting naively. Because these tools fully parse the list’s content, “,” characters

embedded in email address name parts are not mistakenly takes as address sepa-

rators, and so do not clash. Servers and clients generally expect “,” separators, too,

so this works naturally.

With this fix, commas can appear both as address separators as well as embedded

in address name components. For replies, this is handled automatically: the To

field is prefilled with the From in the original message. For sends, the split happens

automatically in email tools for To, Cc, and Bcc headers fields (the latter two are

ignored if they contain just the initial “?” when sent).

HTML help display

Help can now be displayed in text form in a GUI window, in HTML form in a

locally running web browser, or both. User settings in the mailconfig module select

which form or forms to display. The HTML version is new; it uses a simple

1014 | Chapter 14: The PyMailGUI Client

translation of the help text with added links to sections and external sites and

Python’s webbrowser module, discussed earlier in this book, to open a browser. The

text help display is now redundant, but it is retained because the HTML display

currently lacks its ability to open source file viewers.

Thread callback queue speedup

The global thread queue dispatches GUI update callbacks much faster now—up

to 100 times per second, instead of the prior 10. This is due both to checking more

frequently (20 timer events per second versus 10) and to dispatching more callbacks

per timer event (5 versus the prior 1). Depending on the interleaving of queue puts

and gets, this speeds up initial loads for larger mailboxes by as much as an order

of magnitude (factor of 10), at some potential minor cost in CPU utilization. On

my Windows 7 laptop, though, PyMailGUI still shows 0% CPU utilization in Task

Manager when idle.

I bumped up the queue’s speed to support an email account having 4,800 inbox

messages (actually, even more by the time I got around to taking screenshots for

this chapter). Without the speedup, initial header loads for this account took 8

minutes to work through the 4,800 progress callbacks (4800 ÷ 10 ÷ 60), even

though most reflected messages skipped immediately by the new mail fetch limits

(see the next item). With the speedup, the initial load takes just 48 seconds—

perhaps not ideal still, but this initial headers load is normally performed only once

per session, and this policy strikes a balance between CPU resources and respon-

siveness. This email account is an arguably pathological case, of course, but most

initial loads benefit from the faster speed.

See Chapter 10’s threadtools for most of this change’s code, as well as additional

background details. We could alternatively loop through all queued events on each

timer event, but this may block the GUI indefinitely if updates are queued quickly.

Mail fetch limits

Since 2.1, PyMailGUI loads only mail headers initially, not full mail text, and only

loads newly arrived headers thereafter. Depending on your Internet and server

speeds, though, this may still be impractical for very large inboxes (as mentioned,

one of mine currently has some 4,800 emails). To support such cases, a new mail

config setting can be used to limit the number of headers (or full mails if TOP is

unsupported) fetched on loads.

Given this setting N, PyMailGUI fetches at most N of the most recently arrived

mails. Older mails outside this set are not fetched from the server, but are displayed

as empty/dummy emails which are mostly inoperative (though they can generally

still be fetched on demand).

This feature is inherited from mailtools code in Chapter 13; see the mailconfig

module ahead for the user setting associated with it. Note that even with this fix,

because the threadtools queue system used here dispatches GUI events such as

progress updates only up to 100 times per second, a 4,800 mail inbox still takes

Major PyMailGUI Changes | 1015

48 seconds to complete an initially header load. The queue should either run faster

still, or I should delete an email once in a while!

HTML main text extraction (prototype)

PyMailGUI is still somewhat plain-text biased, despite the emergence of HTML

emails in recent years. When the main (or only) text part of a mail is HTML, it is

displayed in a popped-up web browser. In the prior version, though, its HTML

text was still displayed in a PyEdit text editor component and was still quoted for

the main text of replies and forwards.

Because most people are not HTML parsers, this edition’s version attempts to do

better by extracting plain text from the part’s HTML with a simple HTML parsing

step. The extracted plain text is then displayed in the mail view window and used

as original text in replies and forwards.

This HTML parser is at best a prototype and is largely included to provide a first

step that you can tailor for your tastes and needs, but any result it produces is better

than showing raw HTML. If this fails to render the plain text well, users can still

fall back on viewing in the web browser and cutting and pasting from there into

replies and forwards. See also the note about open source alternatives by this

parser’s source code later in this chapter; this is an already explored problem

domain.

Reply copies all original recipients by default

In this version, replies are really reply-to-all by default—they automatically prefill

the Cc header in the replies composition window with all the original recipients of

the message. To do so, replies extract all addresses among both the original To and

Cc headers, and remove duplicates as well as the new sender’s address by using set

operations. The net effect is to copy all other recipients on the reply. This is in

addition to replying to the sender by initializing To with the original sender’s

address.

This feature is intended to reflect common usage: email circulated among groups.

Since it might not always be desirable, though, it can be disabled in mailconfig so

that replies initialize just To headers to reply to the original sender only. If enabled,

users may need to delete the Cc prefill if not wanted; if disabled, users may need

to insert Cc addresses manually instead. Both cases seem equally likely. Moreover,

it’s not impossible that the original recipients include mail list names, aliases, or

spurious addresses that will be either incorrect or irrelevant when the reply is sent.

Like the Bcc prefill described in the next item, the reply’s Cc initialization can be

edited prior to sends if needed, and disabled entirely if preferred. Also see the

suggested enhancements for this feature at the end of this chapter—allowing this

to be enabled or disabled in the GUI per message might be a better approach.

Other upgrades: Bcc prefills, “Re” and “Fwd” case, list size, duplicate recipients

In addition, there have been smaller enhancements throughout. Among them: Bcc

headers in edit windows are now prefilled with the sender’s address as a conven-

ience (a common role for this header); Reply and Forward now ignore case when

1016 | Chapter 14: The PyMailGUI Client

determining if adding a “Re:” or “Fwd:” to the subject would be redundant; mail

list window width and height may now be configured in mailconfig; duplicates are

removed from the recipient address list in mailtools on sends to avoid sending

anyone multiple copies of the same mail (e.g., if an address appears in both To and

Cc); and other minor improvements which I won’t cover here. Look for “3.0” and

“4E” in program comments here and in the underlying mailtools package of

Chapter 13 to see other specific code changes.

Unicode (Internationalization) support

I’ve saved the most significant PyMailGUI 3.0 upgrade for last: it now supports

Unicode encoding of fetched, saved, and sent mails, to the extent allowed by the

Python 3.1 email package. Both text parts of messages and message headers are

decoded when displayed and encoded when sent. Since this is too large a change

to explain in this format, the next section elaborates.

Version 3.0 Unicode support policies

The last item on the preceding list is probably the most substantial. Per Chapter 13, a

user-configurable setting in the mailconfig module is used on a session-wide basis to

decode full message bytes into Unicode strings when fetched, and to encode and decode

mail messages stored in text-mode save files.

More visibly, when composing, the main text and attached text parts of composed mails

may be given explicit Unicode encodings in mailconfig or via user input; when viewing,

message header information of parsed emails is used to determine the Unicode types

of both the main mail text as well as text parts opened on demand. In addition, Inter-

nationalized mail headers (e.g., Subject, To, and From) are decoded per email, MIME,

and Unicode standards when displayed according to their own content, and are auto-

matically encoded if non-ASCII when sent.

Other Unicode policies (and fixes) of Chapter 13’s mailtools package are inherited

here, too; see the prior chapter for more details. In summation, here is how all these

policies play out in terms of user interfaces:

Fetched emails

When fetching mails, a session-wide user setting is used to decode full message

bytes to Unicode strings, as required by Python’s current email parser; if this fails,

a handful of guesses are applied. Most mail text will likely be 7 or 8 bit in nature,

since original email standards required ASCII.

Composed text parts

When sending new mails, user settings are used to determine Unicode type for the

main text part and any text attachment parts. If these are not set in mailconfig, the

user will instead be asked for encoding names in the GUI for each text part. These

are ultimately used to add character set headers, and to invoke MIME encoding.

In all cases, the program falls back on UTF-8 if the user’s encoding setting or input

does not work for the text being sent—for instance, if the user has chosen ASCII

Major PyMailGUI Changes | 1017

for the main text of a reply to or forward of a non-ASCII message or for non-ASCII

attachments.

Composed headers

When sending new mails, if header lines or the name component of an email address

in address-related lines do not encode properly as ASCII text, we first encode the

header per email Internationalization standard. This is done per UTF-8 by default,

but a mailconfig setting can request a different encoding. In email address pairs,

names which cannot be encoded are dropped, and only the email address is used.

It is assumed that servers will respect the encoded names in email addresses.

Displayed text parts

When viewing fetched mail, Unicode encoding names in message headers are used

to decode whenever possible. The main-text part is decoded into str Unicode text

per header information prior to inserting it into a PyEdit component. The content

of all other text parts, as well as all binary parts, is saved in bytes form in binary-

mode files, from where the part may be opened later in the GUI on demand. When

such on-demand text parts are opened, they are displayed in PyEdit pop-up win-

dows by passing to PyEdit the name of the part’s binary-mode file, as well as the

part’s encoding name obtained from part message headers.

If the encoding name in a text part’s header is absent or fails to decode, encoding

guesses are tried for main-text parts, and PyEdit’s separate Unicode policies are

applied to text parts opened on demand (see Chapter 11—it may prompt for an

encoding if not known). In addition to these rules, HTML text parts are saved in

binary mode and opened in a web browser, relying on the browser’s own character

set support; this may in turn use tags in the HTML itself, guesses, or user encoding

selections.

Displayed headers

When viewing email, message headers are automatically decoded per email stand-

ards. This includes both full headers such as Subject, as well as the name compo-

nents of all email address fields in address-related headers such as From, To, and

Cc, and allows these components to be completely encoded or contain encoded

substrings. Because their content gives their MIME and Unicode encodings, no

user interaction is required to decode headers.

In other words, PyMailGUI now supports Internationalized message display and com-

position for both payloads and headers. For broadest utility, this support is distributed

across multiple packages and examples. For example, Unicode decoding of full message

text on fetches actually occurs deep in the imported mailtool package classes. Because

of this, full (unparsed) message text is always Unicode str here. Similarly, headers are

decoded for display here using tools implemented in mailtools, but headers encoding

is both initiated and performed within mailtools itself on sends.

Full text decoding illustrates the types of choices required. It is done according to the

fetchEncoding variable in the mailconfig module. This user setting is used across an

1018 | Chapter 14: The PyMailGUI Client

entire PyMailGUI session to decode fetched message bytes to the required str text prior

to parsing, and to save and load full message text to save files. Users may set this variable

to a Unicode encoding name string which works for their mails’ encodings; “latin-1”,

“utf-8”, and “ascii” are reasonable guesses for most emails, as email standards originally

called for ASCII (though “latin-1” was required to decode some old mail save files

generated by the prior version). If decoding with this encoding name fails, other com-

mon encodings are attempted, and as a last resort the message is still displayed if its

headers can be decoded, but its body is changed to an error message; to view such

unlikely mails, try running PyMailGUI again with a different encoding.

In the negatives column, nothing is done about the Unicode format for the full text of

sent mails, apart from that inherited from Python’s libraries (as we learned in Chap-

ter 13, smtplib attempts to encode per ASCII when messages are sent, which is one

reason that header encoding is required). And while mail content character sets are

fully supported, the GUI itself still uses English for its labels and buttons.

As explained in Chapter 13, this program’s Unicode polices are a broad but partial

solution, because the email package in Python 3.1, upon which PyMailGUI utterly relies

for correct operation, is in a state of flux for some use cases. An updated version which

handles the Python 3.X str/bytes distinctions more accurately and completely is likely

to appear in the future; watch this book’s updates page (see the Preface) for future

changes and improvements to this program’s Unicode policies. Hopefully, the current

email package underlying PyMailGUI 3.0 will be available for some time to come.

Although there is still room for improvement (see the list at the end of this chapter),

the PyMailGUI program is able to provide a full-featured email interface, represents

the most substantial example in this book, and serves to demonstrate a realistic appli-

cation of the Python language and software engineering at large. As its users often attest,

Python may be fun to work with, but it’s also useful for writing practical and nontrivial

software. This example, more than any other in this book, testifies the same. The next

section shows how.

A PyMailGUI Demo

PyMailGUI is a multiwindow interface. It consists of the following:

• A main mail-server list window opened initially, for online mail processing

• One or more mail save-file list windows for offline mail processing

• One or more mail-view windows for viewing and editing messages

• PyEdit windows for displaying raw mail text, extracted text parts, and the system’s

source code

• Nonblocking busy state pop-up dialogs

• Assorted pop-up dialogs for opened message parts, help, and more

A PyMailGUI Demo | 1019

Operationally, PyMailGUI runs as a set of parallel threads, which may overlap in time:

one for each active server transfer, and one for each active offline save file load or

deletion. PyMailGUI supports mail save files, automatic saves of sent messages, con-

figurable fonts and colors, viewing and adding attachments, main message text extrac-

tion, plain text conversion for HTML, and much more.

To make this case study easier to understand, let’s begin by seeing what PyMailGUI

actually does—its user interaction and email processing functionality—before jumping

into the Python code that implements that behavior. As you read this part, feel free to

jump ahead to the code listings that appear after the screenshots, but be sure to read

this section, too; this, along with the prior discussion of version changes, is where some

subtleties of PyMailGUI’s design are explained. After this section, you are invited to

study the system’s Python source code listings on your own for a better and more

complete explanation than can be crafted in English.

Getting Started

OK, it’s time to take the system out for a test drive. I’m going to run the following demo

on my Windows 7 laptop. It may look slightly different on different platforms (includ-

ing other versions of Windows) thanks to the GUI toolkit’s native-look-and-feel sup-

port, but the basic functionality will be similar.

PyMailGUI is a Python/tkinter program, run by executing its top-level script file,

PyMailGui.py. Like other Python programs, PyMailGUI can be started from the system

command line, by clicking on its filename icon in a file explorer interface, or by pressing

its button in the PyDemos or PyGadgets launcher bar. However it is started, the first

window PyMailGUI presents is captured in Figure 14-1, shown after running a Load

to fetch mail headers from my ISP’s email server. Notice the “PY” window icon: this is

the handiwork of window protocol tools we wrote earlier in this book. Also notice the

non-ASCII subject lines here; I’ll talk about Internationalization features later.

This is the PyMailGUI main window—every operation starts here. It consists of:

• A help button (the bar at the top)

• A clickable email list area for fetched emails (the middle section)

• A button bar at the bottom for processing messages selected in the list area

In normal operation, users load their email, select an email from the list area by clicking

on it, and press a button at the bottom to process it. No mail messages are shown

initially; we need to first load them with the Load button—a simple password input

dialog is displayed, a busy dialog appears that counts down message headers being

downloaded to give a status indication, and the index is filled with messages ready to

be selected.

1020 | Chapter 14: The PyMailGUI Client

PyMailGUI’s list windows, such as the one in Figure 14-1, display mail header details

in fixed-width columns, up to a maximum size. Mails with attachments are prefixed

with a “*” in mail index list windows, and fonts and colors in PyMailGUI windows like

this one can be customized by the user in the mailconfig configuration file. You can’t

tell in this black-and-white book, but most of the mail index lists we’ll see are configured

to be Indian red, view windows are light blue, pop-up PyEdit windows are beige instead

of PyEdit’s normal light cyan, and help is steel blue. You can change most of these as

you like, and PyEdit pop-up window appearance can be altered in the GUI itself (see

Example 8-11 for help with color definition strings, and watch for alternative config-

uration examples ahead).

List windows allow multiple messages to be selected at once—the action selected at

the bottom of the window is applied to all selected mails. For instance, to view many

mails, select them all and press View; each will be fetched (if needed) and displayed in

its own view window. Use the All check button in the bottom right corner to select or

deselect every mail in the list, and Ctrl-Click and Shift-Click combinations to select

more than one (the standard Windows multiple selection operations apply—try it).

Before we go any further, though, let’s press the help bar at the top of the list window

in Figure 14-1 to see what sort of help is available; Figure 14-2 shows the text-based

help window pop up that appears—one of two help flavors available.

The main part of this window is simply a block of text in a scrolled-text widget, along

with two buttons at the bottom. The entire help text is coded as a single triple-quoted

string in the Python program. As we’ll see in a moment, a fancier option which opens

Figure 14-1. PyMailGUI main server list window

A PyMailGUI Demo | 1021

an HTML rendition of this text in a spawned web browser is also available, but simple

text is sufficient for many people’s tastes.‡ The Cancel button makes this nonmodal

(i.e., nonblocking) window go away. More interestingly, the Source button pops up

PyEdit text editor viewer windows for all the source files of PyMailGUI’s implementa-

tion; Figure 14-3 captures one of these (there are many; this is intended as a demon-

stration, not as a development environment). Not every program shows you its source

code, but PyMailGUI follows Python’s open source motif.

New in this edition, help is also displayed in HTML form in a web browser, in addition

to or instead of the scrolled text display just shown. Choosing help in text, HTML, or

both is controlled by a setting in the mailconfig module. The HTML flavor uses the

Python webbrowser module to pop up the HTML file in a browser on the local machine,

Figure 14-2. PyMailGUI text help pop up

‡ Actually, the help display started life even less fancy: it originally displayed help text in a standard information

pop up common dialog, generated by the tkinter showinfo call used earlier in the book. This worked fine on

Windows (at least with a small amount of help text), but it failed on Linux because of a default line-length

limit in information pop-up boxes; lines were broken so badly as to be illegible. Over the years, common

dialogs were replaced by scrolled text, which has now been largely replaced by HTML; I suppose the next

edition will require a holographic help interface…

1022 | Chapter 14: The PyMailGUI Client

and currently lacks the source-file opening button of the text display version (one reason

you may wish to display the text viewer, too). HTML help is captured in Figure 14-4.

When a message is selected for viewing in the mail list window by a mouse click and

View press, PyMailGUI downloads its full text (if it has not yet been downloaded in

this session), and a formatted email viewer window appears, as captured in Fig-

ure 14-5 for an existing message in my account’s inbox.

View windows are built in response to actions in list windows and take the following

form:

• The top portion consists of action buttons (Part to list all message parts, Split to

save and open parts using a selected directory, and Cancel to close this nonmodal

window), along with a section for displaying email header lines (From, To, and

so on).

• In the middle, a row of quick-access buttons for opening message parts, including

attachments, appears. When clicked, PyMailGUI opens known and generally safe

parts according to their type. Media types may open in a web browser or image

viewer, text parts in PyEdit, HTML in a web browser, Windows document types

per the Windows Registry, and so on.

Figure 14-3. PyMailGUI text help source code viewer window

A PyMailGUI Demo | 1023

• The bulk of this window (its entire lower portion) is just another reuse of the

TextEditor class object of the PyEdit program we wrote in Chapter 11—PyMailGUI

simply attaches an instance of TextEditor to every view and compose window in

order to get a full-featured text editor component for free. In fact, much on the

window shown in Figure 14-5 is implemented by TextEditor, not by PyMailGUI.

Reusing PyEdit’s class this way means that all of its tools are at our disposal for email

text—cut and paste, find and goto, saving a copy of the text to a file, and so on. For

instance, the PyEdit Save button at the bottom left of Figure 14-5 can be used to save

just the main text of the mail (as we’ll see later, clicking the leftmost part button in the

middle of the screen affords similar utility, and you can also save the entire message

from a list window). To make this reuse even more concrete, if we pick the Tools menu

of the text portion of this window and select its Info entry, we get the standard PyEdit

TextEditor object’s text statistics box shown in Figure 14-6—the same pop up we’d

get in the standalone PyEdit text editor and in the PyView image view programs we

wrote in Chapter 11.

In fact, this is the third reuse of TextEditor in this book: PyEdit, PyView, and now

PyMailGUI all present the same text-editing interface to users, simply because they all

use the same TextEditor object and code. PyMailGUI uses it in multiple roles—it both

attaches instances of this class for mail viewing and composition, and pops up instances

in independent windows for some text mail parts, raw message text display, and Python

Figure 14-4. PyMailGUI HTML help display (new in 3.0)

1024 | Chapter 14: The PyMailGUI Client

source-code viewing (we saw the latter in Figure 14-3). For mail view components,

PyMailGUI customizes PyEdit text fonts and colors per its own configuration module;

for pop ups, user preferences in a local textConfig module are applied.

To display email, PyMailGUI inserts its text into an attached TextEditor object; to

compose email, PyMailGUI presents a TextEditor and later fetches all its text to ship

over the Net. Besides the obvious simplification here, this code reuse makes it easy to

pick up improvements and fixes—any changes in the TextEditor object are automati-

cally inherited by PyMailGUI, PyView, and PyEdit.

In the third edition’s version, for instance, PyMailGUI supports edit undo and redo,

just because PyEdit had gained that feature. And in this fourth edition, all PyEdit im-

porters also inherit its new Grep file search, as well as its new support for viewing and

editing text of arbitrary Unicode encodings—especially useful for text parts in emails

of arbitrary origin like those displayed here (see Chapter 11 for more about PyEdit’s

evolution).

Loading Mail

Next, let’s go back to the PyMailGUI main server list window, and click the Load button

to retrieve incoming email over the POP protocol. PyMailGUI’s load function gets

Figure 14-5. PyMailGUI view window

A PyMailGUI Demo | 1025

account parameters from the mailconfig module listed later in this chapter, so be sure

to change this file to reflect your email account parameters (i.e., server names and

usernames) if you wish to use PyMailGUI to read your own email. Unless you can guess

the book’s email account password, the presets in this file won’t work for you.

The account password parameter merits a few extra words. In PyMailGUI, it may come

from one of two places:

Local file

If you put the name of a local file containing the password in the mailconfig mod-

ule, PyMailGUI loads the password from that file as needed.

Pop up dialog

If you don’t put a password filename in mailconfig (or if PyMailGUI can’t load it

from the file for whatever reason), PyMailGUI will instead ask you for your pass-

word anytime it is needed.

Figure 14-7 shows the password input prompt you get if you haven’t stored your pass-

word in a local file. Note that the password you type is not shown—a show='*' option

for the Entry field used in this pop up tells tkinter to echo typed characters as stars (this

option is similar in spirit to both the getpass console input module we met earlier in

the prior chapter and an HTML type=password option we’ll meet in a later chapter).

Once entered, the password lives only in memory on your machine; PyMailGUI itself

doesn’t store it anywhere in a permanent way.

Figure 14-6. PyMailGUI attached PyEdit info box

1026 | Chapter 14: The PyMailGUI Client

Also notice that the local file password option requires you to store your password

unencrypted in a file on the local client computer. This is convenient (you don’t need

to retype a password every time you check email), but it is not generally a good idea on

a machine you share with others, of course; leave this setting blank in mailconfig if you

prefer to always enter your password in a pop up.

Once PyMailGUI fetches your mail parameters and somehow obtains your password,

it will next attempt to pull down just the header text of all your incoming email from

your inbox on your POP email server. On subsequent loads, only newly arrived mails

are loaded, if any. To support obscenely large inboxes (like one of mine), the program

is also now clever enough to skip fetching headers for all but the last batch of messages,

whose size you can configure in mailconfig—they show up early in the mail list with

subject line “--mail skipped--”; see the 3.0 changes overview earlier for more details.

To save time, PyMailGUI fetches message header text only to populate the list window.

The full text of messages is fetched later only when a message is selected for viewing or

processing, and then only if the full text has not yet been fetched during this session.

PyMailGUI reuses the load-mail tools in the mailtools module of Chapter 13 to fetch

message header text, which in turn uses Python’s standard poplib module to retrieve

your email.

Threading Model

Now that we’re downloading mails, I need to explain the juggling act that PyMailGUI

performs to avoid becoming blocked and support operations that overlap in time. Ul-

timately, mail fetches run over sockets on relatively slow networks. While the download

is in progress, the rest of the GUI remains active—you may compose and send other

mails at the same time, for instance. To show its progress, the nonblocking dialog of

Figure 14-8 is displayed when the mail index is being fetched.

In general, all server transfers display such dialogs. Figure 14-9 shows the busy dialog

displayed while a full text download of five selected and uncached (not yet fetched)

mails is in progress, in response to a View action. After this download finishes, all five

pop up in individual view windows.

Such server transfers, and other long-running operations, are run in threads to avoid

blocking the GUI. They do not disable other actions from running in parallel, as long

as those actions would not conflict with a currently running thread. Multiple mail sends

and disjoint fetches can overlap in time, for instance, and can run in parallel with the

Figure 14-7. PyMailGUI password input dialog

A PyMailGUI Demo | 1027

GUI itself—the GUI responds to moves, redraws, and resizes during the transfers.

Other transfers such as mail deletes must run all by themselves and disable other trans-

fers until they are finished; deletes update the inbox and internal caches too radically

to support other parallel operations.

On systems without threads, PyMailGUI instead goes into a blocked state during such

long-running operations (it essentially stubs out the thread-spawn operation to perform

a simple function call). Because the GUI is essentially dead without threads, covering

and uncovering the GUI during a mail load on such platforms will erase or otherwise

distort its contents. Threads are enabled by default on most platforms that run Python

(including Windows), so you probably won’t see such oddness on your machine.

Figure 14-8. Nonblocking progress indicator: Load

Figure 14-9. Nonblocking progress indicator: View

1028 | Chapter 14: The PyMailGUI Client

Threading model implementation

On nearly every platform, though, long-running tasks like mail fetches and sends are

spawned off as parallel threads, so that the GUI remains active during the transfer—it

continues updating itself and responding to new user requests, while transfers occur

in the background. While that’s true of threading in most GUIs, here are two notes

regarding PyMailGUI’s specific implementation and threading model:

GUI updates: exit callback queue

As we learned earlier in this book, only the main thread that creates windows

should generally update them. See Chapter 9 for more on this; tkinter doesn’t

support parallel GUI changes. As a result, PyMailGUI takes care to not do anything

related to the user interface within threads that load, send, or delete email. Instead,

the main GUI thread continues responding to user interface events and updates,

and uses a timer-based event to watch a queue for exit callbacks to be added by

worker threads, using the thread tools we implemented earlier in Chapter 10

(Example 10-20). Upon receipt, the main GUI thread pulls the callback off the

queue and invokes it to modify the GUI in the main thread.

Such queued exit callbacks can display a fetched email message, update the mail

index list, change a progress indicator, report an error, or close an email compo-

sition window—all are scheduled by worker threads on the queue but performed

in the main GUI thread. This scheme makes the callback update actions automat-

ically thread safe: since they are run by one thread only, such GUI updates cannot

overlap in time.

To make this easy, PyMailGUI stores bound method objects on the thread queue,

which combine both the function to be called and the GUI object itself. Since

threads all run in the same process and memory space, the GUI object queued gives

access to all GUI state needed for exit updates, including displayed widget objects.

PyMailGUI also runs bound methods as thread actions to allow threads to update

state in general, too, subject to the next paragraph’s rules.

Other state updates: operation overlap locks

Although the queued GUI update callback scheme just described effectively re-

stricts GUI updates to the single main thread, it’s not enough to guarantee thread

safety in general. Because some spawned threads update shared object state used

by other threads (e.g., mail caches), PyMailGUI also uses thread locks to prevent

operations from overlapping in time if they could lead to state collisions. This

includes both operations that update shared objects in memory (e.g., loading mail

headers and content into caches), as well as operations that may update POP mes-

sage numbers of loaded email (e.g., deletions).

Where thread overlap might be an issue, the GUI tests the state of thread locks,

and pops up a message when an operation is not currently allowed. See the source

code and this program’s help text for specific cases where this rule is applied.

A PyMailGUI Demo | 1029

Operations such as individual sends and views that are largely independent can

overlap broadly, but deletions and mail header fetches cannot.

In addition, some potentially long-running save-mail operations are threaded to

avoid blocking the GUI, and this edition uses a set object to prevent fetch threads

for requests that include a message whose fetch is in progress in order to avoid

redundant work (see the 3.0 changes review earlier).

For more on why such things matter in general, be sure to see the discussion of threads

in GUIs in Chapters 5, 9, and 10. PyMailGUI is really just a concrete realization of

concepts we’ve explored earlier.

Load Server Interface

Let’s return to loading our email: because the load operation is really a socket operation,

PyMailGUI automatically connects to your email server using whatever connectivity

exists on the machine on which it is run. For instance, if you connect to the Net over

a modem and you’re not already connected, Windows automatically pops up the

standard connection dialog. On the broadband connections that most of us use today,

the interface to your email server is normally automatic.

After PyMailGUI finishes loading your email, it populates the main window’s scrolled

listbox with all of the messages on your email server and automatically scrolls to the

most recently received message. Figure 14-10 shows what the main window looks like

after selecting a message with a click and resizing—the text area in the middle grows

and shrinks with the window, revealing more header columns as it grows.

Figure 14-10. PyMailGUI main window resized

Technically, the Load button fetches all your mail’s header text the first time it is

pressed, but it fetches only newly arrived email headers on later presses. PyMailGUI

keeps track of the last email loaded, and requests only higher email numbers on later

loads. Already loaded mail is kept in memory, in a Python list, to avoid the cost of

downloading it again. PyMailGUI does not delete email from your server when it is

1030 | Chapter 14: The PyMailGUI Client

loaded; if you really want to not see an email on a later load, you must explicitly

delete it.

Entries in the main list show just enough to give the user an idea of what the message

contains—each entry gives the concatenation of portions of the message’s Subject,

From, Date, To, and other header lines, separated by | characters and prefixed with

the message’s POP number (e.g., there are 13 emails in this list). Columns are aligned

by determining the maximum size needed for any entry, up to a fixed maximum, and

the set of headers displayed can be configured in the mailconfig module. Use the hor-

izontal scroll or expand the window to see additional header details such as message

size and mailer.

As we’ve seen, a lot of magic happens when downloading email—the client (the ma-

chine on which PyMailGUI runs) must connect to the server (your email account ma-

chine) over a socket and transfer bytes over arbitrary Internet links. If things go wrong,

PyMailGUI pops up standard error dialog boxes to let you know what happened. For

example, if you type an incorrect username or password for your account (in the mail

config module or in the password pop up), you’ll receive the message in Fig-

ure 14-11. The details displayed here are just the Python exception type and exception

data. Additional details, including a stack trace, show up in standard output (the con-

sole window) on errors.

Figure 14-11. PyMailGUI invalid password error box

Offline Processing with Save and Open

We’ve seen how to fetch and view emails from a server, but PyMailGUI can also be

used in completely offline mode. To save mails in a local file for offline processing,

select the desired messages in any mail list window and press the Save action button;

as usual, any number of messages may be selected for saving together as a set. A

standard file-selection dialog appears, like that in Figure 14-12, and the mails are saved

to the end of the chosen text file.

A PyMailGUI Demo | 1031

Figure 14-12. Save mail selection dialog

To view saved emails later, select the Open action at the bottom of any list window

and pick your save file in the selection dialog. A new mail index list window appears

for the save file and it is filled with your saved messages eventually—there may be a

slight delay for large save files, because of the work involved. PyMailGUI runs file loads

and deletions in threads to avoid blocking the rest of the GUI; these threads can overlap

with operations on other open save-mail files, server transfer threads, and the GUI at

large.

While a mail save file is being loaded in a parallel thread, its window title is set to

“Loading…” as a status indication; the rest of the GUI remains active during the load

(you can fetch and delete server messages, view mails in other files, write new messages,

and so on). The window title changes to the loaded file’s name after the load is finished.

Once filled, a message index appears in the save file’s window, like the one captured

in Figure 14-13 (this window also has three mails selected for processing).

In general, there can be one server mail list window and any number of save-mail file

list windows open at any time. Save-mail file list windows like that in Figure 14-13 can

be opened at any time, even before fetching any mail from the server. They are identical

to the server’s inbox list window, but there is no help bar, the Load action button is

omitted since this is not a server view, and all other action buttons are mapped to the

save file, not to the server.

1032 | Chapter 14: The PyMailGUI Client

For example, View opens the selected message in a normal mail view window identical

to that in Figure 14-5, but the mail originates from the local file. Similarly, Delete re-

moves the message from the save file, instead of from the server’s inbox. Deletions from

save-mail files are also run in a thread, to avoid blocking the rest of the GUI—the

window title changes to “Deleting…” during the delete as a status indicator. Status

indicators for loads and deletions in the server inbox window use pop ups instead,

because the wait is longer and there is progress to display (see Figure 14-8).

Technically, saves always append raw message text to the chosen file; the file is opened

in 'a' mode to append text, which creates the file if it’s new and writes at its end. The

Save and Open operations are also smart enough to remember the last directory you

selected; their file dialogs begin navigation there the next time you press Save or Open.

You can also save mails from a saved file’s window—use Save and Delete to move mails

from file to file. In addition, saving to a file whose window is open for viewing auto-

matically updates that file’s list window in the GUI. This is also true for the automat-

ically written sent-mail save file, described in the next section.

Sending Email and Attachments

Once we’ve loaded email from the server or opened a local save file, we can process

our messages with the action buttons at the bottom of list windows. We can also send

new emails at any time, even before a load or open. Pressing the Write button in any

list window (server or file) generates a mail composition window; one has been cap-

tured in Figure 14-14.

Figure 14-13. List window for mail save file, multiple selections

A PyMailGUI Demo | 1033

Figure 14-14. PyMailGUI write-mail compose window

This window is much like the message view window we saw in Figure 14-5, except

there are no quick-access part buttons in the middle (this window is a new mail). It has

fields for entering header line detail, action buttons for sending the email and managing

attachment files added to it when sent, and an attached TextEditor object component

for writing and editing the main text of the new email.

The PyEdit text editor component at the bottom has no File menu in this role, but it

does have a Save button—useful for saving a draft of your mail’s text in a file. You can

cut and paste this temporary copy into a composition window later if needed to begin

composing again from scratch. PyEdit’s separate Unicode policies apply to mail text

drafts saved this way (it may ask for an encoding—see Chapter 11).

For write operations, PyMailGUI automatically fills the From line and inserts a signa-

ture text line (the last two lines shown), from your mailconfig module settings. You

can change these to any text you like in the GUI, but the defaults are filled in auto-

matically from your mailconfig. When the mail is sent, an email.utils call handles

date and time formatting in the mailtools module in Chapter 13.

There is also a new set of action buttons in the upper left here: Cancel closes the window

(if verified), and Send delivers the mail—when you press the Send button, the text you

typed into the body of this window is mailed to all the addresses you typed into the To,

Cc, and Bcc lines, after removing duplicates, and using Python’s smtplib module. Py-

MailGUI adds the header fields you type as mail header lines in the sent message

(exception: Bcc recipients receive the mail, but no header line is generated).

To send to more than one address, separate them with a comma character in header

fields, and feel free to use full “name” <address> pairs for recipients. In this mail, I fill

in the To header with my own email address in order to send the message to myself for

1034 | Chapter 14: The PyMailGUI Client

illustration purposes. New in this version, PyMailGUI also prefills the Bcc header with

the sender’s own address if this header is enabled in mailconfig; this prefill sends a copy

to the sender (in addition to that written to the sent-mail file), but it can be deleted if

unwanted.

Also in compose windows, the Attach button issues a file selection dialog for attaching

a file to your message, as in Figure 14-15. The Parts button pops up a dialog displaying

files already attached, like that in Figure 14-16. When your message is sent, the text in

the edit portion of the window is sent as the main message text, and any attached part

files are sent as attachments properly encoded according to their type.

Figure 14-15. Attachment file dialog for Attach

As we’ve seen, smtplib ultimately sends bytes to a server over a socket. Since this can

be a long-running operation, PyMailGUI delegates this operation to a spawned thread,

too. While the send thread runs, a nonblocking wait window appears and the entire

GUI stays alive; redraw and move events are handled in the main program thread while

the send thread talks to the SMTP server, and the user may perform other tasks in

parallel, including other views and sends.

You’ll get an error pop up if Python cannot send a message to any of the target recipients

for any reason, and the mail composition window will pop up so that you can try again

or save its text for later use. If you don’t get an error pop up, everything worked cor-

rectly, and your mail will show up in the recipients’ mailboxes on their email servers.

A PyMailGUI Demo | 1035

Since I sent the earlier message to myself, it shows up in mine the next time I press the

main window’s Load button, as we see in Figure 14-17.

Figure 14-17. PyMailGUI main window after loading sent mail

Figure 14-16. Attached parts list dialog for Parts

1036 | Chapter 14: The PyMailGUI Client

If you look back to the last main window shot, you’ll notice that there is only one new

email now—PyMailGUI is smart enough to download only the one new message’s

header text and tack it onto the end of the loaded email list. Mail send operations

automatically save sent mails in a save file that you name in your configuration module;

use Open to view sent messages in offline mode and Delete to clean up the sent mail

file if it grows too large (you can also save from the sent-mail file to another file to copy

mails into other save files per category).

Viewing Email and Attachments

Now let’s view the mail message that was sent and received. PyMailGUI lets us view

email in formatted or raw mode. First, highlight (single-click) the mail you want to see

in the main window, and press the View button. After the full message text is down-

loaded (unless it is already cached), a formatted mail viewer window like that shown

in Figure 14-18 appears. If multiple messages are selected, the View button will down-

load all that are not already cached (i.e., that have not already been fetched) and will

pop up a view window for each selected. Like all long-running operations, full message

downloads are run in parallel threads to avoid blocking.

Figure 14-18. PyMailGUI view incoming mail window

Python’s email module is used to parse out header lines from the raw text of the email

message; their text is placed in the fields in the top right of the window. The message’s

main text is fetched from its body and stuffed into a new TextEditor object for display

at the window bottom. PyMailGUI uses heuristics to extract the main text of the mes-

sage to display, if there is one; it does not blindly show the entire raw text of the mail.

HTML-only mail is handled specially, but I’ll defer details on this until later in this

demo.

A PyMailGUI Demo | 1037

Any other parts of the message attached are displayed and opened with quick-access

buttons in the middle. They are also listed by the Parts pop up dialog, and they can be

saved and opened all at once with Split. Figure 14-19 shows this window’s Parts list

pop up, and Figure 14-20 displays this window’s Split dialog in action.

Figure 14-19. Parts dialog listing all message parts

Figure 14-20. Split dialog selection

1038 | Chapter 14: The PyMailGUI Client

When the Split dialog in Figure 14-20 is submitted, all message parts are saved to the

directory you select, and known parts are automatically opened. Individual parts are

also automatically opened by the row of quick-access buttons labeled with the part’s

filename in the middle of the view window, after being saved to a temporary directory;

this is usually more convenient, especially when there are many attachments.

For instance, Figure 14-21 shows the two image parts attached to the mail we sent open

on my Windows laptop, in a standard image viewer on that platform; other platforms

may open this in a web browser instead. Click the image filenames’ quick-access but-

tons just below the message headers in Figure 14-18 to view them immediately, or run

Split to open all parts at once.

Figure 14-21. PyMailGUI opening image parts in a viewer or browser

By this point, the photo attachments displayed in Figure 14-21 have really gotten

around: they have been MIME encoded, attached, and sent, and then fetched, parsed,

and MIME decoded. Along the way, they have moved through multiple machines—

from the client, to the SMTP server, to the POP server, and back to the client, crossing

arbitrary distances along the way.

In terms of user interaction, we attached the images to the email in Figure 14-14 using

the dialog in Figure 14-15 before we sent the email. To access them later, we selected

the email for viewing in Figure 14-17 and clicked on their quick-access button in Fig-

ure 14-18. PyMailGUI encoded the photos in Base64 form, inserted them in the email’s

text, and later extracted and decoded it to get the original photos. With Python email

tools, and our own code that rides above them, this all just works as expected.

Notice how in Figures 14-18 and 14-19 the main message text counts as a mail part,

too—when selected, it opens in a PyEdit window, like that captured in Figure 14-22,

A PyMailGUI Demo | 1039

from which it can be processed and saved (you can also save the main mail text with

the Save button in the View window itself). The main part is included, because not all

mails have a text part. For messages that have only HTML for their main text part,

PyMailGUI displays plain text extracted from its HTML text in its own window, and

opens a web browser to view the mail with its HTML formatting. Again, I’ll say more

on HTML-only mails later.

Figure 14-22. Main text part opened in PyEdit

Besides images and plain text, PyMailGUI also opens HTML and XML attachments in

a web browser and uses the Windows Registry to open well-known Windows docu-

ment types. For example, .doc and .docx, .xls and .xlsx, and .pdf files usually open,

respectively, in Word, Excel, and Adobe Reader. Figure 14-23 captures the response

to the lp4e-pref.html quick-access part button in Figure 14-18 on my Windows laptop.

If you inspect this screenshot closely, or run live for a better look, you’ll notice that the

HTML attachment is displayed in both a web browser and a PyEdit window; the latter

can be disabled in mailconfig, but is on by default to give an indication of the HTML’s

encoding.

The quick-access buttons in the middle of the Figure 14-18 view window are a more

direct way to open parts than Split—you don’t need to select a save directory, and you

can open just the part you want to view. The Split button, though, allows all parts to

be opened in a single step, allows you to choose where to save parts, and supports an

arbitrary number of parts. Files that cannot be opened automatically because of their

type can be inspected in the local save directory, after both Split and quick-access but-

ton selections (pop up dialogs name the directory to use for this).

After a fixed maximum number of parts, the quick-access row ends with a button

labeled “...”, which simply runs Split to save and open additional parts when selected.

Figure 14-24 captures one such message in the GUI; this message is available in

SavedMail file version30-4E if you want to view it offline—a relatively complex mail,

with 11 total parts of mixed types.

1040 | Chapter 14: The PyMailGUI Client

Figure 14-24. View window for a mail with many parts

Like much of PyMailGUI’s behavior, the maximum number of part buttons to display

in view windows can be configured in the mailconfig.py user settings module. That

setting specified eight buttons in Figure 14-24. Figure 14-25 shows what the same mail

looks like when the part buttons setting has been changed to a maximum of five. The

setting can be higher than eight, but at some point the buttons may become unreadable

(use Split instead).

Figure 14-23. Attached HTML part opened in a web browser

A PyMailGUI Demo | 1041

Figure 14-25. View window with part buttons setting decreased

As a sample of other attachments’ behavior, Figures 14-26 and 14-27 show what hap-

pens when the sousa.au and chapter25.pdf buttons in Figures 14-24 and 14-18 are

pressed on my Windows laptop. The results vary per machine; the audio file opens in

Windows Media Player, MP3 files open in iTunes instead, and some platforms may

open such files directly in a web browser.

Figure 14-26. An audio part opened by PyMailGUI

Besides the nicely formatted view window, PyMailGUI also lets us see the raw text of

a mail message. Double-click on a message’s entry in the main window’s list to bring

1042 | Chapter 14: The PyMailGUI Client

up a simple unformatted display of the mail’s raw text (its full text is downloaded in a

thread if it hasn’t yet been fetched and cached). Part of the raw version of the mail I

sent to myself in Figure 14-18 is shown in Figure 14-28; in this edition, raw text is

displayed in a PyEdit pop-up window (its prior scrolled-text display is still present as

an option, but PyEdit adds tools such as searching, saves, and so on).

This raw text display can be useful to see special mail headers not shown in the for-

matted view. For instance, the optional X-Mailer header in the raw text display iden-

tifies the program that transmitted a message; PyMailGUI adds it automatically, along

with standard headers like From and To. Other headers are added as the mail is trans-

mitted: the Received headers name machines that the message was routed through on

its way to our email server, and Content-Type is added and parsed by Python’s email

package in response to calls from PyMailGUI.

And really, the raw text form is all there is to an email message—it’s what is transferred

from machine to machine when mail is sent. The nicely formatted display of the GUI’s

view windows simply parses out and decodes components from the mail’s raw text

with standard Python tools, and places them in the associated fields of the display.

Notice the Base64 encoding text of the image file at the end of Figure 14-28, for ex-

ample; it’s created when sent, transferred over the Internet, and decoded when fetched

to recreate the image’s original bytes. Quite a feat, but largely automatic with the code

and libraries invoked.

Email Replies and Forwards and Recipient Options

In addition to reading and writing email, PyMailGUI also lets users forward and reply

to incoming email sent from others. These are both just composition operations, but

they quote the original text and prefill header lines as appropriate. To reply to an email,

select its entry in the main window’s list and click the Reply button. If I reply to the

Figure 14-27. A PDF part opened in PyMailGUI

A PyMailGUI Demo | 1043

mail I just sent to myself (arguably narcissistic, but demonstrative), the mail composi-

tion window shown in Figure 14-29 appears.

This window is identical in format to the one we saw for the Write operation, except

that PyMailGUI fills in some parts automatically. In fact, the only thing I’ve added in

this window is the first line in the text editor part; the rest is filled in by PyMailGUI:

• The From line is set to your email address in your mailconfig module.

• The To line is initialized to the original message’s From address (we’re replying to

the original sender, after all).

• The Subject line is set to the original message’s subject line, prepended with a “Re:”,

the standard follow-up subject line form (unless it already has one, in uppercase

or lowercase).

Figure 14-28. PyMailGUI raw mail text view window (PyEdit)

1044 | Chapter 14: The PyMailGUI Client

• The optional Bcc line, if enabled in the mailconfig module, is prefilled with the

sender’s address, too, since it’s often used this way to retain a copy (new in this

version).

• The body of the reply is initialized with the signature line in mailconfig, along with

the original message’s text. The original message text is quoted with > characters

and is prepended with a few header lines extracted from the original message to

give some context.

• Not shown in this example and new in this version, too, the Cc header in replies

is also prefilled with all the original recipients of the message, by extracting ad-

dresses among the original To and Cc headers, removing duplicates, and removing

your address from the result. In other words, Reply really is Reply-to-All by

default—it replies to the sender and copies all other recipients as a group. Since

the latter isn’t always desirable, it can be disabled in mailconfig so that replies only

initialize To with the original sender. You can also simply delete the Cc prefill if

not wanted, but you may have to add addresses to Cc manually if this feature is

disabled. We’ll see reply Cc prefills at work later.

Figure 14-29. PyMailGUI reply compose window

A PyMailGUI Demo | 1045

Luckily, all of this is much easier than it may sound. Python’s standard email module

extracts all of the original message’s header lines, and a single string replace method

call does the work of adding the > quotes to the original message body. I simply type

what I wish to say in reply (the initial paragraph in the mail’s text area) and press the

Send button to route the reply message to the mailbox on my mail server again. Phys-

ically sending the reply works the same as sending a brand-new message—the mail is

routed to your SMTP server in a spawned send-mail thread, and the send-mail wait

pop up appears while the thread runs.

Forwarding a message is similar to replying: select the message in the main window,

press the Fwd button, and fill in the fields and text area of the popped-up composition

window. Figure 14-30 shows the window created to forward the mail we originally

wrote and received after a bit of editing.

Figure 14-30. PyMailGUI forward compose window

Much like replies, forwards fill From with the sender’s address in mailconfig; the orig-

inal text is automatically quoted in the message body again; Bcc is preset initially the

same as From; and the subject line is preset to the original message’s subject prepended

with the string “Fwd:”. All these lines can be changed manually before sending if you

wish to tailor. I always have to fill in the To line manually, though, because a forward

is not a direct reply—it doesn’t necessarily go back to the original sender. Further, the

1046 | Chapter 14: The PyMailGUI Client

Cc prefill of original recipients done by Reply isn’t performed for forwards, because

they are not a continuation of group discussions.

Notice that I’m forwarding this message to three different addresses (two in the To,

and one manually entered in the Bcc). I’m also using full “name <address>” formats

for email addresses. Multiple recipient addresses are separated with a comma (,) in the

To, Cc, and Bcc header fields, and PyMailGUI is happy to use the full address form

anywhere you type an address, including your own in mailconfig. As demonstrated by

the first To recipient in Figure 14-30, commas in address names don’t clash with those

that separate recipients, because address lines are parsed fully in this version. When

we’re ready, the Send button in this window fires the forwarded message off to all

addresses listed in these headers, after removing any duplicates to avoid sending the

same recipient the same mail more than once.

I’ve now written a new message, replied to it, and forwarded it. The reply and forward

were sent to my email address, too; if we press the main window’s Load button again,

the reply and forward messages should show up in the main window’s list. In Fig-

ure 14-31, they appear as messages 15 and 16 (the order they appear in may depend

on timing issues at your server, and I’ve stretched this horizontally in the GUI to try to

reveal the To header of the last of these).

Figure 14-31. PyMailGUI mail list after sends and load

Keep in mind that PyMailGUI runs on the local computer, but the messages you see in

the main window’s list actually live in a mailbox on your email server machine. Every

time we press Load, PyMailGUI downloads but does not delete newly arrived emails’

headers from the server to your computer. The three messages we just wrote (14

through 16) will also appear in any other email program you use on your account (e.g.,

in Outlook or in a webmail interface). PyMailGUI does not automatically delete mes-

sages as they are downloaded, but simply stores them in your computer’s memory for

processing. If we now select message 16 and press View, we see the forward message

we sent, as in Figure 14-32.

A PyMailGUI Demo | 1047

This message went from my machine to a remote email server and was downloaded

from there into a Python list from which it is displayed. In fact, it went to three different

email accounts I have (the other two appear later in this demo—see Figure 14-45). The

third recipient doesn’t appear in Figure 14-32 here because it was a Bcc blind-copy—

it receives the message, but no header line is added to the mail itself.

Figure 14-32. PyMailGUI view forwarded mail

Figure 14-33 shows what the forward message’s raw text looks like; again, double-click

on a main window’s entry to display this form. The formatted display in Fig-

ure 14-32 simply extracts bits and pieces out of the text shown in the raw display form.

One last pointer on replies and forwards: as mentioned, replies in this version reply to

all original recipients, assuming that more than one means that this is a continuation

of a group discussion. To illustrate, Figure 14-34 shows an original message on top, a

forward of it on the lower left, and a reply to it on the lower right. The Cc header in

the reply has been automatically prefilled with all the original recipients, less any du-

plicates and the new sender’s address; the Bcc (enabled here) has also been prefilled

with the sender in both. These are just initial settings which can be edited and removed

prior to sends. Moreover, the Cc prefill for replies can be disabled entirely in the con-

figuration file. Without it, though, you may have to manually cut-and-paste to insert

1048 | Chapter 14: The PyMailGUI Client

addresses in group mail scenarios. Open this version’s mail save file to view this mail’s

behavior live, and see the suggested enhancements later for more ideas.

Deleting Email

So far, we’ve covered every action button on list windows except for Delete and the All

checkbox. The All checkbox simply toggles from selecting all messages at once or de-

selecting all (View, Delete, Reply, Fwd, and Save action buttons apply to all currently

selected messages). PyMailGUI also lets us delete messages from the server perma-

nently, so that we won’t see them the next time we access our inbox.

Delete operations are kicked off the same way as Views and Saves; just press the Delete

button instead. In typical operation, I eventually delete email I’m not interested in, and

save and delete emails that are important. We met Save earlier in this demo.

Like View, Save, and other operations, Delete can be applied to one or more messages.

Deletes happen immediately, and like all server transfers, they are run in a nonblocking

thread but are performed only if you verify the operation in a pop up, such as the one

shown in Figure 14-35. During the delete, a progress dialog like those in Figures 14-8

and 14-9 provide status.

Figure 14-33. PyMailGUI view forwarded mail, raw

A PyMailGUI Demo | 1049

Figure 14-35. PyMailGUI delete verification on quit

By design, no mail is ever removed automatically: you will see the same messages the

next time PyMailGUI runs. It deletes mail from your server only when you ask it to,

and then only if verified in the last pop up shown (this is your last chance to prevent

permanent mail removal). After the deletions are performed, the mail index is updated,

and the GUI session continues.

Deletions disable mail loads and other deletes while running and cannot be run in

parallel with loads or other deletes already in progress because they may change POP

message numbers and thus modify the mail index list (they may also modify the email

Figure 14-34. Reply-to-all Cc prefills

1050 | Chapter 14: The PyMailGUI Client

cache). Messages may still be composed during a deletion, however, and offline save

files may be processed.

POP Message Numbers and Synchronization

By now, we’ve seen all the basic functionality of PyMailGUI—enough to get you started

sending and receiving simple but typical text-based messages. In the rest of this demo,

we’re going to turn our attention to some of the deeper concepts in this system, in-

cluding inbox synchronization, HTML mails, Internationalization, and multiple ac-

count configuration. Since the first of these is related to the preceding section’s tour of

mail deletions, let’s begin here.

Though they might seem simple from an end-user perspective, it turns out that deletions

are complicated by POP’s message-numbering scheme. We learned about the potential

for synchronization errors between the server’s inbox and the fetched email list in

Chapter 13, when studying the mailtools package PyMailGUI uses (near Exam-

ple 13-24). In brief, POP assigns each message a relative sequential number, starting

from one, and these numbers are passed to the server to fetch and delete messages. The

server’s inbox is normally locked while a connection is held so that a series of deletions

can be run as an atomic operation; no other inbox changes occur until the connection

is closed.

However, message number changes also have some implications for the GUI itself. It’s

never an issue if new mail arrives while we’re displaying the result of a prior download—

the new mail is assigned higher numbers, beyond what is displayed on the client. But

if we delete a message in the middle of a mailbox after the index has been loaded from

the mail server, the numbers of all messages after the one deleted change (they are

decremented by one). As a result, some message numbers might no longer be valid if

deletions are made while viewing previously loaded email.

To work around this, PyMailGUI adjusts all the displayed numbers after a Delete by

simply removing the entries for deleted mails from its index list and mail cache. How-

ever, this adjustment is not enough to keep the GUI in sync with the server’s inbox if

the inbox is modified at a position other than after the end, by deletions in another

email client (even in another PyMailGUI session), or by deletions performed by the

mail server itself (e.g., messages determined to be undeliverable and automatically re-

moved from the inbox). Such modifications outside PyMailGUI’s scope are uncom-

mon, but not impossible.

To handle these cases, PyMailGUI uses the safe deletion and synchronization tests in

mailtools. That module uses mail header matching to detect mail list and server inbox

synchronization errors. For instance, if another email client has deleted a message prior

to the one to be deleted by PyMailGUI, mailtools catches the problem and cancels the

deletion, and an error pop up like the one in Figure 14-36 is displayed.

A PyMailGUI Demo | 1051

Figure 14-36. Safe deletion test detection of inbox difference

Similarly, both index list loads and individual message fetches run a synchronization

test in mailtools, as well. Figure 14-37 captures the error generated on a fetch if a

message has been deleted in another client since we last loaded the server index win-

dow. The same error is issued when this occurs during a load operation, but the first

line reads “Load failed.”

Figure 14-37. Synchronization error after delete in another client

1052 | Chapter 14: The PyMailGUI Client

In both synchronization error cases, the mail list is automatically reloaded with the new

inbox content by PyMailGUI immediately after the error pop up is dismissed. This

scheme ensures that PyMailGUI won’t delete or display the wrong message, in the rare

case that the server’s inbox is changed without its knowledge. See mailtools in Chap-

ter 13 for more on synchronization tests; these errors are detected and raised in mail

tools, but triggered by calls made in the mail cache manager here.

Handling HTML Content in Email

Up to this point, we’ve seen PyMailGUI’s basic operation in the context of plain-text

emails. We’ve also seen it handling HTML part attachments, but not the main text of

HTML messages. Today, of course, HTML is common for mail content too. Because

the PyEdit mail display deployed by PyMailGUI uses a tkinter Text widget oriented

toward plain text, HTML content is handled specially:

• For text/HTML alternative mails, PyMailGUI displays the plain text part in its view

window and includes a button for opening the HTML rendition in a web browser

on demand.

• For HTML-only mails, the main text area shows plain text extracted from the

HTML by a simple parser (not the raw HTML), and the HTML is also displayed

in a web browser automatically.

In all cases, the web browser’s display of International character set content in the

HTML depends upon encoding information in tags in the HTML, guesses, or user

feedback. Well-formed HTML parts already have “<meta>” tags in their “<head>”

sections which give the HTML’s encoding, but they may be absent or incorrect. We’ll

learn more about Internationalization support in the next section.

Figure 14-38 gives the scene when a text/HTML alternative mail is viewed, and Fig-

ure 14-39 shows what happens when an HTML-only email is viewed. The web browser

in Figure 14-38 was opened by clicking the HTML part’s button; this is no different

than the HTML attachment example we saw earlier.

For HTML-only messages, though, behavior is new here: the view window on the left

in Figure 14-39 reflects the results of extracting plain text from the HTML shown in

the popped-up web browser behind it. The HTML parser used for this is something of

a first-cut prototype, but any result it can give is an improvement on displaying raw

HTML in the view window for HTML-only mails. For simpler HTML mails of the sort

sent by individuals instead of those sent by mass-mailing companies (like those shown

here), the results are generally good in tests run to date, though time will tell how this

prototype parser fares in today’s unforgiving HTML jungle of nonstandard and non-

conforming code—improve as desired.

A PyMailGUI Demo | 1053

Figure 14-39. Viewing HTML-only mails

Figure 14-38. Viewing text/HTML alternative mails

1054 | Chapter 14: The PyMailGUI Client

One caveat here: PyMailGUI can today display HTML in a web browser and extract

plain text from it, but it cannot display HTML directly in its own window and has no

support for editing it specially. These are enhancements that will have to await further

attention by other programmers who may find them useful.

Mail Content Internationalization Support

Our next advanced feature is something of an inevitable consequence of the Internet’s

success. As described earlier when summarizing version 3.0 changes, PyMailGUI fully

supports International character sets in mail content—both text part payloads and

email headers are decoded for display and encoded when sent, according to email,

MIME, and Unicode standards. This may be the most significant change in this version

of the program. Regrettably, capturing this in screenshots is a bit of a challenge and

you can get a better feel for how this pans out by running an Open on the following

included mail save file, viewing its messages in formatted and raw modes, starting

replies and forwards for them, and so on:

C:\...\PP4E\Internet\Email\PyMailGui\SavedMail\i18n-4E

To sample the flavor of this support here, Figure 14-40 shows the scene when this file

is opened, shown for variety here with one of the alternate account configurations

described the next section. This figure’s index list window and mail view windows

capture Russian and Chinese language messages sent to my email account (these were

unsolicited email of no particular significance, but suffice as reasonable test cases).

Notice how both message headers and text payload parts are decoded for display in

both the mail list window and the mail view windows.

Figure 14-41 shows portions of the raw text of the two fetched messages, obtained by

double-clicking their list entries (you can open these mails from the save file listed earlier

if you have trouble seeing their details as shown in this book). Notice how the body

text is encoded per both MIME and Unicode conventions—the headers at the top and

text at the bottom of these windows show the actual Base64 and quoted-printable

strings that must be decoded to achieve the nicely displayed output in Figure 14-40.

For the text parts, the information in the part’s header describes its content’s encoding

schemes. For instance, charset="gb2312" in the content type header identifies a Chinese

Unicode character set, and the transfer encoding header gives the part’s MIME encod-

ing type (e.g. base64).

The headers are encoded per i18n standards here as well—their content self-describes

their MIME and Unicode encodings. For example, the header prefix =?koi8-r?B means

Russian text, Base64 encoded. PyMailGUI is clever enough to decode both full headers

and the name fields of addresses for display, whether they are completely encoded (as

shown here) or contain just encoded substrings (as shown by other saved mails in the

version30-4E file in this example’s SavedMail directory).

A PyMailGUI Demo | 1055

Figure 14-41. Raw text of fetched Internationalized mails, headers and body encoded

As additional context, Figure 14-42 shows how these messages’ main parts appear when

opened via their part buttons. Their content is saved as raw post-MIME bytes in binary

mode, but the PyEdit pop ups decode according to passed-in encoding names obtained

Figure 14-40. Internationalization support, headers and body decoded for display

1056 | Chapter 14: The PyMailGUI Client

from the raw message headers. As we learned in Chapters 9 and 11, the underlying

tkinter toolkit generally renders decoded str better than raw bytes.

So far, we’ve displayed Internationalized emails, but PyMailGUI allows us to send them

as well, and handles any encoding tasks implied by the text content. Figure 14-43 shows

the result of running replies and forwards to the Russian language email, with the To

address changed to protect the innocent. Headers in the view window were decoded

for display, encoded when sent, and decoded back again for display; text parts in the

mail body were similarly decoded, encoded, and re-decoded along the way and headers

are also decoded within the “>” quoted original text inserted at the end of the message.

And finally, Figure 14-44 shows a portion of the raw text of the Russian language reply

message that appears in the lower right of the formatted view of Figure 14-43. Again,

double-click to see these details live. Notice how both headers and body text have been

encoded per email and MIME standards.

As configured, the body text is always MIME encoded to UTF-8 when sent if it fails to

encode as ASCII, the default setting in the mailconfig module. Other defaults can be

used if desired and will be encoded appropriately for sends; in fact, text that won’t work

in the full text of email is MIME encoded the same way as binary parts such as images.

This is also true for non-Internationalized character sets—the text part of a message

written in English with any non-ASCII quotes, for example, will be UTF-8 and Base64

encoded in the same way as the message in Figure 14-44, and assume that the recipient’s

email reader will decode (any reasonable modern email client will). This allows non-

ASCII text to be embedded in the full email text sent.

Figure 14-42. Main text parts of Internationalized mails, decoded in PyEdit pop-ups

A PyMailGUI Demo | 1057

Message headers are similarly encoded per UTF-8 if they are non-ASCII when sent, so

they will work in the full email text. In fact, if you study this closely you’ll find that the

Subject here was originally encoded per a Russian Unicode scheme but is UTF-8 now—

its new representation yields the same characters (code points) when decoded for

display.

In short, although the GUI itself is still in English (its labels and the like), the content

of emails displayed and sent support arbitrary character sets. Decoding for display is

done per content where possible, using message headers for text payloads and content

for headers. Encoding for sends is performed according to user settings and policies,

using user settings or inputs, or a UTF-8 default. Required MIME and email header

encodings are implemented in a largely automatic fashion by the underlying email

package.

Not shown here are the pop-up dialogs that may be issued to prompt for text part

encoding preferences on sends if so configured in mailconfig, and PyEdit’s similar

prompts under certain user configurations. Some of these user configurations are meant

for illustration and generality; the presets seem to work well for most scenarios I’ve run

into, but your International mileage may vary. For more details, experiment with the

file’s messages on your own and see the system’s source code.

Figure 14-43. Result of reply and forward with International character sets, re-decoded

1058 | Chapter 14: The PyMailGUI Client

Alternative Configurations and Accounts

So far, we’ve mostly seen PyMailGUI being run on an email account I created for this

book’s examples, but it’s easy to tailor its mailconfig module for different accounts, as

well as different visual effects. For example, Figure 14-45 captures the scene with Py-

MailGUI being run on three different email accounts I use for books and training. All

three instances are run in independent processes here. Each main list window is dis-

playing a different email account’s messages, and each customizes appearance or be-

havior in some fashion. The message view window at the top, opened from the server

list window in the lower left also applies custom color and displayed headers schemes.

You can always change mailconfigs in-place for a specific account if you use just one,

but we’ll later see how the altconfigs subdirectory applies one possible solution to

allow configuring for multiple accounts such as these, completely external to the orig-

inal source code. The altconfigs option renders the windows in Figure 14-45, and

suffices as my launching interface; see its code ahead.

Figure 14-44. Raw text of sent Russian reply, headers and body re-encoded

A PyMailGUI Demo | 1059

Multiple Windows and Status Messages

Finally, PyMailGUI is really meant to be a multiple-window interface—a detail that

most of the earlier screenshots haven’t really done justice to. For example, Fig-

ure 14-46 shows PyMailGUI with the main server list window, two save-file list win-

dows, two message view windows, and help. All these windows are nonmodal; that is,

they are all active and independent, and do not block other windows from being se-

lected, even though they are all running a single PyMailGUI process.

In general, you can have any number of mail view or compose windows up at once,

and cut and paste between them. This matters, because PyMailGUI must take care to

make sure that each window has a distinct text-editor object. If the text-editor object

were a global, or used globals internally, you’d likely see the same text in each window

(and the Send operations might wind up sending text from another window). To avoid

this, PyMailGUI creates and attaches a new TextEditor instance to each view and com-

pose window it creates, and associates the new editor with the Send button’s callback

handler to make sure we get the right text. This is just the usual OOP state retention,

but it acquires a tangible benefit here.

Though not GUI-related, PyMailGUI also prints a variety of status messages as it runs,

but you see them only if you launch the program from the system command-line console

window (e.g., a DOS box on Windows or an xterm on Linux) or by double-clicking on

its filename icon (its main script is a .py, not a .pyw). On Windows, you won’t see these

Figure 14-45. Alternative accounts and configurations

1060 | Chapter 14: The PyMailGUI Client

messages when PyMailGUI is started from another program, such as the PyDemos or

PyGadgets launcher bar GUIs. These status messages print server information; show

mail loading status; and trace the load, store, and delete threads that are spawned along

the way. If you want PyMailGUI to be more verbose, launch it from a command line

and watch:

C:\...\PP4E\Internet\Email\PyMailGui> PyMailGui.py

user: PP4E@learning-python.com

loading headers

Connecting...

b'+OK <24715.1275632750@pop05.mesa1.secureserver.net>'

load headers exit

synch check

Connecting...

b'+OK <26056.1275632770@pop19.prod.mesa1.secureserver.net>'

Same headers text

loading headers

Connecting...

b'+OK <18798.1275632771@pop04.mesa1.secureserver.net>'

load headers exit

synch check

Connecting...

b'+OK <28403.1275632790@pop19.prod.mesa1.secureserver.net>'

Same headers text

load 16

Connecting...

b'+OK <28472.1275632791@pop19.prod.mesa1.secureserver.net>'

Sending to...['lutz@rmi.net', 'PP4E@learning-python.com']

Figure 14-46. PyMailGUI multiple windows and text editors

A PyMailGUI Demo | 1061

MIME-Version: 1.0

Content-Type: text/plain; charset="us-ascii"

Content-Transfer-Encoding: 7bit

From: PP4E@learning-python.com

To: lutz@rmi.net

Subject: Already got one...

Date: Fri, 04 Jun 2010 06:30:26 −0000

X-Mailer: PyMailGUI 3.0 (Python)

> -----Origin

Send exit

You can also double-click on the PyMailGui.py filename in your file explorer GUI and

monitor the popped-up DOS console box on Windows. Console messages are mostly

intended for debugging, but they can be used to help understand the system’s operation

as well.

For more details on using PyMailGUI, see its help display (press the help bar at the top

of its main server list windows), or read the help string in the module PyMailGui

Help.py, described in the next section.

PyMailGUI Implementation

Last but not least, we get to the code. PyMailGUI consists of the new modules listed

near the start of this chapter, along with a handful of peripheral files described there.

The source code for these modules is listed in this section. Before we get started, here

are two quick reminders to help you study:

Code reuse

Besides the code here, PyMailGUI also gets a lot of mileage out of reusing modules

we wrote earlier and won’t repeat here: mailtools for mail loads, composition,

parsing, and delete operations; threadtools for managing server and local file ac-

cess threads; the GUI section’s TextEditor for displaying and editing mail message

text; and so on. See the example numbers list earlier in this chapter.

In addition, standard Python modules and packages such as poplib, smtplib, and

email hide most of the details of pushing bytes around the Net and extracting and

building message components. As usual, the tkinter standard library module also

implements GUI components in a portable fashion.

Code structure

As mentioned earlier, PyMailGUI applies code factoring and OOP to leverage code

reuse. For instance, list view windows are implemented as a common superclass

that codes most actions, along with one subclass for the server inbox list window

and one for local save-file list windows. The subclasses customize the common

superclass for their specific mail media.

This design reflects the operation of the GUI itself—server list windows load mail

over POP, and save-file list windows load from local files. The basic operation of

1062 | Chapter 14: The PyMailGUI Client

list window layout and actions, though, is similar for both and is shared in the

common superclass to avoid redundancy and simplify the code. Message view

windows are similarly factored: a common view window superclass is reused and

customized for write, reply, and forward view windows.

To make the code easier to follow, it is divided into two main modules that reflect

the structure of the GUI—one for the implementation of list window actions and

one for view window actions. If you are looking for the implementation of a button

that appears in a mail view or edit window, for instance, see the view window

module and search for a method whose name begins with the word on—the con-

vention used for callback handler methods. A specific button’s text can also be

located in name/callback tables used to build the windows. Actions initiated on

list windows are coded in the list window module instead.

In addition, the message cache is split off into an object and module of its own,

and potentially reusable tools are coded in importable modules (e.g., line wrapping

and utility pop ups). PyMailGUI also includes a main module that defines startup

window classes, a simple HTML to plain-text parser, a module that contains the

help text as a string, the mailconfig user settings module (a version specific to

PyMailGUI is used here), and a small handful of related files.

The following subsections present each of PyMailGUI’s source code files for you to

study. As you read, refer back to the demo earlier in this chapter and run the program

live to map its behavior back to its code.

One accounting note up-front: the only one of PyMailGUI’s 18 new source files not

listed in this section is its __init__.py package initialization file. This file is mostly

empty except for a comment string and is unused in the system today. It exists only for

future expansion, in case PyMailGUI is ever used as a package in the future—some of

its modules may be useful in other programs. As is, though, same-directory internal

imports here are not package-relative, so they assume this system is either run as a top-

level program (to import from “.”) or is listed on sys.path directly (to use absolute

imports). In Python 3.X, a package’s directory is not included on sys.path automati-

cally, so future use as a package would require changes to internal imports (e.g., moving

the main script up one level and using from . import module throughout). See resources

such as the book Learning Python for more on packages and package imports.

PyMailGUI: The Main Module

Example 14-1 defines the file run to start PyMailGUI. It implements top-level list win-

dows in the system—combinations of PyMailGUI’s application logic and the window

protocol superclasses we wrote earlier in the text. The latter of these define window

titles, icons, and close behavior.

The main internal, nonuser documentation is also in this module, as well as command-

line logic—the program accepts the names of one or more save-mail files on the

PyMailGUI Implementation | 1063

command line, and automatically opens them when the GUI starts up. This is used by

the PyDemos launcher of Chapter 10, for example.

Example 14-1. PP4E\Internet\Email\PyMailGui\PyMailGui.py

"""

##################################################################################

PyMailGui 3.0 - A Python/tkinter email client.

A client-side tkinter-based GUI interface for sending and receiving email.

See the help string in PyMailGuiHelp.py for usage details, and a list of

enhancements in this version.

Version 2.0 was a major, complete rewrite. The changes from 2.0 (July '05)

to 2.1 (Jan '06) were quick-access part buttons on View windows, threaded

loads and deletes of local save-mail files, and checks for and recovery from

message numbers out-of-synch with mail server inbox on deletes, index loads,

and message loads.

Version 3.0 (4E) is a port to Python 3.X; uses grids instead of packed column

frames for better form layout of headers in view windows; runs update() after

inserting into a new text editor for accurate line positioning (see PyEdit

loadFirst changes in Chapter 11); provides an HTML-based version of its help

text; extracts plain-text from HTML main/only parts for display and quoting;

supports separators in toolbars; addresses both message content and header

Unicode/I18N encodings for fetched, sent, and saved mails (see Ch13 and Ch14);

and much more (see Ch14 for the full rundown on 3.0 upgrades); fetched message

decoding happens deep in the mailtools package, on mail cache load operations

here; mailtools also fixes a few email package bugs (see Ch13);

This file implements the top-level windows and interface. PyMailGui uses a

number of modules that know nothing about this GUI, but perform related tasks,

some of which are developed in other sections of the book. The mailconfig

module is expanded for this program.

==Modules defined elsewhere and reused here:==

mailtools (package)

client-side scripting chapter

server sends and receives, parsing, construction (Example 13-21+)

threadtools.py

GUI tools chapter

thread queue manangement for GUI callbacks (Example 10-20)

windows.py

GUI tools chapter

border configuration for top-level windows (Example 10-16)

textEditor.py

GUI programs chapter

text widget used in mail view windows, some pop ups (Example 11-4)

==Generally useful modules defined here:==

popuputil.py

help and busy windows, for general use

messagecache.py

1064 | Chapter 14: The PyMailGUI Client

a cache that keeps track of mail already loaded

wraplines.py

utility for wrapping long lines of messages

html2text.py

rudimentary HTML parser for extracting plain text

mailconfig.py

user configuration parameters: server names, fonts, etc.

==Program-specific modules defined here:==

SharedNames.py

objects shared between window classes and main file

ViewWindows.py

implementation of view, write, reply, forward windows

ListWindows.py

implementation of mail-server and local-file list windows

PyMailGuiHelp.py (see also PyMailGuiHelp.html)

user-visible help text, opened by main window bar

PyMailGui.py

main, top-level file (run this), with main window types

##################################################################################

"""

import mailconfig, sys

from SharedNames import appname, windows

from ListWindows import PyMailServer, PyMailFile

###############################################################################

# top-level window classes

# View, Write, Reply, Forward, Help, BusyBox all inherit from PopupWindow

# directly: only usage; askpassword calls PopupWindow and attaches; the

# order matters here!--PyMail classes redef some method defaults in the

# Window classes, like destroy and okayToExit: must be leftmost; to use

# PyMailFileWindow standalone, imitate logic in PyMailCommon.onOpenMailFile;

###############################################################################

# uses icon file in cwd or default in tools dir

srvrname = mailconfig.popservername or 'Server'

class PyMailServerWindow(PyMailServer, windows.MainWindow):

"a Tk, with extra protocol and mixed-in methods"

def __init__(self):

windows.MainWindow.__init__(self, appname, srvrname)

PyMailServer.__init__(self)

class PyMailServerPopup(PyMailServer, windows.PopupWindow):

"a Toplevel, with extra protocol and mixed-in methods"

def __init__(self):

windows.PopupWindow.__init__(self, appname, srvrnane)

PyMailServer.__init__(self)

class PyMailServerComponent(PyMailServer, windows.ComponentWindow):

"a Frame, with extra protocol and mixed-in methods"

PyMailGUI Implementation | 1065

def __init__(self):

windows.ComponentWindow.__init__(self)

PyMailServer.__init__(self)

class PyMailFileWindow(PyMailFile, windows.PopupWindow):

"a Toplevel, with extra protocol and mixed-in methods"

def __init__(self, filename):

windows.PopupWindow.__init__(self, appname, filename)

PyMailFile.__init__(self, filename)

###############################################################################

# when run as a top-level program: create main mail-server list window

###############################################################################

if __name__ == '__main__':

rootwin = PyMailServerWindow() # open server window

if len(sys.argv) > 1: # 3.0: fix to add len()

for savename in sys.argv[1:]:

rootwin.onOpenMailFile(savename) # open save file windows (demo)

rootwin.lift() # save files loaded in threads

rootwin.mainloop()

SharedNames: Program-Wide Globals

The module in Example 14-2 implements a shared, system-wide namespace that col-

lects resources used in most modules in the system and defines global objects that span

files. This allows other files to avoid redundantly repeating common imports and en-

capsulates the locations of package imports; it is the only file that must be updated if

paths change in the future. Using globals can make programs more difficult to under-

stand in general (the source of some names is not as clear), but it is reasonable if all

such names are collected in a single expected module such as this one, because there

is only one place to search for unknown names.

Example 14-2. PP4E\Internet\Email\PyMailGui\SharedNames.py

"""

##############################################################################

objects shared by all window classes and main file: program-wide globals

##############################################################################

"""

# used in all window, icon titles

appname = 'PyMailGUI 3.0'

# used for list save, open, delete; also for sent messages file

saveMailSeparator = 'PyMailGUI' + ('-'*60) + 'PyMailGUI\n'

# currently viewed mail save files; also for sent-mail file

openSaveFiles = {} # 1 window per file,{name:win}

# standard library services

import sys, os, email.utils, email.message, webbrowser, mimetypes

1066 | Chapter 14: The PyMailGUI Client

from tkinter import *

from tkinter.simpledialog import askstring

from tkinter.filedialog import SaveAs, Open, Directory

from tkinter.messagebox import showinfo, showerror, askyesno

# reuse book examples

from PP4E.Gui.Tools import windows # window border, exit protocols

from PP4E.Gui.Tools import threadtools # thread callback queue checker

from PP4E.Internet.Email import mailtools # load,send,parse,build utilities

from PP4E.Gui.TextEditor import textEditor # component and pop up

# modules defined here

import mailconfig # user params: servers, fonts, etc.

import popuputil # help, busy, passwd pop-up windows

import wraplines # wrap long message lines

import messagecache # remember already loaded mail

import html2text # simplistic html->plaintext extract

import PyMailGuiHelp # user documentation

def printStack(exc_info):

"""

debugging: show exception and stack traceback on stdout;

3.0: change to print stack trace to a real log file if print

to sys.stdout fails: it does when launched from another program

on Windows; without this workaround, PMailGUI aborts and exits

altogether, as this is called from the main thread on spawned

thread failures; likely a Python 3.1 bug: it doesn't occur in

2.5 or 2.6, and the traceback object works fine if print to file;

oddly, the print() calls here work (but go nowhere) if spawned;

"""

print(exc_info[0])

print(exc_info[1])

import traceback

try:

traceback.print_tb(exc_info[2], file=sys.stdout) # ok unless spawned!

except:

log = open('_pymailerrlog.txt', 'a') # use a real file

log.write('-'*80) # else gui may exit

traceback.print_tb(exc_info[2], file=log) # in 3.X, not 2.5/6

# thread busy counters for threads run by this GUI

# sendingBusy shared by all send windows, used by main window quit

loadingHdrsBusy = threadtools.ThreadCounter() # only 1

deletingBusy = threadtools.ThreadCounter() # only 1

loadingMsgsBusy = threadtools.ThreadCounter() # poss many

sendingBusy = threadtools.ThreadCounter() # poss many

ListWindows: Message List Windows

The code in Example 14-3 implements mail index list windows—for the server inbox

window and for one or more local save-mail file windows. These two types of windows

look and behave largely the same, and in fact share most of their code in common in a

PyMailGUI Implementation | 1067

superclass. The window subclasses mostly just customize the superclass to map mail

Load and Delete calls to the server or a local file.

List windows are created on program startup (the initial server window, and possible

save-file windows for command-line options), as well as in response to Open button

actions in existing list windows (for opening new save-file list windows). See the Open

button’s callback in this example for initiation code.

Notice that the basic mail processing operations in the mailtools package from Chap-

ter 13 are mixed into PyMailGUI in a variety of ways. The list window classes in

Example 14-3 inherit from the mailtools mail parser class, but the server list window

class embeds an instance of the message cache object, which in turn inherits from the

mailtools mail fetcher. The mailtools mail sender class is inherited by message view

write windows, not list windows; view windows also inherit from the mail parser.

This is a fairly large file; in principle it could be split into three files, one for each class,

but these classes are so closely related that it is handy to have their code in a single file

for edits. Really, this is one class, with two minor extensions.

Example 14-3. PP4E\Internet\Email\PyMailGui\ListWindows.py

"""

###############################################################################

Implementation of mail-server and save-file message list main windows:

one class per kind. Code is factored here for reuse: server and file

list windows are customized versions of the PyMailCommon list window class;

the server window maps actions to mail transferred from a server, and the

file window applies actions to a local file.

List windows create View, Write, Reply, and Forward windows on user actions.

The server list window is the main window opened on program startup by the

top-level file; file list windows are opened on demand via server and file

list window "Open". Msgnums may be temporarily out of sync with server if

POP inbox changes (triggers full reload here).

Changes here in 2.1:

-now checks on deletes and loads to see if msg nums in sync with server

-added up to N attachment direct-access buttons on view windows

-threaded save-mail file loads, to avoid N-second pause for big files

-also threads save-mail file deletes so file write doesn't pause GUI

TBD:

-save-mail file saves still not threaded: may pause GUI briefly, but

uncommon - unlike load and delete, save/send only appends the local file.

-implementation of local save-mail files as text files with separators

is mostly a prototype: it loads all full mails into memory, and so limits

the practical size of these files; better alternative: use 2 DBM keyed

access files for hdrs and fulltext, plus a list to map keys to position;

in this scheme save-mail files become directories, no longer readable.

###############################################################################

"""

from SharedNames import * # program-wide global objects

1068 | Chapter 14: The PyMailGUI Client

from ViewWindows import ViewWindow, WriteWindow, ReplyWindow, ForwardWindow

###############################################################################

# main frame - general structure for both file and server message lists

###############################################################################

class PyMailCommon(mailtools.MailParser):

"""

an abstract widget package, with main mail listbox;

mixed in with a Tk, Toplevel, or Frame by top-level window classes;

must be customized in mode-specific subclass with actions() and other;

creates view and write windows on demand: they serve as MailSenders;

"""

# class attrs shared by all list windows

threadLoopStarted = False # started by first window

queueChecksPerSecond = 20 # tweak if CPU use too high

queueDelay = 1000 // queueChecksPerSecond # min msecs between timer events

queueBatch = 5 # max callbacks per timer event

# all windows use same dialogs: remember last dirs

openDialog = Open(title=appname + ': Open Mail File')

saveDialog = SaveAs(title=appname + ': Append Mail File')

# 3.0: avoid downloading (fetching) same message in parallel

beingFetched = set()

def __init__(self):

self.makeWidgets() # draw my contents: list,tools

if not PyMailCommon.threadLoopStarted:

# start thread exit check loop

# a timer event loop that dispatches queued GUI callbacks;

# just one loop for all windows: server,file,views can all thread;

# self is a Tk, Toplevel,or Frame: any widget type will suffice;

# 3.0/4E: added queue delay/batch for progress speedup: ~100x/sec;

PyMailCommon.threadLoopStarted = True

threadtools.threadChecker(self, self.queueDelay, self.queueBatch)

def makeWidgets(self):

# add all/none checkbtn at bottom

tools = Frame(self, relief=SUNKEN, bd=2, cursor='hand2') # 3.0: configs

tools.pack(side=BOTTOM, fill=X)

self.allModeVar = IntVar()

chk = Checkbutton(tools, text="All")

chk.config(variable=self.allModeVar, command=self.onCheckAll)

chk.pack(side=RIGHT)

# add main buttons at bottom toolbar

for (title, callback) in self.actions():

if not callback:

sep = Label(tools, text=title) # 3.0: separator

sep.pack(side=LEFT, expand=YES, fill=BOTH) # expands with window

PyMailGUI Implementation | 1069

else:

Button(tools, text=title, command=callback).pack(side=LEFT)

# add multiselect listbox with scrollbars

listwide = mailconfig.listWidth or 74 # 3.0: config start size

listhigh = mailconfig.listHeight or 15 # wide=chars, high=lines

mails = Frame(self)

vscroll = Scrollbar(mails)

hscroll = Scrollbar(mails, orient='horizontal')

fontsz = (sys.platform[:3] == 'win' and 8) or 10 # defaults

listbg = mailconfig.listbg or 'white'

listfg = mailconfig.listfg or 'black'

listfont = mailconfig.listfont or ('courier', fontsz, 'normal')

listbox = Listbox(mails, bg=listbg, fg=listfg, font=listfont)

listbox.config(selectmode=EXTENDED)

listbox.config(width=listwide, height=listhigh) # 3.0: init wider

listbox.bind('<Double-1>', (lambda event: self.onViewRawMail()))

# crosslink listbox and scrollbars

vscroll.config(command=listbox.yview, relief=SUNKEN)

hscroll.config(command=listbox.xview, relief=SUNKEN)

listbox.config(yscrollcommand=vscroll.set, relief=SUNKEN)

listbox.config(xscrollcommand=hscroll.set)

# pack last = clip first

mails.pack(side=TOP, expand=YES, fill=BOTH)

vscroll.pack(side=RIGHT, fill=BOTH)

hscroll.pack(side=BOTTOM, fill=BOTH)

listbox.pack(side=LEFT, expand=YES, fill=BOTH)

self.listBox = listbox

#################

# event handlers

#################

def onCheckAll(self):

# all or none click

if self.allModeVar.get():

self.listBox.select_set(0, END)

else:

self.listBox.select_clear(0, END)

def onViewRawMail(self):

# possibly threaded: view selected messages - raw text headers, body

msgnums = self.verifySelectedMsgs()

if msgnums:

self.getMessages(msgnums, after=lambda: self.contViewRaw(msgnums))

def contViewRaw(self, msgnums, pyedit=True): # do we need full TextEditor?

for msgnum in msgnums: # could be a nested def

fulltext = self.getMessage(msgnum) # fulltext is Unicode decoded

if not pyedit:

# display in a scrolledtext

from tkinter.scrolledtext import ScrolledText

window = windows.QuietPopupWindow(appname, 'raw message viewer')

1070 | Chapter 14: The PyMailGUI Client

browser = ScrolledText(window)

browser.insert('0.0', fulltext)

browser.pack(expand=YES, fill=BOTH)

else:

# 3.0/4E: more useful PyEdit text editor

wintitle = ' - raw message text'

browser = textEditor.TextEditorMainPopup(self, winTitle=wintitle)

browser.update()

browser.setAllText(fulltext)

browser.clearModified()

def onViewFormatMail(self):

"""

possibly threaded: view selected messages - pop up formatted display

not threaded if in savefile list, or messages are already loaded

the after action runs only if getMessages prefetch allowed and worked

"""

msgnums = self.verifySelectedMsgs()

if msgnums:

self.getMessages(msgnums, after=lambda: self.contViewFmt(msgnums))

def contViewFmt(self, msgnums):

"""

finish View: extract main text, popup view window(s) to display;

extracts plain text from html text if required, wraps text lines;

html mails: show extracted text, then save in temp file and open

in web browser; part can also be opened manually from view window

Split or part button; if non-multipart, other: part must be opened

manually with Split or part button; verify html open per mailconfig;

3.0: for html-only mails, main text is str here, but save its raw

bytes in binary mode to finesse encodings; worth the effort because

many mails are just html today; this first tried N encoding guesses

(utf-8, latin-1, platform dflt), but now gets and saves raw bytes to

minimize any fidelity loss; if a part is later opened on demand, it

is saved in a binary file as raw bytes in the same way;

caveat: the spawned web browser won't have any original email headers:

it may still have to guess or be told the encoding, unless the html

already has its own encoding headers (these take the form of <meta>

html tags within <head> sections if present; none are inserted in the

html here, as some well-formed html parts have them); IE seems to

handle most html part files anyhow; always encoding html parts to

utf-8 may suffice too: this encoding can handle most types of text;

"""

for msgnum in msgnums:

fulltext = self.getMessage(msgnum) # 3.0: str for parser

message = self.parseMessage(fulltext)

type, content = self.findMainText(message) # 3.0: Unicode decoded

if type in ['text/html', 'text/xml']: # 3.0: get plain text

content = html2text.html2text(content)

content = wraplines.wrapText1(content, mailconfig.wrapsz)

ViewWindow(headermap = message,

showtext = content,

origmessage = message) # 3.0: decodes headers

PyMailGUI Implementation | 1071

# non-multipart, content-type text/HTML (rude but true!)

if type == 'text/html':

if ((not mailconfig.verifyHTMLTextOpen) or

askyesno(appname, 'Open message text in browser?')):

# 3.0: get post mime decode, pre unicode decode bytes

type, asbytes = self.findMainText(message, asStr=False)

try:

from tempfile import gettempdir # or a Tk HTML viewer?

tempname = os.path.join(gettempdir(), 'pymailgui.html')

tmp = open(tempname, 'wb') # already encoded

tmp.write(asbytes)

webbrowser.open_new('file://' + tempname)

except:

showerror(appname, 'Cannot open in browser')

def onWriteMail(self):

"""

compose a new email from scratch, without fetching others;

nothing to quote here, but adds sig, and prefills Bcc with the

sender's address if this optional header enabled in mailconfig;

From may be i18N encoded in mailconfig: view window will decode;

"""

starttext = '\n' # use auto signature text

if mailconfig.mysignature:

starttext += '%s\n' % mailconfig.mysignature

From = mailconfig.myaddress

WriteWindow(starttext = starttext,

headermap = dict(From=From, Bcc=From)) # 3.0: prefill bcc

def onReplyMail(self):

# possibly threaded: reply to selected emails

msgnums = self.verifySelectedMsgs()

if msgnums:

self.getMessages(msgnums, after=lambda: self.contReply(msgnums))

def contReply(self, msgnums):

"""

finish Reply: drop attachments, quote with '>', add signature;

presets initial to/from values from mail or config module;

don't use original To for From: may be many or a listname;

To keeps name+<addr> format even if ',' separator in name;

Uses original From for To, ignores reply-to header is any;

3.0: replies also copy to all original recipients by default;

3.0: now uses getaddresses/parseaddr full parsing to separate

addrs on commas, and handle any commas that appear nested in

email name parts; multiple addresses are separated by comma

in GUI, we copy comma separators when displaying headers, and

we use getaddresses to split addrs as needed; ',' is required

by servers for separator; no longer uses parseaddr to get 1st

name/addr pair of getaddresses result: use full From for To;

3.0: we decode the Subject header here because we need its text,

1072 | Chapter 14: The PyMailGUI Client

but the view window superclass of edit windows performs decoding

on all displayed headers (the extra Subject decode is a no-op);

on sends, all non-ASCII hdrs and hdr email names are in decoded

form in the GUI, but are encoded within the mailtools package;

quoteOrigText also decodes the initial headers it inserts into

the quoted text block, and index lists decode for display;

"""

for msgnum in msgnums:

fulltext = self.getMessage(msgnum)

message = self.parseMessage(fulltext) # may fail: error obj

maintext = self.formatQuotedMainText(message) # same as forward

# from and to are decoded by view window

From = mailconfig.myaddress # not original To

To = message.get('From', '') # 3.0: ',' sept

Cc = self.replyCopyTo(message) # 3.0: cc all recipients?

Subj = message.get('Subject', '(no subject)')

Subj = self.decodeHeader(Subj) # deocde for str

if Subj[:4].lower() != 're: ': # 3.0: unify case

Subj = 'Re: ' + Subj

ReplyWindow(starttext = maintext,

headermap =

dict(From=From, To=To, Cc=Cc, Subject=Subj, Bcc=From))

def onFwdMail(self):

# possibly threaded: forward selected emails

msgnums = self.verifySelectedMsgs()

if msgnums:

self.getMessages(msgnums, after=lambda: self.contFwd(msgnums))

def contFwd(self, msgnums):

"""

finish Forward: drop attachments, quote with '>', add signature;

see notes about headers decoding in the Reply action methods;

view window superclass will decode the From header we pass here;

"""

for msgnum in msgnums:

fulltext = self.getMessage(msgnum)

message = self.parseMessage(fulltext)

maintext = self.formatQuotedMainText(message) # same as reply

# initial From value from config, not mail

From = mailconfig.myaddress # encoded or not

Subj = message.get('Subject', '(no subject)')

Subj = self.decodeHeader(Subj) # 3.0: send encodes

if Subj[:5].lower() != 'fwd: ': # 3.0: unify case

Subj = 'Fwd: ' + Subj

ForwardWindow(starttext = maintext,

headermap = dict(From=From, Subject=Subj, Bcc=From))

def onSaveMailFile(self):

"""

save selected emails to file for offline viewing;

disabled if target file load/delete is in progress;

disabled by getMessages if self is a busy file too;

PyMailGUI Implementation | 1073

contSave not threaded: disables all other actions;

"""

msgnums = self.selectedMsgs()

if not msgnums:

showerror(appname, 'No message selected')

else:

# caveat: dialog warns about replacing file

filename = self.saveDialog.show() # shared class attr

if filename: # don't verify num msgs

filename = os.path.abspath(filename) # normalize / to \

self.getMessages(msgnums,

after=lambda: self.contSave(msgnums, filename))

def contSave(self, msgnums, filename):

# test busy now, after poss srvr msgs load

if (filename in openSaveFiles.keys() and # viewing this file?

openSaveFiles[filename].openFileBusy): # load/del occurring?

showerror(appname, 'Target file busy - cannot save')

else:

try: # caveat:not threaded

fulltextlist = [] # 3.0: use encoding

mailfile = open(filename, 'a', encoding=mailconfig.fetchEncoding)

for msgnum in msgnums: # < 1sec for N megs

fulltext = self.getMessage(msgnum) # but poss many msgs

if fulltext[-1] != '\n': fulltext += '\n'

mailfile.write(saveMailSeparator)

mailfile.write(fulltext)

fulltextlist.append(fulltext)

mailfile.close()

except:

showerror(appname, 'Error during save')

printStack(sys.exc_info())

else: # why .keys(): EIBTI

if filename in openSaveFiles.keys(): # viewing this file?

window = openSaveFiles[filename] # update list, raise

window.addSavedMails(fulltextlist) # avoid file reload

#window.loadMailFileThread() # this was very slow

def onOpenMailFile(self, filename=None):

# process saved mail offline

filename = filename or self.openDialog.show() # shared class attr

if filename:

filename = os.path.abspath(filename) # match on full name

if filename in openSaveFiles.keys(): # only 1 win per file

openSaveFiles[filename].lift() # raise file's window

showinfo(appname, 'File already open') # else deletes odd

else:

from PyMailGui import PyMailFileWindow # avoid duplicate win

popup = PyMailFileWindow(filename) # new list window

openSaveFiles[filename] = popup # removed in quit

popup.loadMailFileThread() # try load in thread

def onDeleteMail(self):

# delete selected mails from server or file

msgnums = self.selectedMsgs() # subclass: fillIndex

1074 | Chapter 14: The PyMailGUI Client

if not msgnums: # always verify here

showerror(appname, 'No message selected')

else:

if askyesno(appname, 'Verify delete %d mails?' % len(msgnums)):

self.doDelete(msgnums)

##################

# utility methods

##################

def selectedMsgs(self):

# get messages selected in main listbox

selections = self.listBox.curselection() # tuple of digit strs, 0..N-1

return [int(x)+1 for x in selections] # convert to ints, make 1..N

warningLimit = 15

def verifySelectedMsgs(self):

msgnums = self.selectedMsgs()

if not msgnums:

showerror(appname, 'No message selected')

else:

numselects = len(msgnums)

if numselects > self.warningLimit:

if not askyesno(appname, 'Open %d selections?' % numselects):

msgnums = []

return msgnums

def fillIndex(self, maxhdrsize=25):

"""

fill all of main listbox from message header mappings;

3.0: decode headers per email/mime/unicode here if encoded;

3.0: caveat: large chinese characters can break '|' alignment;

"""

hdrmaps = self.headersMaps() # may be empty

showhdrs = ('Subject', 'From', 'Date', 'To') # default hdrs to show

if hasattr(mailconfig, 'listheaders'): # mailconfig customizes

showhdrs = mailconfig.listheaders or showhdrs

addrhdrs = ('From', 'To', 'Cc', 'Bcc') # 3.0: decode i18n specially

# compute max field sizes <= hdrsize

maxsize = {}

for key in showhdrs:

allLens = [] # too big for a list comp!

for msg in hdrmaps:

keyval = msg.get(key, ' ')

if key not in addrhdrs:

allLens.append(len(self.decodeHeader(keyval)))

else:

allLens.append(len(self.decodeAddrHeader(keyval)))

if not allLens: allLens = [1]

maxsize[key] = min(maxhdrsize, max(allLens))

# populate listbox with fixed-width left-justified fields

self.listBox.delete(0, END) # show multiparts with *

for (ix, msg) in enumerate(hdrmaps): # via content-type hdr

PyMailGUI Implementation | 1075

msgtype = msg.get_content_maintype() # no is_multipart yet

msgline = (msgtype == 'multipart' and '*') or ' '

msgline += '%03d' % (ix+1)

for key in showhdrs:

mysize = maxsize[key]

if key not in addrhdrs:

keytext = self.decodeHeader(msg.get(key, ' '))

else:

keytext = self.decodeAddrHeader(msg.get(key, ' '))

msgline += ' | %-*s' % (mysize, keytext[:mysize])

msgline += '| %.1fK' % (self.mailSize(ix+1) / 1024) # 3.0: .0 optional

self.listBox.insert(END, msgline)

self.listBox.see(END) # show most recent mail=last line

def replyCopyTo(self, message):

"""

3.0: replies copy all original recipients, by prefilling

Cc header with all addreses in original To and Cc after

removing duplicates and new sender; could decode i18n addrs

here, but the view window will decode to display (and send

will reencode) and the unique set filtering here will work

either way, though a sender's i18n address is assumed to be

in encoded form in mailconfig (else it is not removed here);

empty To or Cc headers are okay: split returns empty lists;

"""

if not mailconfig.repliesCopyToAll:

# reply to sender only

Cc = ''

else:

# copy all original recipients (3.0)

allRecipients = (self.splitAddresses(message.get('To', '')) +

self.splitAddresses(message.get('Cc', '')))

uniqueOthers = set(allRecipients) - set([mailconfig.myaddress])

Cc = ', '.join(uniqueOthers)

return Cc or '?'

def formatQuotedMainText(self, message):

"""

3.0: factor out common code shared by Reply and Forward:

fetch decoded text, extract text if html, line wrap, add > quote

"""

type, maintext = self.findMainText(message) # 3.0: decoded str

if type in ['text/html', 'text/xml']: # 3.0: get plain text

maintext = html2text.html2text(maintext)

maintext = wraplines.wrapText1(maintext, mailconfig.wrapsz-2) # 2 = '> '

maintext = self.quoteOrigText(maintext, message) # add hdrs, >

if mailconfig.mysignature:

maintext = ('\n%s\n' % mailconfig.mysignature) + maintext

return maintext

def quoteOrigText(self, maintext, message):

"""

3.0: we need to decode any i18n (internationalizd) headers here too,

or they show up in email+MIME encoded form in the quoted text block;

decodeAddrHeader works on one addr or all in a comma-separated list;

1076 | Chapter 14: The PyMailGUI Client

this may trigger full text encoding on sends, but the main text is

also already in fully decoded form: could be in any Unicode scheme;

"""

quoted = '\n-----Original Message-----\n'

for hdr in ('From', 'To', 'Subject', 'Date'):

rawhdr = message.get(hdr, '?')

if hdr not in ('From', 'To'):

dechdr = self.decodeHeader(rawhdr) # full value

else:

dechdr = self.decodeAddrHeader(rawhdr) # name parts only

quoted += '%s: %s\n' % (hdr, dechdr)

quoted += '\n' + maintext

quoted = '\n' + quoted.replace('\n', '\n> ')

return quoted

########################

# subclass requirements

########################

def getMessages(self, msgnums, after): # used by view,save,reply,fwd

after() # redef if cache, thread test

# plus okayToQuit?, any unique actions

def getMessage(self, msgnum): assert False # used by many: full mail text

def headersMaps(self): assert False # fillIndex: hdr mappings list

def mailSize(self, msgnum): assert False # fillIndex: size of msgnum

def doDelete(self): assert False # onDeleteMail: delete button

###############################################################################

# main window - when viewing messages in local save file (or sent-mail file)

###############################################################################

class PyMailFile(PyMailCommon):

"""

customize PyMailCommon for viewing saved-mail file offline;

mixed with a Tk, Toplevel, or Frame, adds main mail listbox;

maps load, fetch, delete actions to local text file storage;

file opens and deletes here run in threads for large files;

save and send not threaded, because only append to file; save

is disabled if source or target file busy with load/delete;

save disables load, delete, save just because it is not run

in a thread (blocks GUI);

TBD: may need thread and O/S file locks if saves ever do run in

threads: saves could disable other threads with openFileBusy, but

file may not be open in GUI; file locks not sufficient, because

GUI updated too; TBD: appends to sent-mail file may require O/S

locks: as is, user gets error pop up if sent during load/del;

3.0: mail save files are now Unicode text, encoded per an encoding

name setting in the mailconfig module; this may not support worst

case scenarios of unusual or mixed encodings, but most full mail

PyMailGUI Implementation | 1077

text is ascii, and the Python 3.1 email package is partly broken;

"""

def actions(self):

return [ ('Open', self.onOpenMailFile),

('Write', self.onWriteMail),

(' ', None), # 3.0: separators

('View', self.onViewFormatMail),

('Reply', self.onReplyMail),

('Fwd', self.onFwdMail),

('Save', self.onSaveMailFile),

('Delete', self.onDeleteMail),

(' ', None),

('Quit', self.quit) ]

def __init__(self, filename):

# caller: do loadMailFileThread next

PyMailCommon.__init__(self)

self.filename = filename

self.openFileBusy = threadtools.ThreadCounter() # one per window

def loadMailFileThread(self):

"""

load or reload file and update window index list;

called on Open, startup, and possibly on Send if

sent-mail file appended is currently open; there

is always a bogus first item after the text split;

alt: [self.parseHeaders(m) for m in self.msglist];

could pop up a busy dialog, but quick for small files;

2.1: this is now threaded--else runs < 1sec for N meg

files, but can pause GUI N seconds if very large file;

Save now uses addSavedMails to append msg lists for

speed, not this reload; still called from Send just

because msg text unavailable - requires refactoring;

delete threaded too: prevent open and delete overlap;

"""

if self.openFileBusy:

# don't allow parallel open/delete changes

errmsg = 'Cannot load, file is busy:\n"%s"' % self.filename

showerror(appname, errmsg)

else:

#self.listBox.insert(END, 'loading...') # error if user clicks

savetitle = self.title() # set by window class

self.title(appname + ' - ' + 'Loading...')

self.openFileBusy.incr()

threadtools.startThread(

action = self.loadMailFile,

args = (),

context = (savetitle,),

onExit = self.onLoadMailFileExit,

onFail = self.onLoadMailFileFail)

def loadMailFile(self):

# run in a thread while GUI is active

# open, read, parser may all raise excs: caught in thread utility

1078 | Chapter 14: The PyMailGUI Client

file = open(self.filename, 'r', encoding=mailconfig.fetchEncoding) # 3.0

allmsgs = file.read()

self.msglist = allmsgs.split(saveMailSeparator)[1:] # full text

self.hdrlist = list(map(self.parseHeaders, self.msglist)) # msg objects

def onLoadMailFileExit(self, savetitle):

# on thread success

self.title(savetitle) # reset window title to filename

self.fillIndex() # updates GUI: do in main thread

self.lift() # raise my window

self.openFileBusy.decr()

def onLoadMailFileFail(self, exc_info, savetitle):

# on thread exception

showerror(appname, 'Error opening "%s"\n%s\n%s' %

((self.filename,) + exc_info[:2]))

printStack(exc_info)

self.destroy() # always close my window?

self.openFileBusy.decr() # not needed if destroy

def addSavedMails(self, fulltextlist):

"""

optimization: extend loaded file lists for mails

newly saved to this window's file; in past called

loadMailThread to reload entire file on save - slow;

must be called in main GUI thread only: updates GUI;

sends still reloads sent file if open: no msg text;

"""

self.msglist.extend(fulltextlist)

self.hdrlist.extend(map(self.parseHeaders, fulltextlist)) # 3.x iter ok

self.fillIndex()

self.lift()

def doDelete(self, msgnums):

"""

simple-minded, but sufficient: rewrite all

nondeleted mails to file; can't just delete

from self.msglist in-place: changes item indexes;

Py2.3 enumerate(L) same as zip(range(len(L)), L)

2.1: now threaded, else N sec pause for large files

"""

if self.openFileBusy:

# dont allow parallel open/delete changes

errmsg = 'Cannot delete, file is busy:\n"%s"' % self.filename

showerror(appname, errmsg)

else:

savetitle = self.title()

self.title(appname + ' - ' + 'Deleting...')

self.openFileBusy.incr()

threadtools.startThread(

action = self.deleteMailFile,

args = (msgnums,),

context = (savetitle,),

onExit = self.onDeleteMailFileExit,

onFail = self.onDeleteMailFileFail)

PyMailGUI Implementation | 1079

def deleteMailFile(self, msgnums):

# run in a thread while GUI active

indexed = enumerate(self.msglist)

keepers = [msg for (ix, msg) in indexed if ix+1 not in msgnums]

allmsgs = saveMailSeparator.join([''] + keepers)

file = open(self.filename, 'w', encoding=mailconfig.fetchEncoding) # 3.0

file.write(allmsgs)

self.msglist = keepers

self.hdrlist = list(map(self.parseHeaders, self.msglist))

def onDeleteMailFileExit(self, savetitle):

self.title(savetitle)

self.fillIndex() # updates GUI: do in main thread

self.lift() # reset my title, raise my window

self.openFileBusy.decr()

def onDeleteMailFileFail(self, exc_info, savetitle):

showerror(appname, 'Error deleting "%s"\n%s\n%s' %

((self.filename,) + exc_info[:2]))

printStack(exc_info)

self.destroy() # always close my window?

self.openFileBusy.decr() # not needed if destroy

def getMessages(self, msgnums, after):

"""

used by view,save,reply,fwd: file load and delete

threads may change the msg and hdr lists, so disable

all other operations that depend on them to be safe;

this test is for self: saves also test target file;

"""

if self.openFileBusy:

errmsg = 'Cannot fetch, file is busy:\n"%s"' % self.filename

showerror(appname, errmsg)

else:

after() # mail already loaded

def getMessage(self, msgnum):

return self.msglist[msgnum-1] # full text of 1 mail

def headersMaps(self):

return self.hdrlist # email.message.Message objects

def mailSize(self, msgnum):

return len(self.msglist[msgnum-1])

def quit(self):

# don't destroy during update: fillIndex next

if self.openFileBusy:

showerror(appname, 'Cannot quit during load or delete')

else:

if askyesno(appname, 'Verify Quit Window?'):

# delete file from open list

del openSaveFiles[self.filename]

Toplevel.destroy(self)

1080 | Chapter 14: The PyMailGUI Client

###############################################################################

# main window - when viewing messages on the mail server

###############################################################################

class PyMailServer(PyMailCommon):

"""

customize PyMailCommon for viewing mail still on server;

mixed with a Tk, Toplevel, or Frame, adds main mail listbox;

maps load, fetch, delete actions to email server inbox;

embeds a MessageCache, which is a mailtools MailFetcher;

"""

def actions(self):

return [ ('Load', self.onLoadServer),

('Open', self.onOpenMailFile),

('Write', self.onWriteMail),

(' ', None), # 3.0: separators

('View', self.onViewFormatMail),

('Reply', self.onReplyMail),

('Fwd', self.onFwdMail),

('Save', self.onSaveMailFile),

('Delete', self.onDeleteMail),

(' ', None),

('Quit', self.quit) ]

def __init__(self):

PyMailCommon.__init__(self)

self.cache = messagecache.GuiMessageCache() # embedded, not inherited

#self.listBox.insert(END, 'Press Load to fetch mail')

def makeWidgets(self): # help bar: main win only

self.addHelpBar()

PyMailCommon.makeWidgets(self)

def addHelpBar(self):

msg = 'PyMailGUI - a Python/tkinter email client (help)'

title = Button(self, text=msg)

title.config(bg='steelblue', fg='white', relief=RIDGE)

title.config(command=self.onShowHelp)

title.pack(fill=X)

def onShowHelp(self):

"""

load,show text block string

3.0: now uses HTML and webbrowser module here too

user setting in mailconfig selects text, HTML, or both

always displays one or the other: html if both false

"""

if mailconfig.showHelpAsText:

from PyMailGuiHelp import helptext

popuputil.HelpPopup(appname, helptext, showsource=self.onShowMySource)

if mailconfig.showHelpAsHTML or (not mailconfig.showHelpAsText):

PyMailGUI Implementation | 1081

from PyMailGuiHelp import showHtmlHelp

showHtmlHelp() # 3.0: HTML version without source file links

def onShowMySource(self, showAsMail=False):

"""

display my sourcecode file, plus imported modules here & elsewhere

"""

import PyMailGui, ListWindows, ViewWindows, SharedNames, textConfig

from PP4E.Internet.Email.mailtools import ( # mailtools now a pkg

mailSender, mailFetcher, mailParser) # can't use * in def

mymods = (

PyMailGui, ListWindows, ViewWindows, SharedNames,

PyMailGuiHelp, popuputil, messagecache, wraplines, html2text,

mailtools, mailFetcher, mailSender, mailParser,

mailconfig, textConfig, threadtools, windows, textEditor)

for mod in mymods:

source = mod.__file__

if source.endswith('.pyc'):

source = source[:-4] + '.py' # assume a .py in same dir

if showAsMail:

# this is a bit cheesey...

code = open(source).read() # 3.0: platform encoding

user = mailconfig.myaddress

hdrmap = {'From': appname, 'To': user, 'Subject': mod.__name__}

ViewWindow(showtext=code,

headermap=hdrmap,

origmessage=email.message.Message())

else:

# more useful PyEdit text editor

# 4E: assume in UTF8 Unicode encoding (else PeEdit may ask!)

wintitle = ' - ' + mod.__name__

textEditor.TextEditorMainPopup(self, source, wintitle, 'utf-8')

def onLoadServer(self, forceReload=False):

"""

threaded: load or reload mail headers list on request;

Exit,Fail,Progress run by threadChecker after callback via queue;

load may overlap with sends, but disables all but send;

could overlap with loadingMsgs, but may change msg cache list;

forceReload on delete/synch fail, else loads recent arrivals only;

2.1: cache.loadHeaders may do quick check to see if msgnums

in synch with server, if we are loading just newly arrived hdrs;

"""

if loadingHdrsBusy or deletingBusy or loadingMsgsBusy:

showerror(appname, 'Cannot load headers during load or delete')

else:

loadingHdrsBusy.incr()

self.cache.setPopPassword(appname) # don't update GUI in the thread!

popup = popuputil.BusyBoxNowait(appname, 'Loading message headers')

threadtools.startThread(

action = self.cache.loadHeaders,

args = (forceReload,),

context = (popup,),

onExit = self.onLoadHdrsExit,

onFail = self.onLoadHdrsFail,

1082 | Chapter 14: The PyMailGUI Client

onProgress = self.onLoadHdrsProgress)

def onLoadHdrsExit(self, popup):

self.fillIndex()

popup.quit()

self.lift()

loadingHdrsBusy.decr() # allow other actions to run

def onLoadHdrsFail(self, exc_info, popup):

popup.quit()

showerror(appname, 'Load failed: \n%s\n%s' % exc_info[:2])

printStack(exc_info) # send stack trace to stdout

loadingHdrsBusy.decr()

if exc_info[0] == mailtools.MessageSynchError: # synch inbox/index

self.onLoadServer(forceReload=True) # new thread: reload

else:

self.cache.popPassword = None # force re-input next time

def onLoadHdrsProgress(self, i, n, popup):

popup.changeText('%d of %d' % (i, n))

def doDelete(self, msgnumlist):

"""

threaded: delete from server now - changes msg nums;

may overlap with sends only, disables all except sends;

2.1: cache.deleteMessages now checks TOP result to see

if headers match selected mails, in case msgnums out of

synch with mail server: poss if mail deleted by other client,

or server deletes inbox mail automatically - some ISPs may

move a mail from inbox to undeliverable on load failure;

"""

if loadingHdrsBusy or deletingBusy or loadingMsgsBusy:

showerror(appname, 'Cannot delete during load or delete')

else:

deletingBusy.incr()

popup = popuputil.BusyBoxNowait(appname, 'Deleting selected mails')

threadtools.startThread(

action = self.cache.deleteMessages,

args = (msgnumlist,),

context = (popup,),

onExit = self.onDeleteExit,

onFail = self.onDeleteFail,

onProgress = self.onDeleteProgress)

def onDeleteExit(self, popup):

self.fillIndex() # no need to reload from server

popup.quit() # refill index with updated cache

self.lift() # raise index window, release lock

deletingBusy.decr()

def onDeleteFail(self, exc_info, popup):

popup.quit()

showerror(appname, 'Delete failed: \n%s\n%s' % exc_info[:2])

printStack(exc_info)

deletingBusy.decr() # delete or synch check failure

PyMailGUI Implementation | 1083

self.onLoadServer(forceReload=True) # new thread: some msgnums changed

def onDeleteProgress(self, i, n, popup):

popup.changeText('%d of %d' % (i, n))

def getMessages(self, msgnums, after):

"""

threaded: prefetch all selected messages into cache now;

used by save, view, reply, and forward to prefill cache;

may overlap with other loadmsgs and sends, disables delete,load;

only runs "after" action if the fetch allowed and successful;

2.1: cache.getMessages tests if index in synch with server,

but we only test if we have to go to server, not if cached;

3.0: see messagecache note: now avoids potential fetch of mail

currently being fetched, if user clicks again while in progress;

any message being fetched by any other request in progress must

disable entire toLoad batch: else, need to wait for N other loads;

fetches are still allowed to overlap in time, as long as disjoint;

"""

if loadingHdrsBusy or deletingBusy:

showerror(appname, 'Cannot fetch message during load or delete')

else:

toLoad = [num for num in msgnums if not self.cache.isLoaded(num)]

if not toLoad:

after() # all already loaded

return # process now, no wait pop up

else:

if set(toLoad) & self.beingFetched: # 3.0: any in progress?

showerror(appname, 'Cannot fetch any message being fetched')

else:

self.beingFetched |= set(toLoad)

loadingMsgsBusy.incr()

from popuputil import BusyBoxNowait

popup = BusyBoxNowait(appname, 'Fetching message contents')

threadtools.startThread(

action = self.cache.getMessages,

args = (toLoad,),

context = (after, popup, toLoad),

onExit = self.onLoadMsgsExit,

onFail = self.onLoadMsgsFail,

onProgress = self.onLoadMsgsProgress)

def onLoadMsgsExit(self, after, popup, toLoad):

self.beingFetched -= set(toLoad)

popup.quit()

after()

loadingMsgsBusy.decr() # allow other actions after onExit done

def onLoadMsgsFail(self, exc_info, after, popup, toLoad):

self.beingFetched -= set(toLoad)

popup.quit()

showerror(appname, 'Fetch failed: \n%s\n%s' % exc_info[:2])

printStack(exc_info)

loadingMsgsBusy.decr()

1084 | Chapter 14: The PyMailGUI Client

if exc_info[0] == mailtools.MessageSynchError: # synch inbox/index

self.onLoadServer(forceReload=True) # new thread: reload

def onLoadMsgsProgress(self, i, n, after, popup, toLoad):

popup.changeText('%d of %d' % (i, n))

def getMessage(self, msgnum):

return self.cache.getMessage(msgnum) # full mail text

def headersMaps(self):

# list of email.message.Message objects, 3.x requires list() if map()

# return [self.parseHeaders(h) for h in self.cache.allHdrs()]

return list(map(self.parseHeaders, self.cache.allHdrs()))

def mailSize(self, msgnum):

return self.cache.getSize(msgnum)

def okayToQuit(self):

# any threads still running?

filesbusy = [win for win in openSaveFiles.values() if win.openFileBusy]

busy = loadingHdrsBusy or deletingBusy or sendingBusy or loadingMsgsBusy

busy = busy or filesbusy

return not busy

ViewWindows: Message View Windows

Example 14-4 lists the implementation of mail view and edit windows. These windows

are created in response to actions in list windows—View, Write, Reply, and Forward

buttons. See the callbacks for these actions in the list window module of Exam-

ple 14-3 for view window initiation calls.

As in the prior module (Example 14-3), this file is really one common class and a handful

of customizations. The mail view window is nearly identical to the mail edit window,

used for Write, Reply, and Forward requests. Consequently, this example defines the

common appearance and behavior in the view window superclass, and extends it by

subclassing for edit windows.

Replies and forwards are hardly different from the write window here, because their

details (e.g., From and To addresses and quoted message text) are worked out in the

list window implementation before an edit window is created.

Example 14-4. PP4E\Internet\Email\PyMailGui\ViewWindows.py

"""

###############################################################################

Implementation of View, Write, Reply, Forward windows: one class per kind.

Code is factored here for reuse: a Write window is a customized View window,

and Reply and Forward are custom Write windows. Windows defined in this

file are created by the list windows, in response to user actions.

Caveat:'split' pop ups for opening parts/attachments feel nonintuitive.

2.1: this caveat was addressed, by adding quick-access attachment buttons.

New in 3.0: platform-neutral grid() for mail headers, not packed col frames.

PyMailGUI Implementation | 1085

New in 3.0: supports Unicode encodings for main text + text attachments sent.

New in 3.0: PyEdit supports arbitrary Unicode for message parts viewed.

New in 3.0: supports Unicode/mail encodings for headers in mails sent.

TBD: could avoid verifying quits unless text area modified (like PyEdit2.0),

but these windows are larger, and would not catch headers already changed.

TBD: should Open dialog in write windows be program-wide? (per-window now).

###############################################################################

"""

from SharedNames import * # program-wide global objects

###############################################################################

# message view window - also a superclass of write, reply, forward

###############################################################################

class ViewWindow(windows.PopupWindow, mailtools.MailParser):

"""

a Toplevel, with extra protocol and embedded TextEditor;

inherits saveParts,partsList from mailtools.MailParser;

mixes in custom subclass logic by direct inheritance here;

"""

# class attributes

modelabel = 'View' # used in window titles

from mailconfig import okayToOpenParts # open any attachments at all?

from mailconfig import verifyPartOpens # ask before open each part?

from mailconfig import maxPartButtons # show up to this many + '...'

from mailconfig import skipTextOnHtmlPart # 3.0: just browser, not PyEdit?

tempPartDir = 'TempParts' # where 1 selected part saved

# all view windows use same dialog: remembers last dir

partsDialog = Directory(title=appname + ': Select parts save directory')

def __init__(self, headermap, showtext, origmessage=None):

"""

header map is origmessage, or custom hdr dict for writing;

showtext is main text part of the message: parsed or custom;

origmessage is parsed email.message.Message for view mail windows

"""

windows.PopupWindow.__init__(self, appname, self.modelabel)

self.origMessage = origmessage

self.makeWidgets(headermap, showtext)

def makeWidgets(self, headermap, showtext):

"""

add headers, actions, attachments, text editor

3.0: showtext is assumed to be decoded Unicode str here;

it will be encoded on sends and saves as directed/needed;

"""

actionsframe = self.makeHeaders(headermap)

if self.origMessage and self.okayToOpenParts:

self.makePartButtons()

self.editor = textEditor.TextEditorComponentMinimal(self)

1086 | Chapter 14: The PyMailGUI Client

myactions = self.actionButtons()

for (label, callback) in myactions:

b = Button(actionsframe, text=label, command=callback)

b.config(bg='beige', relief=RIDGE, bd=2)

b.pack(side=TOP, expand=YES, fill=BOTH)

# body text, pack last=clip first

self.editor.pack(side=BOTTOM) # may be multiple editors

self.update() # 3.0: else may be @ line2

self.editor.setAllText(showtext) # each has own content

lines = len(showtext.splitlines())

lines = min(lines + 3, mailconfig.viewheight or 20)

self.editor.setHeight(lines) # else height=24, width=80

self.editor.setWidth(80) # or from PyEdit textConfig

if mailconfig.viewbg:

self.editor.setBg(mailconfig.viewbg) # colors, font in mailconfig

if mailconfig.viewfg:

self.editor.setFg(mailconfig.viewfg)

if mailconfig.viewfont: # also via editor Tools menu

self.editor.setFont(mailconfig.viewfont)

def makeHeaders(self, headermap):

"""

add header entry fields, return action buttons frame;

3.0: uses grid for platform-neutral layout of label/entry rows;

packed row frames with fixed-width labels would work well too;

3.0: decoding of i18n headers (and email names in address headers)

is performed here if still required as they are added to the GUI;

some may have been decoded already for reply/forward windows that

need to use decoded text, but the extra decode here is harmless for

these, and is required for other headers and cases such as fetched

mail views; always, headers are in decoded form when displayed in

the GUI, and will be encoded within mailtools on Sends if they are

non-ASCII (see Write); i18n header decoding also occurs in list

window mail indexes, and for headers added to quoted mail text;

text payloads in the mail body are also decoded for display and

encoded for sends elsewhere in the system (list windows, Write);

3.0: creators of edit windows prefill Bcc header with sender email

address to be picked up here, as a convenience for common usages if

this header is enabled in mailconfig; Reply also now prefills the

Cc header with all unique original recipients less From, if enabled;

"""

top = Frame(self); top.pack (side=TOP, fill=X)

left = Frame(top); left.pack (side=LEFT, expand=NO, fill=BOTH)

middle = Frame(top); middle.pack(side=LEFT, expand=YES, fill=X)

# headers set may be extended in mailconfig (Bcc, others?)

self.userHdrs = ()

showhdrs = ('From', 'To', 'Cc', 'Subject')

if hasattr(mailconfig, 'viewheaders') and mailconfig.viewheaders:

self.userHdrs = mailconfig.viewheaders

showhdrs += self.userHdrs

addrhdrs = ('From', 'To', 'Cc', 'Bcc') # 3.0: decode i18n specially

PyMailGUI Implementation | 1087

self.hdrFields = []

for (i, header) in enumerate(showhdrs):

lab = Label(middle, text=header+':', justify=LEFT)

ent = Entry(middle)

lab.grid(row=i, column=0, sticky=EW)

ent.grid(row=i, column=1, sticky=EW)

middle.rowconfigure(i, weight=1)

hdrvalue = headermap.get(header, '?') # might be empty

# 3.0: if encoded, decode per email+mime+unicode

if header not in addrhdrs:

hdrvalue = self.decodeHeader(hdrvalue)

else:

hdrvalue = self.decodeAddrHeader(hdrvalue)

ent.insert('0', hdrvalue)

self.hdrFields.append(ent) # order matters in onSend

middle.columnconfigure(1, weight=1)

return left

def actionButtons(self): # must be method for self

return [('Cancel', self.destroy), # close view window silently

('Parts', self.onParts), # multiparts list or the body

('Split', self.onSplit)]

def makePartButtons(self):

"""

add up to N buttons that open attachments/parts

when clicked; alternative to Parts/Split (2.1);

okay that temp dir is shared by all open messages:

part file not saved till later selected and opened;

partname=partname is required in lambda in Py2.4;

caveat: we could try to skip the main text part;

"""

def makeButton(parent, text, callback):

link = Button(parent, text=text, command=callback, relief=SUNKEN)

if mailconfig.partfg: link.config(fg=mailconfig.partfg)

if mailconfig.partbg: link.config(bg=mailconfig.partbg)

link.pack(side=LEFT, fill=X, expand=YES)

parts = Frame(self)

parts.pack(side=TOP, expand=NO, fill=X)

for (count, partname) in enumerate(self.partsList(self.origMessage)):

if count == self.maxPartButtons:

makeButton(parts, '...', self.onSplit)

break

openpart = (lambda partname=partname: self.onOnePart(partname))

makeButton(parts, partname, openpart)

def onOnePart(self, partname):

"""

locate selected part for button and save and open;

okay if multiple mails open: resaves each time selected;

we could probably just use web browser directly here;

caveat: tempPartDir is relative to cwd - poss anywhere;

caveat: tempPartDir is never cleaned up: might be large,

1088 | Chapter 14: The PyMailGUI Client

could use tempfile module (just like the HTML main text

part display code in onView of the list window class);

"""

try:

savedir = self.tempPartDir

message = self.origMessage

(contype, savepath) = self.saveOnePart(savedir, partname, message)

except:

showerror(appname, 'Error while writing part file')

printStack(sys.exc_info())

else:

self.openParts([(contype, os.path.abspath(savepath))]) # reuse

def onParts(self):

"""

show message part/attachments in pop-up window;

uses same file naming scheme as save on Split;

if non-multipart, single part = full body text

"""

partnames = self.partsList(self.origMessage)

msg = '\n'.join(['Message parts:\n'] + partnames)

showinfo(appname, msg)

def onSplit(self):

"""

pop up save dir dialog and save all parts/attachments there;

if desired, pop up HTML and multimedia parts in web browser,

text in TextEditor, and well-known doc types on windows;

could show parts in View windows where embedded text editor

would provide a save button, but most are not readable text;

"""

savedir = self.partsDialog.show() # class attr: at prior dir

if savedir: # tk dir chooser, not file

try:

partfiles = self.saveParts(savedir, self.origMessage)

except:

showerror(appname, 'Error while writing part files')

printStack(sys.exc_info())

else:

if self.okayToOpenParts: self.openParts(partfiles)

def askOpen(self, appname, prompt):

if not self.verifyPartOpens:

return True

else:

return askyesno(appname, prompt) # pop-up dialog

def openParts(self, partfiles):

"""

auto-open well known and safe file types, but only if verified

by the user in a pop up; other types must be opened manually

from save dir; at this point, the named parts have been already

MIME-decoded and saved as raw bytes in binary-mode files, but text

parts may be in any Unicode encoding; PyEdit needs to know the

encoding to decode, webbrowsers may have to guess or be told;

PyMailGUI Implementation | 1089

caveat: punts for type application/octet-stream even if it has

safe filename extension such as .html; caveat: image/audio/video

could be opened with the book's playfile.py; could also do that

if text viewer fails: would start notepad on Windows via startfile;

webbrowser may handle most cases here too, but specific is better;

"""

def textPartEncoding(fullfilename):

"""

3.0: map a text part filename back to charset param in content-type

header of part's Message, so we can pass this on to the PyEdit

constructor for proper text display; we could return the charset

along with content-type from mailtools for text parts, but fewer

changes are needed if this is handled as a special case here;

part content is saved in binary mode files by mailtools to avoid

encoding issues, but here the original part Message is not directly

available; we need this mapping step to extract a Unicode encoding

name if present; 4E's PyEdit now allows an explicit encoding name for

file opens, and resolves encoding on saves; see Chapter 11 for PyEdit

policies: it may ask user for an encoding if charset absent or fails;

caveat: move to mailtools.mailParser to reuse for <meta> in PyMailCGI?

"""

partname = os.path.basename(fullfilename)

for (filename, contype, part) in self.walkNamedParts(self.origMessage):

if filename == partname:

return part.get_content_charset() # None if not in header

assert False, 'Text part not found' # should never happen

for (contype, fullfilename) in partfiles:

maintype = contype.split('/')[0] # left side

extension = os.path.splitext(fullfilename)[1] # not [-4:]

basename = os.path.basename(fullfilename) # strip dir

# HTML and XML text, web pages, some media

if contype in ['text/html', 'text/xml']:

browserOpened = False

if self.askOpen(appname, 'Open "%s" in browser?' % basename):

try:

webbrowser.open_new('file://' + fullfilename)

browserOpened = True

except:

showerror(appname, 'Browser failed: trying editor')

if not browserOpened or not self.skipTextOnHtmlPart:

try:

# try PyEdit to see encoding name and effect

encoding = textPartEncoding(fullfilename)

textEditor.TextEditorMainPopup(parent=self,

winTitle=' - %s email part' % (encoding or '?'),

loadFirst=fullfilename, loadEncode=encoding)

except:

showerror(appname, 'Error opening text viewer')

# text/plain, text/x-python, etc.; 4E: encoding, may fail

1090 | Chapter 14: The PyMailGUI Client

elif maintype == 'text':

if self.askOpen(appname, 'Open text part "%s"?' % basename):

try:

encoding = textPartEncoding(fullfilename)

textEditor.TextEditorMainPopup(parent=self,

winTitle=' - %s email part' % (encoding or '?'),

loadFirst=fullfilename, loadEncode=encoding)

except:

showerror(appname, 'Error opening text viewer')

# multimedia types: Windows opens mediaplayer, imageviewer, etc.

elif maintype in ['image', 'audio', 'video']:

if self.askOpen(appname, 'Open media part "%s"?' % basename):

try:

webbrowser.open_new('file://' + fullfilename)

except:

showerror(appname, 'Error opening browser')

# common Windows documents: Word, Excel, Adobe, archives, etc.

elif (sys.platform[:3] == 'win' and

maintype == 'application' and # 3.0: +x types

extension in ['.doc', '.docx', '.xls', '.xlsx', # generalize me

'.pdf', '.zip', '.tar', '.wmv']):

if self.askOpen(appname, 'Open part "%s"?' % basename):

os.startfile(fullfilename)

else: # punt!

msg = 'Cannot open part: "%s"\nOpen manually in: "%s"'

msg = msg % (basename, os.path.dirname(fullfilename))

showinfo(appname, msg)

###############################################################################

# message edit windows - write, reply, forward

###############################################################################

if mailconfig.smtpuser: # user set in mailconfig?

MailSenderClass = mailtools.MailSenderAuth # login/password required

else:

MailSenderClass = mailtools.MailSender

class WriteWindow(ViewWindow, MailSenderClass):

"""

customize view display for composing new mail

inherits sendMessage from mailtools.MailSender

"""

modelabel = 'Write'

def __init__(self, headermap, starttext):

ViewWindow.__init__(self, headermap, starttext)

MailSenderClass.__init__(self)

self.attaches = [] # each win has own open dialog

self.openDialog = None # dialog remembers last dir

PyMailGUI Implementation | 1091

def actionButtons(self):

return [('Cancel', self.quit), # need method to use self

('Parts', self.onParts), # PopupWindow verifies cancel

('Attach', self.onAttach),

('Send', self.onSend)] # 4E: don't pad: centered

def onParts(self):

# caveat: deletes not currently supported

if not self.attaches:

showinfo(appname, 'Nothing attached')

else:

msg = '\n'.join(['Already attached:\n'] + self.attaches)

showinfo(appname, msg)

def onAttach(self):

"""

attach a file to the mail: name added here will be

added as a part on Send, inside the mailtools pkg;

4E: could ask Unicode type here instead of on send

"""

if not self.openDialog:

self.openDialog = Open(title=appname + ': Select Attachment File')

filename = self.openDialog.show() # remember prior dir

if filename:

self.attaches.append(filename) # to be opened in send method

def resolveUnicodeEncodings(self):

"""

3.0/4E: to prepare for send, resolve Unicode encoding for text parts:

both main text part, and any text part attachments; the main text part

may have had a known encoding if this is a reply or forward, but not for

a write, and it may require a different encoding after editing anyhow;

smtplib in 3.1 requires that full message text be encodable per ASCII

when sent (if it's a str), so it's crucial to get this right here; else

fails if reply/fwd to UTF8 text when config=ascii if any non-ascii chars;

try user setting and reply but fall back on general UTF8 as a last resort;

"""

def isTextKind(filename):

contype, encoding = mimetypes.guess_type(filename)

if contype is None or encoding is not None: # 4E utility

return False # no guess, compressed?

maintype, subtype = contype.split('/', 1) # check for text/?

return maintype == 'text'

# resolve many body text encoding

bodytextEncoding = mailconfig.mainTextEncoding

if bodytextEncoding == None:

asknow = askstring('PyMailGUI', 'Enter main text Unicode encoding name')

bodytextEncoding = asknow or 'latin-1' # or sys.getdefaultencoding()?

# last chance: use utf-8 if can't encode per prior selections

if bodytextEncoding != 'utf-8':

try:

1092 | Chapter 14: The PyMailGUI Client

bodytext = self.editor.getAllText()

bodytext.encode(bodytextEncoding)

except (UnicodeError, LookupError): # lookup: bad encoding name

bodytextEncoding = 'utf-8' # general code point scheme

# resolve any text part attachment encodings

attachesEncodings = []

config = mailconfig.attachmentTextEncoding

for filename in self.attaches:

if not isTextKind(filename):

attachesEncodings.append(None) # skip non-text: don't ask

elif config != None:

attachesEncodings.append(config) # for all text parts if set

else:

prompt = 'Enter Unicode encoding name for %' % filename

asknow = askstring('PyMailGUI', prompt)

attachesEncodings.append(asknow or 'latin-1')

# last chance: use utf-8 if can't decode per prior selections

choice = attachesEncodings[-1]

if choice != None and choice != 'utf-8':

try:

attachbytes = open(filename, 'rb').read()

attachbytes.decode(choice)

except (UnicodeError, LookupError, IOError):

attachesEncodings[-1] = 'utf-8'

return bodytextEncoding, attachesEncodings

def onSend(self):

"""

threaded: mail edit window Send button press;

may overlap with any other thread, disables none but quit;

Exit,Fail run by threadChecker via queue in after callback;

caveat: no progress here, because send mail call is atomic;

assumes multiple recipient addrs are separated with ',';

mailtools module handles encodings, attachments, Date, etc;

mailtools module also saves sent message text in a local file

3.0: now fully parses To,Cc,Bcc (in mailtools) instead of

splitting on the separator naively; could also use multiline

input widgets instead of simple entry; Bcc added to envelope,

not headers;

3.0: Unicode encodings of text parts is resolved here, because

it may require GUI prompts; mailtools performs the actual

encoding for parts as needed and requested;

3.0: i18n headers are already decoded in the GUI fields here;

encoding of any non-ASCII i18n headers is performed in mailtools,

not here, because no GUI interaction is required;

"""

# resolve Unicode encoding for text parts;

bodytextEncoding, attachesEncodings = self.resolveUnicodeEncodings()

PyMailGUI Implementation | 1093

# get components from GUI; 3.0: i18n headers are decoded

fieldvalues = [entry.get() for entry in self.hdrFields]

From, To, Cc, Subj = fieldvalues[:4]

extraHdrs = [('Cc', Cc), ('X-Mailer', appname + ' (Python)')]

extraHdrs += list(zip(self.userHdrs, fieldvalues[4:]))

bodytext = self.editor.getAllText()

# split multiple recipient lists on ',', fix empty fields

Tos = self.splitAddresses(To)

for (ix, (name, value)) in enumerate(extraHdrs):

if value: # ignored if ''

if value == '?': # ? not replaced

extraHdrs[ix] = (name, '')

elif name.lower() in ['cc', 'bcc']: # split on ','

extraHdrs[ix] = (name, self.splitAddresses(value))

# withdraw to disallow send during send

# caveat: might not be foolproof - user may deiconify if icon visible

self.withdraw()

self.getPassword() # if needed; don't run pop up in send thread!

popup = popuputil.BusyBoxNowait(appname, 'Sending message')

sendingBusy.incr()

threadtools.startThread(

action = self.sendMessage,

args = (From, Tos, Subj, extraHdrs, bodytext, self.attaches,

saveMailSeparator,

bodytextEncoding,

attachesEncodings),

context = (popup,),

onExit = self.onSendExit,

onFail = self.onSendFail)

def onSendExit(self, popup):

"""

erase wait window, erase view window, decr send count;

sendMessage call auto saves sent message in local file;

can't use window.addSavedMails: mail text unavailable;

"""

popup.quit()

self.destroy()

sendingBusy.decr()

# poss \ when opened, / in mailconfig

sentname = os.path.abspath(mailconfig.sentmailfile) # also expands '.'

if sentname in openSaveFiles.keys(): # sent file open?

window = openSaveFiles[sentname] # update list,raise

window.loadMailFileThread()

def onSendFail(self, exc_info, popup):

# pop-up error, keep msg window to save or retry, redraw actions frame

popup.quit()

self.deiconify()

self.lift()

showerror(appname, 'Send failed: \n%s\n%s' % exc_info[:2])

printStack(exc_info)

1094 | Chapter 14: The PyMailGUI Client

MailSenderClass.smtpPassword = None # try again; 3.0/4E: not on self

sendingBusy.decr()

def askSmtpPassword(self):

"""

get password if needed from GUI here, in main thread;

caveat: may try this again in thread if no input first

time, so goes into a loop until input is provided; see

pop paswd input logic for a nonlooping alternative

"""

password = ''

while not password:

prompt = ('Password for %s on %s?' %

(self.smtpUser, self.smtpServerName))

password = popuputil.askPasswordWindow(appname, prompt)

return password

class ReplyWindow(WriteWindow):

"""

customize write display for replying

text and headers set up by list window

"""

modelabel = 'Reply'

class ForwardWindow(WriteWindow):

"""

customize reply display for forwarding

text and headers set up by list window

"""

modelabel = 'Forward'

messagecache: Message Cache Manager

The class in Example 14-5 implements a cache for already loaded messages. Its logic is

split off into this file in order to avoid further complicating list window implementa-

tions. The server list window creates and embeds an instance of this class to interface

with the mail server and to keep track of already loaded mail headers and full text. In

this version, the server list window also keeps track of mail fetches in progress, to avoid

attempting to load the same mail more than once in parallel. This task isn’t performed

here, because it may require GUI operations.

Example 14-5. PP4E\Internet\Email\PyMailGui\messagecache.py

"""

##############################################################################

manage message and header loads and context, but not GUI;

a MailFetcher, with a list of already loaded headers and messages;

the caller must handle any required threading or GUI interfaces;

3.0 change: use full message text Unicode encoding name in local

mailconfig module; decoding happens deep in mailtools, when a message

PyMailGUI Implementation | 1095

is fetched - mail text is always Unicode str from that point on;

this may change in a future Python/email: see Chapter 13 for details;

3.0 change: inherits the new mailconfig.fetchlimit feature of mailtools,

which can be used to limit the maximum number of most recent headers or

full mails (if no TOP) fetched on each load request; note that this

feature is independent of the loadfrom used here to limit loads to

newly-arrived mails only, though it is applied at the same time: at

most fetchlimit newly-arrived mails are loaded;

3.0 change: though unlikely, it's not impossible that a user may trigger a

new fetch of a message that is currently being fetched in a thread, simply

by clicking the same message again (msg fetches, but not full index loads,

may overlap with other fetches and sends); this seems to be thread safe here,

but can lead to redundant and possibly parallel downloads of the same mail

which are pointless and seem odd (selecting all mails and pressing View

twice downloads most messages twice!); fixed by keeping track of fetches in

progress in the main GUI thread so that this overlap is no longer possible:

a message being fetched disables any fetch request which it is part of, and

parallel fetches are still allowed as long as their targets do not intersect;

##############################################################################

"""

from PP4E.Internet.Email import mailtools

from popuputil import askPasswordWindow

class MessageInfo:

"""

an item in the mail cache list

"""

def __init__(self, hdrtext, size):

self.hdrtext = hdrtext # fulltext is cached msg

self.fullsize = size # hdrtext is just the hdrs

self.fulltext = None # fulltext=hdrtext if no TOP

class MessageCache(mailtools.MailFetcher):

"""

keep track of already loaded headers and messages;

inherits server transfer methods from MailFetcher;

useful in other apps: no GUI or thread assumptions;

3.0: raw mail text bytes are decoded to str to be

parsed with Py3.1's email pkg and saved to files;

uses the local mailconfig module's encoding setting;

decoding happens automatically in mailtools on fetch;

"""

def __init__(self):

mailtools.MailFetcher.__init__(self) # 3.0: inherits fetchEncoding

self.msglist = [] # 3.0: inherits fetchlimit

def loadHeaders(self, forceReloads, progress=None):

"""

three cases to handle here: the initial full load,

1096 | Chapter 14: The PyMailGUI Client

load newly arrived, and forced reload after delete;

don't refetch viewed msgs if hdrs list same or extended;

retains cached msgs after a delete unless delete fails;

2.1: does quick check to see if msgnums still in sync

3.0: this is now subject to mailconfig.fetchlimit max;

"""

if forceReloads:

loadfrom = 1

self.msglist = [] # msg nums have changed

else:

loadfrom = len(self.msglist)+1 # continue from last load

# only if loading newly arrived

if loadfrom != 1:

self.checkSynchError(self.allHdrs()) # raises except if bad

# get all or newly arrived msgs

reply = self.downloadAllHeaders(progress, loadfrom)

headersList, msgSizes, loadedFull = reply

for (hdrs, size) in zip(headersList, msgSizes):

newmsg = MessageInfo(hdrs, size)

if loadedFull: # zip result may be empty

newmsg.fulltext = hdrs # got full msg if no 'top'

self.msglist.append(newmsg)

def getMessage(self, msgnum): # get raw msg text

cacheobj = self.msglist[msgnum-1] # add to cache if fetched

if not cacheobj.fulltext: # harmless if threaded

fulltext = self.downloadMessage(msgnum) # 3.0: simpler coding

cacheobj.fulltext = fulltext

return cacheobj.fulltext

def getMessages(self, msgnums, progress=None):

"""

prefetch full raw text of multiple messages, in thread;

2.1: does quick check to see if msgnums still in sync;

we can't get here unless the index list already loaded;

"""

self.checkSynchError(self.allHdrs()) # raises except if bad

nummsgs = len(msgnums) # adds messages to cache

for (ix, msgnum) in enumerate(msgnums): # some poss already there

if progress: progress(ix+1, nummsgs) # only connects if needed

self.getMessage(msgnum) # but may connect > once

def getSize(self, msgnum): # encapsulate cache struct

return self.msglist[msgnum-1].fullsize # it changed once already!

def isLoaded(self, msgnum):

return self.msglist[msgnum-1].fulltext

def allHdrs(self):

return [msg.hdrtext for msg in self.msglist]

def deleteMessages(self, msgnums, progress=None):

PyMailGUI Implementation | 1097

"""

if delete of all msgnums works, remove deleted entries

from mail cache, but don't reload either the headers list

or already viewed mails text: cache list will reflect the

changed msg nums on server; if delete fails for any reason,

caller should forceably reload all hdrs next, because _some_

server msg nums may have changed, in unpredictable ways;

2.1: this now checks msg hdrs to detect out of synch msg

numbers, if TOP supported by mail server; runs in thread

"""

try:

self.deleteMessagesSafely(msgnums, self.allHdrs(), progress)

except mailtools.TopNotSupported:

mailtools.MailFetcher.deleteMessages(self, msgnums, progress)

# no errors: update index list

indexed = enumerate(self.msglist)

self.msglist = [msg for (ix, msg) in indexed if ix+1 not in msgnums]

class GuiMessageCache(MessageCache):

"""

add any GUI-specific calls here so cache usable in non-GUI apps

"""

def setPopPassword(self, appname):

"""

get password from GUI here, in main thread

forceably called from GUI to avoid pop ups in threads

"""

if not self.popPassword:

prompt = 'Password for %s on %s?' % (self.popUser, self.popServer)

self.popPassword = askPasswordWindow(appname, prompt)

def askPopPassword(self):

"""

but don't use GUI pop up here: I am run in a thread!

when tried pop up in thread, caused GUI to hang;

may be called by MailFetcher superclass, but only

if passwd is still empty string due to dialog close

"""

return self.popPassword

popuputil: General-Purpose GUI Pop Ups

Example 14-6 implements a handful of utility pop-up windows in a module, in case

they ever prove useful in other programs. Note that the same windows utility module

is imported here, to give a common look-and-feel to the pop ups (icons, titles, and

so on).

1098 | Chapter 14: The PyMailGUI Client

Example 14-6. PP4E\Internet\Email\PyMailGui\popuputil.py

"""

#############################################################################

utility windows - may be useful in other programs

#############################################################################

"""

from tkinter import *

from PP4E.Gui.Tools.windows import PopupWindow

class HelpPopup(PopupWindow):

"""

custom Toplevel that shows help text as scrolled text

source button runs a passed-in callback handler

3.0 alternative: use HTML file and webbrowser module

"""

myfont = 'system' # customizable

mywidth = 78 # 3.0: start width

def __init__(self, appname, helptext, iconfile=None, showsource=lambda:0):

PopupWindow.__init__(self, appname, 'Help', iconfile)

from tkinter.scrolledtext import ScrolledText # a nonmodal dialog

bar = Frame(self) # pack first=clip last

bar.pack(side=BOTTOM, fill=X)

code = Button(bar, bg='beige', text="Source", command=showsource)

quit = Button(bar, bg='beige', text="Cancel", command=self.destroy)

code.pack(pady=1, side=LEFT)

quit.pack(pady=1, side=LEFT)

text = ScrolledText(self) # add Text + scrollbar

text.config(font=self.myfont)

text.config(width=self.mywidth) # too big for showinfo

text.config(bg='steelblue', fg='white') # erase on btn or return

text.insert('0.0', helptext)

text.pack(expand=YES, fill=BOTH)

self.bind("<Return>", (lambda event: self.destroy()))

def askPasswordWindow(appname, prompt):

"""

modal dialog to input password string

getpass.getpass uses stdin, not GUI

tkSimpleDialog.askstring echos input

"""

win = PopupWindow(appname, 'Prompt') # a configured Toplevel

Label(win, text=prompt).pack(side=LEFT)

entvar = StringVar(win)

ent = Entry(win, textvariable=entvar, show='*') # display * for input

ent.pack(side=RIGHT, expand=YES, fill=X)

ent.bind('<Return>', lambda event: win.destroy())

ent.focus_set(); win.grab_set(); win.wait_window()

win.update() # update forces redraw

return entvar.get() # ent widget is now gone

PyMailGUI Implementation | 1099

class BusyBoxWait(PopupWindow):

"""

pop up blocking wait message box: thread waits

main GUI event thread stays alive during wait

but GUI is inoperable during this wait state;

uses quit redef here because lower, not leftmost;

"""

def __init__(self, appname, message):

PopupWindow.__init__(self, appname, 'Busy')

self.protocol('WM_DELETE_WINDOW', lambda:0) # ignore deletes

label = Label(self, text=message + '...') # win.quit() to erase

label.config(height=10, width=40, cursor='watch') # busy cursor

label.pack()

self.makeModal()

self.message, self.label = message, label

def makeModal(self):

self.focus_set() # grab application

self.grab_set() # wait for threadexit

def changeText(self, newtext):

self.label.config(text=self.message + ': ' + newtext)

def quit(self):

self.destroy() # don't verify quit

class BusyBoxNowait(BusyBoxWait):

"""

pop up nonblocking wait window

call changeText to show progress, quit to close

"""

def makeModal(self):

pass

if __name__ == '__main__':

HelpPopup('spam', 'See figure 1...\n')

print(askPasswordWindow('spam', 'enter password'))

input('Enter to exit') # pause if clicked

wraplines: Line Split Tools

The module in Example 14-7 implements general tools for wrapping long lines, at either

a fixed column or the first delimiter at or before a fixed column. PyMailGUI uses this

file’s wrapText1 function for text in view, reply, and forward windows, but this code is

potentially useful in other programs. Run the file as a script to watch its self-test code

at work, and study its functions to see its text-processing logic.

Example 14-7. PP4E\Internet\Email\PyMailGui\wraplines.py

"""

###############################################################################

split lines on fixed columns or at delimiters before a column;

see also: related but different textwrap standard library module (2.3+);

4E caveat: this assumes str; supporting bytes might help avoid some decodes;

###############################################################################

"""

1100 | Chapter 14: The PyMailGUI Client

defaultsize = 80

def wrapLinesSimple(lineslist, size=defaultsize):

"split at fixed position size"

wraplines = []

for line in lineslist:

while True:

wraplines.append(line[:size]) # OK if len < size

line = line[size:] # split without analysis

if not line: break

return wraplines

def wrapLinesSmart(lineslist, size=defaultsize, delimiters='.,:\t '):

"wrap at first delimiter left of size"

wraplines = []

for line in lineslist:

while True:

if len(line) <= size:

wraplines += [line]

break

else:

for look in range(size-1, size // 2, −1): # 3.0: need // not /

if line[look] in delimiters:

front, line = line[:look+1], line[look+1:]

break

else:

front, line = line[:size], line[size:]

wraplines += [front]

return wraplines

###############################################################################

# common use case utilities

###############################################################################

def wrapText1(text, size=defaultsize): # better for line-based txt: mail

"when text read all at once" # keeps original line brks struct

lines = text.split('\n') # split on newlines

lines = wrapLinesSmart(lines, size) # wrap lines on delimiters

return '\n'.join(lines) # put back together

def wrapText2(text, size=defaultsize): # more uniform across lines

"same, but treat as one long line" # but loses original line struct

text = text.replace('\n', ' ') # drop newlines if any

lines = wrapLinesSmart([text], size) # wrap single line on delimiters

return lines # caller puts back together

def wrapText3(text, size=defaultsize):

"same, but put back together"

lines = wrapText2(text, size) # wrap as single long line

return '\n'.join(lines) + '\n' # make one string with newlines

def wrapLines1(lines, size=defaultsize):

"when newline included at end"

lines = [line[:-1] for line in lines] # strip off newlines (or .rstrip)

lines = wrapLinesSmart(lines, size) # wrap on delimiters

PyMailGUI Implementation | 1101

return [(line + '\n') for line in lines] # put them back

def wrapLines2(lines, size=defaultsize): # more uniform across lines

"same, but concat as one long line" # but loses original structure

text = ''.join(lines) # put together as 1 line

lines = wrapText2(text) # wrap on delimiters

return [(line + '\n') for line in lines] # put newlines on ends

###############################################################################

# self-test

###############################################################################

if __name__ == '__main__':

lines = ['spam ham ' * 20 + 'spam,ni' * 20,

'spam ham ' * 20,

'spam,ni' * 20,

'spam ham.ni' * 20,

'',

'spam'*80,

' ',

'spam ham eggs']

sep = '-' * 30

print('all', sep)

for line in lines: print(repr(line))

print('simple', sep)

for line in wrapLinesSimple(lines): print(repr(line))

print('smart', sep)

for line in wrapLinesSmart(lines): print(repr(line))

print('single1', sep)

for line in wrapLinesSimple([lines[0]], 60): print(repr(line))

print('single2', sep)

for line in wrapLinesSmart([lines[0]], 60): print(repr(line))

print('combined text', sep)

for line in wrapLines2(lines): print(repr(line))

print('combined lines', sep)

print(wrapText1('\n'.join(lines)))

assert ''.join(lines) == ''.join(wrapLinesSimple(lines, 60))

assert ''.join(lines) == ''.join(wrapLinesSmart(lines, 60))

print(len(''.join(lines)), end=' ')

print(len(''.join(wrapLinesSimple(lines))), end=' ')

print(len(''.join(wrapLinesSmart(lines))), end=' ')

print(len(''.join(wrapLinesSmart(lines, 60))), end=' ')

input('Press enter') # pause if clicked

html2text: Extracting Text from HTML (Prototype, Preview)

Example 14-8 lists the code of the simple-minded HTML parser that PyMailGUI uses

to extract plain text from mails whose main (or only) text part is in HTML form. This

extracted text is used both for display and for the initial text in replies and forwards.

1102 | Chapter 14: The PyMailGUI Client

Its original HTML form is also displayed in its full glory in a popped-up web browser

as before.

This is a prototype. Because PyMailGUI is oriented toward plain text today, this parser

is intended as a temporary workaround until a HTML viewer/editor widget solution is

found. Because of that, this is at best a first cut which has not been polished to any

significant extent. Robustly parsing HTML in its entirety is a task well beyond the scope

of this chapter and book. When this parser fails to render good plain text (and it will!),

users can still view and cut-and-paste the properly formatted text from the web browser.

This is also a preview. HTML parsing is not covered until Chapter 19 of this book, so

you’ll have to take this on faith until we refer back to it in that later chapter. Unfortu-

nately, this feature was added to PyMailGUI late in the book project, and avoiding this

forward reference didn’t seem to justify omitting the improvement altogether. For more

details on HTML parsing, stay tuned for (or flip head to) Chapter 19.

In short, the class here provides handler methods that receive callbacks from an HTML

parser, as tags and their content is recognized; we use this model here to save text we’re

interested in along the way. Besides the parser class, we could also use Python’s

html.entities module to map more entity types than are hardcoded here—another

tool we will meet in Chapter 19.

Despite its limitations, this example serves as a rough guide to help get you started, and

any result it produces is certainly an improvement upon the prior edition’s display and

quoting of raw HTML.

Example 14-8. PP4E\Internet\Email\PyMailGui\html2text.py

"""

################################################################

*VERY* preliminary html-to-text extractor, for text to be

quoted in replies and forwards, and displayed in the main

text display component. Only used when the main text part

is HTML (i.e., no alternative or other text parts to show).

We also need to know if this is HTML or not, but findMainText

already returns the main text's content type.

This is mostly provided as a first cut, to help get you started

on a more complete solution. It hasn't been polished, because

any result is better than displaying raw HTML, and it's probably

a better idea to migrate to an HTML viewer/editor widget in the

future anyhow. As is, PyMailGUI is still plain-text biased.

If (really, when) this parser fails to render well, users can

instead view and cut-and-paste from the web browser popped up

to display the HTML. See Chapter 19 for more on HTML parsing.

################################################################

"""

from html.parser import HTMLParser # Python std lib parser (sax-like model)

class Parser(HTMLParser): # subclass parser, define callback methods

def __init__(self): # text assumed to be str, any encoding ok

PyMailGUI Implementation | 1103

HTMLParser.__init__(self)

self.text = '[Extracted HTML text]'

self.save = 0

self.last = ''

def addtext(self, new):

if self.save > 0:

self.text += new

self.last = new

def addeoln(self, force=False):

if force or self.last != '\n':

self.addtext('\n')

def handle_starttag(self, tag, attrs): # + others imply content start?

if tag in ('p', 'div', 'table', 'h1', 'h2', 'li'):

self.save += 1

self.addeoln()

elif tag == 'td':

self.addeoln()

elif tag == 'style': # + others imply end of prior?

self.save -= 1

elif tag == 'br':

self.addeoln(True)

elif tag == 'a':

alts = [pair for pair in attrs if pair[0] == 'alt']

if alts:

name, value = alts[0]

self.addtext('[' + value.replace('\n', '') + ']')

def handle_endtag(self, tag):

if tag in ('p', 'div', 'table', 'h1', 'h2', 'li'):

self.save -= 1

self.addeoln()

elif tag == 'style':

self.save += 1

def handle_data(self, data):

data = data.replace('\n', '') # what about <PRE>?

data = data.replace('\t', ' ')

if data != ' ' * len(data):

self.addtext(data)

def handle_entityref(self, name):

xlate = dict(lt='<', gt='>', amp='&', nbsp='').get(name, '?')

if xlate:

self.addtext(xlate) # plus many others: show ? as is

def html2text(text):

try:

hp = Parser()

hp.feed(text)

return(hp.text)

except:

return text

1104 | Chapter 14: The PyMailGUI Client

if __name__ == '__main__':

# to test me: html2text.py media\html2text-test\htmlmail1.html

# parse file name in commandline, display result in tkinter Text

# file assumed to be in Unicode platform default, but text need not be

import sys, tkinter

text = open(sys.argv[1], 'r').read()

text = html2text(text)

t = tkinter.Text()

t.insert('1.0', text)

t.pack()

t.mainloop()

After this example and chapter had been written and finalized, I did a

search for HTML-to-text translators on the Web to try to find better

options, and I discovered a Python-coded solution which is much more

complete and robust than the simple prototype script here. Regrettably,

I also discovered that this system is named the same as the script

listed here!

This was unintentional and unforeseen (alas, developers are predis-

posed to think alike). For details on this more widely tested and much

better alternative, search the Web for html2text. It’s open source, but

follows the GPL license, and is available only for Python 2.X at this

writing (e.g., it uses the 2.X sgmllib which has been removed in favor

of the new html.parser in 3.X). Unfortunately, its GPL license may raise

package or otherwise; worse, its 2.X status means it cannot be used at

all with this book’s 3.X examples today.

There are additional plain-text extractor options on the Web worth

checking out, including BeautifulSoup and yet another named

html2text.py (no, really!). They also appear to be available for just 2.X

today, though naturally, this story may change by the time you read this

note. There’s no reason to reinvent the wheel, unless existing wheels

don’t fit your cart!

mailconfig: User Configurations

In Example 14-9, PyMailGUI’s mailconfig user settings module is listed. This program

has its own version of this module because many of its settings are unique for PyMail-

GUI. To use the program for reading your own email, set its initial variables to reflect

your POP and SMTP server names and login parameters. The variables in this module

also allow the user to tailor the appearance and operation of the program without

finding and editing actual program logic.

As is, this is a single-account configuration. We could generalize this module’s code to

allow for multiple email accounts, selected by input at the console when first imported;

PyMailGUI Implementation | 1105

in an upcoming section we’ll see a different approach that allows this module to be

extended externally.

Example 14-9. PP4E\Internet\Email\PyMailGui\mailconfig.py

"""

################################################################################

PyMailGUI user configuration settings.

Email scripts get their server names and other email config options from

this module: change me to reflect your machine names, sig, and preferences.

This module also specifies some widget style preferences applied to the GUI,

as well as message Unicode encoding policy and more in version 3.0. See

also: local textConfig.py, for customizing PyEdit pop-ups made by PyMailGUI.

Warning: PyMailGUI won't run without most variables here: make a backup copy!

Caveat: somewhere along the way this started using mixed case inconsistently...;

TBD: we could get some user settings from the command line too, and a configure

dialog GUI would be better, but this common module file suffices for now.

################################################################################

"""

#-------------------------------------------------------------------------------

# (required for load, delete) POP3 email server machine, user;

#-------------------------------------------------------------------------------

#popservername = '?Please set your mailconfig.py attributes?'

popservername = 'pop.secureserver.net' # see altconfigs/ for others

popusername = 'PP4E@learning-python.com'

#-------------------------------------------------------------------------------

# (required for send) SMTP email server machine name;

# see Python smtpd module for a SMTP server class to run locally ('localhost');

# note: your ISP may require that you be directly connected to their system:

# I once could email through Earthlink on dial up, but not via Comcast cable;

#-------------------------------------------------------------------------------

smtpservername = 'smtpout.secureserver.net'

#-------------------------------------------------------------------------------

# (optional) personal information used by PyMailGUI to fill in edit forms;

# if not set, does not fill in initial form values;

# signature -- can be a triple-quoted block, ignored if empty string;

# address -- used for initial value of "From" field if not empty,

# no longer tries to guess From for replies--varying success;

#-------------------------------------------------------------------------------

myaddress = 'PP4E@learning-python.com'

mysignature = ('Thanks,\n'

'--Mark Lutz (http://learning-python.com, http://rmi.net/~lutz)')

#-------------------------------------------------------------------------------

# (may be required for send) SMTP user/password if authenticated;

# set user to None or '' if no login/authentication is required, and set

1106 | Chapter 14: The PyMailGUI Client

# pswd to name of a file holding your SMTP password, or an empty string to

# force programs to ask (in a console, or GUI)

#-------------------------------------------------------------------------------

smtpuser = None # per your ISP

smtppasswdfile = '' # set to '' to be asked

#smtpuser = popusername

#-------------------------------------------------------------------------------

# (optional) PyMailGUI: name of local one-line text file with your POP

# password; if empty or file cannot be read, pswd is requested when first

# connecting; pswd not encrypted: leave this empty on shared machines;

# PyMailCGI always asks for pswd (runs on a possibly remote server);

#-------------------------------------------------------------------------------

poppasswdfile = r'c:\temp\pymailgui.txt' # set to '' to be asked

#-------------------------------------------------------------------------------

# (required) local file where sent messages are always saved;

# PyMailGUI 'Open' button allows this file to be opened and viewed;

# don't use '.' form if may be run from another dir: e.g., pp4e demos

#-------------------------------------------------------------------------------

#sentmailfile = r'.\sentmail.txt' # . means in current working dir

#sourcedir = r'C:\...\PP4E\Internet\Email\PyMailGui\'

#sentmailfile = sourcedir + 'sentmail.txt'

# determine automatically from one of my source files

import wraplines, os

mysourcedir = os.path.dirname(os.path.abspath(wraplines.__file__))

sentmailfile = os.path.join(mysourcedir, 'sentmail.txt')

#-------------------------------------------------------------------------------

# (defunct) local file where pymail saves POP mail (full text);

# PyMailGUI instead asks for a name in GUI with a pop-up dialog;

# Also asks for Split directory, and part buttons save in ./TempParts;

#-------------------------------------------------------------------------------

#savemailfile = r'c:\temp\savemail.txt' # not used in PyMailGUI: dialog

#-------------------------------------------------------------------------------

# (optional) customize headers displayed in PyMailGUI list and view windows;

# listheaders replaces default, viewheaders extends it; both must be tuple of

# strings, or None to use default hdrs;

#-------------------------------------------------------------------------------

listheaders = ('Subject', 'From', 'Date', 'To', 'X-Mailer')

viewheaders = ('Bcc',)

#-------------------------------------------------------------------------------

# (optional) PyMailGUI fonts and colors for text server/file message list

# windows, message content view windows, and view window attachment buttons;

# use ('family', size, 'style') for font; 'colorname' or hexstr '#RRGGBB' for

PyMailGUI Implementation | 1107

# color (background, foreground); None means use defaults; font/color of

# view windows can also be set interactively with texteditor's Tools menu;

# see also the setcolor.py example in the GUI part (ch8) for custom colors;

#-------------------------------------------------------------------------------

listbg = 'indianred' # None, 'white', '#RRGGBB'

listfg = 'black'

listfont = ('courier', 9, 'bold') # None, ('courier', 12, 'bold italic')

# use fixed-width font for list columns

viewbg = 'light blue' # was '#dbbedc'

viewfg = 'black'

viewfont = ('courier', 10, 'bold')

viewheight = 18 # max lines for height when opened (20)

partfg = None

partbg = None

# see Tk color names: aquamarine paleturquoise powderblue goldenrod burgundy ....

#listbg = listfg = listfont = None

#viewbg = viewfg = viewfont = viewheight = None # to use defaults

#partbg = partfg = None

#-------------------------------------------------------------------------------

# (optional) column at which mail's original text should be wrapped for view,

# reply, and forward; wraps at first delimiter to left of this position;

# composed text is not auto-wrapped: user or recipient's mail tool must wrap

# new text if desired; to disable wrapping, set this to a high value (1024?);

#-------------------------------------------------------------------------------

wrapsz = 90

#-------------------------------------------------------------------------------

# (optional) control how PyMailGUI opens mail parts in the GUI;

# for view window Split actions and attachment quick-access buttons;

# if not okayToOpenParts, quick-access part buttons will not appear in

# the GUI, and Split saves parts in a directory but does not open them;

# verifyPartOpens used by both Split action and quick-access buttons:

# all known-type parts open automatically on Split if this set to False;

# verifyHTMLTextOpen used by web browser open of HTML main text part:

#-------------------------------------------------------------------------------

okayToOpenParts = True # open any parts/attachments at all?

verifyPartOpens = False # ask permission before opening each part?

verifyHTMLTextOpen = False # if main text part is HTML, ask before open?

#-------------------------------------------------------------------------------

# (optional) the maximum number of quick-access mail part buttons to show

# in the middle of view windows; after this many, a "..." button will be

# displayed, which runs the "Split" action to extract additional parts;

#-------------------------------------------------------------------------------

maxPartButtons = 8 # how many part buttons in view windows

# *** 3.0 additions follow ***

1108 | Chapter 14: The PyMailGUI Client

#-------------------------------------------------------------------------------

# (required, for fetch) the Unicode encoding used to decode fetched full message

# bytes, and to encode and decode message text stored in text-mode save files; see

# the book's Chapter 13 for details: this is a limited and temporary approach to

# Unicode encodings until a new bytes-friendly email package parser is provided

# which can handle Unicode encodings more accurately on a message-level basis;

# note: 'latin1' (an 8-bit encoding which is a superset of 7-bit ascii) was

# required to decode message in some old email save files I had, not 'utf8';

#-------------------------------------------------------------------------------

fetchEncoding = 'latin-1' # how to decode and store full message text (ascii?)

#-------------------------------------------------------------------------------

# (optional, for send) Unicode encodings for composed mail's main text plus all

# text attachments; set these to None to be prompted for encodings on mail send,

# else uses values here across entire session; default='latin-1' if GUI Cancel;

# in all cases, falls back on UTF-8 if your encoding setting or input does not

# work for the text being sent (e.g., ascii chosen for reply to non-ascii text,

# or non-ascii attachments); the email package is pickier than Python about

# names: latin-1 is known (uses qp MIME), but latin1 isn't (uses base64 MIME);

# set these to sys.getdefaultencoding() result to choose the platform default;

# encodings of text parts of fetched email are automatic via message headers;

#-------------------------------------------------------------------------------

mainTextEncoding = 'ascii' # main mail body text part sent (None=ask)

attachmentTextEncoding = 'ascii' # all text part attachments sent (utf-8, latin-1)

#-------------------------------------------------------------------------------

# (optional, for send) set this to a Unicode encoding name to be applied to

# non-ASCII headers, as well as non-ASCII names in email addresses in headers,

# in composed messages when they are sent; None means use the UTF-8 default,

# which should work for most use cases; email names that fail to decode are

# dropped (the address part is used); note that header decoding is performed

# automatically for display, according to header content, not user setting;

#-------------------------------------------------------------------------------

headersEncodeTo = None # how to encode non-ASCII headers sent (None=UTF-8)

#-------------------------------------------------------------------------------

# (optional) select text, HTML, or both versions of the help document;

# always shows one or the other: displays HTML if both of these are turned off

#-------------------------------------------------------------------------------

showHelpAsText = True # scrolled text, with button for opening source files

showHelpAsHTML = True # HTML in a web browser, without source file links

#-------------------------------------------------------------------------------

# (optional) if True, don't show a selected HTML text message part in a PyEdit

# popup too if it is being displayed in a web browser; if False show both, to

# see Unicode encoding name and effect in a text widget (browser may not know);

#-------------------------------------------------------------------------------

skipTextOnHtmlPart = False # don't show html part in PyEdit popup too

#-------------------------------------------------------------------------------

PyMailGUI Implementation | 1109

# (optional) the maximum number of mail headers or messages that will be

# downloaded on each load request; given this setting N, PyMailGUI fetches at

# most N of the most recently arrived mails; older mails outside this set are

# not fetched from the server, but are displayed as empty/dummy emails; if this

# is assigned to None (or 0), loads will have no such limit; use this if you

# have very many mails in your inbox, and your Internet or mail server speed

# makes full loads too slow to be practical; PyMailGUI also loads only

# newly-arrived headers, but this setting is independent of that feature;

#-------------------------------------------------------------------------------

fetchlimit = 50 # maximum number headers/emails to fetch on loads

#-------------------------------------------------------------------------------

# (optional) initial width, height of mail index lists (chars x lines); just

# a convenience, since the window can be resized/expanded freely once opened;

#-------------------------------------------------------------------------------

listWidth = None # None = use default 74

listHeight = None # None = use default 15

#-------------------------------------------------------------------------------

# (optional, for reply) if True, the Reply operation prefills the reply's Cc

# with all original mail recipients, after removing duplicates and the new sender;

# if False, no CC prefill occurs, and the reply is configured to reply to the

# original sender only; the Cc line may always be edited later, in either case.

#-------------------------------------------------------------------------------

repliesCopyToAll = True # True=reply to sender + all recipients, else sender

#end

textConfig: Customizing Pop-Up PyEdit Windows

The prior section’s mailconfig module provides user settings for tailoring the PyEdit

component used to view and edit main mail text, but PyMailGUI also uses PyEdit to

display other kinds of pop-up text, including raw mail text, some text attachments, and

source code in its help system. To customize display for these pop ups, PyMailGUI

relies on PyEdit’s own utility, which attempts to load a module like that in Exam-

ple 14-10 from the client application’s own directory. By contrast, PyEdit’s Unicode

settings are loaded from the single textConfig module in its own package’s directory

since they are not expected to vary across a platform (see Chapter 11 for more details).

Example 14-10. PP4E\Internet\Email\PyMailGui\textConfig.py

"""

customize PyEdit pop-up windows other than the main mail text component;

this module (not its package) is assumed to be on the path for these settings;

PyEdit Unicode settings come from its own package's textConfig.py, not this;

"""

bg = 'beige' # absent=white; colorname or RGB hexstr

fg = 'black' # absent=black; e.g., 'beige', '#690f96'

1110 | Chapter 14: The PyMailGUI Client

# etc -- see PP4E\Gui\TextEditor\textConfig.py

# font = ('courier', 9, 'normal')

# height = 20 # Tk default: 24 lines

# width = 80 # Tk default: 80 characters

PyMailGUIHelp: User Help Text and Display

Finally, Example 14-11 lists the module that defines the text displayed in PyMailGUI’s

help pop up as one triple-quoted string, as well as a function for displaying the HTML

rendition of this text. The HTML version of help itself is in a separate file not listed in

full here but included in the book’s examples package.

In fact, I’ve omitted most of the help text string, too, to conserve space here (it spanned

11 pages in the prior edition, and would be longer in this one!). For the full story, see

this module in the examples package, or run PyMailGUI live and click the help bar at

the top of its main server list window to learn more about how PyMailGUI’s interface

operates. If fact, you probably should; the help display may explain some properties of

PyMailGUI not introduced by the demo and other material earlier in this chapter.

The HTML rendition of help includes section links, and is popped up in a web browser.

Because the text version also is able to pop up source files and minimizes external

dependencies (HTML fails if no browser can be located), both the text and HTML

versions are provided and selected by users in the mailconfig module. Other schemes

are possible (e.g., converting HTML to text by parsing as a fallback option), but they

are left as suggested improvements.

Example 14-11. PP4E\Internet\PyMailGui\PyMailGuiHelp.py (partial)

"""

##########################################################################

PyMailGUI help text string and HTML display function;

History: this display began as an info box pop up which had to be

narrow for Linux; it later grew to use scrolledtext with buttons

instead; it now also displays an HTML rendition in a web browser;

2.1/3E: the help string is stored in this separate module to avoid

distracting from executable code. As coded, we throw up this text

in a simple scrollable text box; in the future, we might instead

use an HTML file opened with a browser (use webbrowser module, or

run a "browser help.html" or DOS "start help.html" with os.system);

3.0/4E: the help text is now also popped up in a web browser in HTML

form, with lists, section links, and separators; see the HTML file

PyMailGuiHelp.html in the examples package for the simple HTML

translation of the help text string here, popped up in a browser;

both the scrolled text widget and HTML browser forms are currently

supported: change mailconfig.py to use the flavor(s) you prefer;

##########################################################################

"""

PyMailGUI Implementation | 1111

# new HTML help for 3.0/4E

helpfile = 'PyMailGuiHelp.html' # see book examples package

def showHtmlHelp(helpfile=helpfile):

"""

3.0: popup HTML version of help file in a local web browser via webbrowser;

this module is importable, but html file might not be in current working dir

"""

import os, webbrowser

mydir = os.path.dirname(__file__) # dir of this module's filename

mydir = os.path.abspath(mydir) # make absolute: may be .., etc

webbrowser.open_new('file://' + os.path.join(mydir, helpfile))

##########################################################################

# string for older text display: client responsible for GUI construction

##########################################################################

helptext = """PyMailGUI, version 3.0

May, 2010 (2.1 January, 2006)

Programming Python, 4th Edition

Mark Lutz, for O'Reilly Media, Inc.

PyMailGUI is a multiwindow interface for processing email, both online and

offline. Its main interfaces include one list window for the mail server,

zero or more list windows for mail save files, and multiple view windows for

composing or viewing emails selected in a list window. On startup, the main

(server) list window appears first, but no mail server connection is attempted

until a Load or message send request. All PyMailGUI windows may be resized,

which is especially useful in list windows to see additional columns.

Note: To use PyMailGUI to read and write email of your own, you must change

the POP and SMTP server names and login details in the file mailconfig.py,

located in PyMailGUI's source-code directory. See section 11 for details.

Contents:

0) VERSION ENHANCEMENTS

1) LIST WINDOW ACTIONS

2) VIEW WINDOW ACTIONS

3) OFFLINE PROCESSING

4) VIEWING TEXT AND ATTACHMENTS

5) SENDING TEXT AND ATTACHMENTS

6) MAIL TRANSFER OVERLAP

7) MAIL DELETION

8) INBOX MESSAGE NUMBER SYNCHRONIZATION

9) LOADING EMAIL

10) UNICODE AND INTERNATIONALIZATION SUPPORT

11) THE mailconfig CONFIGURATION MODULE

12) DEPENDENCIES

13) MISCELLANEOUS HINTS ("Cheat Sheet")

...rest of file omitted...

13) MISCELLANEOUS HINTS ("Cheat Sheet")

1112 | Chapter 14: The PyMailGUI Client

- Use ',' between multiple addresses in To, Cc, and Bcc headers.

- Addresses may be given in the full '"name" <addr>' form.

- Payloads and headers are decoded on fetches and encoded on sends.

- HTML mails show extracted plain text plus HTML in a web browser.

- To, Cc, and Bcc receive composed mail, but no Bcc header is sent.

- If enabled in mailconfig, Bcc is prefilled with sender address.

- Reply and Fwd automatically quote the original mail text.

- If enabled, replies prefill Cc with all original recipients.

- Attachments may be added for sends and are encoded as required.

- Attachments may be opened after View via Split or part buttons.

- Double-click a mail in the list index to view its raw text.

- Select multiple mails to process as a set: Ctrl|Shift + click, or All.

- Sent mails are saved to a file named in mailconfig: use Open to view.

- Save pops up a dialog for selecting a file to hold saved mails.

- Save always appends to the chosen save file, rather than erasing it.

- Split asks for a save directory; part buttons save in ./TempParts.

- Open and save dialogs always remember the prior directory.

- Use text editor's Save to save a draft of email text being composed.

- Passwords are requested if/when needed, and not stored by PyMailGUI.

- You may list your password in a file named in mailconfig.py.

- To print emails, "Save" to a text file and print with other tools.

- See the altconfigs directory for using with multiple email accounts.

- Emails are never deleted from the mail server automatically.

- Delete does not reload message headers, unless it fails.

- Delete checks your inbox to make sure it deletes the correct mail.

- Fetches detect inbox changes and may automatically reload the index.

- Any number of sends and disjoint fetches may overlap in time.

- Click this window's Source button to view PyMailGUI source-code files.

- Watch http://www.rmi.net/~lutz for updates and patches

- This is an Open Source system: change its code as you like.

"""

if __name__ == '__main__':

print(helptext) # to stdout if run alone

input('Press Enter key') # pause in DOS console pop ups

See the examples package for the HTML help file, the first few lines of which are shown

in Example 14-12; it’s a simple translation of the module’s help text string (adding a

bit more pizzazz to this page is left in the suggested exercise column).

Example 14-12. PP4E\Internet\PyMailGui\PyMailGuiHelp.html (partial)

<HTML>

<TITLE>PyMailGUI 3.0 Help</TITLE>

<!-- TO DO: add pictures, screen shots, and such --!>

<BODY>

<H1 align=center>PyMailGUI, Version 3.0</H1>

PyMailGUI Implementation | 1113

<B><I>May, 2010 (2.1 January, 2006)</I></B><BR>

<B><I>Programming Python, 4th Edition</I></B><BR>

<B><I>Mark Lutz, for O'Reilly Media, Inc.</I></B>

<P>

<I>PyMailGUI</I> is a multiwindow interface for processing email, both online and

...rest of file omitted...

altconfigs: Configuring for Multiple Accounts

Though not an “official” part of the system, I use a few additional short files to launch

and test it. If you have multiple email accounts, it can be inconvenient to change a

configuration file every time you want to open one in particular. Moreover, if you open

multiple PyMailGUI sessions for your accounts at the same time, it would be better if

they could use custom appearance and behavior schemes to make them distinct.

To address this, the altconfigs directory in the examples source directory provides a

simple way to select an account and configurations for it at start-up time. It defines a

new top-level script which tailors the module import search path, along with a mail

config that prompts for and loads a custom configuration module whose suffix is

named by console input. A launcher script is also provided to run without module

search path configurations—from PyGadgets or a desktop shortcut, for example, with-

out requiring PYTHONPATH settings for the PP4E root. Examples 14-13 through

14-17 list the files involved.

Example 14-13. PP4E\Internet\PyMailGui\altconfigs\PyMailGui.py

import sys # ..\PyMailGui.py or 'book' for book configs

sys.path.insert(1, '..') # add visibility for real dir

exec(open('../PyMailGui.py').read()) # do this, but get mailconfig here

Example 14-14. PP4E\Internet\PyMailGui\altconfigs\mailconfig.py

above = open('../mailconfig.py').read() # copy version above here (hack?)

open('mailconfig_book.py', 'w').write(above) # used for 'book' and as others' base

acct = input('Account name?') # book, rmi, train

exec('from mailconfig_%s import *' % acct) # . is first on sys.path

Example 14-15. PP4E\Internet\PyMailGui\altconfigs\mailconfig_rmi.py

from mailconfig_book import * # get base in . (copied from ..)

popservername = 'pop.rmi.net' # this is a big inbox: 4800 emails!

popusername = 'lutz'

myaddress = 'lutz@rmi.net'

listbg = 'navy'

listfg = 'white'

listHeight = 20 # higher initially

viewbg = '#dbbedc'

viewfg = 'black'

wrapsz = 80 # wrap at 80 cols

fetchlimit = 300 # load more headers

1114 | Chapter 14: The PyMailGUI Client

Example 14-16. PP4E\Internet\PyMailGui\altconfigs\mailconfig_train.py

from mailconfig_book import * # get base in . (copied from ..)

popusername = 'lutz@learning-python.com'

myaddress = 'lutz@learning-python.com'

listbg = 'wheat' # goldenrod, dark green, beige

listfg = 'navy' # chocolate, brown,...

viewbg = 'aquamarine'

viewfg = 'black'

wrapsz = 80

viewheaders = None # no Bcc

fetchlimit = 100 # load more headers

Example 14-17. PP4E\Internet\PyMailGui\altconfigs\launch_PyMailGui.py

# to run without PYTHONPATH setup (e.g., desktop)

import os # Launcher.py is overkill

os.environ['PYTHONPATH'] = r'..\..\..\..\..' # hmm; generalize me

os.system('PyMailGui.py') # inherits path env var

Account files like those in Examples 14-15 and 14-16 can import the base “book”

module (to extend it) or not (to replace it entirely). To use these alternative account

configurations, run a command line like the following or run the self-configuring

launcher script in Example 14-17 from any location. Either way, you can open these

account’s windows to view the included saved mails, but be sure to change configura-

tions for your own email accounts and preferences first if you wish to fetch or send mail

from these clients:

C:\...\PP4E\Internet\Email\PyMailGui\altconfigs> PyMailGui.py

Account name?rmi

Add a “start” to the beginning of this command to keep your console alive on Windows

so you can open multiple accounts (try a “&” at the end on Unix). Figure 14-45 earlier

shows the scene with all three of my accounts open in PyMailGUI. I keep them open

perpetually on my desktop, since a Load fetches just newly arrived headers no matter

how long the GUI may have sat dormant, and a Send requires nothing to be loaded at

all. While they’re open, the alternative color schemes make the accounts’ windows

distinct. A desktop shortcut to the launcher script makes opening my accounts even

easier.

As is, account names are only requested when this special PyMailGui.py file is run

directly, and not when the original file is run directly or by program launchers (in which

case there may be no stdin to read). Extending a module like mailconfig which might

be imported in multiple places this way turns out to be an interesting task (which is

largely why I don’t consider its quick solution here to be an official end-user feature).

For instance, there are other ways to allow for multiple accounts, including:

• Changing the single mailconfig module in-place

• Importing alternative modules and storing them as key “mailconfig” in sys.modules

PyMailGUI Implementation | 1115

• Copying alternative module variables to mailconfig attributes using __dict__ and

setattr

• Using a class for configuration to better support customization in subclasses

• Issuing a pop-up in the GUI to prompt for an account name after or before the

main window appears

And so on. The separate subdirectory scheme used here was chosen to minimize im-

pacts on existing code in general; to avoid changes to the existing mailconfig module

specifically (which works fine for the single account case); to avoid requiring extra user

input of any kind in single account cases; and to allow for the fact that an “import

module1 as module2” statement doesn’t prevent “module1” from being imported di-

rectly later. This last point is more fraught with peril than you might expect—importing

a customized version of a module is not merely a matter of using the “as” renaming

extension:

import m1 as m2 # custom import: load m1 as m2 alternative

print(m2.attr) # prints attr in m1.py

import m2 # later imports: loads m2.py anyhow!

print(m2.attr) # prints attr in m2.py

In other words, this is a quick-and-dirty solution that I originally wrote for testing

purposes, and it seems a prime candidate for improvement—along with the other ideas

in the next section’s chapter wrap up.

Ideas for Improvement

Although I use the 3.0 version of PyMailGUI as is on a regular basis for both personal

and business communications, there is always room for improvement to software, and

this system is no exception. If you wish to experiment with its code, here are a few

suggested projects to close out this chapter:

Column sorts and list layout

Mail list windows could be sorted by columns on demand. This may require a more

sophisticated list window structure which presents columns more distinctly. The

current display of mail lists seems like the most obvious candidate for cosmetic

upgrade in general, and any column sorting solution would likely address this as

well. tkinter extensions such as the Tix HList widget may show promise here, and

the third-party TkinterTreectrl supports multicolumn sortable listboxes, too, but

is available only for Python 2.X today; consult the Web and other resources for

pointers and details.

Mail save file (and sent file) size

The implementation of save-mail files limits their size by loading them into memory

all at once; a DBM keyed-access implementation may work around this constraint.

See the list windows module comments for ideas. This also applies to sent-mail

1116 | Chapter 14: The PyMailGUI Client

save files, though the user can limit their sizes with periodic deletions; users might

also benefit from a prompt for deletions if they grow too large.

Embedded links

Hyperlink URLs within messages could be highlighted visually and made to spawn

a web browser automatically when clicked by using the launcher tools we met in

the GUI and system parts of this book (tkinter’s text widget supports links directly).

Help text redundancy

In this version, the help text had grown so large that it is also implemented as

HTML and displayed in a web browser using Python’s webbrowser module (instead

of or in addition to text, per mailconfig settings). That means there are currently

two copies of the basic help text: simple text and HTML. This is less than ideal

from a maintenance perspective going forward.

We may want to either drop the simple text version altogether, or attempt to extract

the simple text from the HTML with Python’s html.parser module to avoid re-

dundant copies; see Chapter 19 for more on HTML parsing in general, and see

PyMailGUI’s new html2text module for a plain-text extraction tool prototype. The

HTML help version also does not include links to display source files; these could

be inserted into the HTML automatically with string formatting, though it’s not

clear what all browsers will do with Python source code (some may try to run it).

More threading contexts

Message Save and Split file writes could also be threaded for worst-case scenarios.

For pointers on making Saves parallel, see the comments in the file class of List

Windows.py; there may be some subtle issues that require both thread locks and

general file locking for potentially concurrent updates. List window index fills

might also be threaded for pathologically large mailboxes and woefully slow ma-

chines (optimizing to avoid reparsing headers may help here, too).

Attachment list deletes

There is currently no way to delete an attachment once it has been added in com-

pose windows. This might be supported by adding quick-access part buttons to

compose windows, too, which could verify and delete the part when clicked.

Spam filtering

We could add an automatic spam filter for mails fetched, in addition to any pro-

vided at the email server or ISP. The Python-based SpamBayes might help. This is

often better implemented by servers than clients, but not all ISPs filter spam.

Improve multiple account usage

Per the prior section, the current system selects one of multiple email accounts and

uses its corresponding mail configuration module by running special code in the

altconfigs subdirectory. This works for a book example, but it would be fairly

straightforward to improve for broader audiences.

Ideas for Improvement | 1117

Increased visibility for sent file

We may want to add an explicit button for opening the sent-mails file. PyMailGUI

already does save sent messages to a text file automatically, which may be opened

currently with the list window’s Open button. Frankly, though, this feature may

be a too-well-kept secret—I forgot about it myself when I revisited the program

for this edition! It might also be useful to allow sent-mail saves to be disabled in

mailconfig for users who might never delete from this file (it can grow large fairly

quickly; see the earlier prompt-for-deletion suggestion as well).

Thread queue speed tuning

As mentioned when describing version 3.0 changes, the thread queue has been

sped up by as much as a factor of 10 in this version to quicken initial header down-

loads. This is achieved both by running more than one callback per timer event

and scheduling timer events to occur twice as often as before. Checking the queue

too often, however, might increase CPU utilization beyond acceptable levels on

some machines. On my Windows laptop, this overhead is negligible (the program’s

CPU utilization is 0% when idle), but you may want to tune this if it’s significant

on your platform.

See the list windows code for speed settings, and threadtools.py in Chapter 10 for

the base code. In general, increasing the number of callbacks per event and de-

creasing timer frequency will decrease CPU drain without sacrificing responsive-

ness. (And if I had a nickel for every time I said that…)

Mailing lists

We could add support for mailing lists, allowing users to associate multiple email

addresses with a saved list name. On sends to a list name, the mail would be sent

to all on the list (the To addresses passed to smtplib), but the email list could be

used for the email’s To header line. See Chapter 13’s SMTP coverage for mailing

list–related examples.

HTML main text views and edits

PyMailGUI is still oriented toward supporting only plain text for the main text of

a message, despite the fact that some mailers today are more HTML-biased in this

regard. This partly stems from the fact that PyMailGUI uses a simple tkinter Text

widget for main text composition. PyMailGUI can display such messages’ HTML

in a popped-up web browser, and it attempts to extract text from the HTML for

display per the next note, but it doesn’t come with its own HTML editor. Fully

supporting HTML for main message text will likely require a tkinter extension (or,

regrettably, a port to another GUI toolkit with working support for this feature).

HTML parser honing

On a related note, as described earlier, this version includes a simple-minded

HTML parser, applied to extract text from HTML main (or only) text parts when

they are displayed or quoted in replies and forwards. As also mentioned earlier,

this parser is nowhere near complete or robust; for production-level quality, this

would have to be improved by testing over a large set of HTML emails. Better yet,

1118 | Chapter 14: The PyMailGUI Client

watch for a Python 3.X–compatible version of more robust and complete open

source alternatives, such as the html2text.py same-named third-party utility de-

scribed in this chapter’s earlier note. The open source BeautifulSoup system offers

another lenient and forgiving HTML parser, but is based on SGMLParser tools

available in 2.X only (removed in 3.X).

Text/HTML alternative mails

Also in the HTML department, there is presently no support for sending both text

and HTML versions of a mail as a MIME multipart/alternative message—a popular

scheme which supports both text- and HTML-based clients and allows users to

choose which to use. Such messages can be viewed (both parts are offered in the

GUI), but cannot be composed. Again, since there is no support for HTML editing

anyhow, this is a moot point; if such an editor is ever added, we’d need to support

this sort of mail structure in mailtools message object construction code and re-

factor parts of its current send logic so that it can be shared.

Internationalized headers throw list columns off

As is so often true in software, one feature added in this version broke another

already present: the fonts used for display of some non-ASCII Unicode header fields

is large enough to throw off the fixed-width columns in mail index list windows.

They rely on the assumption that N characters is always the same width among all

mails, and this is no longer true for some Chinese and other character set encodings.

This isn’t a showstopper—it only occurs when some i18n headers are displayed,

and simply means that “|” column separators are askew for such mails only, but

could still be addressed. The fix here is probably to move to a more sophisticated

list display, and might be resolved as a side effect of allowing for the column sorts

described earlier.

Address books

PyMailGUI has no notion of automatically filling in an email address from an ad-

dress book, as many modern email clients do. Adding this would be an interesting

extension; low-level keyboard event binding may allow matching as addresses are

typed, and Python’s pickle and shelve modules of Chapters 1 and 17 might come

in handy for data storage.

Spelling checker

There is currently no spelling checker of the sort most email programs have today.

This could be added in PyMailGUI, but it would probably be more appropriate to

add it in the PyEdit text edit component/program that it uses, so the spell-checking

would be inherited by all PyEdit clients. A quick web search reveals a variety of

options, including the interesting PyEnchant third-party package, none of which

we have space to explore here.

Mail searches

Similarly, there is no support for searching emails’ content (headers or bodies) for

a given string. It’s not clear how this should be provided given that the system

fetches and caches just message headers until a mail is requested, but searching

Ideas for Improvement | 1119

large inboxes can be convenient. As is, this can be performed manually by running

a Save to store fetched mails in a text file and searching in that file externally.

Frozen binary distribution

As a desktop program, PyMailGUI seems an ideal candidate for packing as a self-

contained frozen binary executable, using tools such as PyInstaller, Py2Exe, and

others. When distributed this way, users need not install Python, since the Python

runtime is embedded in the executable.

Selecting Reply versus Reply-All in the GUI

As described in the 3.0 changes overview earlier, in this version, Reply by default

now copies all the original mail’s recipients by prefilling the Cc line, in addition to

replying to the original sender. This Cc feature can be turned off in mailconfig

because it may not be desirable in all cases. Ideally, though, this should be select-

able in the GUI on a mail-by-mail basis, not per session. Adding another button to

list windows for ReplyAll would suffice; since this feature was added too late in

this project for GUI changes, though, this will have to be relegated to the domain

of suggested exercise.

Propagating attachments?

When replying to or forwarding an email, PyMailGUI discards any attachments

on the original message. This is by design, partly because there is currently no way

to delete attached parts in the GUI prior to sending (you couldn’t remove selectively

and couldn’t remove all), and partly because this system’s current sole user prefers

to work this way.

Users can work around this by running a Split to save all parts in a directory, and

then adding any desired attachments to the mail from there. Still, it might be better

to allow the user to choose that this happen automatically for replies and forwards.

Similarly, forwarding HTML mails well currently requires saving and attaching the

HTML part to avoid quoting the text; this might be similarly addressed by parts

propagation in general.

Disable editing for viewed mails?

Mail text is editable in message view windows, even though a new mail is not being

composed. This is deliberate—users can annotate the message’s text and save it in

a text file with the Save button at the bottom of the window, or simply cut-and-

paste portions of it into other windows. This might be confusing, though, and is

redundant (we can also edit and save by clicking on the main text’s quick-access

part button). Removing edit tools would require extending PyEdit. Using PyEdit

for display in general is a useful design—users also have access to all of PyEdit’s

tools for the mail text, including save, find, goto, grep, replace, undo/redo, and so,

though edits might be superfluous in this context.

Automatic periodic new mail check?

It would be straightforward to add the ability to automatically check for and fetch

new incoming email periodically, by registering long-duration timer events with

either the after widget method or the threading module’s timer object. I haven’t

1120 | Chapter 14: The PyMailGUI Client

done so because I have a personal bias against being surprised by software, but

your mileage may vary.

Reply and Forward buttons on view windows, too?

Minor potential ergonomic improvement: we could include Reply and Forward

buttons on the message view windows, too, instead of requiring these operations

to be selected in mail list windows only. As this system’s sole user, I prefer the

uncluttered appearance and conceptual simplicity of the current latter approach;

GUIs have a way of getting out of hand when persistent pop-up windows start

nesting too deeply. It would be trivial to have Reply/Forward on view windows,

too, though; they could probably fetch mail components straight from the GUI

instead of reparsing a message.

Omit Bcc header in view windows?

Minor nit: mail view windows may be better off omitting the Bcc header even if

it’s enabled in the configuration file. Since it shouldn’t be present once a mail is

sent, it really needs to be included in composition windows only. It’s displayed as

is anyhow, to verify that Bcc is omitted on sends (the prior edition did not), to

maintain a uniform look for all mail windows, to avoid special-casing this in the

code, and to avoid making such ergonomic decisions in the absence of actual user

feedback.

Check for empty Subject lines?

Minor usability issue: it would be straightforward to add a check for an empty

Subject field on sends and to pop up a verification dialog to give the user a second

chance to fill the field in. A blank subject is probably unintended. We could do the

same for the To field as well, though there may be valid use cases for omitting this

from mail headers (the mail is still sent to Cc and Bcc recipients).

Removing duplicate recipients more accurately?

As is, the send operation attempts to remove duplicate recipients using set opera-

tions. This works, but it may be inaccurate if the same email address appears twice

with a different name component (e.g., “name1 <eml>, name2 <eml>”). To do

better, we could fully parse the recipient addresses to extract and compare just the

address portion of the full email address. Arguably, though, it’s not clear what

should be done if the same recipient address appears with different names. Could

multiple people be using the same email account? If not, which name should we

choose to use?

For now, end user or mail server intervention may be required in the rare cases

where this might crop up. In most cases, other email clients will likely handle names

in consistent ways that make this a moot point. On related notes, Reply removes

duplicates in Cc prefills in the same simplistic way, and both sends and replies

could use case-insensitive string comparisons when filtering for duplicates.

Handling newsgroup messages, too?

Because Internet newsgroup posts are similar in structure to emails (header lines

plus body text; see the nntplib example in Chapter 13), this script could in principle

Ideas for Improvement | 1121

be extended to display both email messages and news articles. Classifying such a

mutation as clever generalization or diabolical hack is left as an exercise in itself.

SMTP sends may not work in some network configurations?

On my home/office network, SMTP works fine and as shown for sending emails,

but I have occasionally seen sends have issues on public networks of the sort avail-

able in hotels and airports. In some cases, mail sends can fail with exceptions and

error messages in the GUI; in worst cases, such sends might fail with no exception

at all and without reporting an error in the GUI. The mail simply goes nowhere,

which is obviously less than ideal if its content matters.

It’s not clear if these issues are related to limitations of the networks used, of Py-

thon’s smtplib, or of the ISP-provided SMTP server I use. Unfortunately, I ran out

of time to recreate the problem and investigate further (again, a system with a single

user also has just a single tester).

Resolving any such issues is left as an exercise for the reader, but as a caution: if

you wish to use the system to send important emails, you should first test sends in

a new network environment to ensure that they will be routed correctly. Sending

an email to yourself and verifying receipt should suffice.

Performance tuning?

Almost all of the work done on this system to date has been related to its func-

tionality. The system does allow some operation threads to run in parallel, and

optimizes mail downloads by fetching just headers initially and caching already-

fetched full mail text to avoid refetching. Apart from this, though, its performance

in terms of CPU utilization and memory requirements has not been explored in

any meaningful way at all. That’s for the best—in general we code for utility and

clarity first in Python, and deal with performance later if and only if needed. Having

said that, a broader audience for this program might mandate some performance

analysis and improvement.

For example, although the full text of fetched mails is kept just once in a cache,

each open view of a message retains a copy of the parsed mail in memory. For large

mails, this may impact memory growth. Caching parsed mails as well might help

decrease memory footprints, though these will still not be small for large mails,

and the cache might hold onto memory longer than required if not intelligently

designed. Storing messages or their parts in files (perhaps as pickled objects) in-

stead of in memory might alleviate some growth, too, though that may also require

a mechanism for reaping temporary files. As is, Python’s garbage collector should

reclaim all such message space eventually as windows are closed, but this can de-

pend upon how and where we retain object references. See also the gc standard

library modules for possible pointers on finer-grained garbage collection control.

Unicode model tuning?

As discussed in brief at the start of this chapter and in full in Chapter 13, PyMail-

GUI’s support for Unicode encoding of message text and header components is

broad, but not necessarily as general or universally applicable as it might be. Some

1122 | Chapter 14: The PyMailGUI Client

Unicode limitations here stem from the limitations of the email package in

Python 3.1 upon which PyMailGUI heavily depends. It may be difficult for Python-

coded email clients to support some features better until Python’s libraries do, too.

Moreover, the Unicode support that is present in this program has been tested

neither widely nor rigorously. Just like Chapter 11’s PyEdit, this is currently still a

single-user system designed to work as a book example, not an open source project.

Because of that, some of the current Unicode policies are partially heuristic in

nature and may have to be honed with time and practice.

For example, it may prove better in the end to use UTF-8 encoding (or none at all)

for sends in general, instead of supporting some of the many user options which

are included in this book for illustration purposes. Since UTF-8 can represent most

Unicode code points, it’s broadly applicable.

More subtly, we might also consider propagating the main text part’s Unicode

encoding to the embedded PyEdit component in view and edit windows, so it can

be used as a known encoding by the PyEdit Save button. As is, users can pop up

the main text’s part in view windows to save with a known encoding automatically,

but saves of drafts for mails being edited fall back on PyEdit’s own Unicode policies

and GUI prompts. The ambiguous encoding for saved drafts may be unavoidable,

though—users might enter characters from any character set, both while writing

new mails from scratch and while editing the text of replies and forwards (just like

headers in replies and forwards, the initial known encoding of the original main

text part may no longer apply after arbitrary edits).

In addition, there is no support for non-ASCII encodings of full mail text, it’s not

impossible that i18n encoded text might appear in other contexts in rare emails

(e.g., in attachment filenames, whose undecoded form may or may not be valid on

the receiving platform’s filesystem, and may require renaming if allowed at all),

and although Internationalization is supported for mail content, the GUI itself still

uses English for its buttons, labels, and titles—something that a truly location-

neutral program may wish to address.

In other words, if this program were to ever take the leap to commercial-grade or

broadly used software, its Unicode story would probably have to be revisited. Also

discussed in Chapter 13, a future release of the email package may solve some

Unicode issues automatically, though PyMailGUI may also require updates for the

solutions, as well as for incompatibilities introduced by them. For now, this will

have to stand as a useful object lesson in itself: for both better and worse, such

changes will always be a fact of life in the constantly evolving world of software

development.

And so on—because this software is open source, it is also necessarily open-ended.

Ultimately, writing a complete email client is a substantial undertaking, and we’ve

taken this example as far as we can in this book. To move PyMailGUI further along,

we’d probably have to consider the suitability of both the underlying Python 3.1

Ideas for Improvement | 1123

email package, as well as the tkinter GUI toolkit. Both are fully sufficient for the utility

we’ve implemented here, but they might limit further progress.

For example, the current lack of an HTML viewer widget in the base tkinter toolkit

precludes HTML mail viewing and composition in the GUI itself. Moreover, although

PyMailGUI broadly supports Internationalization today, it must rely on workarounds

to get email to work at all. To be fair, some of the email package’s issues described in

this book will likely be fixed by the time you read about them, and email in general is

probably close to a worst case for Internationalization issues brought into the spotlight

by Unicode prominence in Python 3.X. Still, such tool constraints might impede further

system evolution.

On the other hand, despite any limitations in the tools it deploys, PyMailGUI does

achieve all its goals—it’s an arguably full-featured and remarkably quick desktop email

client, which works surprisingly well for my emails and preferences and performs ad-

mirably on the cases I’ve tested to date. It may not satisfy your tastes or constraints,

but it is open to customization and imitation. Suggested exercises and further tweaking

are therefore officially delegated to your imagination; this is Python, after all.

This concludes our tour of Python client-side protocols programming. In the next

chapter, we’ll hop the fence to the other side of the Internet world and explore scripts

that run on server machines. Such programs give rise to the grander notion of applica-

tions that live entirely on the Web and are launched by web browsers. As we take this

leap in structure, keep in mind that the tools we met in this and the previous chapter

are often sufficient to implement all the distributed processing that many applications

require, and they can work in harmony with scripts that run on a server. To completely

understand the Web world view, though, we need to explore the server realm, too.

1124 | Chapter 14: The PyMailGUI Client

CHAPTER 15

Server-Side Scripting

“Oh, What a Tangled Web We Weave”

This chapter is the fourth part of our look at Python Internet programming. In the last

three chapters, we explored sockets and basic client-side programming interfaces such

as FTP and email. In this chapter, our main focus will be on writing server-side scripts

in Python—a type of program usually referred to as CGI scripts. Though something of

a lowest common denominator for web development today, such scripts still provide

a simple way to get started with implementing interactive websites in Python.

Server-side scripting and its derivatives are at the heart of much of the interaction that

happens on the Web. This is true both when scripting manually with CGI and when

using the higher-level frameworks that automate some of the work. Because of that,

the fundamental web model we’ll explore here in the context of CGI scripting is pre-

requisite knowledge for programming the Web well, regardless of the tools you choose

to deploy.

As we’ll see, Python is an ideal language for writing scripts to implement and customize

websites, because of both its ease of use and its library support. In the following chapter,

we will use the basics we learn in this chapter to implement a full-blown website. Here,

our goal is to understand the fundamentals of server-side scripting, before exploring

systems that deploy or build upon that basic model.

A House upon the Sand

As you read the next two chapters of this book, please keep in mind that they focus on

the fundamentals of server-side scripting and are intended only as an introduction to

programming in this domain with Python. The web domain is large and complex,

changes rapidly and constantly, and often prescribes many ways to accomplish a given

goal—some of which can vary from browser to browser and server to server.

For instance, the password encryption scheme of the next chapter may be unnecessary

under certain scenarios (with a suitable server, we could use secure HTTP instead).

Moreover, some of the HTML we’ll use here may not leverage all of that language’s

1125

power, and may even not conform to current HTML standards. In fact, much of the

material added in later editions of this book reflects recent technology shifts in this

domain.

Given such a large and dynamic field, this part of the book does not even pretend to

be a complete look at the server-side scripting domain. That is, you should not take

this text to be a final word on the subject. To become truly proficient in this area, you

should also expect to spend some time studying other texts for additional

webmaster-y details and techniques—for example, Chuck Musciano and Bill

Kennedy’s HTML & XHTML: The Definitive Guide (O’Reilly).

The good news is that here you will explore the core ideas behind server-side program-

ming, meet Python’s CGI tool set, and learn enough to start writing substantial websites

of your own in Python. This knowledge should apply to wherever the Web or you head

next.

What’s a Server-Side CGI Script?

Simply put, CGI scripts implement much of the interaction you typically experience

on the Web. They are a standard and widely used mechanism for programming web-

based systems and website interaction, and they underlie most of the larger web

development models.

There are other ways to add interactive behavior to websites with Python, both on the

client and the server. We briefly met some such alternatives near the start of Chap-

ter 12. For instance, client-side solutions include Jython applets, RIAs such as Silver-

light and pyjamas, Active Scripting on Windows, and the emerging HTML 5 standard.

On the server side, there are a variety of additional technologies that build on the basic

CGI model, such as Python Server Pages, and web frameworks such as Django, App

Engine, CherryPy, and Zope, many of which utilize the MVC programming model.

By and large, though, CGI server-side scripts are used to program much of the activity

on the Web, whether it’s programmed directly or partly automated by frameworks and

tools. CGI scripting is perhaps the most primitive approach to implementing websites,

and it does not by itself offer the tools that are often built into larger frameworks such

as state retention, database interfaces, and reply templating. CGI scripts, however, are

in many ways the simplest technique for server-side scripting. As a result, they are an

ideal way to get started with programming on the server side of the Web. Especially for

simpler sites that do not require enterprise-level tools, CGI is sufficient, and it can be

augmented with additional libraries as needed.

The Script Behind the Curtain

Formally speaking, CGI scripts are programs that run on a server machine and adhere

to the Common Gateway Interface—a model for browser/server communications,

1126 | Chapter 15: Server-Side Scripting

from which CGI scripts take their name. CGI is an application protocol that web servers

use to transfer input data and results between web browsers and other clients and

server-side scripts. Perhaps a more useful way to understand CGI, though, is in terms

of the interaction it implies.

Most people take this interaction for granted when browsing the Web and pressing

buttons in web pages, but a lot is going on behind the scenes of every transaction on

the Web. From the perspective of a user, it’s a fairly familiar and simple process:

Submission

When you visit a website to search, purchase a product, or submit information

online, you generally fill in a form in your web browser, press a button to submit

your information, and begin waiting for a reply.

Response

Assuming all is well with both your Internet connection and the computer you are

contacting, you eventually get a reply in the form of a new web page. It may be a

simple acknowledgment (e.g., “Thanks for your order”) or a new form that must

be filled out and submitted again.

And, believe it or not, that simple model is what makes most of the Web hum. But

internally, it’s a bit more complex. In fact, a subtle client/server socket-based architec-

ture is at work—your web browser running on your computer is the client, and the

computer you contact over the Web is the server. Let’s examine the interaction scenario

again, with all the gory details that users usually never see:

Submission

When you fill out a form page in a web browser and press a submission button,

behind the scenes your web browser sends your information across the Internet to

the server machine specified as its receiver. The server machine is usually a remote

computer that lives somewhere else in both cyberspace and reality. It is named in

the URL accessed—the Internet address string that appears at the top of your

browser. The target server and file can be named in a URL you type explicitly, but

more typically they are specified in the HTML that defines the submission page

itself—either in a hyperlink or in the “action” tag of the input form’s HTML.

However the server is specified, the browser running on your computer ultimately

sends your information to the server as bytes over a socket, using techniques we

saw in the last three chapters. On the server machine, a program called an HTTP

server runs perpetually, listening on a socket for incoming connection requests and

data from browsers and other clients, usually on port number 80.

Processing

When your information shows up at the server machine, the HTTP server program

notices it first and decides how to handle the request. If the requested URL names

a simple web page (e.g., a URL ending in .html), the HTTP server opens the named

HTML file on the server machine and sends its text back to the browser over a

What’s a Server-Side CGI Script? | 1127

socket. On the client, the browser reads the HTML and uses it to construct the

next page you see.

But if the URL requested by the browser names an executable program instead (e.g.,

a URL ending in .cgi or .py), the HTTP server starts the named program on the

server machine to process the request and redirects the incoming browser data to

the spawned program’s stdin input stream, environment variables, and command-

line arguments. That program started by the server is usually a CGI script—a pro-

gram run on the remote server machine somewhere in cyberspace, usually not on

your computer. The CGI script is responsible for handling the request from this

point on; it may store your information in a database, perform a search, charge

your credit card, and so on.

Response

Ultimately, the CGI script prints HTML, along with a few header lines, to generate

a new response page in your browser. When a CGI script is started, the HTTP

server takes care to connect the script’s stdout standard output stream to a socket

that the browser is listening to. As a result, HTML code printed by the CGI script

is sent over the Internet, back to your browser, to produce a new page. The HTML

printed back by the CGI script works just as if it had been stored and read from an

HTML file; it can define a simple response page or a brand-new form coded to

collect additional information. Because it is generated by a script, it may include

information dynamically determined per request.

In other words, CGI scripts are something like callback handlers for requests generated

by web browsers that require a program to be run dynamically. They are automatically

run on the server machine in response to actions in a browser. Although CGI scripts

ultimately receive and send standard structured messages over sockets, CGI is more

like a higher-level procedural convention for sending and receiving information be-

tween a browser and a server.

Writing CGI Scripts in Python

If all of this sounds complicated, relax—Python, as well as the resident HTTP server,

automates most of the tricky bits. CGI scripts are written as fairly autonomous pro-

grams, and they assume that startup tasks have already been accomplished. The HTTP

web server program, not the CGI script, implements the server side of the HTTP pro-

tocol itself. Moreover, Python’s library modules automatically dissect information sent

up from the browser and give it to the CGI script in an easily digested form. The upshot

is that CGI scripts may focus on application details like processing input data and

producing a result page.

As mentioned earlier, in the context of CGI scripts, the stdin and stdout streams are

automatically tied to sockets connected to the browser. In addition, the HTTP server

passes some browser information to the CGI script in the form of shell environment

variables, and possibly command-line arguments. To CGI programmers, that means:

1128 | Chapter 15: Server-Side Scripting

•Input data sent from the browser to the server shows up as a stream of bytes in the

stdin input stream, along with shell environment variables.

•Output is sent back from the server to the client by simply printing properly for-

matted HTML to the stdout output stream.

The most complex parts of this scheme include parsing all the input information sent

up from the browser and formatting information in the reply sent back. Happily, Py-

thon’s standard library largely automates both tasks:

Input

With the Python cgi module, input typed into a web browser form or appended

to a URL string shows up as values in a dictionary-like object in Python CGI scripts.

Python parses the data itself and gives us an object with one key : value pair per

input sent by the browser that is fully independent of transmission style (roughly,

by fill-in form or by direct URL).

Output

The cgi module also has tools for automatically escaping strings so that they are

legal to use in HTML (e.g., replacing embedded <, >, and & characters with HTML

escape codes). Module urllib.parse provides additional tools for formatting text

inserted into generated URL strings (e.g., adding %XX and + escapes).

We’ll study both of these interfaces in detail later in this chapter. For now, keep in mind

that although any language can be used to write CGI scripts, Python’s standard modules

and language attributes make it a snap.

Perhaps less happily, CGI scripts are also intimately tied to the syntax of HTML, since

they must generate it to create a reply page. In fact, it can be said that Python CGI

scripts embed HTML, which is an entirely distinct language in its own right.* As we’ll

also see, the fact that CGI scripts create a user interface by printing HTML syntax means

that we have to take special care with the text we insert into a web page’s code (e.g.,

escaping HTML operators). Worse, CGI scripts require at least a cursory knowledge

of HTML forms, since that is where the inputs and target script’s address are typically

specified.

This book won’t teach HTML in depth; if you find yourself puzzled by some of the

arcane syntax of the HTML generated by scripts here, you should glance at an HTML

introduction, such as HTML & XHTML: The Definitive Guide. Also keep in mind that

higher-level tools and frameworks can sometimes hide the details of HTML generation

from Python programmers, albeit at the cost of any new complexity inherent in the

* Interestingly, in Chapter 12 we briefly introduced other systems that take the opposite route—embedding

Python code or calls in HTML. The server-side templating languages in Zope, PSP, and other web frameworks

use this model, running the embedded Python code to produce part of a reply page. Because Python is

embedded, these systems must run special servers to evaluate the embedded tags. Because Python CGI scripts

embed HTML in Python instead, they can be run as standalone programs directly, though they must be

launched by a CGI-capable web server.

What’s a Server-Side CGI Script? | 1129

framework itself. With HTMLgen and similar packages, for instance, it’s possible to

deal in Python objects, not HTML syntax, though you must learn this system’s API as

well.

Running Server-Side Examples

Like GUIs, web-based systems are highly interactive, and the best way to get a feel for

some of these examples is to test-drive them live. Before we get into some code, let’s

get set up to run the examples we’re going to see.

Running CGI-based programs requires three pieces of software:

• The client, to submit requests: a browser or script

• The web server that receives the request

• The CGI script, which is run by the server to process the request

We’ll be writing CGI scripts as we move along, and any web browser can be used as a

client (e.g., Firefox, Safari, Chrome, or Internet Explorer). As we’ll see later, Python’s

urllib.request module can also serve as a web client in scripts we write. The only

missing piece here is the intermediate web server.

Web Server Options

There are a variety of approaches to running web servers. For example, the open source

Apache system provides a complete, production-grade web server, and its mod_python

extension discussed later runs Python scripts quickly. Provided you are willing to install

and configure it, it is a complete solution, which you can run on a machine of your

own. Apache usage is beyond our present scope here, though.

If you have access to an account on a web server machine that runs Python 3.X, you

can also install the HTML and script files we’ll see there. For the second edition of this

book, for instance, all the web examples were uploaded to an account I had on the

“starship” Python server, and were accessed with URLs of this form:

http://starship.python.net/~lutz/PyInternetDemos.html

If you go this route, replace starship.python.net/~lutz with the names of your own

server and account directory path. The downside of using a remote server account is

that changing code is more involved—you will have to either work on the server ma-

chine itself or transfer code back and forth on changes. Moreover, you need access to

such a server in the first place, and server configuration details can vary widely. On the

starship machine, for example, Python CGI scripts were required to have a .cgi filename

extension, executable permission, and the Unix #! line at the top to point the shell to

Python.

Finding a server that supports Python 3.X used by this book’s examples might prove a

stumbling block for some time to come as well; neither of my own ISPs had it installed

1130 | Chapter 15: Server-Side Scripting

when I wrote this chapter in mid-2010, though it’s possible to find commercial ISPs

today that do. Naturally, this may change over time.

Running a Local Web Server

To keep things simple, this edition is taking a different approach. All the examples will

be run using a simple web server coded in Python itself. Moreover, the web server will

be run on the same local machine as the web browser client. This way, all you have to

do to run the server-side examples is start the web server script and use “localhost” as

the server name in all the URLs you will submit or code (see Chapter 12 if you’ve

forgotten why this name means the local machine). For example, to view a web page,

use a URL of this form in the address field of your web browser:

http://localhost/tutor0.html

This also avoids some of the complexity of per-server differences, and it makes changing

the code simple—it can be edited on the local machine directly.

For this book’s examples, we’ll use the web server in Example 15-1. This is essentially

the same script introduced in Chapter 1, augmented slightly to allow the working di-

rectory and port number to be passed in as command-line arguments (we’ll also run

this in the root directory of a larger example in the next chapter). We won’t go into

details on all the modules and classes Example 15-1 uses here; see the Python library

manual. But as described in Chapter 1, this script implements an HTTP web server,

which:

• Listens for incoming socket requests from clients on the machine it is run on and

the port number specified in the script or command line (which defaults to 80, that

standard HTTP port)

• Serves up HTML pages from the webdir directory specified in the script or com-

mand line (which defaults to the directory it is launched from)

• Runs Python CGI scripts that are located in the cgi-bin (or htbin) subdirectory of

the webdir directory, with a .py filename extension

See Chapter 1 for additional background on this web server’s operation.

Example 15-1. PP4E\Internet\Web\webserver.py

"""

Implement an HTTP web server in Python which knows how to serve HTML

pages and run server-side CGI scripts coded in Python; this is not

a production-grade server (e.g., no HTTPS, slow script launch/run on

some platforms), but suffices for testing, especially on localhost;

Serves files and scripts from the current working dir and port 80 by

default, unless these options are specified in command-line arguments;

Python CGI scripts must be stored in webdir\cgi-bin or webdir\htbin;

more than one of this server may be running on the same machine to serve

from different directories, as long as they listen on different ports;

Running Server-Side Examples | 1131

"""

import os, sys

from http.server import HTTPServer, CGIHTTPRequestHandler

webdir = '.' # where your HTML files and cgi-bin script directory live

port = 80 # http://servername/ if 80, else use http://servername:xxxx/

if len(sys.argv) > 1: webdir = sys.argv[1] # command-line args

if len(sys.argv) > 2: port = int(sys.argv[2]) # else default ., 80

print('webdir "%s", port %s' % (webdir, port))

os.chdir(webdir) # run in HTML root dir

srvraddr = ('', port) # my hostname, portnumber

srvrobj = HTTPServer(srvraddr, CGIHTTPRequestHandler)

srvrobj.serve_forever() # serve clients till exit

To start the server to run this chapter’s examples, simply run this script from the di-

rectory the script’s file is located in, with no command-line arguments. For instance,

from a DOS command line:

C:\...\PP4E\Internet\Web> webserver.py

webdir ".", port 80

On Windows, you can simply click its icon and keep the console window open, or

launch it from a DOS command prompt. On Unix it can be run from a command line

in the background, or in its own terminal window. Some platforms may also require

you to have administrator privileges to run servers on reserved ports, such as the Web’s

port 80; if this includes your machine, either run the server with the required permis-

sions, or run on an alternate port number (more on port numbers later in this chapter).

By default, while running locally this way, the script serves up HTML pages requested

on “localhost” from the directory it lives in or is launched from, and runs Python CGI

scripts from the cgi-bin subdirectory located there; change its webdir variable or pass

in a command-line argument to point it to a different directory. Because of this struc-

ture, in the examples distribution HTML files are in the same directory as the web server

script and CGI scripts are located in the cgi-bin subdirectory. In other words, to visit

web pages and run scripts, we’ll be using URLs of these forms, respectively:

http://localhost/somepage.html

http://localhost/cgi-bin/somescript.py

Both map to the directory that contains the web server script (PP4E\Internet\Web) by

default. Again, to run the examples on a different server machine of your own, simply

replace the “localhost” and “localhost/cgi-bin” parts of these addresses with your server

name and directory path details (more on URLs later in this chapter); with this address

change the examples work the same, but requests are routed across a network to the

server, instead of being routed between programs running on the same local machine.

The server in Example 15-1 is by no means a production-grade web server, but it can

be used to experiment with this book’s examples and is viable as a way to test your CGI

1132 | Chapter 15: Server-Side Scripting

scripts locally with server name “localhost” before deploying them on a real remote

server. If you wish to install and run the examples under a different web server, you’ll

want to extrapolate the examples for your context. Things like server names and path-

names in URLs, as well as CGI script filename extensions and other conventions, can

vary widely; consult your server’s documentation for more details. For this chapter and

the next, we’ll assume that you have the webserver.py script running locally.

The Server-Side Examples Root Page

To confirm that you are set up to run the examples, start the web server script in

Example 15-1 and type the following URL in the address field at the top of your web

browser:

http://localhost/PyInternetDemos.html

This address loads a launcher page with links to this chapter’s example files (see the

examples distribution for this page’s HTML source code, which is not listed in this

book). The launcher page itself appears as in Figure 15-1, shown displayed in the In-

ternet Explorer web browser on Windows 7 (it looks similar on other browsers and

platforms). Each major example has a link on this page, which runs when clicked.

Figure 15-1. The PyInternetDemos launcher page

Running Server-Side Examples | 1133

It’s possible to open some of the examples by clicking on their HTML file directly in

your system’s file explorer GUI. However, the CGI scripts ultimately invoked by some

of the example links must be run by a web server. If you click to browse such pages

directly, your browser will likely display the scripts’ source code, instead of running it.

To run scripts, too, be sure to open the HTML pages by typing their “localhost” URL

address into your browser’s address field.

Eventually, you probably will want to start using a more powerful web server, so we

will study additional CGI installation details later in this chapter. You may also wish

to review our prior exploration of custom server options in Chapter 12 (Apache and

mod_python are a popular option). Such details can be safely skipped or skimmed if

you will not be installing on another server right away. For now, we’ll run locally.

Viewing Server-Side Examples and Output

The source code of examples in this part of the book is listed in the text and included

in the book’s examples distribution package. In all cases, if you wish to view the source

code of an HTML file, or the HTML generated by a Python CGI script, you can also

simply select your browser’s View Source menu option while the corresponding web

page is displayed.

Keep in mind, though, that your browser’s View Source option lets you see the out-

put of a server-side script after it has run, but not the source code of the script itself.

There is no automatic way to view the Python source code of the CGI scripts themselves,

short of finding them in this book or in its examples distribution.

To address this issue, later in this chapter we’ll also write a CGI-based program

called getfile, which allows the source code of any file on this book’s website (HTML,

CGI script, and so on) to be downloaded and viewed. Simply type the desired file’s

name into a web page form referenced by the getfile.html link on the Internet demos

launcher page of Figure 15-1, or add it to the end of an explicitly typed URL as a

parameter like the following; replace tutor5.py at the end with the name of the script

whose code you wish to view, and omit the cgi-bin component at the end to view HTML

files instead:

http://localhost/cgi-bin/getfile.py?filename=cgi-bin\tutor5.py

In response, the server will ship back the text of the named file to your browser. This

process requires explicit interface steps, though, and much more knowledge of URLs

than we’ve gained thus far; to learn how and why this magic line works, let’s move on

to the next section.

1134 | Chapter 15: Server-Side Scripting

Climbing the CGI Learning Curve

Now that we’ve looked at setup issues, it’s time to get into concrete programming

details. This section is a tutorial that introduces CGI coding one step at a time—from

simple, noninteractive scripts to larger programs that utilize all the common web page

user input devices (what we called widgets in the tkinter GUI chapters in Part III).

Along the way, we’ll also explore the core ideas behind server-side scripting. We’ll move

slowly at first, to learn all the basics; the next chapter will use the ideas presented here

to build up larger and more realistic website examples. For now, let’s work through a

simple CGI tutorial, with just enough HTML thrown in to write basic server-side

scripts.

A First Web Page

As mentioned, CGI scripts are intimately bound up with HTML, so let’s start with a

simple HTML page. The file tutor0.html, shown in Example 15-2, defines a bona fide,

fully functional web page—a text file containing HTML code, which specifies the

structure and contents of a simple web page.

Example 15-2. PP4E\Internet\Web\tutor0.html

<HTML>

<BODY>

<H1>A First HTML Page</H1>

<P>Hello, HTML World!</P>

</BODY></HTML>

If you point your favorite web browser to the Internet address of this file, you should

see a page like that shown in Figure 15-2. This figure shows the Internet Explorer

browser at work on the address http://localhost/tutor0.html (type this into your

browser’s address field), and it assumes that the local web server described in the prior

section is running; other browsers render the page similarly. Since this is a static HTML

file, you’ll get the same result if you simply click on the file’s icon on most platforms,

though its text won’t be delivered by the web server in this mode.

Climbing the CGI Learning Curve | 1135

Figure 15-2. A simple web page from an HTML file

To truly understand how this little file does its work, you need to know something

about HTML syntax, Internet addresses, and file permission rules. Let’s take a quick

first look at each of these topics before we move on to the next example.

HTML basics

I promised that I wouldn’t teach much HTML in this book, but you need to know

enough to make sense of examples. In short, HTML is a descriptive markup language,

based on tags— items enclosed in <> pairs. Some tags stand alone (e.g., <HR> specifies

a horizontal rule). Others appear in begin/end pairs in which the end tag includes an

extra slash.

For instance, to specify the text of a level-one header line, we write HTML code of the

form <H1> text </H1>; the text between the tags shows up on the web page. Some tags

also allow us to specify options (sometimes called attributes). For example, a tag pair

like <A href=" address ">text</A> specifies a hyperlink: pressing the link’s text in the

page directs the browser to access the Internet address (URL) listed in the href option.

It’s important to keep in mind that HTML is used only to describe pages: your web

browser reads it and translates its description to a web page with headers, paragraphs,

links, and the like. Notably absent are both layout information—the browser is re-

sponsible for arranging components on the page—and syntax for programming logic—

there are no if statements, loops, and so on. Also, Python code is nowhere to be found

in Example 15-2; raw HTML is strictly for defining pages, not for coding programs or

specifying all user interface details.

HTML’s lack of user interface control and programmability is both a strength and a

weakness. It’s well suited to describing pages and simple user interfaces at a high level.

The browser, not you, handles physically laying out the page on your screen. On the

other hand, HTML by itself does not directly support full-blown GUIs and requires us

to introduce CGI scripts (or other technologies such as RIAs) to websites in order to

add dynamic programmability to otherwise static HTML.

1136 | Chapter 15: Server-Side Scripting

Internet addresses (URLs)

Once you write an HTML file, you need to put it somewhere a web browser can ref-

erence it. If you are using the locally running Python web server described earlier, this

becomes trivial: use a URL of the form http://localhost/file.html to access web pages,

and http://localhost/cgi-bin/file.py to name CGI scripts. This is implied by the fact that

the web server script by default serves pages and scripts from the directory in which it

is run.

On other servers, URLs may be more complex. Like all HTML files, tutor0.html must

be stored in a directory on the server machine, from which the resident web server

program allows browsers to fetch pages. For example, on the server used for the second

edition of this book, the page’s file must be stored in or below the public_html directory

of my personal home directory—that is, somewhere in the directory tree rooted

at /home/lutz/public_html. The complete Unix pathname of this file on the server is:

/home/lutz/public_html/tutor0.html

This path is different from its PP4E\Internet\Web location in the book’s examples dis-

tribution, as given in the example file listing’s title. When referencing this file on the

client, though, you must specify its Internet address, sometimes called a URL, instead

of a directory path name. The following URL was used to load the remote page from

the server:

http://starship.python.net/~lutz/tutor0.html

The remote server maps this URL to the Unix pathname automatically, in much the

same way that the http://localhost resolves to the examples directory containing the web

server script for our locally-running server. In general, URL strings like the one just

listed are composed as the concatenation of multiple parts:

Protocol name: http

The protocol part of this URL tells the browser to communicate with the HTTP

(i.e., web) server program on the server machine, using the HTTP message proto-

col. URLs used in browsers can also name different protocols—for example,

ftp:// to reference a file managed by the FTP protocol and server, file:// to reference

a file on the local machine, telnet to start a Telnet client session, and so on.

Server machine name and port: starship.python.net

A URL also names the target server machine’s domain name or Internet Protocol

(IP) address following the protocol type. Here, we list the domain name of the

server machine where the examples are installed; the machine name listed is used

to open a socket to talk to the server. As usual, a machine name of localhost (or the

equivalent IP address 127.0.0.1) here means the server is running on the same

machine as the client.

Optionally, this part of the URL may also explicitly give the socket port on which

the server is listening for connections, following a colon (e.g., starship.python.net:

8000, or 127.0.0.1:80). For HTTP, the socket is usually connected to port number

Climbing the CGI Learning Curve | 1137

80, so this is the default if the port is omitted. See Chapter 12 if you need a refresher

on machine names and ports.

File path: ~lutz/tutor0.html

Finally, the URL gives the path to the desired file on the remote machine. The

HTTP web server automatically translates the URL’s file path to the file’s true

pathname: on the starship server, ~lutz is automatically translated to the

public_html directory in my home directory. When using the Python-coded web

server script in Example 15-1, files are mapped to the server’s current working

directory instead. URLs typically map to such files, but they can reference other

sorts of items as well, and as we’ll see in a few moments may name an executable

CGI script to be run when accessed.

Query parameters (used in later examples)

URLs may also be followed by additional input parameters for CGI programs.

When used, they are introduced by a ? and are typically separated by & characters.

For instance, a string of the form ?name=bob&job=hacker at the end of a URL passes

parameters named name and job to the CGI script named earlier in the URL, with

values bob and hacker, respectively. As we’ll discuss later in this chapter when we

explore escaping rules, the parameters may sometimes be separated by ; characters

instead, as in ?name=bob;job=hacker, though this form is less common.

These values are sometimes called URL query string parameters and are treated the

same as form inputs by scripts. Technically speaking, query parameters may have

other structures (e.g., unnamed values separated by +), but we will ignore addi-

tional options in this text; more on both parameters and input forms later in this

tutorial.

To make sure we have a handle on URL syntax, let’s pick apart another example that

we will be using later in this chapter. In the following HTTP protocol URL:

http://localhost:80/cgi-bin/languages.py?language=All

the components uniquely identify a server script to be run as follows:

• The server name localhost means the web server is running on the same machine

as the client; as explained earlier, this is the configuration we’re using for our

examples.

• Port number 80 gives the socket port on which the web server is listening for con-

nections (port 80 is the default if this part is omitted, so we will usually omit it).

• The file path cgi-bin/languages.py gives the location of the file to be run on the

server machine, within the directory where the server looks for referenced files.

• The query string ?language=All provides an input parameter to the referenced

script languages.py, as an alternative to user input in form fields (described later).

1138 | Chapter 15: Server-Side Scripting

Although this covers most URLs you’re likely to encounter in the wild, the full format

of URLs is slightly richer:

protocol://networklocation/path;parameters?querystring#fragment

For instance, the fragment part may name a section within a page (e.g., #part1). More-

over, each part can have formats of its own, and some are not used in all protocols.

The ;parameters part is omitted for HTTP, for instance (it gives an explicit file type for

FTP), and the networklocation part may also specify optional user login parameters for

some protocol schemes (its full format is user:password@host:port for FTP and Telnet,

but just host:port for HTTP). We used a complex FTP URL in Chapter 13, for example,

which included a username and password, as well as a binary file type (the server may

guess if no type is given):

ftp://lutz:password@ftp.rmi.net/filename;type=i

We’ll ignore additional URL formatting rules here. If you’re interested in more details,

you might start by reading the urllib.parse module’s entry in Python’s library manual,

as well as its source code in the Python standard library. You may also notice that a

URL you type to access a page looks a bit different after the page is fetched (spaces

become + characters, % characters are added, and so on). This is simply because brows-

ers must also generally follow URL escaping (i.e., translation) conventions, which we’ll

explore later in this chapter.

Using minimal URLs

Because browsers remember the prior page’s Internet address, URLs embedded in

HTML files can often omit the protocol and server names, as well as the file’s directory

path. If missing, the browser simply uses these components’ values from the last page’s

address. This minimal syntax works for URLs embedded in hyperlinks and for form

actions (we’ll meet forms later in this tutorial). For example, within a page that was

fetched from the directory dirpath on the server http://www.server.com, minimal hy-

perlinks and form actions such as:

are treated exactly as if we had specified a complete URL with explicit server and path

components, like the following:

The first minimal URL refers to the file more.html on the same server and in the same

directory from which the page containing this hyperlink was fetched; it is expanded to

a complete URL within the browser. URLs can also employ Unix-style relative path

syntax in the file path component. A hyperlink tag like <A HREF="../spam.gif">, for

instance, names a GIF file on the server machine and parent directory of the file that

contains this link’s URL.

Climbing the CGI Learning Curve | 1139

Why all the fuss about shorter URLs? Besides extending the life of your keyboard and

eyesight, the main advantage of such minimal URLs is that they don’t need to be

changed if you ever move your pages to a new directory or server—the server and path

are inferred when the page is used; they are not hardcoded into its HTML. The flipside

of this can be fairly painful: examples that do include explicit site names and pathnames

in URLs embedded within HTML code cannot be copied to other servers without

source code changes. Scripts and special HTML tags can help here, but editing source

code can be error-prone.

The downside of minimal URLs is that they don’t trigger automatic Internet connec-

tions when followed offline. This becomes apparent only when you load pages from

local files on your computer. For example, we can generally open HTML pages without

connecting to the Internet at all by pointing a web browser to a page’s file that lives on

the local machine (e.g., by clicking on its file icon). When browsing a page locally like

this, following a fully specified URL makes the browser automatically connect to the

Internet to fetch the referenced page or script. Minimal URLs, though, are opened on

the local machine again; usually, the browser simply displays the referenced page or

script’s source code.

The net effect is that minimal URLs are more portable, but they tend to work better

when running all pages live on the Internet (or served up by a locally running web

server). To make them easier to work with, the examples in this book will often omit

the server and path components in URLs they contain. In this book, to derive a page

or script’s true URL from a minimal URL, imagine that the string:

http://localhost/

appears before the filename given by the URL. Your browser will, even if you don’t.

HTML file permission constraints

One install pointer before we move on: if you want to use a different server and machine,

it may be necessary on some platforms to grant web page files and their directories

world-readable permission. That’s because they are loaded by arbitrary people over the

Web (often by someone named “nobody,” who we’ll introduce in a moment).

An appropriate chmod command can be used to change permissions on Unix-like ma-

chines. For instance, a chmod 755 filename shell command usually suffices; it makes

filename readable and executable by everyone, and writable by you only.† These di-

rectory and file permission details are typical, but they can vary from server to server.

Be sure to find out about the local server’s conventions if you upload HTML files to a

remote site.

† These are not necessarily magic numbers. On Unix machines, mode 755 is a bit mask. The first 7 simply

means that you (the file’s owner) can read, write, and execute the file (7 in binary is 111—each bit enables

an access mode). The two 5s (binary 101) say that everyone else (your group and others) can read and execute

(but not write) the file. See your system’s manpage on the chmod command for more details.

1140 | Chapter 15: Server-Side Scripting

A First CGI Script

The HTML file we saw in the prior section is just that—an HTML file, not a CGI script.

When referenced by a browser, the remote web server simply sends back the file’s text

to produce a new page in the browser. To illustrate the nature of CGI scripts, let’s

recode the example as a Python CGI program, as shown in Example 15-3.

Example 15-3. PP4E\Internet\Web\cgi-bin\tutor0.py

#!/usr/bin/python

"""

runs on the server, prints HTML to create a new page;

url=http://localhost/cgi-bin/tutor0.py

"""

print('Content-type: text/html\n')

print('<TITLE>CGI 101</TITLE>')

print('<H1>A First CGI Script</H1>')

print('<P>Hello, CGI World!</P>')

This file, tutor0.py, makes the same sort of page as Example 15-2 if you point your

browser at it—simply replace .html with .py in the URL, and add the cgi-bin subdir-

ectory name to the path to yield its address to enter in your browser’s address field,

http://localhost/cgi-bin/tutor0.py.

But this time it’s a very different kind of animal—it is an executable program that is run

on the server in response to your access request. It’s also a completely legal Python

program, in which the page’s HTML is printed dynamically, instead of being precoded

in a static file. In fact, little is CGI-specific about this Python program; if run from the

system command line, it simply prints HTML instead of generating a browser page:

C:\...\PP4E\Internet\Web\cgi-bin> python tutor0.py

Content-type: text/html

<H1>A First CGI Script</H1>

<P>Hello, CGI World!</P>

When run by the HTTP server program on a web server machine, however, the standard

output stream is tied to a socket read by the browser on the client machine. In this

context, all the output is sent across the Internet to your web browser. As such, it must

be formatted per the browser’s expectations.

In particular, when the script’s output reaches your browser, the first printed line is

interpreted as a header, describing the text that follows. There can be more than one

header line in the printed response, but there must always be a blank line between the

headers and the start of the HTML code (or other data). As we’ll see later, “cookie”

state retention directives show up in the header area as well, prior to the blank line.

In this script, the first header line tells the browser that the rest of the transmission is

HTML text (text/html), and the newline character (\n) at the end of the first print call

Climbing the CGI Learning Curve | 1141

statement generates an extra line feed in addition to the one that the print generates

itself. The net effect is to insert a blank line after the header line. The rest of this pro-

gram’s output is standard HTML and is used by the browser to generate a web page

on a client, exactly as if the HTML lived in a static HTML file on the server.‡

CGI scripts are accessed just like HTML files: you either type the full URL of this script

into your browser’s address field or click on the tutor0.py link line in the examples root

page of Figure 15-1 (which follows a minimal hyperlink that resolves to the script’s full

URL). Figure 15-3 shows the result page generated if you point your browser at this

script.

Figure 15-3. A simple web page from a CGI script

Installing CGI scripts

If you are running the local web server described at the start of this chapter, no extra

installation steps are required to make this example work, and you can safely skip most

of this section. If you want to put CGI scripts on another server, though, there are a

few pragmatic details you may need to know about. This section provides a brief over-

view of common CGI configuration details for reference.

Like HTML files, CGI scripts are simple text files that you can either create on your

local machine and upload to the server by FTP or write with a text editor running

directly on the server machine (perhaps using a Telnet or SSH client). However, because

CGI scripts are run as programs, they have some unique installation requirements that

differ from simple HTML files. In particular, they usually must be stored and named

specially, and they must be configured as programs that are executable by arbitrary

users. Depending on your needs, CGI scripts also may require help finding imported

‡ Notice that the script does not generate the enclosing <HEAD> and <BODY> tags included in the static HTML

file of the prior section. Strictly speaking, it should—HTML without such tags is technically invalid. But

because all commonly used browsers simply ignore the omission, we’ll take some liberties with HTML syntax

in this book. If you need to care about such things, consult HTML references for more formal details.

1142 | Chapter 15: Server-Side Scripting

modules and may need to be converted to the server platform’s text file format after

being uploaded. Let’s look at each install constraint in more depth:

Directory and filename conventions

First, CGI scripts need to be placed in a directory that your web server recognizes

as a program directory, and they need to be given a name that your server recognizes

as a CGI script. In the local web server we’re using in this chapter, scripts need to

be placed in a special cgi-bin subdirectory and be named with a .py extension. On

the server used for this book’s second edition, CGI scripts instead were stored in

the user’s public_html directory just like HTML files, but they required a filename

ending in a .cgi, not a .py. Some servers may allow other suffixes and program

directories; this varies widely and can sometimes be configured per server or per

user.

Execution conventions

Because they must be executed by the web server on behalf of arbitrary users on

the Web, CGI script files may also need to be given executable file permissions to

mark them as programs and be made executable by others. Again, a shell command

chmod 0755 filename does the trick on most servers.

Under some servers, CGI scripts also need the special #! line at the top, to identify

the Python interpreter that runs the file’s code. The text after the #! in the first line

simply gives the directory path to the Python executable on your server machine.

See Chapter 3 for more details on this special first line, and be sure to check your

server’s conventions for more details on non-Unix platforms.

Some servers may expect this line, even outside Unix. Most of the CGI scripts in

this book include the #! line just in case they will ever be run on Unix-like platforms;

under our locally running web server on Windows, this first line is simply ignored

as a Python comment.

One subtlety worth noting: as we saw earlier in the book, the special first line in

executable text files can normally contain either a hardcoded path to the Python

interpreter (e.g., #!/usr/bin/python) or an invocation of the env program (e.g.,

#!/usr/bin/env python), which deduces where Python lives from environment var-

iable settings (i.e., your $PATH). The env trick is less useful in CGI scripts, though,

because their environment settings may be those of the user “nobody” (not your

own), as explained in the next paragraph.

Module search path configuration (optional)

Some HTTP servers may run CGI scripts with the username “nobody” for security

reasons (this limits the user’s access to the server machine). That’s why files you

publish on the Web must have special permission settings that make them acces-

sible to other users. It also means that some CGI scripts can’t rely on the Python

module search path to be configured in any particular way. As you’ve learned by

now, the module path is normally initialized from the user’s PYTHONPATH setting

and .pth files, plus defaults which normally include the current working directory.

Climbing the CGI Learning Curve | 1143

But because CGI scripts are run by the user “nobody,” PYTHONPATH may be arbitrary

when a CGI script runs.

Before you puzzle over this too hard, you should know that this is often not a

concern in practice. Because Python usually searches the current directory for im-

ported modules by default, this is not an issue if all of your scripts and any modules

and packages they use are stored in your web directory, and your web server

launches CGI scripts in the directory in which they reside. But if the module lives

elsewhere, you may need to modify the sys.path list in your scripts to adjust the

search path manually before imports—for instance, with sys.path.append(dir

name) calls, index assignments, and so on.

End-of-line conventions (optional)

On some Unix (and Linux) servers, you might also have to make sure that your

script text files follow the Unix end-of-line convention (\n), not DOS (\r\n). This

isn’t an issue if you edit and debug right on the server (or on another Unix machine)

or FTP files one by one in text mode. But if you edit and upload your scripts from

a PC to a Unix server in a tar file (or in FTP binary mode), you may need to convert

end-of-lines after the upload. For instance, the server that was used for the second

edition of this text returns a default error page for scripts whose end-of-lines are

in DOS format. See Chapter 6 for techniques and a note on automated end-of-line

converter scripts.

Unbuffered output streams (optional)

Under some servers, the print call statement may buffer its output. If you have a

long-running CGI script, to avoid making the user wait to see results, you may wish

to manually flush your printed text (call sys.stdout.flush()) or run your Python

scripts in unbuffered mode. Recall from Chapter 5 that you can make streams

unbuffered by running with the -u command-line flag or by setting your

PYTHONUNBUFFERED environment variable to a nonempty value.

To use -u in the CGI world, try using a first line on Unix-like platforms like #!/

usr/bin/python -u. In typical usage, output buffering is not usually a factor. On

some servers and clients, though, this may be a resolution for empty reply pages,

or premature end-of-script header errors—the client may time out before the buf-

fered output stream is sent (though more commonly, these cases reflect genuine

program errors in your script).

This installation process may sound a bit complex at first glance, but much of it is

server-dependent, and it’s not bad once you’ve worked through it on your own. It’s

only a concern at install time and can usually be automated to some extent with Python

scripts run on the server. To summarize, most Python CGI scripts are text files of Python

code, which:

• Are named according to your web server’s conventions (e.g., file.py)

• Are stored in a directory recognized by your web server (e.g., cgi-bin/)

• Are given executable file permissions if required (e.g., chmod 755 file.py)

1144 | Chapter 15: Server-Side Scripting

• May require the special #!pythonpath line at the top for some servers

• Configure sys.path only if needed to see modules in other directories

• Use Unix end-of-line conventions if your server rejects DOS format

• Flush output buffers if required, or to send portions of the reply periodically

Even if you must use a server machine configured by someone else, most of the ma-

chine’s conventions should be easy to root out during a normal debugging cycle. As

usual, you should consult the conventions for any machine to which you plan to copy

these example files.

Finding Python on remote servers

One last install pointer: even though Python doesn’t have to be installed on any cli-

ents in the context of a server-side web application, it does have to exist on the server

machine where your CGI scripts are expected to run. If you’re running your own server

with either the webserver.py script we met earlier or an open source server such as

Apache, this is a nonissue.

But if you are using a web server that you did not configure yourself, you must be sure

that Python lives on that machine. Moreover, you need to find where it is on that

machine so that you can specify its path in the #! line at the top of your script. If you

are not sure if or where Python lives on your server machine, here are some tips:

• Especially on Unix systems, you should first assume that Python lives in a standard

place (e.g., /usr/local/bin/python): type python (or which python) in a shell window

and see if it works. Chances are that Python already lives on such machines. If you

have Telnet or SSH access on your server, a Unix find command starting at /usr

may help.

• If your server runs Linux, you’re probably set to go. Python ships as a standard

part of Linux distributions these days, and many websites and Internet Service

Providers (ISPs) run the Linux operating system; at such sites, Python probably

already lives at /usr/bin/python.

• In other environments where you cannot control the server machine yourself, it

may be harder to obtain access to an already installed Python. If so, you can relocate

your site to a server that does have Python installed, talk your ISP into installing

Python on the machine you’re trying to use, or install Python on the server machine

yourself.

If your ISP is unsympathetic to your need for Python and you are willing to relocate

your site to one that is, you can find lists of Python-friendly ISPs by searching the Web.

And if you choose to install Python on your server machine yourself, be sure to check

out the Python world’s support for frozen binaries—with it, you can create a single

executable program file that contains the entire Python interpreter, as well as all the

standard library modules. Assuming compatible machines, such a frozen interpreter

might be uploaded to your web account by FTP in a single step, and it won’t require a

Climbing the CGI Learning Curve | 1145

full-blown Python installation on the server. The public domain PyInstaller and Py2Exe

systems can produce a frozen Python binary.

Finally, to run this book’s examples, make sure the Python you find or install is Python

3.X, not Python 2.X. As mentioned earlier, many commercial ISPs support the latter

but not the former as I’m writing this fourth edition, but this is expected to change over

time. If you do locate a commercial ISP with 3.X support, you should be able to upload

your files by FTP and work by SSH or Telnet. You may also be able to run this chapter’s

webserver.py script on the remote machine, though you may need to avoid using the

standard port 80, depending on how much control your account affords.

Adding Pictures and Generating Tables

Let’s get back to writing server-side code. As anyone who’s ever surfed the Web knows,

web pages usually consist of more than simple text. Example 15-4 is a Python CGI

script that prints an <IMG> HTML tag in its output to produce a graphic image in the

client browser. This example isn’t very Python-specific, but note that just as for simple

HTML files, the image file (ppsmall.gif, one level up from the script file) lives on and is

downloaded from the server machine when the browser interprets the output of this

script to render the reply page (even if the server’s machine is the same as the client’s).

Example 15-4. PP4E\Internet\Web\cgi-bin\tutor1.py

#!/usr/bin/python

text = """Content-type: text/html

<H1>A Second CGI Script</H1>

<HR>

<P>Hello, CGI World!</P>

<HR>

"""

print(text)

Notice the use of the triple-quoted string block here; the entire HTML string is sent to

the browser in one fell swoop, with the print call statement at the end. Be sure that the

blank line between the Content-type header and the first HTML is truly blank in the

string (it may fail in some browsers if you have any spaces or tabs on that line). If both

client and server are functional, a page that looks like Figure 15-4 will be generated

when this script is referenced and run.

So far, our CGI scripts have been putting out canned HTML that could have just as

easily been stored in an HTML file. But because CGI scripts are executable programs,

they can also be used to generate HTML on the fly, dynamically—even, possibly, in

response to a particular set of user inputs sent to the script. That’s the whole purpose

1146 | Chapter 15: Server-Side Scripting

of CGI scripts, after all. Let’s start using this to better advantage now, and write a Python

script that builds up response HTML programmatically, listed in Example 15-5.

Example 15-5. PP4E\Internet\Web\cgi-bin\tutor2.py

#!/usr/bin/python

print("""Content-type: text/html

<H1>A Third CGI Script</H1>

<HR>

<P>Hello, CGI World!</P>

""")

for i in range(5):

print('<tr>')

for j in range(4):

print('<td>%d.%d</td>' % (i, j))

print('</tr>')

print("""

</table>

<HR>

""")

Despite all the tags, this really is Python code—the tutor2.py script uses triple-quoted

strings to embed blocks of HTML again. But this time, the script also uses nested Python

Figure 15-4. A page with an image generated by tutor1.py

Climbing the CGI Learning Curve | 1147

for loops to dynamically generate part of the HTML that is sent to the browser. Spe-

cifically, it emits HTML to lay out a two-dimensional table in the middle of a page, as

shown in Figure 15-5.

Figure 15-5. A page with a table generated by tutor2.py

Each row in the table displays a “row.column” pair, as generated by the executing

Python script. If you’re curious how the generated HTML looks, select your browser’s

View Source option after you’ve accessed this page. It’s a single HTML page composed

of the HTML generated by the first print in the script, then the for loops, and finally

the last print. In other words, the concatenation of this script’s output is an HTML

document with headers.

Table tags

The script in Example 15-5 generates HTML table tags. Again, we’re not out to learn

HTML here, but we’ll take a quick look just so that you can make sense of this book’s

examples. Tables are declared by the text between <table> and </table> tags in HTML.

Typically, a table’s text in turn declares the contents of each table row between <tr>

and </tr> tags and each column within a row between <td> and </td> tags. The loops

in our script build up HTML to declare five rows of four columns each by printing the

appropriate tags, with the current row and column number as column values.

For instance, here is part of the script’s output, defining the first two rows (to see the

full output, run the script standalone from a system command line, or select your

browser’s View Source option):

1148 | Chapter 15: Server-Side Scripting

<tr>

</tr>

<tr>

</tr>

. . .

</table>

Other table tags and options let us specify a row title (<th>), lay out borders, and so

on. We’ll use more table syntax to lay out forms in a uniform fashion later in this

tutorial.

Adding User Interaction

CGI scripts are great at generating HTML on the fly like this, but they are also com-

monly used to implement interaction with a user typing at a web browser. As described

earlier in this chapter, web interactions usually involve a two-step process and two

distinct web pages: you fill out an input form page and press Submit, and a reply page

eventually comes back. In between, a CGI script processes the form input.

Submission page

That description sounds simple enough, but the process of collecting user inputs re-

quires an understanding of a special HTML tag, <form>. Let’s look at the implementa-

tion of a simple web interaction to see forms at work. First, we need to define a form

page for the user to fill out, as shown in Example 15-6.

Example 15-6. PP4E\Internet\Web\tutor3.html

<html>

<body>

<H1>A first user interaction: forms</H1>

<hr>

<P><B>Enter your name:</B>

</form>

</body></html>

Climbing the CGI Learning Curve | 1149

tutor3.html is a simple HTML file, not a CGI script (though its contents could be printed

from a script as well). When this file is accessed, all the text between its <form> and

</form> tags generates the input fields and Submit button shown in Figure 15-6.

Figure 15-6. A simple form page generated by tutor3.html

Programming Python OReilly 4th Ed. Ed

Navigation menu

Versions of this User Manual:

Views

Navigation