Learning Python 5E Manual

User Manual:

Open the PDF directly: View PDF .
Page Count: 1594 [warning: Documents this large are best viewed by clicking the View PDF Link!]

Table of Contents
Preface
Part I. Getting Started
Part II. Types and Operations
Part III. Statements and Syntax
Part IV. Functions and Generators
Part V. Modules and Packages
Part VI. Classes and OOP
Part VII. Exceptions and Tools
Part VIII. Advanced Topics
Part IX. Appendixes
Index

www.it-ebooks.info

FIFTH EDITION

Learning Python

Mark Lutz

Beijing

•

Cambridge

•

Farnham

•

Köln

•

Sebastopol

•

Tokyo

www.it-ebooks.info

Learning Python, Fifth Edition

by Mark Lutz

Printed in the United States of America.

Published by O’Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472.

O’Reilly books may be purchased for educational, business, or sales promotional use. Online editions

are also available for most titles (http://my.safaribooksonline.com). For more information, contact our

corporate/institutional sales department: 800-998-9938 or corporate@oreilly.com.

Editor: Rachel Roumeliotis

Production Editor: Christopher Hearse

Copyeditor: Rachel Monaghan

Proofreader: Julie Van Keuren

Indexer: Lucie Haskins

Cover Designer: Randy Comer

Interior Designer: David Futato

Illustrator: Rebecca Demarest

June 2013: Fifth Edition.

Revision History for the Fifth Edition:

2013-06-07 First release

See http://oreilly.com/catalog/errata.csp?isbn=9781449355739 for release details.

Nutshell Handbook, the Nutshell Handbook logo, and the O’Reilly logo are registered trademarks of

O’Reilly Media, Inc. Learning Python, 5th Edition, the image of a wood rat, and related trade dress are

trademarks of O’Reilly Media, Inc.

Many of the designations used by manufacturers and sellers to distinguish their products are claimed as

trademarks. Where those designations appear in this book, and O’Reilly Media, Inc., was aware of a

trademark claim, the designations have been printed in caps or initial caps.

While every precaution has been taken in the preparation of this book, the publisher and authors assume

no responsibility for errors or omissions, or for damages resulting from the use of the information con-

tained herein.

ISBN: 978-1-449-35573-9

[QG]

1370970520

www.it-ebooks.info

To Vera.

You are my life.

www.it-ebooks.info

Table of Contents

Preface .................................................................. xxxiii

Part I. Getting Started

1. A Python Q&A Session ................................................... 3

Why Do People Use Python? 3

Software Quality 4

Developer Productivity 5

Is Python a “Scripting Language”? 5

OK, but What’s the Downside? 7

Who Uses Python Today? 9

What Can I Do with Python? 10

Systems Programming 11

GUIs 11

Internet Scripting 11

Component Integration 12

Database Programming 12

Rapid Prototyping 13

Numeric and Scientific Programming 13

And More: Gaming, Images, Data Mining, Robots, Excel... 14

How Is Python Developed and Supported? 15

Open Source Tradeoffs 15

What Are Python’s Technical Strengths? 16

It’s Object-Oriented and Functional 16

It’s Free 17

It’s Portable 17

It’s Powerful 18

It’s Mixable 19

It’s Relatively Easy to Use 19

It’s Relatively Easy to Learn 20

It’s Named After Monty Python 20

www.it-ebooks.info

How Does Python Stack Up to Language X? 21

Chapter Summary 22

Test Your Knowledge: Quiz 23

Test Your Knowledge: Answers 23

2. How Python Runs Programs ............................................. 27

Introducing the Python Interpreter 27

Program Execution 28

The Programmer’s View 28

Python’s View 30

Execution Model Variations 33

Python Implementation Alternatives 33

Execution Optimization Tools 37

Frozen Binaries 39

Future Possibilities? 40

Chapter Summary 40

Test Your Knowledge: Quiz 41

Test Your Knowledge: Answers 41

3. How You Run Programs ................................................. 43

The Interactive Prompt 43

Starting an Interactive Session 44

The System Path 45

New Windows Options in 3.3: PATH, Launcher 46

Where to Run: Code Directories 47

What Not to Type: Prompts and Comments 48

Running Code Interactively 49

Why the Interactive Prompt? 50

Usage Notes: The Interactive Prompt 52

System Command Lines and Files 54

A First Script 55

Running Files with Command Lines 56

Command-Line Usage Variations 57

Usage Notes: Command Lines and Files 58

Unix-Style Executable Scripts: #! 59

Unix Script Basics 59

The Unix env Lookup Trick 60

The Python 3.3 Windows Launcher: #! Comes to Windows 60

Clicking File Icons 62

Icon-Click Basics 62

Clicking Icons on Windows 63

The input Trick on Windows 63

Other Icon-Click Limitations 66

vi | Table of Contents

www.it-ebooks.info

Module Imports and Reloads 66

Import and Reload Basics 66

The Grander Module Story: Attributes 68

Usage Notes: import and reload 71

Using exec to Run Module Files 72

The IDLE User Interface 73

IDLE Startup Details 74

IDLE Basic Usage 75

IDLE Usability Features 76

Advanced IDLE Tools 77

Usage Notes: IDLE 78

Other IDEs 79

Other Launch Options 81

Embedding Calls 81

Frozen Binary Executables 82

Text Editor Launch Options 82

Still Other Launch Options 82

Future Possibilities? 83

Which Option Should I Use? 83

Chapter Summary 85

Test Your Knowledge: Quiz 85

Test Your Knowledge: Answers 86

Test Your Knowledge: Part I Exercises 87

Part II. Types and Operations

4. Introducing Python Object Types ......................................... 93

The Python Conceptual Hierarchy 93

Why Use Built-in Types? 94

Python’s Core Data Types 95

Numbers 97

Strings 99

Sequence Operations 99

Immutability 101

Type-Specific Methods 102

Getting Help 104

Other Ways to Code Strings 105

Unicode Strings 106

Pattern Matching 108

Lists 109

Sequence Operations 109

Type-Specific Operations 109

Table of Contents | vii

www.it-ebooks.info

Bounds Checking 110

Nesting 110

Comprehensions 111

Dictionaries 113

Mapping Operations 114

Nesting Revisited 115

Missing Keys: if Tests 116

Sorting Keys: for Loops 118

Iteration and Optimization 120

Tuples 121

Why Tuples? 122

Files 122

Binary Bytes Files 123

Unicode Text Files 124

Other File-Like Tools 126

Other Core Types 126

How to Break Your Code’s Flexibility 128

User-Defined Classes 129

And Everything Else 130

Chapter Summary 130

Test Your Knowledge: Quiz 131

Test Your Knowledge: Answers 131

5. Numeric Types ....................................................... 133

Numeric Type Basics 133

Numeric Literals 134

Built-in Numeric Tools 136

Python Expression Operators 136

Numbers in Action 141

Variables and Basic Expressions 141

Numeric Display Formats 143

Comparisons: Normal and Chained 144

Division: Classic, Floor, and True 146

Integer Precision 150

Complex Numbers 151

Hex, Octal, Binary: Literals and Conversions 151

Bitwise Operations 153

Other Built-in Numeric Tools 155

Other Numeric Types 157

Decimal Type 157

Fraction Type 160

Sets 163

Booleans 171

viii | Table of Contents

www.it-ebooks.info

Numeric Extensions 172

Chapter Summary 172

Test Your Knowledge: Quiz 173

Test Your Knowledge: Answers 173

6. The Dynamic Typing Interlude .......................................... 175

The Case of the Missing Declaration Statements 175

Variables, Objects, and References 176

Types Live with Objects, Not Variables 177

Objects Are Garbage-Collected 178

Shared References 180

Shared References and In-Place Changes 181

Shared References and Equality 183

Dynamic Typing Is Everywhere 185

Chapter Summary 186

Test Your Knowledge: Quiz 186

Test Your Knowledge: Answers 186

7. String Fundamentals .................................................. 189

This Chapter’s Scope 189

Unicode: The Short Story 189

String Basics 190

String Literals 192

Single- and Double-Quoted Strings Are the Same 193

Escape Sequences Represent Special Characters 193

Raw Strings Suppress Escapes 196

Triple Quotes Code Multiline Block Strings 198

Strings in Action 200

Basic Operations 200

Indexing and Slicing 201

String Conversion Tools 205

Changing Strings I 208

String Methods 209

Method Call Syntax 209

Methods of Strings 210

String Method Examples: Changing Strings II 211

String Method Examples: Parsing Text 213

Other Common String Methods in Action 214

The Original string Module’s Functions (Gone in 3.X) 215

String Formatting Expressions 216

Formatting Expression Basics 217

Advanced Formatting Expression Syntax 218

Advanced Formatting Expression Examples 220

Table of Contents | ix

www.it-ebooks.info

Dictionary-Based Formatting Expressions 221

String Formatting Method Calls 222

Formatting Method Basics 222

Adding Keys, Attributes, and Offsets 223

Advanced Formatting Method Syntax 224

Advanced Formatting Method Examples 225

Comparison to the % Formatting Expression 227

Why the Format Method? 230

General Type Categories 235

Types Share Operation Sets by Categories 235

Mutable Types Can Be Changed in Place 236

Chapter Summary 237

Test Your Knowledge: Quiz 237

Test Your Knowledge: Answers 237

8. Lists and Dictionaries .................................................. 239

Lists 239

Lists in Action 242

Basic List Operations 242

List Iteration and Comprehensions 242

Indexing, Slicing, and Matrixes 243

Changing Lists in Place 244

Dictionaries 250

Dictionaries in Action 252

Basic Dictionary Operations 253

Changing Dictionaries in Place 254

More Dictionary Methods 254

Example: Movie Database 256

Dictionary Usage Notes 258

Other Ways to Make Dictionaries 262

Dictionary Changes in Python 3.X and 2.7 264

Chapter Summary 271

Test Your Knowledge: Quiz 272

Test Your Knowledge: Answers 272

9. Tuples, Files, and Everything Else . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275

Tuples 276

Tuples in Action 277

Why Lists and Tuples? 279

Records Revisited: Named Tuples 280

Files 282

Opening Files 283

Using Files 284

x | Table of Contents

www.it-ebooks.info

Files in Action 285

Text and Binary Files: The Short Story 287

Storing Python Objects in Files: Conversions 288

Storing Native Python Objects: pickle 290

Storing Python Objects in JSON Format 291

Storing Packed Binary Data: struct 293

File Context Managers 294

Other File Tools 294

Core Types Review and Summary 295

Object Flexibility 297

References Versus Copies 297

Comparisons, Equality, and Truth 300

The Meaning of True and False in Python 304

Python’s Type Hierarchies 306

Type Objects 306

Other Types in Python 308

Built-in Type Gotchas 308

Assignment Creates References, Not Copies 308

Repetition Adds One Level Deep 309

Beware of Cyclic Data Structures 310

Immutable Types Can’t Be Changed in Place 311

Chapter Summary 311

Test Your Knowledge: Quiz 311

Test Your Knowledge: Answers 312

Test Your Knowledge: Part II Exercises 313

Part III. Statements and Syntax

10. Introducing Python Statements ......................................... 319

The Python Conceptual Hierarchy Revisited 319

Python’s Statements 320

A Tale of Two ifs 322

What Python Adds 322

What Python Removes 323

Why Indentation Syntax? 324

A Few Special Cases 327

A Quick Example: Interactive Loops 329

A Simple Interactive Loop 329

Doing Math on User Inputs 331

Handling Errors by Testing Inputs 332

Handling Errors with try Statements 333

Nesting Code Three Levels Deep 335

Table of Contents | xi

www.it-ebooks.info

Chapter Summary 336

Test Your Knowledge: Quiz 336

Test Your Knowledge: Answers 336

11. Assignments, Expressions, and Prints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 339

Assignment Statements 339

Assignment Statement Forms 340

Sequence Assignments 341

Extended Sequence Unpacking in Python 3.X 344

Multiple-Target Assignments 348

Augmented Assignments 350

Variable Name Rules 352

Expression Statements 356

Expression Statements and In-Place Changes 357

Print Operations 358

The Python 3.X print Function 359

The Python 2.X print Statement 361

Print Stream Redirection 363

Version-Neutral Printing 366

Chapter Summary 369

Test Your Knowledge: Quiz 370

Test Your Knowledge: Answers 370

12. if Tests and Syntax Rules ............................................... 371

if Statements 371

General Format 371

Basic Examples 372

Multiway Branching 372

Python Syntax Revisited 375

Block Delimiters: Indentation Rules 376

Statement Delimiters: Lines and Continuations 378

A Few Special Cases 379

Truth Values and Boolean Tests 380

The if/else Ternary Expression 382

Chapter Summary 385

Test Your Knowledge: Quiz 385

Test Your Knowledge: Answers 386

13. while and for Loops ................................................... 387

while Loops 387

General Format 388

Examples 388

break, continue, pass, and the Loop else 389

xii | Table of Contents

www.it-ebooks.info

General Loop Format 389

pass 390

continue 391

break 391

Loop else 392

for Loops 395

General Format 395

Examples 395

Loop Coding Techniques 402

Counter Loops: range 402

Sequence Scans: while and range Versus for 403

Sequence Shufflers: range and len 404

Nonexhaustive Traversals: range Versus Slices 405

Changing Lists: range Versus Comprehensions 406

Parallel Traversals: zip and map 407

Generating Both Offsets and Items: enumerate 410

Chapter Summary 413

Test Your Knowledge: Quiz 414

Test Your Knowledge: Answers 414

14. Iterations and Comprehensions ......................................... 415

Iterations: A First Look 416

The Iteration Protocol: File Iterators 416

Manual Iteration: iter and next 419

Other Built-in Type Iterables 422

List Comprehensions: A First Detailed Look 424

List Comprehension Basics 425

Using List Comprehensions on Files 426

Extended List Comprehension Syntax 427

Other Iteration Contexts 429

New Iterables in Python 3.X 434

Impacts on 2.X Code: Pros and Cons 434

The range Iterable 435

The map, zip, and filter Iterables 436

Multiple Versus Single Pass Iterators 438

Dictionary View Iterables 439

Other Benchmarking Topics: pystones

This chapter has focused on code timing fundamentals that you can use on your own

code, that apply to Python benchmarking in general, and that served as a common use

case for developing larger examples for this book. Benchmarking Python is a broader

and richer domain than so far implied, though. If you’re interested in pursuing this

topic further, search the Web for links. Among the topics you’ll find:

•pystone.py—a program designed for measuring Python speed across a range of code

that ships with Python in its Lib\test directory

•http://speed.python.org—a project site for coordinating work on common Python

benchmarks

•http://speed.pypy.org—the PyPy benchmarking site that the preceding bullet is

partially emulating

The pystone test, for example, is based on a C language benchmark program that was

translated to Python by Python original creator Guido van Rossum. It provides another

way to measure the relative speeds of Python implementations, and seems to generally

support our findings here:

c:\Python33\Lib\test> cd C:\python33\lib\test

c:\Python33\Lib\test> py −3 pystone.py

Pystone(1.1) time for 50000 passes = 0.685303

This machine benchmarks at 72960.4 pystones/second

c:\Python33\Lib\test> cd c:\python27\lib\test

c:\Python27\Lib\test> py −2 pystone.py

Pystone(1.1) time for 50000 passes = 0.463547

This machine benchmarks at 107864 pystones/second

c:\Python27\Lib\test> c:\pypy\pypy-1.9\pypy pystone.py

Pystone(1.1) time for 50000 passes = 0.099975

This machine benchmarks at 500125 pystones/second

Since it’s time to wrap up this chapter, this will have to suffice as independent confir-

mation of our tests’ results. Analyzing the meaning of pystone’s results is left as sug-

gested exercise; its code is not identical across 3.X and 2.X, but appears to differ today

only in terms of print operations and an initialization of a global. Also keep in mind

that benchmarking is just one of many aspects of Python code analysis; for pointers on

options in related domains (e.g., testing), see Chapter 36’s review of Python develop-

ment tools.

Function Gotchas

Now that we’ve reached the end of the function story, let’s review some common pit-

falls. Functions have some jagged edges that you might not expect. They’re all relatively

656 | Chapter 21: The Benchmarking Interlude

www.it-ebooks.info

obscure, and a few have started to fall away from the language completely in recent

releases, but most have been known to trip up new users.

Local Names Are Detected Statically

As you know, Python classifies names assigned in a function as locals by default; they

live in the function’s scope and exist only while the function is running. What you may

not realize is that Python detects locals statically, when it compiles the def’s code, rather

than by noticing assignments as they happen at runtime. This leads to one of the most

common oddities posted on the Python newsgroup by beginners.

Normally, a name that isn’t assigned in a function is looked up in the enclosing module:

>>> X = 99

>>> def selector(): # X used but not assigned

print(X) # X found in global scope

>>> selector()

Here, the X in the function resolves to the X in the module. But watch what happens if

you add an assignment to X after the reference:

>>> def selector():

print(X) # Does not yet exist!

X = 88 # X classified as a local name (everywhere)

# Can also happen for "import X", "def X"...

>>> selector()

UnboundLocalError: local variable 'X' referenced before assignment

You get the name usage error shown here, but the reason is subtle. Python reads and

compiles this code when it’s typed interactively or imported from a module. While

compiling, Python sees the assignment to X and decides that X will be a local name

everywhere in the function. But when the function is actually run, because the assign-

ment hasn’t yet happened when the print executes, Python says you’re using an un-

defined name. According to its name rules, it should say this; the local X is used before

being assigned. In fact, any assignment in a function body makes a name local. Imports,

=, nested defs, nested classes, and so on are all susceptible to this behavior.

The problem occurs because assigned names are treated as locals everywhere in a func-

tion, not just after the statements where they’re assigned. Really, the previous example

is ambiguous: was the intention to print the global X and create a local X, or is this a

real programming error? Because Python treats X as a local everywhere, it’s seen as an

error; if you mean to print the global X, you need to declare it in a global statement:

>>> def selector():

global X # Force X to be global (everywhere)

print(X)

X = 88

Function Gotchas | 657

www.it-ebooks.info

>>> selector()

Remember, though, that this means the assignment also changes the global X, not a

local X. Within a function, you can’t use both local and global versions of the same

simple name. If you really meant to print the global and then set a local of the same

name, you’d need to import the enclosing module and use module attribute notation

to get to the global version:

>>> X = 99

>>> def selector():

import __main__ # Import enclosing module

print(__main__.X) # Qualify to get to global version of name

X = 88 # Unqualified X classified as local

print(X) # Prints local version of name

>>> selector()

Qualification (the .X part) fetches a value from a namespace object. The interactive

namespace is a module called __main__, so __main__.X reaches the global version of X.

If that isn’t clear, check out Chapter 17.

In recent versions Python has improved on this story somewhat by issuing for this case

the more specific “unbound local” error message shown in the example listing (it used

to simply raise a generic name error); this gotcha is still present in general, though.

Defaults and Mutable Objects

As noted briefly in Chapter 17 and Chapter 18, mutable values for default arguments

can retain state between calls, though this is often unexpected. In general, default ar-

gument values are evaluated and saved once when a def statement is run, not each time

the resulting function is later called. Internally, Python saves one object per default

argument attached to the function itself.

That’s usually what you want—because defaults are evaluated at def time, it lets you

save values from the enclosing scope, if needed (functions defined within loops by

factories may even depend on this behavior—see ahead). But because a default retains

an object between calls, you have to be careful about changing mutable defaults. For

instance, the following function uses an empty list as a default value, and then changes

it in place each time the function is called:

>>> def saver(x=[]): # Saves away a list object

x.append(1) # Changes same object each time!

print(x)

>>> saver([2]) # Default not used

[2, 1]

>>> saver() # Default used

[1]

658 | Chapter 21: The Benchmarking Interlude

www.it-ebooks.info

>>> saver() # Grows on each call!

[1, 1]

>>> saver()

[1, 1, 1]

Some see this behavior as a feature—because mutable default arguments retain their

state between function calls, they can serve some of the same roles as static local func-

tion variables in the C language. In a sense, they work much like global variables, but

their names are local to the functions and so will not clash with names elsewhere in a

program.

To other observers, though, this seems like a gotcha, especially the first time they run

into it. There are better ways to retain state between calls in Python (e.g., using the

nested scope closures we met in this part and the classes we will study in Part VI).

Moreover, mutable defaults are tricky to remember (and to understand at all). They

depend upon the timing of default object construction. In the prior example, there is

just one list object for the default value—the one created when the def is executed. You

don’t get a new list every time the function is called, so the list grows with each new

append; it is not reset to empty on each call.

If that’s not the behavior you want, simply make a copy of the default at the start of

the function body, or move the default value expression into the function body. As long

as the value resides in code that’s actually executed each time the function runs, you’ll

get a new object each time through:

>>> def saver(x=None):

if x is None: # No argument passed?

x = [] # Run code to make a new list each time

x.append(1) # Changes new list object

print(x)

>>> saver([2])

[2, 1]

>>> saver() # Doesn't grow here

[1]

>>> saver()

[1]

By the way, the if statement in this example could almost be replaced by the assignment

x = x or [], which takes advantage of the fact that Python’s or returns one of its

operand objects: if no argument was passed, x would default to None, so the or would

return the new empty list on the right.

However, this isn’t exactly the same. If an empty list were passed in, the or expression

would cause the function to extend and return a newly created list, rather than ex-

tending and returning the passed-in list like the if version. (The expression becomes

[] or [], which evaluates to the new empty list on the right; see the section “Truth

Tests” if you don’t recall why.) Real program requirements may call for either behavior.

Function Gotchas | 659

www.it-ebooks.info

Today, another way to achieve the value retention effect of mutable defaults in a pos-

sibly less confusing way is to use the function attributes we discussed in Chapter 19:

>>> def saver():

saver.x.append(1)

print(saver.x)

>>> saver.x = []

>>> saver()

[1]

>>> saver()

[1, 1]

>>> saver()

[1, 1, 1]

The function name is global to the function itself, but it need not be declared because

it isn’t changed directly within the function. This isn’t used in exactly the same way,

but when coded like this, the attachment of an object to the function is much more

explicit (and arguably less magical).

Functions Without returns

In Python functions, return (and yield) statements are optional. When a function

doesn’t return a value explicitly, the function exits when control falls off the end of the

function body. Technically, all functions return a value; if you don’t provide a return

statement, your function returns the None object automatically:

>>> def proc(x):

print(x) # No return is a None return

>>> x = proc('testing 123...')

testing 123...

>>> print(x)

None

Functions such as this without a return are Python’s equivalent of what are called

“procedures” in some languages. They’re usually invoked as statements, and the None

results are ignored, as they do their business without computing a useful result.

This is worth knowing, because Python won’t tell you if you try to use the result of a

function that doesn’t return one. As we noted in Chapter 11, for instance, assigning

the result of a list append method won’t raise an error, but you’ll get back None, not the

modified list:

>>> list = [1, 2, 3]

>>> list = list.append(4) # append is a "procedure"

>>> print(list) # append changes list in place

None

Chapter 15’s section “Common Coding Gotchas” on page 463 discusses this more

broadly. In general, any functions that do their business as a side effect are usually

designed to be run as statements, not expressions.

660 | Chapter 21: The Benchmarking Interlude

www.it-ebooks.info

Miscellaneous Function Gotchas

Here are two additional function-related gotchas—mostly reviews, but common

enough to reiterate.

Enclosing scopes and loop variables: Factory functions

We described this gotcha in Chapter 17’s discussion of enclosing function scopes, but

as a reminder: when coding factory functions (a.k.a. closures), be careful about relying

on enclosing function scope lookup for variables that are changed by enclosing loops

—when a generated function is later called, all such references will remember the value

of the last loop iteration in the enclosing function’s scope. In this case, you must use

defaults to save loop variable values instead of relying on automatic lookup in enclosing

scopes. See “Loop variables may require defaults, not scopes” on page 506 in Chap-

ter 17 for more details on this topic.

Hiding built-ins by assignment: Shadowing

Also in Chapter 17, we saw how it’s possible to reassign built-in names in a closer local

or global scope; the reassignment effectively hides and replaces that built-in’s name for

the remainder of the scope where the assignment occurs. This means you won’t be able

to use the original built-in value for the name. As long as you don’t need the built-in

value of the name you’re assigning, this isn’t an issue—many names are built in, and

they may be freely reused. However, if you reassign a built-in name your code relies

on, you may have problems. So either don’t do that, or use tools like PyChecker that

can warn you if you do. The good news is that the built-ins you commonly use will

soon become second nature, and Python’s error trapping will alert you early in testing

if your built-in name is not what you think it is.

Chapter Summary

This chapter rounded out our look at functions and built-in iteration tools with a larger

case study that measured the performance of iteration alternatives and Pythons, and

closed with a review of common function-related mistakes to help you avoid pitfalls.

The iteration story has one last sequel in Part VI, where we’ll learn how to code user-

defined iterable objects that generate values with classes and __iter__, in Chap-

ter 30’s operator overloading coverage.

This concludes the functions part of this book. In the next part, we will expand on what

we already know about modules—files of tools that form the topmost organizational

unit in Python, and the structure in which our functions always live. After that, we will

explore classes, tools that are largely packages of functions with special first arguments.

As we’ll see, user-defined classes can implement objects that tap into the iteration pro-

tocol, just like the generators and iterables we met here. In fact, everything we have

Chapter Summary | 661

www.it-ebooks.info

learned in this part of the book will apply when functions pop up later in the context

of class methods.

Before moving on to modules, though, be sure to work through this chapter’s quiz and

the exercises for this part of the book, to practice what we’ve learned about functions

here.

Test Your Knowledge: Quiz

1. What conclusions can you draw from this chapter about the relative speed of

Python iteration tools?

2. What conclusions can you draw from this chapter about the relative speed of the

Pythons timed?

Test Your Knowledge: Answers

1. In general, list comprehensions are usually the quickest of the bunch; map beats list

comprehensions in Python only when all tools must call functions; for loops tend

to be slower than comprehensions; and generator functions and expressions are

slower than comprehensions by a constant factor. Under PyPy, some of these find-

ings differ; map often turns in a different relative performance, for example, and list

comprehensions seem always quickest, perhaps due to function-level optimiza-

tions.

At least that’s the case today on the Python versions tested, on the test machine

used, and for the type of code timed—these results may vary if any of these three

variables differ. Use the homegrown timer or standard library timeit to test your

use cases for more relevant results. Also keep in mind that iteration is just one

component of a program’s time: more code gives a more complete picture.

2. In general, PyPy 1.9 (implementing Python 2.7) is typically faster than CPython

2.7, and CPython 2.7 is often faster than CPython 3.3. In most cases timed, PyPy

is some 10X faster than CPython, and CPython 2.7 is often a small constant faster

than CPython 3.3. In cases that use integer math, CPython 2.7 can be 10X faster

than CPython 3.3, and PyPy can be 100X faster than 3.3. In other cases (e.g., string

operations and file iterators), PyPy can be slower than CPython by 10X, though

timeit and memory management differences may influence some results. The

pystone benchmark confirms these relative rankings, though the sizes of the dif-

ferences it reports differ due to the code timed.

At least that’s the case today on the Python versions tested, on the test machine

used, and for the type of code timed—these results may vary if any of these three

variables differ. Use the homegrown timer or standard library timeit to test your

use cases for more relevant results. This is especially true when timing Python

implementations, which may be arbitrarily optimized in each new release.

662 | Chapter 21: The Benchmarking Interlude

www.it-ebooks.info

Test Your Knowledge: Part IV Exercises

In these exercises, you’re going to start coding more sophisticated programs. Be sure

to check the solutions in Part IV in Appendix D, and be sure to start writing your code

in module files. You won’t want to retype these exercises if you make a mistake.

1. The basics. At the Python interactive prompt, write a function that prints its single

argument to the screen and call it interactively, passing a variety of object types:

string, integer, list, dictionary. Then, try calling it without passing any argument.

What happens? What happens when you pass two arguments?

2. Arguments. Write a function called adder in a Python module file. The function

should accept two arguments and return the sum (or concatenation) of the two.

Then, add code at the bottom of the file to call the adder function with a variety of

object types (two strings, two lists, two floating points), and run this file as a script

from the system command line. Do you have to print the call statement results to

see results on your screen?

3. varargs. Generalize the adder function you wrote in the last exercise to compute

the sum of an arbitrary number of arguments, and change the calls to pass more

or fewer than two arguments. What type is the return value sum? (Hints: a slice

such as S[:0] returns an empty sequence of the same type as S, and the type built-

in function can test types; but see the manually coded min examples in Chap-

ter 18 for a simpler approach.) What happens if you pass in arguments of different

types? What about passing in dictionaries?

4. Keywords. Change the adder function from exercise 2 to accept and sum/concat-

enate three arguments: def adder(good, bad, ugly). Now, provide default values

for each argument, and experiment with calling the function interactively. Try

passing one, two, three, and four arguments. Then, try passing keyword argu-

ments. Does the call adder(ugly=1, good=2) work? Why? Finally, generalize the

new adder to accept and sum/concatenate an arbitrary number of keyword argu-

ments. This is similar to what you did in exercise 3, but you’ll need to iterate over

a dictionary, not a tuple. (Hint: the dict.keys method returns a list you can step

through with a for or while, but be sure to wrap it in a list call to index it in 3.X;

dict.values may help here too.)

5. Dictionary tools. Write a function called copyDict(dict) that copies its dictionary

argument. It should return a new dictionary containing all the items in its argu-

ment. Use the dictionary keys method to iterate (or, in Python 2.2 and later, step

over a dictionary’s keys without calling keys). Copying sequences is easy (X[:]

makes a top-level copy); does this work for dictionaries, too? As explained in this

exercise’s solution, because dictionaries now come with similar tools, this and the

next exercise are just coding exercises but still serve as representative function

examples.

6. Dictionary tools. Write a function called addDict(dict1, dict2) that computes the

union of two dictionaries. It should return a new dictionary containing all the items

Test Your Knowledge: Part IV Exercises | 663

www.it-ebooks.info

in both its arguments (which are assumed to be dictionaries). If the same key ap-

pears in both arguments, feel free to pick a value from either. Test your function

by writing it in a file and running the file as a script. What happens if you pass lists

instead of dictionaries? How could you generalize your function to handle this case,

too? (Hint: see the type built-in function used earlier.) Does the order of the argu-

ments passed in matter?

7. More argument-matching examples. First, define the following six functions (either

interactively or in a module file that can be imported):

def f1(a, b): print(a, b) # Normal args

def f2(a, *b): print(a, b) # Positional varargs

def f3(a, **b): print(a, b) # Keyword varargs

def f4(a, *b, **c): print(a, b, c) # Mixed modes

def f5(a, b=2, c=3): print(a, b, c) # Defaults

def f6(a, b=2, *c): print(a, b, c) # Defaults and positional varargs

Now, test the following calls interactively, and try to explain each result; in some

cases, you’ll probably need to fall back on the matching algorithm shown in Chap-

ter 18. Do you think mixing matching modes is a good idea in general? Can you

think of cases where it would be useful?

>>> f1(1, 2)

>>> f1(b=2, a=1)

>>> f2(1, 2, 3)

>>> f3(1, x=2, y=3)

>>> f4(1, 2, 3, x=2, y=3)

>>> f5(1)

>>> f5(1, 4)

>>> f6(1)

>>> f6(1, 3, 4)

8. Primes revisited. Recall the following code snippet from Chapter 13, which sim-

plistically determines whether a positive integer is prime:

x = y // 2 # For some y > 1

while x > 1:

if y % x == 0: # Remainder

print(y, 'has factor', x)

break # Skip else

x -= 1

else: # Normal exit

print(y, 'is prime')

Package this code as a reusable function in a module file (y should be a passed-in

argument), and add some calls to the function at the bottom of your file. While

you’re at it, experiment with replacing the first line’s // operator with / to see how

664 | Chapter 21: The Benchmarking Interlude

www.it-ebooks.info

true division changes the / operator in Python 3.X and breaks this code (refer back

to Chapter 5 if you need a reminder). What can you do about negatives, and the

values 0 and 1? How about speeding this up? Your outputs should look something

like this:

13 is prime

13.0 is prime

15 has factor 5

15.0 has factor 5.0

9. Iterations and comprehensions. Write code to build a new list containing the square

roots of all the numbers in this list: [2, 4, 9, 16, 25]. Code this as a for loop first,

then as a map call, then as a list comprehension, and finally as a generator expres-

sion. Use the sqrt function in the built-in math module to do the calculation (i.e.,

import math and say math.sqrt(x)). Of the four, which approach do you like best?

10. Timing tools. In Chapter 5, we saw three ways to compute square roots:

math.sqrt(X), X ** .5, and pow(X, .5). If your programs run a lot of these, their

relative performance might become important. To see which is quickest, repurpose

the timerseqs.py script we wrote in this chapter to time each of these three tools.

Use the bestof or bestoftotal functions in one of this chapter’s timer modules to

test (you can use either the original, the 3.X-only keyword-only variant, or the 2.X/

3.X version, and may use Python’s timeit module as well). You might also want

to repackage the testing code in this script for better reusability—by passing a test

functions tuple to a general tester function, for example (for this exercise a copy-

and-modify approach is fine). Which of the three square root tools seems to run

fastest on your machine and Python in general? Finally, how might you go about

interactively timing the speed of dictionary comprehensions versus for loops?

11. Recursive functions. Write a simple recursion function named countdown that prints

numbers as it counts down to zero. For example, a call countdown(5) will print: 5

4 3 2 1 stop. There’s no obvious reason to code this with an explicit stack or

queue, but what about a nonfunction approach? Would a generator make sense

here?

12. Computing factorials. Finally, a computer science classic (but demonstrative none-

theless). We employed the notion of factorials in Chapter 20’s coverage of permu-

tations: N!, computed as N*(N-1)*(N-2)*...1. For instance, 6! is 6*5*4*3*2*1, or

720. Code and time four functions that, for a call fact(N), each return N!. Code

these four functions (1) as a recursive countdown per Chapter 19; (2) using the

functional reduce call per Chapter 19; (3) with a simple iterative counter loop per

Chapter 13; and (4) using the math.factorial library tool per Chapter 20. Use

Chapter 21’s timeit to time each of your functions. What conclusions can you

draw from your results?

Test Your Knowledge: Part IV Exercises | 665

www.it-ebooks.info

PART V

Modules and Packages

www.it-ebooks.info

CHAPTER 22

Modules: The Big Picture

This chapter begins our in-depth look at the Python module—the highest-level program

organization unit, which packages program code and data for reuse, and provides self-

contained namespaces that minimize variable name clashes across your programs. In

concrete terms, modules typically correspond to Python program files. Each file is a

module, and modules import other modules to use the names they define. Modules

might also correspond to extensions coded in external languages such as C, Java, or

C#, and even to directories in package imports. Modules are processed with two state-

ments and one important function:

import

Lets a client (importer) fetch a module as a whole

from

Allows clients to fetch particular names from a module

imp.reload (reload in 2.X)

Provides a way to reload a module’s code without stopping Python

Chapter 3 introduced module fundamentals, and we’ve been using them ever since.

The goal here is to expand on the core module concepts you’re already familiar with,

and move on to explore more advanced module usage. This first chapter reviews mod-

ule basics, and offers a general look at the role of modules in overall program structure.

In the chapters that follow, we’ll dig into the coding details behind the theory.

Along the way, we’ll flesh out module details omitted so far—you’ll learn about reloads,

the __name__ and __all__ attributes, package imports, relative import syntax, 3.3 name-

space packages, and so on. Because modules and classes are really just glorified name-

spaces, we’ll formalize namespace concepts here as well.

Why Use Modules?

In short, modules provide an easy way to organize components into a system by serving

as self-contained packages of variables known as namespaces. All the names defined at

669

www.it-ebooks.info

the top level of a module file become attributes of the imported module object. As we

saw in the last part of this book, imports give access to names in a module’s global

scope. That is, the module file’s global scope morphs into the module object’s attribute

namespace when it is imported. Ultimately, Python’s modules allow us to link indi-

vidual files into a larger program system.

More specifically, modules have at least three roles:

Code reuse

As discussed in Chapter 3, modules let you save code in files permanently. Unlike

code you type at the Python interactive prompt, which goes away when you exit

Python, code in module files is persistent—it can be reloaded and rerun as many

times as needed. Just as importantly, modules are a place to define names, known

as attributes, which may be referenced by multiple external clients. When used

well, this supports a modular program design that groups functionality into reus-

able units.

System namespace partitioning

Modules are also the highest-level program organization unit in Python. Although

they are fundamentally just packages of names, these packages are also self-con-

tained—you can never see a name in another file, unless you explicitly import that

file. Much like the local scopes of functions, this helps avoid name clashes across

your programs. In fact, you can’t avoid this feature—everything “lives” in a mod-

ule, both the code you run and the objects you create are always implicitly enclosed

in modules. Because of that, modules are natural tools for grouping system com-

ponents.

Implementing shared services or data

From an operational perspective, modules are also useful for implementing com-

ponents that are shared across a system and hence require only a single copy. For

instance, if you need to provide a global object that’s used by more than one func-

tion or file, you can code it in a module that can then be imported by many clients.

At least that’s the abstract story—for you to truly understand the role of modules in a

Python system, we need to digress for a moment and explore the general structure of

a Python program.

Python Program Architecture

So far in this book, I’ve sugarcoated some of the complexity in my descriptions of

Python programs. In practice, programs usually involve more than just one file. For all

but the simplest scripts, your programs will take the form of multifile systems—as the

code timing programs of the preceding chapter illustrate. Even if you can get by with

coding a single file yourself, you will almost certainly wind up using external files that

someone else has already written.

670 | Chapter 22: Modules: The Big Picture

www.it-ebooks.info

This section introduces the general architecture of Python programs—the way you di-

vide a program into a collection of source files (a.k.a. modules) and link the parts into

a whole. As we’ll see, Python fosters a modular program structure that groups func-

tionality into coherent and reusable units, in ways that are natural, and almost auto-

matic. Along the way, we’ll also explore the central concepts of Python modules, im-

ports, and object attributes.

How to Structure a Program

At a base level, a Python program consists of text files containing Python statements,

with one main top-level file, and zero or more supplemental files known as modules.

Here’s how this works. The top-level (a.k.a. script) file contains the main flow of control

of your program—this is the file you run to launch your application. The module files

are libraries of tools used to collect components used by the top-level file, and possibly

elsewhere. Top-level files use tools defined in module files, and modules use tools de-

fined in other modules.

Although they are files of code too, module files generally don’t do anything when run

directly; rather, they define tools intended for use in other files. A file imports a module

to gain access to the tools it defines, which are known as its attributes—variable names

attached to objects such as functions. Ultimately, we import modules and access their

attributes to use their tools.

Imports and Attributes

Let’s make this a bit more concrete. Figure 22-1 sketches the structure of a Python

program composed of three files: a.py, b.py, and c.py. The file a.py is chosen to be the

top-level file; it will be a simple text file of statements, which is executed from top to

bottom when launched. The files b.py and c.py are modules; they are simple text files

of statements as well, but they are not usually launched directly. Instead, as explained

previously, modules are normally imported by other files that wish to use the tools the

modules define.

For instance, suppose the file b.py in Figure 22-1 defines a function called spam, for

external use. As we learned when studying functions in Part IV, b.py will contain a

Python def statement to generate the function, which you can later run by passing zero

or more values in parentheses after the function’s name:

def spam(text): # File b.py

print(text, 'spam')

Now, suppose a.py wants to use spam. To this end, it might contain Python statements

such as the following:

import b # File a.py

b.spam('gumby') # Prints "gumby spam"

Python Program Architecture | 671

www.it-ebooks.info

The first of these, a Python import statement, gives the file a.py access to everything

defined by top-level code in the file b.py. The code import b roughly means:

Load the file b.py (unless it’s already loaded), and give me access to all its attributes

through the name b.

To satisfy such goals, import (and, as you’ll see later, from) statements execute and load

other files on request. More formally, in Python, cross-file module linking is not re-

solved until such import statements are executed at runtime; their net effect is to assign

module names—simple variables like b—to loaded module objects. In fact, the module

name used in an import statement serves two purposes: it identifies the external file to

be loaded, but it also becomes a variable assigned to the loaded module.

Similarly, objects defined by a module are also created at runtime, as the import is

executing: import literally runs statements in the target file one at a time to create its

contents. Along the way, every name assigned at the top-level of the file becomes an

attribute of the module, accessible to importers. For example, the second of the state-

ments in a.py calls the function spam defined in the module b—created by running its

def statement during the import—using object attribute notation. The code b.spam

means:

Fetch the value of the name spam that lives within the object b.

This happens to be a callable function in our example, so we pass a string in parentheses

('gumby'). If you actually type these files, save them, and run a.py, the words “gumby

spam” will be printed.

As we’ve seen, the object.attribute notation appears throughout Python code—most

objects have useful attributes that are fetched with the “.” operator. Some reference

callable objects like functions that take action (e.g., a salary computer), and others are

Figure 22-1. Program architecture in Python. A program is a system of modules. It has one top-level

script file (launched to run the program), and multiple module files (imported libraries of tools). Scripts

and modules are both text files containing Python statements, though the statements in modules

usually just create objects to be used later. Python’s standard library provides a collection of precoded

modules.

672 | Chapter 22: Modules: The Big Picture

www.it-ebooks.info

simple data values that denote more static objects and properties (e.g., a person’s

name).

The notion of importing is also completely general throughout Python. Any file can

import tools from any other file. For instance, the file a.py may import b.py to call its

function, but b.py might also import c.py to leverage different tools defined there. Im-

port chains can go as deep as you like: in this example, the module a can import b,

which can import c, which can import b again, and so on.

Besides serving as the highest organizational structure, modules (and module packages,

described in Chapter 24) are also the highest level of code reuse in Python. Coding

components in module files makes them useful in your original program, and in any

other programs you may write later. For instance, if after coding the program in Fig-

ure 22-1 we discover that the function b.spam is a general-purpose tool, we can reuse

it in a completely different program; all we have to do is import the file b.py again from

the other program’s files.

Standard Library Modules

Notice the rightmost portion of Figure 22-1. Some of the modules that your programs

will import are provided by Python itself and are not files you will code.

Python automatically comes with a large collection of utility modules known as the

standard library. This collection, over 200 modules large at last count, contains plat-

form-independent support for common programming tasks: operating system inter-

faces, object persistence, text pattern matching, network and Internet scripting, GUI

construction, and much more. None of these tools are part of the Python language

itself, but you can use them by importing the appropriate modules on any standard

Python installation. Because they are standard library modules, you can also be rea-

sonably sure that they will be available and will work portably on most platforms on

which you will run Python.

This book’s examples employ a few of the standard library’s modules—timeit, sys,

and os in last chapter’s code, for instance—but we’ll really only scratch the surface of

the libraries story here. For a complete look, you should browse the standard Python

library reference manual, available either online at http://www.python.org, or with your

Python installation (via IDLE or Python’s Start button menu on some Windows). The

PyDoc tool discussed in Chapter 15 is another way to explore standard library modules.

Because there are so many modules, this is really the only way to get a feel for what

tools are available. You can also find tutorials on Python library tools in commercial

books that cover application-level programming, such as O’Reilly’s Programming

Python, but the manuals are free, viewable in any web browser (in HTML format),

viewable in other formats (e.g., Windows help), and updated each time Python is re-

released. See Chapter 15 for more pointers.

Python Program Architecture | 673

www.it-ebooks.info

How Imports Work

The prior section talked about importing modules without really explaining what hap-

pens when you do so. Because imports are at the heart of program structure in Python,

this section goes into more formal detail on the import operation to make this process

less abstract.

Some C programmers like to compare the Python module import operation to a C

#include, but they really shouldn’t—in Python, imports are not just textual insertions

of one file into another. They are really runtime operations that perform three distinct

steps the first time a program imports a given file:

1. Find the module’s file.

2. Compile it to byte code (if needed).

3. Run the module’s code to build the objects it defines.

To better understand module imports, we’ll explore these steps in turn. Bear in mind

that all three of these steps are carried out only the first time a module is imported

during a program’s execution; later imports of the same module in a program run

bypass all of these steps and simply fetch the already loaded module object in memory.

Technically, Python does this by storing loaded modules in a table named sys.mod

ules and checking there at the start of an import operation. If the module is not present,

a three-step process begins.

1. Find It

First, Python must locate the module file referenced by an import statement. Notice

that the import statement in the prior section’s example names the file without a .py

extension and without its directory path: it just says import b, instead of something

like import c:\dir1\b.py. Path and extension details are omitted on purpose; instead,

Python uses a standard module search path and known file types to locate the module

file corresponding to an import statement.1 Because this is the main part of the import

operation that programmers must know about, we’ll return to this topic in a moment.

1. It’s syntactically illegal to include path and extension details in a standard import. However, package

imports, which we’ll discuss in Chapter 24, allow import statements to include part of the directory path

leading to a file as a set of period-separated names. Package imports, though, still rely on the normal

module search path to locate the leftmost directory in a package path (i.e., they are relative to a directory

in the search path). They also cannot make use of any platform-specific directory syntax in the import

statements; such syntax only works on the search path. Also, note that module file search path issues are

not as relevant when you run frozen executables (discussed in Chapter 2), which typically embed byte

code in the binary image.

674 | Chapter 22: Modules: The Big Picture

www.it-ebooks.info

2. Compile It (Maybe)

After finding a source code file that matches an import statement by traversing the

module search path, Python next compiles it to byte code, if necessary. We discussed

byte code briefly in Chapter 2, but it’s a bit richer than explained there. During an

import operation Python checks both file modification times and the byte code’s Python

version number to decide how to proceed. The former uses file “timestamps,” and the

latter uses either a “magic” number embedded in the byte code or a filename, depending

on the Python release being used. This step chooses an action as follows:

Compile

If the byte code file is older than the source file (i.e., if you’ve changed the source)

or was created by a different Python version, Python automatically regenerates the

byte code when the program is run.

As discussed ahead, this model is modified somewhat in Python 3.2 and later—

byte code files are segregated in a __pycache__ subdirectory and named with their

Python version to avoid contention and recompiles when multiple Pythons are

installed. This obviates the need to check version numbers in the byte code, but

the timestamp check is still used to detect changes in the source.

Don’t compile

If, on the other hand, Python finds a .pyc byte code file that is not older than the

corresponding .py source file and was created by the same Python version, it skips

the source-to-byte-code compile step.

In addition, if Python finds only a byte code file on the search path and no source,

it simply loads the byte code directly; this means you can ship a program as just

byte code files and avoid sending source. In other words, the compile step is by-

passed if possible to speed program startup.

Notice that compilation happens when a file is being imported. Because of this, you

will not usually see a .pyc byte code file for the top-level file of your program, unless it

is also imported elsewhere—only imported files leave behind .pyc files on your ma-

chine. The byte code of top-level files is used internally and discarded; byte code of

imported files is saved in files to speed future imports.

Top-level files are often designed to be executed directly and not imported at all. Later,

we’ll see that it is possible to design a file that serves both as the top-level code of a

program and as a module of tools to be imported. Such a file may be both executed

and imported, and thus does generate a .pyc. To learn how this works, watch for the

discussion of the special __name__ attribute and __main__ in Chapter 25.

3. Run It

The final step of an import operation executes the byte code of the module. All state-

ments in the file are run in turn, from top to bottom, and any assignments made to

names during this step generate attributes of the resulting module object. This is how

How Imports Work | 675

www.it-ebooks.info

the tools defined by the module’s code are created. For instance, def statements in a

file are run at import time to create functions and assign attributes within the module

to those functions. The functions can then be called later in the program by the file’s

importers.

Because this last import step actually runs the file’s code, if any top-level code in a

module file does real work, you’ll see its results at import time. For example, top-level

print statements in a module show output when the file is imported. Function def

statements simply define objects for later use.

As you can see, import operations involve quite a bit of work—they search for files,

possibly run a compiler, and run Python code. Because of this, any given module is

imported only once per process by default. Future imports skip all three import steps

and reuse the already loaded module in memory. If you need to import a file again after

it has already been loaded (for example, to support dynamic end-user customizations),

you have to force the issue with an imp.reload call—a tool we’ll meet in the next

chapter.2

Byte Code Files: __pycache__ in Python 3.2+

As mentioned briefly, the way that Python stores files to retain the byte code that results

from compiling your source has changed in Python 3.2 and later. First of all, if Python

cannot write a file to save this on your computer for any reason, your program still runs

fine—Python simply creates and uses the byte code in memory and discards it on exit.

To speed startups, though, it will try to save byte code in a file in order to skip the

compile step next time around. The way it does this varies per Python version:

In Python 3.1 and earlier (including all of Python 2.X)

Byte code is stored in files in the same directory as the corresponding source files,

normally with the filename extension .pyc (e.g., module.pyc). Byte code files are

also stamped internally with the version of Python that created them (known as a

“magic” field to developers) so Python knows to recompile when this differs in the

version of Python running your program. For instance, if you upgrade to a new

Python whose byte code differs, all your byte code files will be recompiled auto-

matically due to a version number mismatch, even if you haven’t changed your

source code.

In Python 3.2 and later

Byte code is instead stored in files in a subdirectory named __pycache__, which

Python creates if needed, and which is located in the directory containing the cor-

responding source files. This helps avoid clutter in your source directories by seg-

regating the byte code files in their own directory. In addition, although byte code

2. As described earlier, Python keeps already imported modules in the built-in sys.modules dictionary so it

can keep track of what’s been loaded. In fact, if you want to see which modules are loaded, you can import

sys and print list(sys.modules.keys()). There’s more on other uses for this internal table in Chapter 25.

676 | Chapter 22: Modules: The Big Picture

www.it-ebooks.info

files still get the .pyc extension as before, they are given more descriptive names

that include text identifying the version of Python that created them (e.g., mod-

ule.cpython-32.pyc). This avoids contention and recompiles: because each version

of Python installed can have its own uniquely named version of byte code files in

the __pycache__ subdirectory, running under a given version doesn’t overwrite the

byte code of another, and doesn’t require recompiles. Technically, byte code file-

names also include the name of the Python that created them, so CPython, Jython,

and other implementations mentioned in the preface and Chapter 2 can coexist on

the same machine without stepping on each other’s work (once they support this

model).

In both models, Python always recreates the byte code file if you’ve changed the source

code file since the last compile, but version differences are handled differently—by

magic numbers and replacement prior to 3.2, and by filenames that allow for multiple

copies in 3.2 and later.

Byte Code File Models in Action

The following is a quick example of these two models in action under 2.X and 3.3. I’ve

omitted much of the text displayed by the dir directory listing on Windows here to

save space, and the script used here isn’t listed because it is not relevant to this discus-

sion (it’s from Chapter 2, and simply prints two values). Prior to 3.2, byte code files

show up alongside their source files after being created by import operations:

c:\code\py2x> dir

10/31/2012 10:58 AM 39 script0.py

c:\code\py2x> C:\python27\python

>>> import script0

hello world

1267650600228229401496703205376

>>> ^Z

c:\code\py2x> dir

10/31/2012 10:58 AM 39 script0.py

10/31/2012 11:00 AM 154 script0.pyc

However, in 3.2 and later byte code files are saved in the __pycache__ subdirectory and

include versions and Python implementation details in their names to avoid clutter and

contention among the Pythons on your computer:

c:\code\py2x> cd ..\py3x

c:\code\py3x> dir

10/31/2012 10:58 AM 39 script0.py

c:\code\py3x> C:\python33\python

>>> import script0

hello world

1267650600228229401496703205376

>>> ^Z

Byte Code Files: __pycache__ in Python 3.2+ | 677

www.it-ebooks.info

c:\code\py3x> dir

10/31/2012 10:58 AM 39 script0.py

10/31/2012 11:00 AM <DIR> __pycache__

c:\code\py3x> dir __pycache__

10/31/2012 11:00 AM 184 script0.cpython-33.pyc

Crucially, under the model used in 3.2 and later, importing the same file with a different

Python creates a different byte code file, instead of overwriting the single file as done

by the pre-3.2 model—in the newer model, each Python version and implementation

has its own byte code files, ready to be loaded on the next program run (earlier Pythons

will happily continue using their scheme on the same machine):

c:\code\py3x> C:\python32\python

>>> import script0

hello world

1267650600228229401496703205376

>>> ^Z

c:\code\py3x> dir __pycache__

10/31/2012 12:28 PM 178 script0.cpython-32.pyc

10/31/2012 11:00 AM 184 script0.cpython-33.pyc

Python 3.2’s newer byte code file model is probably superior, as it avoids recompiles

when there is more than one Python on your machine—a common case in today’s

mixed 2.X/3.X world. On the other hand, it is not without potential incompatibilities

in programs that rely on the prior file and directory structure. This may be a compati-

bility issue in some tools programs, for instance, though most well-behaved tools

should work as before. See Python 3.2’s “What’s New?” document for details on po-

tential impacts.

Also keep in mind that this process is completely automatic—it’s a side effect of running

programs—and most programmers probably won’t care about or even notice the dif-

ference, apart from faster startups due to fewer recompiles.

The Module Search Path

As mentioned earlier, the part of the import procedure that most programmers will

need to care about is usually the first—locating the file to be imported (the “find it”

part). Because you may need to tell Python where to look to find files to import, you

need to know how to tap into its search path in order to extend it.

In many cases, you can rely on the automatic nature of the module import search path

and won’t need to configure this path at all. If you want to be able to import user-

defined files across directory boundaries, though, you will need to know how the search

path works in order to customize it. Roughly, Python’s module search path is composed

of the concatenation of these major components, some of which are preset for you and

some of which you can tailor to tell Python where to look:

678 | Chapter 22: Modules: The Big Picture

www.it-ebooks.info

1. The home directory of the program

2. PYTHONPATH directories (if set)

3. Standard library directories

4. The contents of any .pth files (if present)

5. The site-packages home of third-party extensions

Ultimately, the concatenation of these four components becomes sys.path, a mutable

list of directory name strings that I’ll expand upon later in this section. The first and

third elements of the search path are defined automatically. Because Python searches

the concatenation of these components from first to last, though, the second and

fourth elements can be used to extend the path to include your own source code di-

rectories. Here is how Python uses each of these path components:

Home directory (automatic)

Python first looks for the imported file in the home directory. The meaning of this

entry depends on how you are running the code. When you’re running a pro-

gram, this entry is the directory containing your program’s top-level script file.

When you’re working interactively, this entry is the directory in which you are

working (i.e., the current working directory).

Because this directory is always searched first, if a program is located entirely in a

single directory, all of its imports will work automatically with no path configura-

tion required. On the other hand, because this directory is searched first, its files

will also override modules of the same name in directories elsewhere on the path;

be careful not to accidentally hide library modules this way if you need them in

your program, or use package tools we’ll meet later that can partially sidestep this

issue.

PYTHONPATH directories (configurable)

Next, Python searches all directories listed in your PYTHONPATH environment vari-

able setting, from left to right (assuming you have set this at all: it’s not preset for

you). In brief, PYTHONPATH is simply a list of user-defined and platform-specific

names of directories that contain Python code files. You can add all the directories

from which you wish to be able to import, and Python will extend the module

search path to include all the directories your PYTHONPATH lists.

Because Python searches the home directory first, this setting is only important

when importing files across directory boundaries—that is, if you need to import a

file that is stored in a different directory from the file that imports it. You’ll probably

want to set your PYTHONPATH variable once you start writing substantial programs,

but when you’re first starting out, as long as you save all your module files in the

directory in which you’re working (i.e., the home directory, like the C:\code used

in this book) your imports will work without you needing to worry about this

setting at all.

The Module Search Path | 679

www.it-ebooks.info

Standard library directories (automatic)

Next, Python automatically searches the directories where the standard library

modules are installed on your machine. Because these are always searched, they

normally do not need to be added to your PYTHONPATH or included in path files

(discussed next).

.pth path file directories (configurable)

Next, a lesser-used feature of Python allows users to add directories to the module

search path by simply listing them, one per line, in a text file whose name ends

with a .pth suffix (for “path”). These path configuration files are a somewhat ad-

vanced installation-related feature; we won’t cover them fully here, but they pro-

vide an alternative to PYTHONPATH settings.

In short, text files of directory names dropped in an appropriate directory can serve

roughly the same role as the PYTHONPATH environment variable setting. For instance,

if you’re running Windows and Python 3.3, a file named myconfig.pth may be

placed at the top level of the Python install directory (C:\Python33) or in the site-

packages subdirectory of the standard library there (C:\Python33\Lib\site-pack-

ages) to extend the module search path. On Unix-like systems, this file might be

located in usr/local/lib/python3.3/site-packages or /usr/local/lib/site-python instead.

When such a file is present, Python will add the directories listed on each line of

the file, from first to last, near the end of the module search path list—currently,

after PYTHONPATH and standard libraries, but before the site-packages directory

where third-party extensions are often installed. In fact, Python will collect the

directory names in all the .pth path files it finds and will filter out any duplicates

and nonexistent directories. Because they are files rather than shell settings, path

files can apply to all users of an installation, instead of just one user or shell. More-

over, for some users and applications, text files may be simpler to code than envi-

ronment settings.

This feature is more sophisticated than I’ve described here. For more details, con-

sult the Python library manual, and especially its documentation for the standard

library module site—this module allows the locations of Python libraries and path

files to be configured, and its documentation describes the expected locations of

path files in general. I recommend that beginners use PYTHONPATH or perhaps a sin-

gle .pth file, and then only if you must import across directories. Path files are used

more often by third-party libraries, which commonly install a path file in Python’s

site-packages, described next.

The Lib\site-packages directory of third-party extensions (automatic)

Finally, Python automatically adds the site-packages subdirectory of its standard

library to the module search path. By convention, this is the place that most third-

party extensions are installed, often automatically by the distutils utility de-

scribed in an upcoming sidebar. Because their install directory is always part of the

module search path, clients can import the modules of such extensions without

any path settings.

680 | Chapter 22: Modules: The Big Picture

www.it-ebooks.info

Configuring the Search Path

The net effect of all of this is that both the PYTHONPATH and path file components of the

search path allow you to tailor the places where imports look for files. The way you set

environment variables and where you store path files varies per platform. For instance,

on Windows, you might use your Control Panel’s System icon to set PYTHONPATH to a

list of directories separated by semicolons, like this:

c:\pycode\utilities;d:\pycode\package1

Or you might instead create a text file called C:\Python33\pydirs.pth, which looks like

this:

c:\pycode\utilities

d:\pycode\package1

These settings are analogous on other platforms, but the details can vary too widely for

us to cover in this chapter. See Appendix A for pointers on extending your module

search path with PYTHONPATH or .pth files on various platforms.

Search Path Variations

This description of the module search path is accurate, but generic; the exact config-

uration of the search path is prone to changing across platforms, Python releases, and

even Python implementations. Depending on your platform, additional directories may

automatically be added to the module search path as well.

For instance, some Pythons may add an entry for the current working directory—the

directory from which you launched your program—in the search path before the

PYTHONPATH directories. When you’re launching from a command line, the current

working directory may not be the same as the home directory of your top-level file (i.e.,

the directory where your program file resides), which is always added. Because the

current working directory can vary each time your program runs, you normally

shouldn’t depend on its value for import purposes. See Chapter 3 for more on launching

programs from command lines.3

To see how your Python configures the module search path on your platform, you can

always inspect sys.path—the topic of the next section.

The sys.path List

If you want to see how the module search path is truly configured on your machine,

you can always inspect the path as Python knows it by printing the built-in sys.path

3. Also watch for Chapter 24’s discussion of the new relative import syntax and search rules in Python 3.X;

they modify the search path for from statements in files inside packages when “.” characters are used (e.g.,

from . import string). By default, a package’s own directory is not automatically searched by imports

in Python 3.X, unless such relative imports are used by files in the package itself.

The Module Search Path | 681

www.it-ebooks.info

list (that is, the path attribute of the standard library module sys). This list of directory

name strings is the actual search path within Python; on imports, Python searches each

directory in this list from left to right, and uses the first file match it finds.

Really, sys.path is the module search path. Python configures it at program startup,

automatically merging the home directory of the top-level file (or an empty string to

designate the current working directory), any PYTHONPATH directories, the contents of

any .pth file paths you’ve created, and all the standard library directories. The result is

a list of directory name strings that Python searches on each import of a new file.

Python exposes this list for two good reasons. First, it provides a way to verify the search

path settings you’ve made—if you don’t see your settings somewhere in this list, you

need to recheck your work. For example, here is what my module search path looks

like on Windows under Python 3.3, with my PYTHONPATH set to C:\code and a C:

\Python33\mypath.pth path file that lists C:\Users\mark. The empty string at the front

means current directory, and my two settings are merged in; the rest are standard library

directories and files and the site-packages home for third-party extensions:

>>> import sys

>>> sys.path

['', 'C:\\code', 'C:\\Windows\\system32\\python33.zip', 'C:\\Python33\\DLLs',

'C:\\Python33\\lib', 'C:\\Python33', 'C:\\Users\\mark',

'C:\\Python33\\lib\\site-packages']

Second, if you know what you’re doing, this list provides a way for scripts to tailor their

search paths manually. As you’ll see by example later in this part of the book, by

modifying the sys.path list, you can modify the search path for all future imports made

in a program’s run. Such changes last only for the duration of the script, however;

PYTHONPATH and .pth files offer more permanent ways to modify the path—the first per

user, and the second per installation.

On the other hand, some programs really do need to change sys.path. Scripts that run

on web servers, for example, often run as the user “nobody” to limit machine access.

Because such scripts cannot usually depend on “nobody” to have set PYTHONPATH in any

particular way, they often set sys.path manually to include required source directories,

prior to running any import statements. A sys.path.append or sys.path.insert will

often suffice, though will endure for a single program run only.

Module File Selection

Keep in mind that filename extensions (e.g., .py) are omitted from import statements

intentionally. Python chooses the first file it can find on the search path that matches

the imported name. In fact, imports are the point of interface to a host of external

components—source code, multiple flavors of byte code, compiled extensions, and

more. Python automatically selects any type that matches a module’s name.

682 | Chapter 22: Modules: The Big Picture

www.it-ebooks.info

Module sources

For example, an import statement of the form import b might today load or resolve to:

• A source code file named b.py

• A byte code file named b.pyc

• An optimized byte code file named b.pyo (a less common format)

• A directory named b, for package imports (described in Chapter 24)

• A compiled extension module, coded in C, C++, or another language, and dy-

namically linked when imported (e.g., b.so on Linux, or b.dll or b.pyd on Cygwin

and Windows)

• A compiled built-in module coded in C and statically linked into Python

• A ZIP file component that is automatically extracted when imported

• An in-memory image, for frozen executables

• A Java class, in the Jython version of Python

• A .NET component, in the IronPython version of Python

C extensions, Jython, and package imports all extend imports beyond simple files. To

importers, though, differences in the loaded file type are completely irrelevant, both

when importing and when fetching module attributes. Saying import b gets whatever

module b is, according to your module search path, and b.attr fetches an item in the

module, be it a Python variable or a linked-in C function. Some standard modules we

will use in this book are actually coded in C, not Python; because they look just like

Python-coded module files, their clients don’t have to care.

Selection priorities

If you have both a b.py and a b.so in different directories, Python will always load the

one found in the first (leftmost) directory of your module search path during the left-

to-right search of sys.path. But what happens if it finds both a b.py and a b.so in the

same directory? In this case, Python follows a standard picking order, though this order

is not guaranteed to stay the same over time or across implementations. In general, you

should not depend on which type of file Python will choose within a given directory—

make your module names distinct, or configure your module search path to make your

module selection preferences explicit.

Import hooks and ZIP files

Normally, imports work as described in this section—they find and load files on your

machine. However, it is possible to redefine much of what an import operation does

in Python, using what are known as import hooks. These hooks can be used to make

imports do various useful things, such as loading files from archives, performing de-

cryption, and so on.

The Module Search Path | 683

www.it-ebooks.info

In fact, Python itself makes use of these hooks to enable files to be directly imported

from ZIP archives: archived files are automatically extracted at import time when

a .zip file is selected from the module import search path. One of the standard library

directories in the earlier sys.path display, for example, is a .zip file today. For more

details, see the Python standard library manual’s description of the built-in

__import__ function, the customizable tool that import statements actually run.

Also see Python 3.3’s “What’s New?” document for updates on this

front that we’ll mostly omit here for space. In short, in this version and

later, the __import__ function is now implemented by impor

tlib.__import__, in part to unify and more clearly expose its imple-

mentation.

The latter of these calls is also wrapped by importlib.import_module—

a tool that, per Python’s current manuals, is generally preferred over

__import__ for direct calls to import by name string, a technique dis-

cussed in Chapter 25. Both calls still work today, though the

__import__ function supports customizing imports by replacement in

the built-in scope (see Chapter 17), and other techniques support similar

roles. See the Python library manuals for more details.

Optimized byte code files

Finally, Python also supports the notion of .pyo optimized byte code files, created and

run with the -O Python command-line flag, and automatically generated by some install

tools. Because these run only slightly faster than normal .pyc files (typically 5 percent

faster), however, they are infrequently used. The PyPy system (see Chapter 2 and

Chapter 21), for example, provides more substantial speedups. See Appendix A and

Chapter 36 for more on .pyo files.

Third-Party Software: distutils

This chapter’s description of module search path settings is targeted mainly at user-

defined source code that you write on your own. Third-party extensions for Python

typically use the distutils tools in the standard library to automatically install them-

selves, so no path configuration is required to use their code.

Systems that use distutils generally come with a setup.py script, which is run to install

them; this script imports and uses distutils modules to place such systems in a direc-

tory that is automatically part of the module search path (usually in the Lib\site-pack-

ages subdirectory of the Python install tree, wherever that resides on the target ma-

chine).

For more details on distributing and installing with distutils, see the Python standard

manual set; its use is beyond the scope of this book (for instance, it also provides ways

to automatically compile C-coded extensions on the target machine). Also check out

the third-party open source eggs system, which adds dependency checking for installed

Python software.

684 | Chapter 22: Modules: The Big Picture

www.it-ebooks.info

Note: as this fifth edition is being written, there is some talk of deprecating distutils

and replacing it with a newer distutils2 package in the Python standard library. The

status of this is unclear—it was anticipated in 3.3 but did not appear—so be sure to

see Python’s “What’s New” documents for updates on this front that may emerge after

this book is released.

Chapter Summary

In this chapter, we covered the basics of modules, attributes, and imports and explored

the operation of import statements. We learned that imports find the designated file on

the module search path, compile it to byte code, and execute all of its statements to

generate its contents. We also learned how to configure the search path to be able to

import from directories other than the home directory and the standard library direc-

tories, primarily with PYTHONPATH settings.

As this chapter demonstrated, the import operation and modules are at the heart of

program architecture in Python. Larger programs are divided into multiple files, which

are linked together at runtime by imports. Imports in turn use the module search path

to locate files, and modules define attributes for external use.

Of course, the whole point of imports and modules is to provide a structure to your

program, which divides its logic into self-contained software components. Code in one

module is isolated from code in another; in fact, no file can ever see the names defined

in another, unless explicit import statements are run. Because of this, modules minimize

name collisions between different parts of your program.

You’ll see what this all means in terms of actual statements and code in the next chapter.

Before we move on, though, let’s run through the chapter quiz.

Test Your Knowledge: Quiz

1. How does a module source code file become a module object?

2. Why might you have to set your PYTHONPATH environment variable?

3. Name the five major components of the module import search path.

4. Name four file types that Python might load in response to an import operation.

5. What is a namespace, and what does a module’s namespace contain?

Test Your Knowledge: Answers

1. A module’s source code file automatically becomes a module object when that

module is imported. Technically, the module’s source code is run during the im-

Test Your Knowledge: Answers | 685

www.it-ebooks.info

port, one statement at a time, and all the names assigned in the process become

attributes of the module object.

2. You only need to set PYTHONPATH to import from directories other than the one in

which you are working (i.e., the current directory when working interactively, or

the directory containing your top-level file). In practice, this will be a common case

for nontrivial programs.

3. The five major components of the module import search path are the top-level

script’s home directory (the directory containing it), all directories listed in the

PYTHONPATH environment variable, the standard library directories, all directories

listed in .pth path files located in standard places, and the site-packages root di-

rectory for third-party extension installs. Of these, programmers can customize

PYTHONPATH and .pth files.

4. Python might load a source code (.py) file, a byte code (.pyc or .pyo) file, a C ex-

tension module (e.g., a .so file on Linux or a .dll or .pyd file on Windows), or a

directory of the same name for package imports. Imports may also load more exotic

things such as ZIP file components, Java classes under the Jython version of

Python, .NET components under IronPython, and statically linked C extensions

that have no files present at all. In fact, with import hooks, imports can load arbi-

trary items.

5. A namespace is a self-contained package of variables, which are known as the

attributes of the namespace object. A module’s namespace contains all the names

assigned by code at the top level of the module file (i.e., not nested in def or

class statements). Technically, a module’s global scope morphs into the module

object’s attributes namespace. A module’s namespace may also be altered by as-

signments from other files that import it, though this is generally frowned upon

(see Chapter 17 for more on the downsides of cross-file changes).

686 | Chapter 22: Modules: The Big Picture

www.it-ebooks.info

CHAPTER 23

Module Coding Basics

Now that we’ve looked at the larger ideas behind modules, let’s turn to some examples

of modules in action. Although some of the early topics in this chapter will be review

for linear readers who have already applied them in previous chapters’ examples, we’ll

find that they quickly lead us to further details surrounding Python’s modules that we

haven’t yet met, such as nesting, reloads, scopes, and more.

Python modules are easy to create; they’re just files of Python program code created

with a text editor. You don’t need to write special syntax to tell Python you’re making

a module; almost any text file will do. Because Python handles all the details of finding

and loading modules, modules are also easy to use; clients simply import a module, or

specific names a module defines, and use the objects they reference.

Module Creation

To define a module, simply use your text editor to type some Python code into a text

file, and save it with a “.py” extension; any such file is automatically considered a

Python module. All the names assigned at the top level of the module become its

attributes (names associated with the module object) and are exported for clients to use

—they morph from variable to module object attribute automatically.

For instance, if you type the following def into a file called module1.py and import it,

you create a module object with one attribute—the name printer, which happens to

be a reference to a function object:

def printer(x): # Module attribute

print(x)

Module Filenames

Before we go on, I should say a few more words about module filenames. You can call

modules just about anything you like, but module filenames should end in a .py suffix

if you plan to import them. The .py is technically optional for top-level files that will

687

www.it-ebooks.info

be run but not imported, but adding it in all cases makes your files’ types more obvious

and allows you to import any of your files in the future.

Because module names become variable names inside a Python program (without

the .py), they should also follow the normal variable name rules outlined in Chap-

ter 11. For instance, you can create a module file named if.py, but you cannot import

it because if is a reserved word—when you try to run import if, you’ll get a syntax

error. In fact, both the names of module files and the names of directories used in

package imports (discussed in the next chapter) must conform to the rules for variable

names presented in Chapter 11; they may, for instance, contain only letters, digits, and

underscores. Package directories also cannot contain platform-specific syntax such as

spaces in their names.

When a module is imported, Python maps the internal module name to an external

filename by adding a directory path from the module search path to the front, and

a .py or other extension at the end. For instance, a module named M ultimately maps

to some external file <directory>\M.<extension> that contains the module’s code.

Other Kinds of Modules

As mentioned in the preceding chapter, it is also possible to create a Python module by

writing code in an external language such as C, C++, and others (e.g., Java, in the

Jython implementation of the language). Such modules are called extension modules,

and they are generally used to wrap up external libraries for use in Python scripts. When

imported by Python code, extension modules look and feel the same as modules coded

as Python source code files—they are accessed with import statements, and they provide

functions and objects as module attributes. Extension modules are beyond the scope

of this book; see Python’s standard manuals or advanced texts such as Programming

Python for more details.

Module Usage

Clients can use the simple module file we just wrote by running an import or from

statement. Both statements find, compile, and run a module file’s code, if it hasn’t yet

been loaded. The chief difference is that import fetches the module as a whole, so you

must qualify to fetch its names; in contrast, from fetches (or copies) specific names out

of the module.

Let’s see what this means in terms of code. All of the following examples wind up calling

the printer function defined in the prior section’s module1.py module file, but in dif-

ferent ways.

688 | Chapter 23: Module Coding Basics

www.it-ebooks.info

The import Statement

In the first example, the name module1 serves two different purposes—it identifies an

external file to be loaded, and it becomes a variable in the script, which references the

module object after the file is loaded:

>>> import module1 # Get module as a whole (one or more)

>>> module1.printer('Hello world!') # Qualify to get names

Hello world!

The import statement simply lists one or more names of modules to load, separated by

commas. Because it gives a name that refers to the whole module object, we must go

through the module name to fetch its attributes (e.g., module1.printer).

The from Statement

By contrast, because from copies specific names from one file over to another scope, it

allows us to use the copied names directly in the script without going through the

module (e.g., printer):

>>> from module1 import printer # Copy out a variable (one or more)

>>> printer('Hello world!') # No need to qualify name

Hello world!

This form of from allows us to list one or more names to be copied out, separated by

commas. Here, it has the same effect as the prior example, but because the imported

name is copied into the scope where the from statement appears, using that name in

the script requires less typing—we can use it directly instead of naming the enclosing

module. In fact, we must; from doesn’t assign the name of the module itself.

As you’ll see in more detail later, the from statement is really just a minor extension to

the import statement—it imports the module file as usual (running the full three-step

procedure of the preceding chapter), but adds an extra step that copies one or more

names (not objects) out of the file. The entire file is loaded, but you’re given names for

more direct access to its parts.

The from * Statement

Finally, the next example uses a special form of from: when we use a * instead of specific

names, we get copies of all names assigned at the top level of the referenced module.

Here again, we can then use the copied name printer in our script without going

through the module name:

>>> from module1 import * # Copy out _all_ variables

>>> printer('Hello world!')

Hello world!

Technically, both import and from statements invoke the same import operation; the

from * form simply adds an extra step that copies all the names in the module into the

importing scope. It essentially collapses one module’s namespace into another; again,

Module Usage | 689

www.it-ebooks.info

the net effect is less typing for us. Note that only * works in this context; you can’t use

pattern matching to select a subset of names (though you could with more work and

a loop through a module’s __dict__, discussed ahead).

And that’s it—modules really are simple to use. To give you a better understanding of

what really happens when you define and use modules, though, let’s move on to look

at some of their properties in more detail.

In Python 3.X, the from ...* statement form described here can be used

only at the top level of a module file, not within a function. Python 2.X

allows it to be used within a function, but issues a warning anyhow. It’s

rare to see this statement used inside a function in practice; when

present, it makes it impossible for Python to detect variables statically,

before the function runs. Best practice in all Pythons recommends listing

all your imports at the top of a module file; it’s not required, but makes

them easier to spot.

Imports Happen Only Once

One of the most common questions people seem to ask when they start using modules

is, “Why won’t my imports keep working?” They often report that the first import

works fine, but later imports during an interactive session (or program run) seem to

have no effect. In fact, they’re not supposed to. This section explains why.

Modules are loaded and run on the first import or from, and only the first. This is on

purpose—because importing is an expensive operation, by default Python does it just

once per file, per process. Later import operations simply fetch the already loaded

module object.

Initialization code

As one consequence, because top-level code in a module file is usually executed only

once, you can use it to initialize variables. Consider the file simple.py, for example:

print('hello')

spam = 1 # Initialize variable

In this example, the print and = statements run the first time the module is imported,

and the variable spam is initialized at import time:

% python

>>> import simple # First import: loads and runs file's code

hello

>>> simple.spam # Assignment makes an attribute

Second and later imports don’t rerun the module’s code; they just fetch the already

created module object from Python’s internal modules table. Thus, the variable spam

is not reinitialized:

690 | Chapter 23: Module Coding Basics

www.it-ebooks.info

>>> simple.spam = 2 # Change attribute in module

>>> import simple # Just fetches already loaded module

>>> simple.spam # Code wasn't rerun: attribute unchanged

Of course, sometimes you really want a module’s code to be rerun on a subsequent

import. We’ll see how to do this with Python’s reload function later in this chapter.

import and from Are Assignments

Just like def, import and from are executable statements, not compile-time declarations.

They may be nested in if tests, to select among options; appear in function defs, to be

loaded only on calls (subject to the preceding note); be used in try statements, to pro-

vide defaults; and so on. They are not resolved or run until Python reaches them while

executing your program. In other words, imported modules and names are not available

until their associated import or from statements run.

Changing mutables in modules

Also, like def, the import and from are implicit assignments:

•import assigns an entire module object to a single name.

•from assigns one or more names to objects of the same names in another module.

All the things we’ve already discussed about assignment apply to module access, too.

For instance, names copied with a from become references to shared objects; as with

function arguments, reassigning a copied name has no effect on the module from which

it was copied, but changing a shared mutable object through a copied name can also

change it in the module from which it was imported. To illustrate, consider the fol-

lowing file, small.py:

x = 1

y = [1, 2]

When importing with from, we copy names to the importer’s scope that initially share

objects referenced by the module’s names:

% python

>>> from small import x, y # Copy two names out

>>> x = 42 # Changes local x only

>>> y[0] = 42 # Changes shared mutable in place

Here, x is not a shared mutable object, but y is. The names y in the importer and the

importee both reference the same list object, so changing it from one place changes it

in the other:

>>> import small # Get module name (from doesn't)

>>> small.x # Small's x is not my x

>>> small.y # But we share a changed mutable

[42, 2]

Module Usage | 691

www.it-ebooks.info

For more background on this, see Chapter 6. And for a graphical picture of what

from assignments do with references, flip back to Figure 18-1 (function argument pass-

ing), and mentally replace “caller” and “function” with “imported” and “importer.”

The effect is the same, except that here we’re dealing with names in modules, not

functions. Assignment works the same everywhere in Python.

Cross-file name changes

Recall from the preceding example that the assignment to x in the interactive session

changed the name x in that scope only, not the x in the file—there is no link from a

name copied with from back to the file it came from. To really change a global name in

another file, you must use import:

% python

>>> from small import x, y # Copy two names out

>>> x = 42 # Changes my x only

>>> import small # Get module name

>>> small.x = 42 # Changes x in other module

This phenomenon was introduced in Chapter 17. Because changing variables in other

modules like this is a common source of confusion (and often a bad design choice),

we’ll revisit this technique again later in this part of the book. Note that the change to

y[0] in the prior session is different; it changes an object, not a name, and the name in

both modules references the same, changed object.

import and from Equivalence

Notice in the prior example that we have to execute an import statement after the

from to access the small module name at all. from only copies names from one module

to another; it does not assign the module name itself. At least conceptually, a from

statement like this one:

from module import name1, name2 # Copy these two names out (only)

is equivalent to this statement sequence:

import module # Fetch the module object

name1 = module.name1 # Copy names out by assignment

name2 = module.name2

del module # Get rid of the module name

Like all assignments, the from statement creates new variables in the importer, which

initially refer to objects of the same names in the imported file. Only the names are

copied out, though, not the objects they reference, and not the name of the module

itself. When we use the from * form of this statement (from module import *), the

equivalence is the same, but all the top-level names in the module are copied over to

the importing scope this way.

692 | Chapter 23: Module Coding Basics

www.it-ebooks.info

Notice that the first step of the from runs a normal import operation, with all the se-

mantics outlined in the preceding chapter. Because of this, the from always imports the

entire module into memory if it has not yet been imported, regardless of how many

names it copies out of the file. There is no way to load just part of a module file (e.g.,

just one function), but because modules are byte code in Python instead of machine

code, the performance implications are generally negligible.

Potential Pitfalls of the from Statement

Because the from statement makes the location of a variable more implicit and obscure

(name is less meaningful to the reader than module.name), some Python users recommend

using import instead of from most of the time. I’m not sure this advice is warranted,

though; from is commonly and widely used, without too many dire consequences. In

practice, in realistic programs, it’s often convenient not to have to type a module’s name

every time you wish to use one of its tools. This is especially true for large modules that

provide many attributes—the standard library’s tkinter GUI module, for example.

It is true that the from statement has the potential to corrupt namespaces, at least in

principle—if you use it to import variables that happen to have the same names as

existing variables in your scope, your variables will be silently overwritten. This prob-

lem doesn’t occur with the simple import statement because you must always go

through a module’s name to get to its contents (module.attr will not clash with a vari-

able named attr in your scope). As long as you understand and expect that this can

happen when using from, though, this isn’t a major concern in practice, especially if

you list the imported names explicitly (e.g., from module import x, y, z).

On the other hand, the from statement has more serious issues when used in conjunc-

tion with the reload call, as imported names might reference prior versions of objects.

Moreover, the from module import * form really can corrupt namespaces and make

names difficult to understand, especially when applied to more than one file—in this

case, there is no way to tell which module a name came from, short of searching the

external source files. In effect, the from * form collapses one namespace into another,

and so defeats the namespace partitioning feature of modules. We will explore these

issues in more detail in the section “Module Gotchas” on page 770 (see Chapter 25).

Probably the best real-world advice here is to generally prefer import to from for simple

modules, to explicitly list the variables you want in most from statements, and to limit

the from * form to just one import per file. That way, any undefined names can be

assumed to live in the module referenced with the from *. Some care is required when

using the from statement, but armed with a little knowledge, most programmers find

it to be a convenient way to access modules.

Module Usage | 693

www.it-ebooks.info

When import is required

The only time you really must use import instead of from is when you must use the same

name defined in two different modules. For example, if two files define the same name

differently:

# M.py

def func():

...do something...

# N.py

def func():

...do something else...

and you must use both versions of the name in your program, the from statement will

fail—you can have only one assignment to the name in your scope:

# O.py

from M import func

from N import func # This overwrites the one we fetched from M

func() # Calls N.func only!

An import will work here, though, because including the name of the enclosing module

makes the two names unique:

# O.py

import M, N # Get the whole modules, not their names

M.func() # We can call both names now

N.func() # The module names make them unique

This case is unusual enough that you’re unlikely to encounter it very often in practice.

If you do, though, import allows you to avoid the name collision. Another way out of

this dilemma is using the as extension, which we’ll cover in Chapter 25 but is simple

enough to introduce here:

# O.py

from M import func as mfunc # Rename uniquely with "as"

from N import func as nfunc

mfunc(); nfunc() # Calls one or the other

The as extension works in both import and from as a simple renaming tool (it can also

be used to give a shorter synonym for a long module name in import); more on this

form in Chapter 25.

Module Namespaces

Modules are probably best understood as simply packages of names—i.e., places to

define names you want to make visible to the rest of a system. Technically, modules

usually correspond to files, and Python creates a module object to contain all the names

assigned in a module file. But in simple terms, modules are just namespaces (places

where names are created), and the names that live in a module are called its at-

tributes. This section expands on the details behind this model.

694 | Chapter 23: Module Coding Basics

www.it-ebooks.info

Files Generate Namespaces

I’ve mentioned that files morph into namespaces, but how does this actually happen?

The short answer is that every name that is assigned a value at the top level of a module

file (i.e., not nested in a function or class body) becomes an attribute of that module.

For instance, given an assignment statement such as X = 1 at the top level of a module

file M.py, the name X becomes an attribute of M, which we can refer to from outside the

module as M.X. The name X also becomes a global variable to other code inside M.py,

but we need to consider the notion of module loading and scopes a bit more formally

to understand why:

•Module statements run on the first import. The first time a module is imported

anywhere in a system, Python creates an empty module object and executes the

statements in the module file one after another, from the top of the file to the

bottom.

•Top-level assignments create module attributes. During an import, statements

at the top level of the file not nested in a def or class that assign names (e.g., =,

def) create attributes of the module object; assigned names are stored in the mod-

ule’s namespace.

•Module namespaces can be accessed via the attribute__dict__ or dir(M).

Module namespaces created by imports are dictionaries; they may be accessed

through the built-in __dict__ attribute associated with module objects and may be

inspected with the dir function. The dir function is roughly equivalent to the sorted

keys list of an object’s __dict__ attribute, but it includes inherited names for classes,

may not be complete, and is prone to changing from release to release.

•Modules are a single scope (local is global). As we saw in Chapter 17, names

at the top level of a module follow the same reference/assignment rules as names

in a function, but the local and global scopes are the same—or, more formally,

they follow the LEGB scope rule we met in Chapter 17, but without the L and E

lookup layers.

Crucially, though, the module’s global scope becomes an attribute dictionary of a

module object after the module has been loaded. Unlike function scopes, where

the local namespace exists only while the function runs, a module file’s scope be-

comes a module object’s attribute namespace and lives on after the import, pro-

viding a source of tools to importers.

Here’s a demonstration of these ideas. Suppose we create the following module file in

a text editor and call it module2.py:

print('starting to load...')

import sys

name = 42

def func(): pass

Module Namespaces | 695

www.it-ebooks.info

class klass: pass

print('done loading.')

The first time this module is imported (or run as a program), Python executes its state-

ments from top to bottom. Some statements create names in the module’s namespace

as a side effect, but others do actual work while the import is going on. For instance,

the two print statements in this file execute at import time:

>>> import module2

starting to load...

done loading.

Once the module is loaded, its scope becomes an attribute namespace in the module

object we get back from import. We can then access attributes in this namespace by

qualifying them with the name of the enclosing module:

>>> module2.sys

>>> module2.name

>>> module2.func

>>> module2.klass

Here, sys, name, func, and klass were all assigned while the module’s statements were

being run, so they are attributes after the import. We’ll talk about classes in Part VI,

but notice the sys attribute—import statements really assign module objects to names,

and any type of assignment to a name at the top level of a file generates a module

attribute.

Namespace Dictionaries: __dict__

In fact, internally, module namespaces are stored as dictionary objects. These are just

normal dictionaries with all the usual methods. When needed—for instance, to write

tools that list module content generically as we will in Chapter 25—we can access a

module’s namespace dictionary through the module’s __dict__ attribute. Continuing

the prior section’s example (remember to wrap this in a list call in Python 3.X—it’s a

view object there, and contents may vary outside 3.3 used here):

>>> list(module2.__dict__.keys())

['__loader__', 'func', 'klass', '__builtins__', '__doc__', '__file__', '__name__',

'name', '__package__', 'sys', '__initializing__', '__cached__']

The names we assigned in the module file become dictionary keys internally, so some

of the names here reflect top-level assignments in our file. However, Python also adds

some names in the module’s namespace for us; for instance, __file__ gives the name

696 | Chapter 23: Module Coding Basics

www.it-ebooks.info

of the file the module was loaded from, and __name__ gives its name as known to im-

porters (without the .py extension and directory path). To see just the names your code

assigns, filter out the double-underscore names as we’ve done before, in Chapter 15’s

dir coverage and Chapter 17’s built-in scope coverage:

>>> list(name for name in module2.__dict__.keys() if not name.startswith('__'))

['func', 'klass', 'name', 'sys']

>>> list(name for name in module2.__dict__ if not name.startswith('__'))

['func', 'sys', 'name', 'klass']

This time we’re filtering with a generator instead of a list comprehension, and can omit

the .keys() because dictionaries generate their keys automatically though implicitly;

the effect is the same. We’ll see similar __dict__ dictionaries on class-related objects in

Part VI too. In both cases, attribute fetch is similar to dictionary indexing, though only

the former kicks off inheritance in classes:

>>> module2.name, module2.__dict__['name']

(42, 42)

Attribute Name Qualification

Speaking of attribute fetch, now that you’re becoming more familiar with modules, we

should firm up the notion of name qualification more formally too. In Python, you can

access the attributes of any object that has attributes using the qualification (a.k.a.

attribute fetch) syntax object.attribute.

Qualification is really an expression that returns the value assigned to an attribute name

associated with an object. For example, the expression module2.sys in the previous

example fetches the value assigned to sys in module2. Similarly, if we have a built-in list

object L, L.append returns the append method object associated with that list.

It’s important to keep in mind that attribute qualification has nothing to do with the

scope rules we studied in Chapter 17; it’s an independent concept. When you use

qualification to access names, you give Python an explicit object from which to fetch

the specified names. The LEGB scope rule applies only to bare, unqualified names—it

may be used for the leftmost name in a name path, but later names after dots search

specific objects instead. Here are the rules:

Simple variables

X means search for the name X in the current scopes (following the LEGB rule of

Chapter 17).

Qualification

X.Y means find X in the current scopes, then search for the attribute Y in the object

X (not in scopes).

Qualification paths

X.Y.Z means look up the name Y in the object X, then look up Z in the object X.Y.

Module Namespaces | 697

www.it-ebooks.info

Generality

Qualification works on all objects with attributes: modules, classes, C extension

types, etc.

In Part VI, we’ll see that attribute qualification means a bit more for classes—it’s also

the place where something called inheritance happens—but in general, the rules out-

lined here apply to all names in Python.

Imports Versus Scopes

As we’ve learned, it is never possible to access names defined in another module file

without first importing that file. That is, you never automatically get to see names in

another file, regardless of the structure of imports or function calls in your program. A

variable’s meaning is always determined by the locations of assignments in your source

code, and attributes are always requested of an object explicitly.

For example, consider the following two simple modules. The first, moda.py, defines

a variable X global to code in its file only, along with a function that changes the global

X in this file:

X = 88 # My X: global to this file only

def f():

global X # Change this file's X

X = 99 # Cannot see names in other modules

The second module, modb.py, defines its own global variable X and imports and calls

the function in the first module:

X = 11 # My X: global to this file only

import moda # Gain access to names in moda

moda.f() # Sets moda.X, not this file's X

print(X, moda.X)

When run, moda.f changes the X in moda, not the X in modb. The global scope for

moda.f is always the file enclosing it, regardless of which module it is ultimately called

from:

% python modb.py

11 99

In other words, import operations never give upward visibility to code in imported files

—an imported file cannot see names in the importing file. More formally:

• Functions can never see names in other functions, unless they are physically en-

closing.

• Module code can never see names in other modules, unless they are explicitly im-

ported.

698 | Chapter 23: Module Coding Basics

www.it-ebooks.info

Such behavior is part of the lexical scoping notion—in Python, the scopes surrounding

a piece of code are completely determined by the code’s physical position in your file.

Scopes are never influenced by function calls or module imports.1

Namespace Nesting

In some sense, although imports do not nest namespaces upward, they do nest down-

ward. That is, although an imported module never has direct access to names in a file

that imports it, using attribute qualification paths it is possible to descend into arbi-

trarily nested modules and access their attributes. For example, consider the next three

files. mod3.py defines a single global name and attribute by assignment:

X = 3

mod2.py in turn defines its own X, then imports mod3 and uses qualification to access

the imported module’s attribute:

X = 2

import mod3

print(X, end=' ') # My global X

print(mod3.X) # mod3's X

mod1.py also defines its own X, then imports mod2, and fetches attributes in both the

first and second files:

X = 1

import mod2

print(X, end=' ') # My global X

print(mod2.X, end=' ') # mod2's X

print(mod2.mod3.X) # Nested mod3's X

Really, when mod1 imports mod2 here, it sets up a two-level namespace nesting. By using

the path of names mod2.mod3.X, it can descend into mod3, which is nested in the imported

mod2. The net effect is that mod1 can see the Xs in all three files, and hence has access to

all three global scopes:

% python mod1.py

2 3

1 2 3

The reverse, however, is not true: mod3 cannot see names in mod2, and mod2 cannot see

names in mod1. This example may be easier to grasp if you don’t think in terms of

namespaces and scopes, but instead focus on the objects involved. Within mod1, mod2

is just a name that refers to an object with attributes, some of which may refer to other

1. Some languages act differently and provide for dynamic scoping, where scopes really may depend on

runtime calls. This tends to make code trickier, though, because the meaning of a variable can differ over

time. In Python, scopes more simply correspond to the text of your program.

Module Namespaces | 699

www.it-ebooks.info

objects with attributes (import is an assignment). For paths like mod2.mod3.X, Python

simply evaluates from left to right, fetching attributes from objects along the way.

Note that mod1 can say import mod2, and then mod2.mod3.X, but it cannot say import

mod2.mod3—this syntax invokes something called package (directory) imports, de-

scribed in the next chapter. Package imports also create module namespace nesting,

but their import statements are taken to reflect directory trees, not simple file import

chains.

Reloading Modules

As we’ve seen, a module’s code is run only once per process by default. To force a

module’s code to be reloaded and rerun, you need to ask Python to do so explicitly by

calling the reload built-in function. In this section, we’ll explore how to use reloads to

make your systems more dynamic. In a nutshell:

• Imports (via both import and from statements) load and run a module’s code only

the first time the module is imported in a process.

• Later imports use the already loaded module object without reloading or rerunning

the file’s code.

• The reload function forces an already loaded module’s code to be reloaded and

rerun. Assignments in the file’s new code change the existing module object in

place.

Why care about reloading modules? In short, dynamic customization: the reload func-

tion allows parts of a program to be changed without stopping the whole program.

With reload, the effects of changes in components can be observed immediately. Re-

loading doesn’t help in every situation, but where it does, it makes for a much shorter

development cycle. For instance, imagine a database program that must connect to a

server on startup; because program changes or customizations can be tested immedi-

ately after reloads, you need to connect only once while debugging. Long-running

servers can update themselves this way, too.

Because Python is interpreted (more or less), it already gets rid of the compile/link steps

you need to go through to get a C program to run: modules are loaded dynamically

when imported by a running program. Reloading offers a further performance advan-

tage by allowing you to also change parts of running programs without stopping.

Though beyond this book’s scope, note that reload currently only works on modules

written in Python; compiled extension modules coded in a language such as C can be

dynamically loaded at runtime, too, but they can’t be reloaded (though most users

probably prefer to code customizations in Python anyhow!).

700 | Chapter 23: Module Coding Basics

www.it-ebooks.info

Version skew note: In Python 2.X, reload is available as a built-in func-

tion. In Python 3.X, it has been moved to the imp standard library mod-

ule—it’s known as imp.reload in 3.X. This simply means that an extra

import or from statement is required to load this tool in 3.X only. Readers

using 2.X can ignore these imports in this book’s examples, or use them

anyhow—2.X also has a reload in its imp module to ease migration to

3.X. Reloading works the same regardless of its packaging.

reload Basics

Unlike import and from:

•reload is a function in Python, not a statement.

•reload is passed an existing module object, not a new name.

•reload lives in a module in Python 3.X and must be imported itself.

Because reload expects an object, a module must have been previously imported suc-

cessfully before you can reload it (if the import was unsuccessful due to a syntax or

other error, you may need to repeat it before you can reload the module). Furthermore,

the syntax of import statements and reload calls differs: as a function reloads require

parentheses, but import statements do not. Abstractly, reloading looks like this:

import module # Initial import

...use module.attributes...

... # Now, go change the module file

...

from imp import reload # Get reload itself (in 3.X)

reload(module) # Get updated exports

...use module.attributes...

The typical usage pattern is that you import a module, then change its source code in

a text editor, and then reload it. This can occur when working interactively, but also

in larger programs that reload periodically.

When you call reload, Python rereads the module file’s source code and reruns its top-

level statements. Perhaps the most important thing to know about reload is that it

changes a module object in place; it does not delete and re-create the module object.

Because of that, every reference to an entire module object anywhere in your program

is automatically affected by a reload. Here are the details:

•reload runs a module file’s new code in the module’s current namespace.

Rerunning a module file’s code overwrites its existing namespace, rather than de-

leting and re-creating it.

•Top-level assignments in the file replace names with new values. For instance,

rerunning a def statement replaces the prior version of the function in the module’s

namespace by reassigning the function name.

Reloading Modules | 701

www.it-ebooks.info

•Reloads impact all clients that use import to fetch modules. Because clients

that use import qualify to fetch attributes, they’ll find new values in the module

object after a reload.

•Reloads impact future from clients only. Clients that used from to fetch attributes

in the past won’t be affected by a reload; they’ll still have references to the old

objects fetched before the reload.

•Reloads apply to a single module only. You must run them on each module you

wish to update, unless you use code or tools that apply reloads transitively.

reload Example

To demonstrate, here’s a more concrete example of reload in action. In the following,

we’ll change and reload a module file without stopping the interactive Python session.

Reloads are used in many other scenarios, too (see the sidebar “Why You Will Care:

Module Reloads” on page 703), but we’ll keep things simple for illustration here.

First, in the text editor of your choice, write a module file named changer.py with the

following contents:

message = "First version"

def printer():

print(message)

This module creates and exports two names—one bound to a string, and another to a

function. Now, start the Python interpreter, import the module, and call the function

it exports. The function will print the value of the global message variable:

% python

>>> import changer

>>> changer.printer()

First version

Keeping the interpreter active, now edit the module file in another window:

...modify changer.py without stopping Python...

% notepad changer.py

Change the global message variable, as well as the printer function body:

message = "After editing"

def printer():

print('reloaded:', message)

Then, return to the Python window and reload the module to fetch the new code. Notice

in the following interaction that importing the module again has no effect; we get the

original message, even though the file’s been changed. We have to call reload in order

to get the new version:

...back to the Python interpreter...

>>> import changer

>>> changer.printer() # No effect: uses loaded module

First version

>>> from imp import reload

702 | Chapter 23: Module Coding Basics

www.it-ebooks.info

>>> reload(changer) # Forces new code to load/run

>>> changer.printer() # Runs the new version now

reloaded: After editing

Notice that reload actually returns the module object for us—its result is usually ig-

nored, but because expression results are printed at the interactive prompt, Python

shows a default <module 'name'...> representation.

Two final notes here: first, if you use reload, you’ll probably want to pair it with

import instead of from, as the latter isn’t updated by reload operations—leaving your

names in a state that’s strange enough to warrant postponing further elaboration until

this part’s “gotchas” at the end of Chapter 25. Second, reload by itself updates only a

single module, but it’s straightforward to code a function that applies it transitively to

related modules—an extension we’ll save for a case study near the end of Chapter 25.

Why You Will Care: Module Reloads

Besides allowing you to reload (and hence rerun) modules at the interactive prompt,

module reloads are also useful in larger systems, especially when the cost of restarting

the entire application is prohibitive. For instance, game servers and systems that must

connect to servers over a network on startup are prime candidates for dynamic reloads.

They’re also useful in GUI work (a widget’s callback action can be changed while the

GUI remains active), and when Python is used as an embedded language in a C or C+

+ program (the enclosing program can request a reload of the Python code it runs,

without having to stop). See Programming Python for more on reloading GUI callbacks

and embedded Python code.

More generally, reloads allow programs to provide highly dynamic interfaces. For in-

stance, Python is often used as a customization language for larger systems—users can

customize products by coding bits of Python code onsite, without having to recompile

the entire product (or even having its source code at all). In such worlds, the Python

code already adds a dynamic flavor by itself.

To be even more dynamic, though, such systems can automatically reload the Python

customization code periodically at runtime. That way, users’ changes are picked up

while the system is running; there is no need to stop and restart each time the Python

code is modified. Not all systems require such a dynamic approach, but for those that

do, module reloads provide an easy-to-use dynamic customization tool.

Chapter Summary

This chapter delved into the essentials of module coding tools—the import and from

statements, and the reload call. We learned how the from statement simply adds an

extra step that copies names out of a file after it has been imported, and how reload

forces a file to be imported again without stopping and restarting Python. We also

surveyed namespace concepts, saw what happens when imports are nested, explored

Chapter Summary | 703

www.it-ebooks.info

the way files become module namespaces, and learned about some potential pitfalls of

the from statement.

Although we’ve already seen enough to handle module files in our programs, the next

chapter extends our coverage of the import model by presenting package imports—a

way for our import statements to specify part of the directory path leading to the desired

module. As we’ll see, package imports give us a hierarchy that is useful in larger systems

and allow us to break conflicts between same-named modules. Before we move on,

though, here’s a quick quiz on the concepts presented here.

Test Your Knowledge: Quiz

1. How do you make a module?

2. How is the from statement related to the import statement?

3. How is the reload function related to imports?

4. When must you use import instead of from?

5. Name three potential pitfalls of the from statement.

6. What...is the airspeed velocity of an unladen swallow?

Test Your Knowledge: Answers

1. To create a module, you simply write a text file containing Python statements; every

source code file is automatically a module, and there is no syntax for declaring one.

Import operations load module files into module objects in memory. You can also

make a module by writing code in an external language like C or Java, but such

extension modules are beyond the scope of this book.

2. The from statement imports an entire module, like the import statement, but as an

extra step it also copies one or more variables from the imported module into the

scope where the from appears. This enables you to use the imported names directly

(name) instead of having to go through the module (module.name).

3. By default, a module is imported only once per process. The reload function forces

a module to be imported again. It is mostly used to pick up new versions of a

module’s source code during development, and in dynamic customization scenar-

ios.

4. You must use import instead of from only when you need to access the same name

in two different modules; because you’ll have to specify the names of the enclosing

modules, the two names will be unique. The as extension can render from usable

in this context as well.

5. The from statement can obscure the meaning of a variable (which module it is

defined in), can have problems with the reload call (names may reference prior

versions of objects), and can corrupt namespaces (it might silently overwrite names

704 | Chapter 23: Module Coding Basics

www.it-ebooks.info

you are using in your scope). The from * form is worse in most regards—it can

seriously corrupt namespaces and obscure the meaning of variables, so it is prob-

ably best used sparingly.

6. What do you mean? An African or European swallow?

Test Your Knowledge: Answers | 705

www.it-ebooks.info

CHAPTER 24

Module Packages

So far, when we’ve imported modules, we’ve been loading files. This represents typical

module usage, and it’s probably the technique you’ll use for most imports you’ll code

early on in your Python career. However, the module import story is a bit richer than

I have thus far implied.

In addition to a module name, an import can name a directory path. A directory of

Python code is said to be a package, so such imports are known as package imports. In

effect, a package import turns a directory on your computer into another Python name-

space, with attributes corresponding to the subdirectories and module files that the

directory contains.

This is a somewhat advanced feature, but the hierarchy it provides turns out to be handy

for organizing the files in a large system and tends to simplify module search path

settings. As we’ll see, package imports are also sometimes required to resolve import

ambiguities when multiple program files of the same name are installed on a single

machine.

Because it is relevant to code in packages only, we’ll also introduce Python’s recent

relative imports model and syntax here. As we’ll see, this model modifies search paths

in 3.X, and extends the from statement for imports within packages in both 2.X and

3.X. This model can make such intrapackage imports more explicit and succinct, but

comes with some tradeoffs that can impact your programs.

Finally, for readers using Python 3.3 and later, its new namespace package model—

which allows packages to span multiple directories and requires no initialization file—

is also introduced here. This new-style package model is optional and can be used in

concert with the original (now known as “regular”) package model, but it upends some

of the original model’s basic ideas and rules. Because of that, we’ll explore regular

packages here first for all readers, and present namespace packages last as an optional

topic.

707

www.it-ebooks.info

Package Import Basics

At a base level, package imports are straightforward—in the place where you have been

naming a simple file in your import statements, you can instead list a path of names

separated by periods:

import dir1.dir2.mod

The same goes for from statements:

from dir1.dir2.mod import x

The “dotted” path in these statements is assumed to correspond to a path through the

directory hierarchy on your computer, leading to the file mod.py (or similar; the ex-

tension may vary). That is, the preceding statements indicate that on your machine

there is a directory dir1, which has a subdirectory dir2, which contains a module file

mod.py (or similar).

Furthermore, these imports imply that dir1 resides within some container directory

dir0, which is a component of the normal Python module search path. In other words,

these two import statements imply a directory structure that looks something like this

(shown with Windows backslash separators):

dir0\dir1\dir2\mod.py # Or mod.pyc, mod.so, etc.

The container directory dir0 needs to be added to your module search path unless it’s

the home directory of the top-level file, exactly as if dir1 were a simple module file.

More formally, the leftmost component in a package import path is still relative to a

directory included in the sys.path module search path list we explored in Chap-

ter 22. From there down, though, the import statements in your script explicitly give

the directory paths leading to modules in packages.

Packages and Search Path Settings

If you use this feature, keep in mind that the directory paths in your import statements

can be only variables separated by periods. You cannot use any platform-specific path

syntax in your import statements, such as C:\dir1, My Documents.dir2, or ../dir1—

these do not work syntactically. Instead, use any such platform-specific syntax in your

module search path settings to name the container directories.

For instance, in the prior example, dir0—the directory name you add to your module

search path—can be an arbitrarily long and platform-specific directory path leading up

to dir1. You cannot use an invalid statement like this:

import C:\mycode\dir1\dir2\mod # Error: illegal syntax

But you can add C:\mycode to your PYTHONPATH variable or a .pth file, and say this in

your script:

import dir1.dir2.mod

708 | Chapter 24: Module Packages

www.it-ebooks.info

In effect, entries on the module search path provide platform-specific directory path

prefixes, which lead to the leftmost names in import and from statements. These import

statements themselves provide the remainder of the directory path in a platform-neutral

fashion.1

As for simple file imports, you don’t need to add the container directory dir0 to your

module search path if it’s already there—per Chapter 22, it will be if it’s the home

directory of the top-level file, the directory you’re working in interactively, a standard

library directory, or the site-packages third-party install root. One way or another,

though, your module search path must include all the directories containing leftmost

components in your code’s package import statements.

Package __init__.py Files

If you choose to use package imports, there is one more constraint you must follow: at

least until Python 3.3, each directory named within the path of a package import state-

ment must contain a file named __init__.py, or your package imports will fail. That is,

in the example we’ve been using, both dir1 and dir2 must contain a file called

__init__.py; the container directory dir0 does not require such a file because it’s not

listed in the import statement itself.

More formally, for a directory structure such as this:

dir0\dir1\dir2\mod.py

and an import statement of the form:

import dir1.dir2.mod

the following rules apply:

•dir1 and dir2 both must contain an __init__.py file.

•dir0, the container, does not require an __init__.py file; this file will simply be

ignored if present.

•dir0, not dir0\dir1, must be listed on the module search path sys.path.

To satisfy the first two of these rules, package creators must create files of the sort we’ll

explore here. To satisfy the latter of these, dir0 must be an automatic path component

(the home, libraries, or site-packages directories), or be given in PYTHONPATH or .pth file

settings or manual sys.path changes.

1. The dot path syntax was chosen partly for platform neutrality, but also because paths in import statements

become real nested object paths. This syntax also means that you may get odd error messages if you forget

to omit the .py in your import statements. For example, import mod.py is assumed to be a directory path

import—it loads mod.py, then tries to load a mod\py.py, and ultimately issues a potentially confusing

“No module named py” error message. As of Python 3.3 this error message has been improved to say

“No module named 'm.py'; m is not a package.”

Package Import Basics | 709

www.it-ebooks.info

The net effect is that this example’s directory structure should be as follows, with in-

dentation designating directory nesting:

dir0\ # Container on module search path

dir1\

__init__.py

dir2\

__init__.py

mod.py

The __init__.py files can contain Python code, just like normal module files. Their

names are special because their code is run automatically the first time a Python pro-

gram imports a directory, and thus serves primarily as a hook for performing initiali-

zation steps required by the package. These files can also be completely empty, though,

and sometimes have additional roles—as the next section explains.

As we’ll see near the end of this chapter, the requirement of packages

to have a file named __init__.py has been lifted as of Python 3.3. In that

release and later, directories of modules with no such file may be im-

ported as single-directory namespace packages, which work the same

but run no initialization-time code file. Prior to Python 3.3, though, and

in all of Python 2.X, packages still require __init__.py files. As described

ahead, in 3.3 and later these files also provide a performance advantage

when used.

Package initialization file roles

In more detail, the __init__.py file serves as a hook for package initialization-time ac-

tions, declares a directory as a Python package, generates a module namespace for a

directory, and implements the behavior of from * (i.e., from .. import *) statements

when used with directory imports:

Package initialization

The first time a Python program imports through a directory, it automatically runs

all the code in the directory’s __init__.py file. Because of that, these files are a

natural place to put code to initialize the state required by files in a package. For

instance, a package might use its initialization file to create required data files, open

connections to databases, and so on. Typically, __init__.py files are not meant to

be useful if executed directly; they are run automatically when a package is first

accessed.

Module usability declarations

Package __init__.py files are also partly present to declare that a directory is a

Python package. In this role, these files serve to prevent directories with common

names from unintentionally hiding true modules that appear later on the module

search path. Without this safeguard, Python might pick a directory that has nothing

to do with your code, just because it appears nested in an earlier directory on the

search path. As we’ll see later, Python 3.3’s namespace packages obviate much of

710 | Chapter 24: Module Packages

www.it-ebooks.info

this role, but achieve a similar effect algorithmically by scanning ahead on the path

to find later files.

Module namespace initialization

In the package import model, the directory paths in your script become real nested

object paths after an import. For instance, in the preceding example, after the im-

port the expression dir1.dir2 works and returns a module object whose namespace

contains all the names assigned by dir2’s __init__.py initialization file. Such files

provide a namespace for module objects created for directories, which would

otherwise have no real associated module file.

from * statement behavior

As an advanced feature, you can use __all__ lists in __init__.py files to define what

is exported when a directory is imported with the from * statement form. In an

__init__.py file, the __all__ list is taken to be the list of submodule names that

should be automatically imported when from * is used on the package (directory)

name. If __all__ is not set, the from * statement does not automatically load sub-

modules nested in the directory; instead, it loads just names defined by assignments

in the directory’s __init__.py file, including any submodules explicitly imported by

code in this file. For instance, the statement from submodule import X in a direc-

tory’s __init__.py makes the name X available in that directory’s namespace. (We’ll

see additional roles for __all__ in Chapter 25: it serves to declare from * exports

of simple files as well.)

You can also simply leave these files empty, if their roles are beyond your needs (and

frankly, they are often empty in practice). They must exist, though, for your directory

imports to work at all.

Don’t confuse package __init__.py files with the class __init__ con-

structor methods we’ll meet in the next part of the book. The former

are files of code run when imports first step through a package directory

in a program run, while the latter are called when an instance is created.

Both have initialization roles, but they are otherwise very different.

Package Import Example

Let’s actually code the example we’ve been talking about to show how initialization

files and paths come into play. The following three files are coded in a directory dir1

and its subdirectory dir2—comments give the pathnames of these files:

# dir1\__init__.py

print('dir1 init')

x = 1

# dir1\dir2\__init__.py

print('dir2 init')

y = 2

Package Import Example | 711

www.it-ebooks.info

# dir1\dir2\mod.py

print('in mod.py')

z = 3

Here, dir1 will be either an immediate subdirectory of the one we’re working in (i.e.,

the home directory), or an immediate subdirectory of a directory that is listed on the

module search path (technically, on sys.path). Either way, dir1’s container does not

need an __init__.py file.

import statements run each directory’s initialization file the first time that directory is

traversed, as Python descends the path; print statements are included here to trace

their execution:

C:\code> python # Run in dir1's container directory

>>> import dir1.dir2.mod # First imports run init files

dir1 init

dir2 init

in mod.py

>>>

>>> import dir1.dir2.mod # Later imports do not

Just like module files, an already imported directory may be passed to reload to force

reexecution of that single item. As shown here, reload accepts a dotted pathname to

reload nested directories and files:

>>> from imp import reload # from needed in 3.X only

>>> reload(dir1)

dir1 init

>>>

>>> reload(dir1.dir2)

dir2 init

Once imported, the path in your import statement becomes a nested object path in your

script. Here, mod is an object nested in the object dir2, which in turn is nested in the

object dir1:

>>> dir1

>>> dir1.dir2

>>> dir1.dir2.mod

In fact, each directory name in the path becomes a variable assigned to a module object

whose namespace is initialized by all the assignments in that directory’s __init__.py

file. dir1.x refers to the variable x assigned in dir1\__init__.py, much as mod.z refers to

the variable z assigned in mod.py:

>>> dir1.x

>>> dir1.dir2.y

712 | Chapter 24: Module Packages

www.it-ebooks.info

>>> dir1.dir2.mod.z

from Versus import with Packages

import statements can be somewhat inconvenient to use with packages, because you

may have to retype the paths frequently in your program. In the prior section’s example,

for instance, you must retype and rerun the full path from dir1 each time you want to

reach z. If you try to access dir2 or mod directly, you’ll get an error:

>>> dir2.mod

NameError: name 'dir2' is not defined

>>> mod.z

NameError: name 'mod' is not defined

It’s often more convenient, therefore, to use the from statement with packages to avoid

retyping the paths at each access. Perhaps more importantly, if you ever restructure

your directory tree, the from statement requires just one path update in your code,

whereas imports may require many. The import as extension, discussed formally in the

next chapter, can also help here by providing a shorter synonym for the full path, and

a renaming tool when the same name appears in multiple modules:

C:\code> python

>>> from dir1.dir2 import mod # Code path here only

dir1 init

dir2 init

in mod.py

>>> mod.z # Don't repeat path

>>> from dir1.dir2.mod import z

>>> z

>>> import dir1.dir2.mod as mod # Use shorter name (see Chapter 25)

>>> mod.z

>>> from dir1.dir2.mod import z as modz # Ditto if names clash (see Chapter 25)

>>> modz

Why Use Package Imports?

If you’re new to Python, make sure that you’ve mastered simple modules before step-

ping up to packages, as they are a somewhat more advanced feature. They do serve

useful roles, though, especially in larger programs: they make imports more informa-

tive, serve as an organizational tool, simplify your module search path, and can resolve

ambiguities.

First of all, because package imports give some directory information in program files,

they both make it easier to locate your files and serve as an organizational tool. Without

package paths, you must often resort to consulting the module search path to find files.

Why Use Package Imports? | 713

www.it-ebooks.info

Moreover, if you organize your files into subdirectories for functional areas, package

imports make it more obvious what role a module plays, and so make your code more

readable. For example, a normal import of a file in a directory somewhere on the module

search path, like this:

import utilities

offers much less information than an import that includes the path:

import database.client.utilities

Package imports can also greatly simplify your PYTHONPATH and .pth file search path

settings. In fact, if you use explicit package imports for all your cross-directory imports,

and you make those package imports relative to a common root directory where all

your Python code is stored, you really only need a single entry on your search path: the

common root. Finally, package imports serve to resolve ambiguities by making explicit

exactly which files you want to import—and resolve conflicts when the same module

name appears in more than one place. The next section explores this role in more detail.

A Tale of Three Systems

The only time package imports are actually required is to resolve ambiguities that may

arise when multiple programs with same-named files are installed on a single machine.

This is something of an install issue, but it can also become a concern in general practice

—especially given the tendency of developers to use simple and similar names for

module files. Let’s turn to a hypothetical scenario to illustrate.

Suppose that a programmer develops a Python program that contains a file called

utilities.py for common utility code, and a top-level file named main.py that users launch

to start the program. All over this program, its files say import utilities to load and

use the common code. When the program is shipped, it arrives as a single .tar or .zip

file containing all the program’s files, and when it is installed, it unpacks all its files into

a single directory named system1 on the target machine:

system1\

utilities.py # Common utility functions, classes

main.py # Launch this to start the program

other.py # Import utilities to load my tools

Now, suppose that a second programmer develops a different program with files also

called utilities.py and main.py, and again uses import utilities throughout the pro-

gram to load the common code file. When this second system is fetched and installed

on the same computer as the first system, its files will unpack into a new directory called

system2 somewhere on the receiving machine—ensuring that they do not overwrite

same-named files from the first system:

system2\

utilities.py # Common utilities

main.py # Launch this to run

other.py # Imports utilities

714 | Chapter 24: Module Packages

www.it-ebooks.info

So far, there’s no problem: both systems can coexist and run on the same computer.

In fact, you won’t even need to configure the module search path to use these programs

on your computer—because Python always searches the home directory first (that is,

the directory containing the top-level file), imports in either system’s files will auto-

matically see all the files in that system’s directory. For instance, if you click on sys-

tem1\main.py, all imports will search system1 first. Similarly, if you launch sys-

tem2\main.py, system2 will be searched first instead. Remember, module search path

settings are only needed to import across directory boundaries.

However, suppose that after you’ve installed these two programs on your machine, you

decide that you’d like to use some of the code in each of the utilities.py files in a system

of your own. It’s common utility code, after all, and Python code by nature “wants” to

be reused. In this case, you’d like to be able to say the following from code that you’re

writing in a third directory to load one of the two files:

import utilities

utilities.func('spam')

Now the problem starts to materialize. To make this work at all, you’ll have to set the

module search path to include the directories containing the utilities.py files. But which

directory do you put first in the path—system1 or system2?

The problem is the linear nature of the search path. It is always scanned from left to

right, so no matter how long you ponder this dilemma, you will always get just one

utilities.py—from the directory listed first (leftmost) on the search path. As is, you’ll

never be able to import it from the other directory at all.

You could try changing sys.path within your script before each import operation, but

that’s both extra work and highly error prone. And changing PYTHONPATH before each

Python program run is too tedious, and won’t allow you to use both versions in a single

file in an event. By default, you’re stuck.

This is the issue that packages actually fix. Rather than installing programs in inde-

pendent directories listed on the module search path individually, you can package and

install them as subdirectories under a common root. For instance, you might organize

all the code in this example as an install hierarchy that looks like this:

root\

system1\

__init__.py

utilities.py

main.py

other.py

system2\

__init__.py

utilities.py

main.py

other.py

system3\ # Here or elsewhere

__init__.py # Need __init__.py here only if imported elsewhere

myfile.py # Your new code here

Why Use Package Imports? | 715

www.it-ebooks.info

Now, add just the common root directory to your search path. If your code’s imports

are all relative to this common root, you can import either system’s utility file with a

package import—the enclosing directory name makes the path (and hence, the module

reference) unique. In fact, you can import both utility files in the same module, as long

as you use an import statement and repeat the full path each time you reference the

utility modules:

import system1.utilities

import system2.utilities

system1.utilities.function('spam')

system2.utilities.function('eggs')

The names of the enclosing directories here make the module references unique.

Note that you have to use import instead of from with packages only if you need to

access the same attribute name in two or more paths. If the name of the called function

here were different in each path, you could use from statements to avoid repeating the

full package path whenever you call one of the functions, as described earlier; the as

extension in from can also be used to provide unique synonyms.

Also, notice in the install hierarchy shown earlier that __init__.py files were added to

the system1 and system2 directories to make this work, but not to the root directory.

Only directories listed within import statements in your code require these files; as

we’ve seen, they are run automatically the first time the Python process imports through

a package directory.

Technically, in this case the system3 directory doesn’t have to be under root—just the

packages of code from which you will import. However, because you never know when

your own modules might be useful in other programs, you might as well place them

under the common root directory as well to avoid similar name-collision problems in

the future.

Finally, notice that both of the two original systems’ imports will keep working un-

changed. Because their home directories are searched first, the addition of the common

root on the search path is irrelevant to code in system1 and system2; they can keep

saying just import utilities and expect to find their own files when run as programs

—though not when used as packages in 3.X, as the next section explains. If you’re

careful to unpack all your Python systems under a common root like this, path con-

figuration also becomes simple: you’ll only need to add the common root directory

once.

Why You Will Care: Module Packages

Because packages are a standard part of Python, it’s common to see larger third-party

extensions shipped as sets of package directories, rather than flat lists of modules. The

win32all Windows extensions package for Python, for instance, was one of the first to

jump on the package bandwagon. Many of its utility modules reside in packages im-

ported with paths. For instance, to load client-side COM tools, you use a statement

like this:

716 | Chapter 24: Module Packages

www.it-ebooks.info

from win32com.client import constants, Dispatch

This line fetches names from the client module of the win32com package—an install

subdirectory.

Package imports are also pervasive in code run under the Jython Java-based imple-

mentation of Python, because Java libraries are organized into hierarchies as well. In

recent Python releases, the email and XML tools are likewise organized into package

subdirectories in the standard library, and Python 3.X groups even more related mod-

ules into packages—including tkinter GUI tools, HTTP networking tools, and more.

The following imports access various standard library tools in 3.X (2.X usage may vary):

from email.message import Message

from tkinter.filedialog import askopenfilename

from http.server import CGIHTTPRequestHandler

Whether you create package directories or not, you will probably import from them

eventually.

Package Relative Imports

The coverage of package imports so far has focused mostly on importing package files

from outside the package. Within the package itself, imports of same-package files can

use the same full path syntax as imports from outside the package—and as we’ll see,

sometimes should. However, package files can also make use of special intrapackage

search rules to simplify import statements. That is, rather than listing package import

paths, imports within the package can be relative to the package.

The way this works is version-dependent: Python 2.X implicitly searches package di-

rectories first on imports, while 3.X requires explicit relative import syntax in order to

import from the package directory. This 3.X change can enhance code readability by

making same-package imports more obvious, but it’s also incompatible with 2.X and

may break some programs.

If you’re starting out in Python with version 3.X, your focus in this section will likely

be on its new import syntax and model. If you’ve used other Python packages in the

past, though, you’ll probably also be interested in how the 3.X model differs. Let’s

begin our tour with the latter perspective on this topic.

As we’ll learn in this section, use of package relative imports can actually

limit your files’ roles. In short, they can no longer be used as executable

program files in both 2.X and 3.X. Because of this, normal package im-

port paths may be a better option in many cases. Still, this feature has

found its way into many a Python file, and merits a review by most

Python programmers to better understand both its tradeoffs and moti-

vation.

Package Relative Imports | 717

www.it-ebooks.info

Changes in Python 3.X

The way import operations in packages work has changed slightly in Python 3.X. This

change applies only to imports within files when files are used as part of a package

directory; imports in other usage modes work as before. For imports in packages,

though, Python 3.X introduces two changes:

• It modifies the module import search path semantics to skip the package’s own

directory by default. Imports check only paths on the sys.path search path. These

are known as absolute imports.

• It extends the syntax of from statements to allow them to explicitly request that

imports search the package’s directory only, with leading dots. This is known as

relative import syntax.

These changes are fully present in Python 3.X. The new from statement relative syntax

is also available in Python 2.X, but the default absolute search path change must be

enabled as an option there. Enabling this can break 2.X programs, but is available for

3.X forward compatibility.

The impact of this change is that in 3.X (and optionally in 2.X), you must generally use

special from dotted syntax to import modules located in the same package as the im-

porter, unless your imports list a complete path relative to a package root on

sys.path, or your imports are relative to the always-searched home directory of the

program’s top-level file (which is usually the current working directory).

By default, though, your package directory is not automatically searched, and intra-

package imports made by files in a directory used as a package will fail without the

special from syntax. As we’ll see, in 3.X this can affect the way you will structure imports

or directories for modules meant for use in both top-level programs and importable

packages. First, though, let’s take a more detailed look at how this all works.

Relative Import Basics

In both Python 3.X and 2.X, from statements can now use leading dots (“.”) to specify

that they require modules located within the same package (known as package relative

imports), instead of modules located elsewhere on the module import search path

(called absolute imports). That is:

•Imports with dots: In both Python 3.X and 2.X, you can use leading dots in from

statements’ module names to indicate that imports should be relative-only to the

containing package—such imports will search for modules inside the package di-

rectory only and will not look for same-named modules located elsewhere on the

import search path (sys.path). The net effect is that package modules override

outside modules.

•Imports without dots: In Python 2.X, normal imports in a package’s code without

leading dots currently default to a relative-then-absolute search path order—that

718 | Chapter 24: Module Packages

www.it-ebooks.info

is, they search the package’s own directory first. However, in Python 3.X, normal

imports within a package are absolute-only by default—in the absence of any spe-

cial dot syntax, imports skip the containing package itself and look elsewhere on

the sys.path search path.

For example, in both Python 3.X and 2.X a statement of the form:

from . import spam # Relative to this package

instructs Python to import a module named spam located in the same package directory

as the file in which this statement appears. Similarly, this statement:

from .spam import name

means “from a module named spam located in the same package as the file that contains

this statement, import the variable name.”

The behavior of a statement without the leading dot depends on which version of

Python you use. In 2.X, such an import will still default to the original relative-then-

absolute search path order (i.e., searching the package’s directory first), unless a state-

ment of the following form is included at the top of the importing file (as its first exe-

cutable statement):

from __future__ import absolute_import # Use 3.X relative import model in 2.X

If present, this statement enables the Python 3.X absolute-only search path change. In

3.X, and in 2.X when enabled, an import without a leading dot in the module name

always causes Python to skip the relative components of the module import search path

and look instead in the absolute directories that sys.path contains. For instance, in

3.X’s model, a statement of the following form will always find a string module some-

where on sys.path, instead of a module of the same name in the package:

import string # Skip this package's version

By contrast, without the from __future__ statement in 2.X, if there’s a local string

module in the package, it will be imported instead. To get the same behavior in 3.X,

and in 2.X when the absolute import change is enabled, run a statement of the following

form to force a relative import:

from . import string # Searches this package only

This statement works in both Python 2.X and 3.X today. The only difference in the 3.X

model is that it is required in order to load a module that is located in the same package

directory as the file in which this appears, when the file is being used as part of a package

(and unless full package paths are spelled out).

Notice that leading dots can be used to force relative imports only with the from state-

ment, not with the import statement. In Python 3.X, the import modname statement is

always absolute-only, skipping the containing package’s directory. In 2.X, this state-

ment form still performs relative imports, searching the package’s directory first. from

statements without leading dots behave the same as import statements—absolute-only

Package Relative Imports | 719

www.it-ebooks.info

in 3.X (skipping the package directory), and relative-then-absolute in 2.X (searching

the package directory first).

Other dot-based relative reference patterns are possible, too. Within a module file lo-

cated in a package directory named mypkg, the following alternative import forms work

as described:

from .string import name1, name2 # Imports names from mypkg.string

from . import string # Imports mypkg.string

from .. import string # Imports string sibling of mypkg

To understand these latter forms better, and to justify all this added complexity, we

need to take a short detour to explore the rationale behind this change.

Why Relative Imports?

Besides making intrapackage imports more explicit, this feature is designed in part to

allow scripts to resolve ambiguities that can arise when a same-named file appears in

multiple places on the module search path. Consider the following package directory:

mypkg\

__init__.py

main.py

string.py

This defines a package named mypkg containing modules named mypkg.main and

mypkg.string. Now, suppose that the main module tries to import a module named

string. In Python 2.X and earlier, Python will first look in the mypkg directory to per-

form a relative import. It will find and import the string.py file located there, assigning

it to the name string in the mypkg.main module’s namespace.

It could be, though, that the intent of this import was to load the Python standard

library’s string module instead. Unfortunately, in these versions of Python, there’s no

straightforward way to ignore mypkg.string and look for the standard library’s string

module located on the module search path. Moreover, we cannot resolve this with full

package import paths, because we cannot depend on any extra package directory

structure above the standard library being present on every machine.

In other words, simple imports in packages can be both ambiguous and error-prone.

Within a package, it’s not clear whether an import spam statement refers to a module

within or outside the package. As one consequence, a local module or package can hide

another hanging directly off of sys.path, whether intentionally or not.

In practice, Python users can avoid reusing the names of standard library modules they

need for modules of their own (if you need the standard string, don’t name a new

module string!). But this doesn’t help if a package accidentally hides a standard mod-

ule; moreover, Python might add a new standard library module in the future that has

the same name as a module of your own. Code that relies on relative imports is also

720 | Chapter 24: Module Packages

www.it-ebooks.info

less easy to understand, because the reader may be confused about which module is

intended to be used. It’s better if the resolution can be made explicit in code.

The relative imports solution in 3.X

To address this dilemma, imports run within packages have changed in Python 3.X to

be absolute-only (and can be made so as an option in 2.X). Under this model, an

import statement of the following form in our example file mypkg/main.py will always

find a string module outside the package, via an absolute import search of sys.path:

import string # Imports string outside package (absolute)

A from import without leading-dot syntax is considered absolute as well:

from string import name # Imports name from string outside package

If you really want to import a module from your package without giving its full path

from the package root, though, relative imports are still possible if you use the dot

syntax in the from statement:

from . import string # Imports mypkg.string here (relative)

This form imports the string module relative to the current package only and is the

relative equivalent to the prior import example’s absolute form (both load a module as

a whole). When this special relative syntax is used, the package’s directory is the only

directory searched.

We can also copy specific names from a module with relative syntax:

from .string import name1, name2 # Imports names from mypkg.string

This statement again refers to the string module relative to the current package. If this

code appears in our mypkg.main module, for example, it will import name1 and name2

from mypkg.string.

In effect, the “.” in a relative import is taken to stand for the package directory con-

taining the file in which the import appears. An additional leading dot performs the

relative import starting from the parent of the current package. For example, this state-

ment:

from .. import spam # Imports a sibling of mypkg

will load a sibling of mypkg—i.e., the spam module located in the package’s own con-

tainer directory, next to mypkg. More generally, code located in some module A.B.C can

use any of these forms:

from . import D # Imports A.B.D (. means A.B)

from .. import E # Imports A.E (.. means A)

from .D import X # Imports A.B.D.X (. means A.B)

from ..E import X # Imports A.E.X (.. means A)

Package Relative Imports | 721

www.it-ebooks.info

Relative imports versus absolute package paths

Alternatively, a file can sometimes name its own package explicitly in an absolute im-

port statement, relative to a directory on sys.path. For example, in the following,

mypkg will be found in an absolute directory on sys.path:

from mypkg import string # Imports mypkg.string (absolute)

However, this relies on both the configuration and the order of the module search path

settings, while relative import dot syntax does not. In fact, this form requires that the

directory immediately containing mypkg be included in the module search path. It prob-

ably is if mypkg is the package root (or else the package couldn’t be used from the outside

in the first place!), but this directory may be nested in a much larger package tree. If

mypkg isn’t the package’s root, absolute import statements must list all the directories

below the package’s root entry in sys.path when naming packages explicitly like this:

from system.section.mypkg import string # system container on sys.path only

In large or deep packages, that could be substantially more work to code than a dot:

from . import string # Relative import syntax

With this latter form, the containing package is searched automatically, regardless of

the search path settings, search path order, and directory nesting. On the other hand,

the full-path absolute form will work regardless of how the file is being used—as part

of a program or package—as we’ll explore ahead.

The Scope of Relative Imports

Relative imports can seem a bit perplexing on first encounter, but it helps if you re-

member a few key points about them:

•Relative imports apply to imports within packages only. Keep in mind that

this feature’s module search path change applies only to import statements within

module files used as part of a package—that is, intrapackage imports. Normal

imports in files not used as part of a package still work exactly as described earlier,

automatically searching the directory containing the top-level script first.

•Relative imports apply to the from statement only. Also remember that this

feature’s new syntax applies only to from statements, not import statements. It’s

detected by the fact that the module name in a from begins with one or more dots

(periods). Module names that contain embedded dots but don’t have a leading dot

are package imports, not relative imports.

In other words, package relative imports in 3.X really boil down to just the removal of

2.X’s inclusive search path behavior for packages, along with the addition of special

from syntax to explicitly request that relative package-only behavior be used. If you

coded your package imports in the past so that they did not depend upon 2.X’s implicit

relative lookup (e.g., by always spelling out full paths from a package root), this change

722 | Chapter 24: Module Packages

www.it-ebooks.info

is largely a moot point. If you didn’t, you’ll need to update your package files to use

the new from syntax for local package files, or full absolute paths.

Module Lookup Rules Summary

With packages and relative imports, the module search story in Python 3.X that we

have seen so far can be summarized as follows:

• Basic modules with simple names (e.g., A) are located by searching each directory

on the sys.path list, from left to right. This list is constructed from both system

defaults and user-configurable settings described in Chapter 22.

• Packages are simply directories of Python modules with a special __init__.py file,

which enables A.B.C directory path syntax in imports. In an import of A.B.C, for

example, the directory named A is located relative to the normal module import

search of sys.path, B is another package subdirectory within A, and C is a module

or other importable item within B.

• Within a package’s files, normal import and from statements use the same

sys.path search rule as imports elsewhere. Imports in packages using from state-

ments and leading dots, however, are relative to the package; that is, only the

package directory is checked, and the normal sys.path lookup is not used. In from .

import A, for example, the module search is restricted to the directory containing

the file in which this statement appears.

Python 2.X works the same, except that normal imports without dots also automatically

search the package directory first before proceeding on to sys.path.

In sum, Python imports select between relative (in the containing directory) and abso-

lute (in a directory on sys.path) resolutions as follows:

Dotted imports: from . import m

Are relative-only in both 2.X and 3.X

Nondotted imports: import m, from m import x

Are relative-then-absolute in 2.X, and absolute-only in 3.X

As we’ll see later, Python 3.3 adds another flavor to modules—namespace packages—

which is largely disjointed from the package-relative story we’re covering here. This

newer model supports package-relative imports too, and is simply a different way to

construct a package. It augments the import search procedure to allow package content

to be spread across multiple simple directories as a last-resort resolution. Thereafter,

though, the composite package behaves the same in terms of relative import rules.

Relative Imports in Action

But enough theory: let’s run some simple code to demonstrate the concepts behind

relative imports.

Package Relative Imports | 723

www.it-ebooks.info

Imports outside packages

First of all, as mentioned previously, this feature does not impact imports outside a

package. Thus, the following finds the standard library string module as expected:

C:\code> c:\Python33\python

>>> import string

>>> string

But if we add a module of the same name in the directory we’re working in, it is selected

instead, because the first entry on the module search path is the current working di-

rectory (CWD):

# code\string.py

print('string' * 8)

C:\code> c:\Python33\python

>>> import string

stringstringstringstringstringstringstringstring

>>> string

In other words, normal imports are still relative to the “home” directory (the top-level

script’s container, or the directory you’re working in). In fact, package relative import

syntax is not even allowed in code that is not in a file being used as part of a package:

>>> from . import string

SystemError: Parent module '' not loaded, cannot perform relative import

In this section, code entered at the interactive prompt behaves the same as it would if

run in a top-level script, because the first entry on sys.path is either the interactive

working directory or the directory containing the top-level file. The only difference is

that the start of sys.path is an absolute directory, not an empty string:

# code\main.py

import string # Same code but in a file

print(string)

C:\code> C:\python33\python main.py # Equivalent results in 2.X

stringstringstringstringstringstringstringstring

Similarly, a from . import string in this nonpackage file fails the same as it does at the

interactive prompt—programs and packages are different file usage modes.

Imports within packages

Now, let’s get rid of the local string module we coded in the CWD and build a package

directory there with two modules, including the required but empty test\pkg

\__init__.py file. Package roots in this section are located in the CWD added automat-

ically to sys.path, so we don’t need to set PYTHONPATH. I’ll also largely omit empty

724 | Chapter 24: Module Packages

www.it-ebooks.info

__init__.py files and most error message text for space (and non-Windows readers will

have to pardon the shell commands here, and translate for your platform):

C:\code> del string* # del __pycache__\string* for bytecode in 3.2+

C:\code> mkdir pkg

c:\code> notepad pkg\__init__.py

# code\pkg\spam.py

import eggs # <== Works in 2.X but not 3.X!

print(eggs.X)

# code\pkg\eggs.py

X = 99999

import string

print(string)

The first file in this package tries to import the second with a normal import statement.

Because this is taken to be relative in 2.X but absolute in 3.X, it fails in the latter. That

is, 2.X searches the containing package first, but 3.X does not. This is the incompatible

behavior you have to be aware of in 3.X:

C:\code> c:\Python27\python

>>> import pkg.spam

99999

C:\code> c:\Python33\python

>>> import pkg.spam

ImportError: No module named 'eggs'

To make this work in both 2.X and 3.X, change the first file to use the special relative

import syntax, so that its import searches the package directory in 3.X too:

# code\pkg\spam.py

from . import eggs # <== Use package relative import in 2.X or 3.X

print(eggs.X)

# code\pkg\eggs.py

X = 99999

import string

print(string)

C:\code> c:\Python27\python

>>> import pkg.spam

99999

C:\code> c:\Python33\python

>>> import pkg.spam

99999

Package Relative Imports | 725

www.it-ebooks.info

Imports are still relative to the CWD

Notice in the preceding example that the package modules still have access to standard

library modules like string—their normal imports are still relative to the entries on the

module search path. In fact, if you add a string module to the CWD again, imports in

a package will find it there instead of in the standard library. Although you can skip

the package directory with an absolute import in 3.X, you still can’t skip the home

directory of the program that imports the package:

# code\string.py

print('string' * 8)

# code\pkg\spam.py

from . import eggs

print(eggs.X)

# code\pkg\eggs.py

X = 99999

import string # <== Gets string in CWD, not Python lib!

print(string)

C:\code> c:\Python33\python # Same result in 2.X

>>> import pkg.spam

stringstringstringstringstringstringstringstring

99999

Selecting modules with relative and absolute imports

To show how this applies to imports of standard library modules, reset the package

again. Get rid of the local string module, and define a new one inside the package itself:

C:\code> del string* # del __pycache__\string* for bytecode in 3.2+

# code\pkg\spam.py

import string # <== Relative in 2.X, absolute in 3.X

print(string)

# code\pkg\string.py

print('Ni' * 8)

Now, which version of the string module you get depends on which Python you use.

As before, 3.X interprets the import in the first file as absolute and skips the package,

but 2.X does not—another example of the incompatible behavior in 3.X:

C:\code> c:\Python33\python

>>> import pkg.spam

C:\code> c:\Python27\python

>>> import pkg.spam

NiNiNiNiNiNiNiNi

726 | Chapter 24: Module Packages

www.it-ebooks.info

Using relative import syntax in 3.X forces the package to be searched again, as it is in

2.X—by using absolute or relative import syntax in 3.X, you can either skip or select

the package directory explicitly. In fact, this is the use case that the 3.X model addresses:

# code\pkg\spam.py

from . import string # <== Relative in both 2.X and 3.X

print(string)

# code\pkg\string.py

print('Ni' * 8)

C:\code> c:\Python33\python

>>> import pkg.spam

NiNiNiNiNiNiNiNi

C:\code> c:\Python27\python

>>> import pkg.spam

NiNiNiNiNiNiNiNi

Relative imports search packages only

It’s also important to note that relative import syntax is really a binding declaration, not

just a preference. If we delete the string.py file and any associated byte code in this

example now, the relative import in spam.py fails in both 3.X and 2.X, instead of falling

back on the standard library (or any other) version of this module:

# code\pkg\spam.py

from . import string # <== Fails in both 2.X and 3.X if no string.py here!

C:\code> del pkg\string*

C:\code> C:\python33\python

>>> import pkg.spam

ImportError: cannot import name string

C:\code> C:\python27\python

>>> import pkg.spam

ImportError: cannot import name string

Modules referenced by relative imports must exist in the package directory.

Imports are still relative to the CWD, again

Although absolute imports let you skip package modules this way, they still rely on

other components of sys.path. For one last test, let’s define two string modules of our

own. In the following, there is one module by that name in the CWD, one in the pack-

age, and another in the standard library:

# code\string.py

print('string' * 8)

# code\pkg\spam.py

Package Relative Imports | 727

www.it-ebooks.info

from . import string # <== Relative in both 2.X and 3.X

print(string)

# code\pkg\string.py

print('Ni' * 8)

When we import the string module with relative import syntax like this, we get the

version in the package in both 2.X and 3.X, as desired:

C:\code> c:\Python33\python # Same result in 2.X

>>> import pkg.spam

NiNiNiNiNiNiNiNi

When absolute syntax is used, though, the module we get varies per version again. 2.X

interprets this as relative to the package first, but 3.X makes it “absolute,” which in this

case really just means it skips the package and loads the version relative to the CWD

—not the version in the standard library:

# code\string.py

print('string' * 8)

# code\pkg\spam.py

import string # <== Relative in 2.X, "absolute" in 3.X: CWD!

print(string)

# code\pkg\string.py

print('Ni' * 8)

C:\code> c:\Python33\python

>>> import pkg.spam

stringstringstringstringstringstringstringstring

C:\code> c:\Python27\python

>>> import pkg.spam

NiNiNiNiNiNiNiNi

As you can see, although packages can explicitly request modules within their own

directories with dots, their “absolute” imports are otherwise still relative to the rest of

the normal module search path. In this case, a file in the program using the package

hides the standard library module the package may want. The change in 3.X simply

allows package code to select files either inside or outside the package (i.e., relatively

or absolutely). Because import resolution can depend on an enclosing context that may

not be foreseen, though, absolute imports in 3.X are not a guarantee of finding a module

in the standard library.

Experiment with these examples on your own for more insight. In practice, this is not

usually as ad hoc as it might seem: you can generally structure your imports, search

paths, and module names to work the way you wish during development. You should

keep in mind, though, that imports in larger systems may depend upon context of use,

and the module import protocol is part of a successful library’s design.

728 | Chapter 24: Module Packages

www.it-ebooks.info

Pitfalls of Package-Relative Imports: Mixed Use

Now that you’ve learned about package-relative imports, you should also keep in mind

that they may not always be your best option. Absolute package imports, with a com-

plete directory path relative to a directory on sys.path, are still sometimes preferred

over both implicit package-relative imports in Python 2.X, and explicit package-relative

import dot syntax in both Python 2.X and 3.X. This issue may seem obscure, but will

likely become important fairly soon after you start coding packages of your own.

As we’ve seen, Python 3.X’s relative import syntax and absolute search rule default

make intrapackage imports explicit and thus easier to notice and maintain, and allow

explicit choice in some name conflict scenarios. However, there are also two major

ramifications of this model that you should be aware of:

• In both Python 3.X and 2.X, use of package-relative import statements implicitly

binds a file to a package directory and role, and precludes it from being used in

other ways.

• In Python 3.X, the new relative search rule change means that a file can no longer

serve as both script and package module as easily as it could in 2.X.

These constraint’s causes are a bit subtle, but because the following are simultaneously

true:

• Python 3.X and 2.X do not allow from . relative syntax to be used unless the im-

porter is being used as part of a package (i.e., is being imported from somewhere

else).

• Python 3.X does not search a package module’s own directory for imports, unless

from . relative syntax is used (or the module is in the current working directory or

main script’s home directory).

Use of relative imports prevents you from creating directories that serve as both exe-

cutable programs and externally importable packages in 3.X and 2.X. Moreover, some

files can no longer serve as both script and package module in 3.X as they could in 2.X.

In terms of import statements, the rules pan out as follows—the first is for package

mode only in both Pythons, and the second is for program mode only in 3.X:

from . import mod # Not allowed in nonpackage mode in both 2.X and 3.X

import mod # Does not search file's own directory in package mode in 3.X

The net effect is that for files to be used in either 2.X or 3.X, you may need to choose a

single usage mode—package (with relative imports) or program (with simple imports),

and isolate true package module files in a subdirectory apart from top-level script files.

Alternatively, you can attempt manual sys.path changes (a generally brittle and error-

prone task), or always use full package paths in absolute imports instead of either

package-relative syntax or simple imports, and assume the package root is on the mod-

ule search path:

from system.section.mypkg import mod # Works in both program and package mode

Package Relative Imports | 729

www.it-ebooks.info

Of all these schemes, the last—full package path imports—may be the most portable

and functional, but we need to turn to more concrete code to see why.

The issue

For example, in Python 2.X it’s common to use the same single directory as both pro-

gram and package, using normal undotted imports. This relies on the script’s home

directory to resolve imports when used as a program, and the 2.X relative-then-absolute

rule to resolve intrapackage imports when used as a package. This won’t quite work in

3.X, though—in package mode, plain imports do not load modules in the same direc-

tory anymore, unless that directory also happens to be the same as the main file’s con-

tainer or the current working directory (and hence, be on sys.path).

Here’s what this looks like in action, stripped to a bare minimum of code (for brevity

in this section I again omit __init__.py package directory files required prior to Python

3.3, and for variety use the 3.3 Windows launcher covered in Appendix B):

# code\pkg\main.py

import spam

# code\pkg\spam.py

import eggs # <== Works if in "." = home of main script file

# code\pkg\eggs.py

print('Eggs' * 4) # But won't load this file when used as pkg in 3.X!

c:\code> python pkg\main.py # OK as program, in both 2.X and 3.X

EggsEggsEggsEggs

c:\code> python pkg\spam.py

EggsEggsEggsEggs

c:\code> py −2 # OK as package in 2.X: relative-then-absolute

>>> import pkg.spam # 2.X: plain imports search package directory first

EggsEggsEggsEggs

C:\code> py −3 # But 3.X fails to find file here: absolute only

>>> import pkg.spam # 3.X: plain imports search only CWD plus sys.path

ImportError: No module named 'eggs'

Your next step might be to add the required relative import syntax for 3.X use, but it

won’t help here. The following retains the single directory for both a main top-level

script and package modules, and adds the required dots—in both 2.X and 3.X this now

works when the directory is imported as a package, but fails when it is used as a program

directory (including attempts to run a module as a script directly):

# code\pkg\main.py

import spam

# code\pkg\spam.py

from . import eggs # <== Not a package if main file here (even if me)!

# code\pkg\eggs.py

730 | Chapter 24: Module Packages

www.it-ebooks.info

print('Eggs' * 4)

c:\code> python # OK as package but not program in both 3.X and 2.X

>>> import pkg.spam

EggsEggsEggsEggs

c:\code> python pkg\main.py

SystemError: ... cannot perform relative import

c:\code> python pkg\spam.py

SystemError: ... cannot perform relative import

Fix 1: Package subdirectories

In a mixed-use case like this, one solution is to isolate all but the main files used only

by the program in a subdirectory—this way, your intrapackage imports still work in all

Pythons, you can use the top directory as a standalone program, and the nested direc-

tory still serves as a package for use from other programs:

# code\pkg\main.py

import sub.spam # <== Works if move modules to pkg below main file

# code\pkg\sub\spam.py

from . import eggs # Package relative works now: in subdirectory

# code\pkg\sub\eggs.py

print('Eggs' * 4)

c:\code> python pkg\main.py # From main script: same result in 2.X and 3.X

EggsEggsEggsEggs

c:\code> python # From elsewhere: same result in 2.X and 3.X

>>> import pkg.sub.spam

EggsEggsEggsEggs

The potential downside of this scheme is that you won’t be able to run package modules

directly to test them with embedded self-test code, though tests can be coded separately

in their parent directory instead:

c:\code> py −3 pkg\sub\spam.py # But individual modules can't be run to test

SystemError: ... cannot perform relative import

Fix 2: Full path absolute import

Alternatively, full path package import syntax would address this case too—it requires

the directory above the package root to be in your path, though this is probably not an

extra requirement for a realistic software package. Most Python packages will either

require this setting, or arrange for it to be handled automatically with install tools (such

as distutils, which may store a package’s code in a directory on the default module

search path such as the site-packages root; see Chapter 22 for more details):

# code\pkg\main.py

import spam

Package Relative Imports | 731

www.it-ebooks.info

# code\pkg\spam.py

import pkg.eggs # <== Full package paths work in all cases, 2.X+3.X

# code\pkg\eggs.py

print('Eggs' * 4)

c:\code> set PYTHONPATH=C:\code

c:\code> python pkg\main.py # From main script: Same result in 2.X and 3.X

EggsEggsEggsEggs

c:\code> python # From elsewhere: Same result in 2.X and 3.X

>>> import pkg.spam

EggsEggsEggsEggs

Unlike the subdirectory fix, full path absolute imports like these also allow you to run

your modules standalone to test:

c:\code> python pkg\spam.py # Individual modules are runnable too in 2.X and 3.X

EggsEggsEggsEggs

Example: Application to module self-test code (preview)

To summarize, here’s another typical example of the issue and its full path resolution.

This uses a common technique we’ll expand on in the next chapter, but the idea is

simple enough to include as a preview here (though you may want to review this again

later—the coverage makes more sense here).

Consider the following two modules in a package directory, the second of which in-

cludes self-test code. In short, a module’s __name__ attribute is the string “__main__”

when it is being run as a top-level script, but not when it is being imported, which

allows it to be used as both module and script:

# code\dualpkg\m1.py

def somefunc():

print('m1.somefunc')

# code\dualpkg\m2.py

...import m1 here... # Replace me with a real import statement

def somefunc():

m1.somefunc()

print('m2.somefunc')

if __name__ == '__main__':

somefunc() # Self-test or top-level script usage mode code

The second of these needs to import the first where the “...import m1 here...” place-

holder appears. Replacing this line with a relative import statement works when the

file is used as a package, but is not allowed in nonpackage mode by either 2.X or 3.X

(results and error messages are omitted here for space; see the file dualpkg\results.txt

in the book’s examples for the full listing):

# code\dualpkg\m2.py

from . import m1

732 | Chapter 24: Module Packages

www.it-ebooks.info

c:\code> py −3

>>> import dualpkg.m2 # OK

C:\code> py −2

>>> import dualpkg.m2 # OK

c:\code> py −3 dualpkg\m2.py # Fails!

c:\code> py −2 dualpkg\m2.py # Fails!

Conversely, a simple import statement works in nonpackage mode in both 2.X and

3.X, but fails in package mode in 3.X only, because such statements do not search the

package directory in 3.X:

# code\dualpkg\m2.py

import m1

c:\code> py −3

>>> import dualpkg.m2 # Fails!

c:\code> py −2

>>> import dualpkg.m2 # OK

c:\code> py −3 dualpkg\m2.py # OK

c:\code> py −2 dualpkg\m2.py # OK

And finally, using full package paths works again in both usage modes and Pythons, as

long as the package’s root is on the module search path (as it must be to be used

elsewhere):

# code\dualpkg\m2.py

import dualpkg.m1 as m1 # And: set PYTHONPATH=c:\code

c:\code> py −3

>>> import dualpkg.m2 # OK

C:\code> py −2

>>> import dualpkg.m2 # OK

c:\code> py −3 dualpkg\m2.py # OK

c:\code> py −2 dualpkg\m2.py # OK

In sum, unless you’re willing and able to isolate your modules in subdirectories below

scripts, full package path imports are probably preferable to package-relative imports

—though they’re more typing, they handle all cases, and they work the same in 2.X

and 3.X. There may be additional workarounds that involve extra tasks (e.g., manually

setting sys.path in your code), but we’ll skip them here because they are more obscure

and rely on import semantics, which is error-prone; full package imports rely only on

the basic package mechanism.

Naturally, the extent to which this may impact your modules can vary per package;

absolute imports may also require changes when directories are reorganized, and rel-

ative imports may become invalid if a local module is relocated.

Package Relative Imports | 733

www.it-ebooks.info

Be sure to also watch for future Python changes on this front. Although

this book covers Python up to 3.3 only, at this writing, there is talk in a

PEP of possibly addressing some package issues in Python 3.4, perhaps

even allowing relative imports to be used in program mode. On the other

hand, this initiative’s scope and outcome is uncertain and would work

only on 3.4 and later; the full path solution given here is version-neutral;

and 3.4 is more than a year away in any event. That is, you can wait for

a change to a 3.X change that limited functionality, or simply use tried-

and-true full package paths.

Python 3.3 Namespace Packages

Now that you’ve learned all about package and package-relative imports, I need to

explain that there’s a new option that modifies some of the ideas we just covered. At

least abstractly, as of release 3.3 Python has four import models. From original to new-

est:

Basic module imports: import mod, from mod import attr

The original model: imports of files and their contents, relative to the sys.path

module search path

Package imports: import dir1.dir2.mod, from dir1.mod import attr

Imports that give directory path extensions relative to the sys.path module search

path, where each package is contained in a single directory and has an initialization

file, in Python 2.X and 3.X

Package-relative imports: from . import mod (relative), import mod (absolute)

The model used for intrapackage imports of the prior section, with its relative or

absolute lookup schemes for dotted and nondotted imports, available but differing

in Python 2.X and 3.X

Namespace packages: import splitdir.mod

The new namespace package model that we’ll survey here, which allows packages

to span multiple directories, and requires no initialization file, introduced in Python

3.3

The first two of these are self-contained, but the third tightens up the search order and

extends syntax for intrapackage imports, and the fourth upends some of the core no-

tions and requirements of the prior package model. In fact, Python 3.3 (and later) now

has two flavors of packages:

• The original model, now known as regular packages

• The alternative model, known as namespace packages

This is similar in spirit to the “classic” and “new style” class model dichotomy we’ll

meet in the next part of this book, though the new is more an addition to the old here.

The original and new package models are not mutually exclusive, and can be used

734 | Chapter 24: Module Packages

www.it-ebooks.info

simultaneously in the same program. In fact, the new namespace package model works

as something of a fallback option, recognized only if normal modules and regular pack-

ages of the same name are not present on the module search path.

The rationale for namespace packages is rooted in package installation goals that may

seem obscure unless you are responsible for such tasks, and is better addressed by this

feature’s PEP document. In short, though, they resolve a potential for collision of mul-

tiple __init__.py files when package parts are merged, by removing this file completely.

Moreover, by providing standard support for packages that can be split across multiple

directories and located in multiple sys.path entries, namespace packages both enhance

install flexibility and provide a common mechanism to replace the multiple incompat-

ible solutions that have arisen to address this goal.

Though too early to judge their uptake, average Python users may find namespace

packages to be a useful and alternative extension to the regular package model—one

that does not require initialization files, and allows any directory of code to be used as

an importable package. To see why, let’s move on to the details.

Namespace Package Semantics

A namespace package is not fundamentally different from a regular package; it is just

a different way of creating packages. Moreover, they are still relative to sys.path at the

top level: the leftmost component of a dotted namespace package path must still be

located in an entry on the normal module search path.

In terms of physical structure, though, the two can differ substantially. Regular pack-

ages still must have an __init__.py file that is run automatically, and reside in a single

directory as before. By contrast, new-style namespace packages cannot contain an

__init__.py, and may span multiple directories that are collected at import time. In fact,

none of the directories that make up a namespace package can have an __init__.py, but

the content nested within each of them is treated as a single package.

The import algorithm

To truly understand namespace packages, we have to look under the hood to see how

the import operation works in 3.3. During imports, Python still iterates over each di-

rectory in the module search path, sys.path, just as in 3.2 and earlier. In 3.3, though,

while looking for an imported module or package named spam, for each directory in

the module search path, Python tests for a wider variety of matching criteria, in the

following order:

1. If directory\spam\__init__.py is found, a regular package is imported and re-

turned.

2. If directory\spam.{py, pyc, or other module extension} is found, a simple module

is imported and returned.

Python 3.3 Namespace Packages | 735

www.it-ebooks.info

3. If directory\spam is found and is a directory, it is recorded and the scan continues

with the next directory in the search path.

4. If none of the above was found, the scan continues with the next directory in the

search path.

If the search path scan completes without returning a module or package by steps 1 or

2, and at least one directory was recorded by step 3, then a namespace package is created.

The creation of the namespace package happens immediately, and is not deferred until

a sublevel import occurs. The new namespace package has a __path__ attribute set to

an iterable of the directory path strings that were found and recorded during the scan

by step 3, but does not have a __file__.

The __path__ attribute is then used in later, deeper accesses to search all package com-

ponents—each recorded entry on a namespace package’s __path__ is searched when-

ever further nested items are requested, much like the sole directory of a regular pack-

age.

Viewed another way, the __path__ attribute of a namespace package serves the same

role for lower-level components that sys.path does at the top for the leftmost compo-

nent of package import paths; it becomes the “parent path” for accessing lower items

using the same four-step procedure just sketched.

The net result is that a namespace package is a sort of virtual concatenation of directories

located via multiple sys.path entries. Once a namespace package is created, though,

there is no functional difference between it and a regular package; it supports everything

we’ve learned for regular packages, including package-relative import syntax.

Impacts on Regular Packages: Optional __init__.py

As one consequence of this new import procedure, as of Python 3.3 packages no longer

require __init__.py files—when a single-directory package does not have this file, it will

be treated as a single-directory namespace package, and no warning will be issued. This

is a major relaxation of prior rules, but a commonly requested change; many packages

require no initialization code, and it seemed extraneous to have to create an empty

initialization file in such cases. This is finally no longer required as of 3.3.

At the same time, the original regular package model is still fully supported, and au-

tomatically runs code in __init__.py as before as an initialization hook. Moreover, when

it’s known that a package will never be a portion of a split namespace package, there

is a performance advantage to coding it as a regular package with an __init__.py. Cre-

ation and loading of a regular package occurs immediately when it is located along the

path. With namespace packages, all entries in the path must be scanned before the

package is created. More formally, regular packages stop the prior section’s algorithm

at step 1; namespace packages do not.

736 | Chapter 24: Module Packages

www.it-ebooks.info

Per this change’s PEP, there is no plan to remove support of regular packages—at least,

that’s the story today; change is always a possibility in open source projects (indeed,

the prior edition quoted plans on string formatting and relative imports in 2.X that were

later abandoned), so as usual, be sure to watch for future developments on this front.

Given the performance advantage and auto-initialization code of regular packages,

though, it seems unlikely that they would be removed altogether.

Namespace Packages in Action

To see how namespace packages work, consider the following two modules and nested

directory structure—with two subdirectories named sub located in different parent

directories, dir1 and dir2:

C:\code\ns\dir1\sub\mod1.py

C:\code\ns\dir2\sub\mod2.py

If we add both dir1 and dir2 to the module search path, sub becomes a namespace

package spanning both, with the two module files available under that name even

though they live in separate physical directories. Here’s the files’ contents and the re-

quired path settings on Windows: there are no __init__.py files here—in fact there

cannot be in namespace packages, as this is their chief physical differentiation:

c:\code> mkdir ns\dir1\sub # Two dirs of same name in different dirs

c:\code> mkdir ns\dir2\sub # And similar outside Windows

c:\code> type ns\dir1\sub\mod1.py # Module files in different directories

print(r'dir1\sub\mod1')

c:\code> type ns\dir2\sub\mod2.py

print(r'dir2\sub\mod2')

c:\code> set PYTHONPATH=C:\code\ns\dir1;C:\code\ns\dir2

Now, when imported directly in 3.3 and later, the namespace package is the virtual

concatenation of its individual directory components, and allows further nested parts

to be accessed through its single, composite name with normal imports:

c:\code> C:\Python33\python

>>> import sub

>>> sub # Namespace packages: nested search paths

>>> sub.__path__

_NamespacePath(['C:\\code\\ns\\dir1\\sub', 'C:\\code\\ns\\dir2\\sub'])

>>> from sub import mod1

dir1\sub\mod1

>>> import sub.mod2 # Content from two different directories

dir2\sub\mod2

>>> mod1

Python 3.3 Namespace Packages | 737

www.it-ebooks.info

>>> sub.mod2

This is also true if we import through the namespace package name immediately—

because the namespace package is made when first reached, the timing of path exten-

sions is irrelevant:

c:\code> C:\Python33\python

>>> import sub.mod1

dir1\sub\mod1

>>> import sub.mod2 # One package spanning two directories

dir2\sub\mod2

>>> sub.mod1

>>> sub.mod2

>>> sub

>>> sub.__path__

_NamespacePath(['C:\\code\\ns\\dir1\\sub', 'C:\\code\\ns\\dir2\\sub'])

Interestingly, relative imports work in namespace packages too—in the following, the

relative import statement references a file in the package, even though the referenced

file resides in a different directory:

c:\code> type ns\dir1\sub\mod1.py

from . import mod2 # And "from . import string" still fails

print(r'dir1\sub\mod1')

c:\code> C:\Python33\python

>>> import sub.mod1 # Relative import of mod2 in another dir

dir2\sub\mod2

dir1\sub\mod1

>>> import sub.mod2 # Already imported module not rerun

>>> sub.mod2

As you can see, namespace packages are like ordinary single-directory packages in every

way, except for having a split physical storage—which is why single directory name-

spaces packages without __init__.py files are exactly like regular packages, but with no

initialization logic to be run.

Namespace Package Nesting

Namespace packages even support arbitrary nesting—once a package namespace pack-

age is created, it serves essentially the same role at its level that sys.path does at the

top, becoming the “parent path” for lower levels. Continuing the prior section’s ex-

ample:

c:\code> mkdir ns\dir2\sub\lower # Further nested components

c:\code> type ns\dir2\sub\lower\mod3.py

738 | Chapter 24: Module Packages

www.it-ebooks.info

print(r'dir2\sub\lower\mod3')

c:\code> C:\Python33\python

>>> import sub.lower.mod3 # Namespace pkg nested in namespace pkg

dir2\sub\lower\mod3

c:\code> C:\Python33\python

>>> import sub # Same effect if accessed incrementally

>>> import sub.mod2

dir2\sub\mod2

>>> import sub.lower.mod3

dir2\sub\lower\mod3

>>> sub.lower # A single-directory namespace pkg

>>> sub.lower.__path__

_NamespacePath(['C:\\code\\ns\\dir2\\sub\\lower'])

In the preceding, sub is a namespace package split across two directories, and

sub.lower is a single-directory namespace package nested within the portion of sub

physically located in dir2. sub.lower is also the namespace package equivalent of a

regular package with no __init__.py.

This nesting behavior holds true whether the lower component is a module, regular

package, or another namespace package—by serving as new import search paths,

namespace packages allow all three to be nested within them freely:

c:\code> mkdir ns\dir1\sub\pkg

C:\code> type ns\dir1\sub\pkg\__init__.py

print(r'dir1\sub\pkg\__init__.py')

c:\code> C:\Python33\python

>>> import sub.mod2 # Nested module

dir2\sub\mod2

>>> import sub.pkg # Nested regular package

dir1\sub\pkg\__init__.py

>>> import sub.lower.mod3 # Nested namespace package

dir2\sub\lower\mod3

>>> sub # Modules, packages,and namespaces

>>> sub.mod2

>>> sub.pkg

>>> sub.lower

>>> sub.lower.mod3

Trace through this example’s files and directories for more insight. As you can see,

namespace packages integrate seamlessly into the former import models, and extend

it with new functionality.

Python 3.3 Namespace Packages | 739

www.it-ebooks.info

Files Still Have Precedence over Directories

As explained earlier, part of the purpose of __init___.py files in regular packages is to

declare the directory as a package—it tells Python to use the directory, rather than

skipping ahead to a possible file of the same name later on the path. This avoids inad-

vertently choosing a noncode subdirectory that accidentally appears early on the path,

over a desired module of the same name.

Because namespace packages do not require these special files, they would seem to

invalidate this safeguard. This isn’t the case, though—because the namespace algo-

rithm outlined earlier continues scanning the path after a namespace directory has been

found, files later on the path still have priority over earlier directories with no

__init__.py. For example, consider the following directories and modules:

c:\code> mkdir ns2

c:\code> mkdir ns3

c:\code> mkdir ns3\dir

c:\code> notepad ns3\dir\ns2.py

c:\code> type ns3\dir\ns2.py

print(r'ns3\dir\ns2.py!')

The ns2 directory here cannot be imported in Python 3.2 and earlier—it’s not a regular

package, as it lacks an __init__.py initialization file. This directory can be imported

under 3.3, though—it’s a namespace package directory in the current working direc-

tory, which is always the first item on the sys.path module search path irrespective of

PYTHONPATH settings:

c:\code> set PYTHONPATH=

c:\code> py −3.2

>>> import ns2

ImportError: No module named ns2

c:\code> py −3.3

>>> import ns2

>>> ns2 # A single-directory namespace package in CWD

>>> ns2.__path__

_NamespacePath(['.\\ns2'])

But watch what happens when the directory containing a file of the same name as a

namespace directory is added later on the search path, via PYTHONPATH settings—the file

is used instead, because Python keeps searching later path entries after a namespace

package directory is found. It stops searching only when a module or regular package

is located, or the path has been completely scanned. Namespace packages are returned

only if nothing else was found along the way:

c:\code> set PYTHONPATH=C:\code\ns3\dir

c:\code> py −3.3

>>> import ns2 # Use later module file, not same-named directory!

ns3\dir\ns2.py!

>>> ns2

740 | Chapter 24: Module Packages

www.it-ebooks.info

>>> import sys

>>> sys.path[:2] # First '' means current working directory, CWD

['', 'C:\\code\\ns3\\dir']

In fact, setting the path to include a module works the same as it does in earlier Pythons,

even if a same-named namespace directory appears earlier on the path; namespace

packages are used in 3.3 only in cases that would be errors in earlier Pythons:

c:\code> py −3.2

>>> import ns2

ns3\dir\ns2.py!

>>> ns2

This is also why none of the directories in a namespace package is allowed to have a

__init__.py file: as soon as the import algorithm finds one that does, it returns a regular

package immediately, and abandons the path search and the namespace package. Put

more formally, the import algorithm chooses a namespace package only at the end of

the path scan, and stops at steps 1 or 2 if either a regular package or module file is found

sooner.

The net effect is that both module files and regular packages anywhere on the module

search path have precedence over namespace package directories. In the following, for

example, a namespace package called sub exists as the concatenation of same-named

directories under dir1 and dir2 on the path:

c:\code> mkdir ns4\dir1\sub

c:\code> mkdir ns4\dir2\sub

c:\code> set PYTHONPATH=c:\code\ns4\dir1;c:\code\ns4\dir2

c:\code> py −3

>>> import sub

>>> sub

>>> sub.__path__

_NamespacePath(['c:\\code\\ns4\\dir1\\sub', 'c:\\code\\ns4\\dir2\\sub'])

Much like a module file, though, a regular package added in the rightmost path entry

takes priority over same-named namespace package directories too—the import path

scan starts recording a namespace package tentatively in dir1 as before, but abandons

it when the regular package is detected in dir2:

c:\code> notepad ns4\dir2\sub\__init__.py

c:\code> py −3

>>> import sub # Use later reg. package, not same-named directory!

>>> sub

Though a useful extension, because namespace packages are available only to readers

using Python 3.3 (and later) I’m going to defer to Python’s manuals for more details on

the subject. See especially this change’s PEP document for this change’s rationale, ad-

ditional details, and more comprehensive examples.

Python 3.3 Namespace Packages | 741

www.it-ebooks.info

Chapter Summary

This chapter introduced Python’s package import model—an optional but useful way

to explicitly list part of the directory path leading up to your modules. Package imports

are still relative to a directory on your module import search path, but your script gives

the rest of the path to the module explicitly.

As we’ve seen, packages not only make imports more meaningful in larger systems, but

also simplify import search path settings if all cross-directory imports are relative to a

common root directory, and resolve ambiguities when there is more than one module

of the same name—including the name of the enclosing directory in a package import

helps distinguish between them.

Because it’s relevant only to code in packages, we also explored the newer relative

import model here—a way for imports in package files to select modules in the same

package explicitly using leading dots in a from, instead of relying on an older and error-

prone implicit package search rule. Finally, we surveyed Python 3.3 namespace pack-

ages, which allow a logical package to span multiple physical directories as a fallback

option of import searches, and remove the initialization file requirements of the prior

model.

In the next chapter, we will survey a handful of more advanced module-related topics,

such as the __name__ usage mode variable and name-string imports. As usual, though,

let’s close out this chapter first with a short quiz to review what you’ve learned here.

Test Your Knowledge: Quiz

1. What is the purpose of an __init__.py file in a module package directory?

2. How can you avoid repeating the full package path every time you reference a

package’s content?

3. Which directories require __init__.py files?

4. When must you use import instead of from with packages?

5. What is the difference between from mypkg import spam and from . import spam?

6. What is a namespace package?

Test Your Knowledge: Answers

1. The __init__.py file serves to declare and initialize a regular module package;

Python automatically runs its code the first time you import through a directory

in a process. Its assigned variables become the attributes of the module object

created in memory to correspond to that directory. It is also not optional until 3.3

and later—you can’t import through a directory with package syntax unless it

contains this file.

742 | Chapter 24: Module Packages

www.it-ebooks.info

2. Use the from statement with a package to copy names out of the package directly,

or use the as extension with the import statement to rename the path to a shorter

synonym. In both cases, the path is listed in only one place, in the from or import

statement.

3. In Python 3.2 and earlier, each directory listed in an executed import or from state-

ment must contain an __init__.py file. Other directories, including the directory

that contains the leftmost component of a package path, do not need to include

this file.

4. You must use import instead of from with packages only if you need to access the

same name defined in more than one path. With import, the path makes the ref-

erences unique, but from allows only one version of any given name (unless you

also use the as extension to rename).

5. In Python 3.X, from mypkg import spam is an absolute import—the search for

mypkg skips the package directory and the module is located in an absolute directory

in sys.path. A statement from . import spam, on the other hand, is a relative import

—spam is looked up relative to the package in which this statement is contained

only. In Python 2.X, the absolute import searches the package directory first before

proceeding to sys.path; relative imports work as described.

6. A namespace package is an extension to the import model, available in Python 3.3

and later, that corresponds to one or more directories that do not have

__init__.py files. When Python finds these during an import search, and does not

find a simple module or regular package first, it creates a namespace package that

is the virtual concatenation of all found directories having the requested module

name. Further nested components are looked up in all the namespace package’s

directories. The effect is similar to a regular package, but content may be split across

multiple directories.

Test Your Knowledge: Answers | 743

www.it-ebooks.info

CHAPTER 25

Advanced Module Topics

This chapter concludes this part of the book with a collection of more advanced mod-

ule-related topics—data hiding, the __future__ module, the __name__ variable,

sys.path changes, listing tools, importing modules by name string, transitive reloads,

and so on—along with the standard set of gotchas and exercises related to what we’ve

covered in this part of the book.

Along the way, we’ll build some larger and more useful tools than we have so far that

combine functions and modules. Like functions, modules are more effective when their

interfaces are well defined, so this chapter also briefly reviews module design concepts,

some of which we have explored in prior chapters.

Despite the word “advanced” used in this chapter’s title for symmetry, this is mostly a

grab-bag assortment of additional module topics. Because some of the topics discussed

here are widely used—especially the __name__ trick—be sure to browse here before

moving on to classes in the next part of the book.

Module Design Concepts

Like functions, modules present design tradeoffs: you have to think about which func-

tions go in which modules, module communication mechanisms, and so on. All of this

will become clearer when you start writing bigger Python systems, but here are a few

general ideas to keep in mind:

•You’re always in a module in Python. There’s no way to write code that doesn’t

live in some module. As mentioned briefly in Chapter 17 and Chapter 21, even

code typed at the interactive prompt really goes in a built-in module called

__main__; the only unique things about the interactive prompt are that code runs

and is discarded immediately, and expression results are printed automatically.

•Minimize module coupling: global variables. Like functions, modules work

best if they’re written to be closed boxes. As a rule of thumb, they should be as

independent of global variables used within other modules as possible, except for

745

www.it-ebooks.info

functions and classes imported from them. The only things a module should share

with the outside world are the tools it uses, and the tools it defines.

•Maximize module cohesion: unified purpose. You can minimize a module’s

couplings by maximizing its cohesion; if all the components of a module share a

general purpose, you’re less likely to depend on external names.

•Modules should rarely change other modules’ variables. We illustrated this

with code in Chapter 17, but it’s worth repeating here: it’s perfectly OK to use

globals defined in another module (that’s how clients import services, after all),

but changing globals in another module is often a symptom of a design problem.

There are exceptions, of course, but you should try to communicate results through

devices such as function arguments and return values, not cross-module changes.

Otherwise, your globals’ values become dependent on the order of arbitrarily re-

mote assignments in other files, and your modules become harder to understand

and reuse.

As a summary, Figure 25-1 sketches the environment in which modules operate. Mod-

ules contain variables, functions, classes, and other modules (if imported). Functions

have local variables of their own, as do classes—objects that live within modules and

which we’ll begin studying in the next chapter. As we saw in Part IV, functions can

nest, too, but all are ultimately contained by modules at the top.

Figure 25-1. Module execution environment. Modules are imported, but modules also import and use

other modules, which may be coded in Python or another language such as C. Modules in turn contain

variables, functions, and classes to do their work, and their functions and classes may contain variables

and other items of their own. At the top, though, programs are just sets of modules.

746 | Chapter 25: Advanced Module Topics

www.it-ebooks.info

Data Hiding in Modules

As we’ve seen, a Python module exports all the names assigned at the top level of its

file. There is no notion of declaring which names should and shouldn’t be visible out-

side the module. In fact, there’s no way to prevent a client from changing names inside

a module if it wants to.

In Python, data hiding in modules is a convention, not a syntactical constraint. If you

want to break a module by trashing its names, you can, but fortunately, I’ve yet to meet

a programmer for whom this was a life goal. Some purists object to this liberal attitude

toward data hiding, claiming that it means Python can’t implement encapsulation.

However, encapsulation in Python is more about packaging than about restricting.

We’ll expand this idea in the next part in relation to classes, which also have no privacy

syntax but can often emulate its effect in code.

Minimizing from * Damage: _X and __all__

As a special case, you can prefix names with a single underscore (e.g., _X) to prevent

them from being copied out when a client imports a module’s names with a from *

statement. This really is intended only to minimize namespace pollution; because from

* copies out all names, the importer may get more than it’s bargained for (including

names that overwrite names in the importer). Underscores aren’t “private” declara-

tions: you can still see and change such names with other import forms, such as the

import statement:

# unders.py

a, _b, c, _d = 1, 2, 3, 4

>>> from unders import * # Load non _X names only

>>> a, c

(1, 3)

>>> _b

NameError: name '_b' is not defined

>>> import unders # But other importers get every name

>>> unders._b

Alternatively, you can achieve a hiding effect similar to the _X naming convention by

assigning a list of variable name strings to the variable __all__ at the top level of the

module. When this feature is used, the from * statement will copy out only those names

listed in the __all__ list. In effect, this is the converse of the _X convention: __all__

identifies names to be copied, while _X identifies names not to be copied. Python looks

for an __all__ list in the module first and copies its names irrespective of any under-

scores; if __all__ is not defined, from * copies all names without a single leading un-

derscore:

# alls.py

__all__ = ['a', '_c'] # __all__ has precedence over _X

Data Hiding in Modules | 747

www.it-ebooks.info

a, b, _c, _d = 1, 2, 3, 4

>>> from alls import * # Load __all__ names only

>>> a, _c

(1, 3)

>>> b

NameError: name 'b' is not defined

>>> from alls import a, b, _c, _d # But other importers get every name

>>> a, b, _c, _d

(1, 2, 3, 4)

>>> import alls

>>> alls.a, alls.b, alls._c, alls._d

(1, 2, 3, 4)

Like the _X convention, the __all__ list has meaning only to the from * statement form

and does not amount to a privacy declaration: other import statements can still access

all names, as the last two tests show. Still, module writers can use either technique to

implement modules that are well behaved when used with from *. See also the discus-

sion of __all__ lists in package __init__.py files in Chapter 24; there, these lists declare

submodules to be automatically loaded for a from * on their container.

Enabling Future Language Features: __future__

Changes to the language that may potentially break existing code are usually introduced

gradually in Python. They often initially appear as optional extensions, which are dis-

abled by default. To turn on such extensions, use a special import statement of this form:

from __future__ import featurename

When used in a script, this statement must appear as the first executable statement in

the file (possibly following a docstring or comment), because it enables special com-

pilation of code on a per-module basis. It’s also possible to submit this statement at the

interactive prompt to experiment with upcoming language changes; the feature will

then be available for the remainder of the interactive session.

For example, in this book we’ve seen how to use this statement in Python 2.X to activate

3.X true division in Chapter 5, 3.X print calls in Chapter 11, and 3.X absolute imports

for packages in Chapter 24. Prior editions of this book used this statement form to

demonstrate generator functions, which required a keyword that was not yet enabled

by default (they use a featurename of generators).

All of these changes have the potential to break existing code in Python 2.X, so they

were phased in gradually or offered as optional extensions, enabled with this special

import. At the same time, some are available to allow you to write code that is forward

compatible with later releases you may port to someday.

For a list of futurisms you may import and turn on this way, run a dir call on the

__future__ module after importing it, or see its library manual entry. Per its documen-

748 | Chapter 25: Advanced Module Topics

www.it-ebooks.info

tation, none of its feature names will ever be removed, so it’s safe to leave in a

__future__ import even in code run by a version of Python where the feature is present

normally.

Mixed Usage Modes: __name__ and __main__

Our next module-related trick lets you both import a file as a module and run it as a

standalone program, and is widely used in Python files. It’s actually so simple that some

miss the point at first: each module has a built-in attribute called __name__, which

Python creates and assigns automatically as follows:

• If the file is being run as a top-level program file, __name__ is set to the string

"__main__" when it starts.

• If the file is being imported instead, __name__ is set to the module’s name as known

by its clients.

The upshot is that a module can test its own __name__ to determine whether it’s being

run or imported. For example, suppose we create the following module file, named

runme.py, to export a single function called tester:

def tester():

print("It's Christmas in Heaven...")

if __name__ == '__main__': # Only when run

tester() # Not when imported

This module defines a function for clients to import and use as usual:

c:\code> python

>>> import runme

>>> runme.tester()

It's Christmas in Heaven...

But the module also includes code at the bottom that is set up to call the function

automatically when this file is run as a program:

c:\code> python runme.py

It's Christmas in Heaven...

In effect, a module’s __name__ variable serves as a usage mode flag, allowing its code to

be leveraged as both an importable library and a top-level script. Though simple, you’ll

see this hook used in the majority of the Python program files you are likely to encounter

in the wild—both for testing and dual usage.

For instance, perhaps the most common way you’ll see the __name__ test applied is for

self-test code. In short, you can package code that tests a module’s exports in the module

itself by wrapping it in a __name__ test at the bottom of the file. This way, you can use

the file in clients by importing it, but also test its logic by running it from the system

shell or via another launching scheme.

Mixed Usage Modes: __name__ and __main__ | 749

www.it-ebooks.info

Coding self-test code at the bottom of a file under the __name__ test is probably the most

common and simplest unit-testing protocol in Python. It’s much more convenient than

retyping all your tests at the interactive prompt. (Chapter 36 will discuss other com-

monly used options for testing Python code—as you’ll see, the unittest and doctest

standard library modules provide more advanced testing tools.)

In addition, the __name__ trick is also commonly used when you’re writing files that

can be used both as command-line utilities and as tool libraries. For instance, suppose

you write a file-finder script in Python. You can get more mileage out of your code if

you package it in functions and add a __name__ test in the file to automatically call those

functions when the file is run standalone. That way, the script’s code becomes reusable

in other programs.

Unit Tests with __name__

In fact, we’ve already seen a prime example in this book of an instance where the

__name__ check could be useful. In the section on arguments in Chapter 18, we coded

a script that computed the minimum value from the set of arguments sent in (this was

the file minmax.py in “The min Wakeup Call!”):

def minmax(test, *args):

res = args[0]

for arg in args[1:]:

if test(arg, res):

res = arg

return res

def lessthan(x, y): return x < y

def grtrthan(x, y): return x > y

print(minmax(lessthan, 4, 2, 1, 5, 6, 3)) # Self-test code

print(minmax(grtrthan, 4, 2, 1, 5, 6, 3))

This script includes self-test code at the bottom, so we can test it without having to

retype everything at the interactive command line each time we run it. The problem

with the way it is currently coded, however, is that the output of the self-test call will

appear every time this file is imported from another file to be used as a tool—not exactly

a user-friendly feature! To improve it, we can wrap up the self-test call in a __name__

check, so that it will be launched only when the file is run as a top-level script, not when

it is imported (this new version of the module file is renamed minmax2.py here):

print('I am:', __name__)

def minmax(test, *args):

res = args[0]

for arg in args[1:]:

if test(arg, res):

res = arg

return res

750 | Chapter 25: Advanced Module Topics

www.it-ebooks.info

def lessthan(x, y): return x < y

def grtrthan(x, y): return x > y

if __name__ == '__main__':

print(minmax(lessthan, 4, 2, 1, 5, 6, 3)) # Self-test code

print(minmax(grtrthan, 4, 2, 1, 5, 6, 3))

We’re also printing the value of __name__ at the top here to trace its value. Python creates

and assigns this usage-mode variable as soon as it starts loading a file. When we run

this file as a top-level script, its name is set to __main__, so its self-test code kicks in

automatically:

c:\code> python minmax2.py

I am: __main__

If we import the file, though, its name is not __main__, so we must explicitly call the

function to make it run:

c:\code> python

>>> import minmax2

I am: minmax2

>>> minmax2.minmax(minmax2.lessthan, 's', 'p', 'a', 'a')

'a'

Again, regardless of whether this is used for testing, the net effect is that we get to use

our code in two different roles—as a library module of tools, or as an executable pro-

gram.

Per Chapter 24’s discussion of package relative imports, this section’s

technique can also have some implications for imports run by files that

are also used as package components in 3.X, but can still be leveraged

with absolute package path imports and other techniques. See the prior

chapter’s discussion and example for more details.

Example: Dual Mode Code

Here’s a more substantial module example that demonstrates another way that the

prior section’s __name__ trick is commonly employed. The following module, for-

mats.py, defines string formatting utilities for importers, but also checks its name to

see if it is being run as a top-level script; if so, it tests and uses arguments listed on the

system command line to run a canned or passed-in test. In Python, the sys.argv list

contains command-line arguments—it is a list of strings reflecting words typed on the

command line, where the first item is always the name of the script being run. We used

this in Chapter 21’s benchmark tool as switches, but leverage it as a general input

mechanism here:

#!python

"""

Example: Dual Mode Code | 751

www.it-ebooks.info

File: formats.py (2.X and 3.X)

Various specialized string display formatting utilities.

Test me with canned self-test or command-line arguments.

To do: add parens for negative money, add more features.

"""

def commas(N):

"""

Format positive integer-like N for display with

commas between digit groupings: "xxx,yyy,zzz".

"""

digits = str(N)

assert(digits.isdigit())

result = ''

while digits:

digits, last3 = digits[:-3], digits[-3:]

result = (last3 + ',' + result) if result else last3

return result

def money(N, numwidth=0, currency='$'):

"""

Format number N for display with commas, 2 decimal digits,

leading $ and sign, and optional padding: "$ -xxx,yyy.zz".

numwidth=0 for no space padding, currency='' to omit symbol,

and non-ASCII for others (e.g., pound=u'\xA3' or u'\u00A3').

"""

sign = '-' if N < 0 else ''

N = abs(N)

whole = commas(int(N))

fract = ('%.2f' % N)[-2:]

number = '%s%s.%s' % (sign, whole, fract)

return '%s%*s' % (currency, numwidth, number)

if __name__ == '__main__':

def selftest():

tests = 0, 1 # fails: −1, 1.23

tests += 12, 123, 1234, 12345, 123456, 1234567

tests += 2 ** 32, 2 ** 100

for test in tests:

print(commas(test))

print('')

tests = 0, 1, −1, 1.23, 1., 1.2, 3.14159

tests += 12.34, 12.344, 12.345, 12.346

tests += 2 ** 32, (2 ** 32 + .2345)

tests += 1.2345, 1.2, 0.2345

tests += −1.2345, −1.2, −0.2345

tests += −(2 ** 32), −(2**32 + .2345)

tests += (2 ** 100), −(2 ** 100)

for test in tests:

print('%s [%s]' % (money(test, 17), test))

import sys

if len(sys.argv) == 1:

selftest()

752 | Chapter 25: Advanced Module Topics

www.it-ebooks.info

else:

print(money(float(sys.argv[1]), int(sys.argv[2])))

This file works identically in Python 2.X and 3.X. When run directly, it tests itself as

before, but it uses options on the command line to control the test behavior. Run this

file directly with no command-line arguments on your own to see what its self-test code

prints—it’s too extensive to list in full here:

c:\code> python formats.py

123

1,234

12,345

123,456

1,234,567

...etc...

To test specific strings, pass them in on the command line along with a minimum field

width; the script’s __main__ code passes them on to its money function, which in turn

runs commas:

C:\code> python formats.py 999999999 0

$999,999,999.00

C:\code> python formats.py −999999999 0

$-999,999,999.00

C:\code> python formats.py 123456789012345 0

$123,456,789,012,345.00

C:\code> python formats.py −123456789012345 25

$ −123,456,789,012,345.00

C:\code> python formats.py 123.456 0

$123.46

C:\code> python formats.py −123.454 0

$-123.45

As before, because this code is instrumented for dual-mode usage, we can also import

its tools normally to reuse them as library components in scripts, modules, and the

interactive prompt:

>>> from formats import money, commas

>>> money(123.456)

'$123.46'

>>> money(-9999999.99, 15)

'$ −9,999,999.99'

>>> X = 99999999999999999999

>>> '%s (%s)' % (commas(X), X)

'99,999,999,999,999,999,999 (99999999999999999999)'

You can use command-line arguments in ways similar to this example to provide gen-

eral inputs to scripts that may also package their code as functions and classes for reuse

by importers. For more advanced command-line processing, see “Python Command-

Example: Dual Mode Code | 753

www.it-ebooks.info

Line Arguments” on page 1432 in Appendix A, and the getopt, optparse, and arg

parse modules’ documentation in Python’s standard library manual. In some scenarios,

you might also use the built-in input function, used in Chapter 3 and Chapter 10, to

prompt the shell user for test inputs instead of pulling them from the command line.

Also see Chapter 7’s discussion of the new {,d} string format method

syntax added in Python 2.7 and 3.1; this formatting extension separates

thousands groups with commas much like the code here. The module

listed here, though, adds money formatting, can be changed, and serves

as a manual alternative for comma insertions in earlier Pythons.

Currency Symbols: Unicode in Action

This module’s money function defaults to dollars, but supports other currency symbols

by allowing you to pass in non-ASCII Unicode characters. The Unicode ordinal with

hexadecimal value 00A3, for example, is the pound symbol, and 00A5 is the yen. You

can code these in a variety of forms, as:

• The character’s decoded Unicode code point ordinal (integer) in a text string, with

either Unicode or hex escapes (for 2.X compatibility, use a leading u in such string

literals in Python 3.3)

• The character’s raw encoded form in a byte string that is decoded before passed,

with hex escapes (for 3.X compatibility, use a leading b in such string literals in

Python 2.X)

• The actual character itself in your program’s text, along with a source code en-

coding declaration

We previewed Unicode in Chapter 4 and will get into more details in Chapter 37, but

its basic requirements here are fairly simple, and serve as a decent use case. To test

alternative currencies, I typed the following in a file, formats_currency.py, because it

was too much to reenter interactively on changes:

from __future__ import print_function # 2.X

from formats import money

X = 54321.987

print(money(X), money(X, 0, ''))

print(money(X, currency=u'\xA3'), money(X, currency=u'\u00A5'))

print(money(X, currency=b'\xA3'.decode('latin-1')))

print(money(X, currency=u'\u20AC'), money(X, 0, b'\xA4'.decode('iso-8859-15')))

print(money(X, currency=b'\xA4'.decode('latin-1')))

The following gives this test file’s output in Python 3.3 in IDLE, and in other contexts

configured properly. It works the same in 2.X because it prints and codes strings port-

ably. Per Chapter 11, a __future__ import enables 3.X print calls in 2.X. And as intro-

754 | Chapter 25: Advanced Module Topics

www.it-ebooks.info

duced in Chapter 4, 3.X b'...' bytes literals are taken as simple strings in 2.X, and 2.X

u'...' Unicode literals as treated as normal strings in 3.X as of 3.3.

$54,321.99 54,321.99

£54,321.99 ¥54,321.99

£54,321.99

€54,321.99 €54,321.99

¤54,321.99

If this works on your computer, you can probably skip the next few paragraphs. De-

pending on your interface and system settings, though, getting this to run and display

properly may require additional steps. On my machine, it behaves correctly when

Python and the display medium are in sync, but the euro and generic currency symbols

in the last two lines fail with errors in a basic Command Prompt on Windows.

Specifically, this test script always runs and produces the output shown in the IDLE

GUI in both 3.X and 2.X, because Unicode-to-glyph mappings are handled well. It also

works as advertised in 3.X on Windows if you redirect the output to a file and open it

with Notepad, because 3.X encodes content on this platform in a default Windows

format that Notepad understands:

c:\code> formats_currency.py > temp

c:\code> notepad temp

However, this doesn’t work in 2.X, because Python tries to encode printed text as ASCII

by default. To show all the non-ASCII characters in a Windows Command Prompt

window directly, on some computers you may need to change the Windows code

page (used to render characters) as well as Python’s PYTHONIOENCODING environment

variable (used as the encoding of text in standard streams, including the translation of

characters to bytes when they are printed) to a common Unicode format such as UTF-8:

c:\code> chcp 65001 # Console matches Python

c:\code> set PYTHONIOENCODING=utf-8 # Python matches console

c:\code> formats_currency.py > temp # Both 3.X and 2.X write UTF-8 text

c:\code> type temp # Console displays it properly

c:\code> notepad temp # Notepad recognizes UTF-8 too

You may not need to take these steps on some platforms and even on some Windows

distributions. I did because my laptop’s code page is set to 437 (U.S. characters), but

your code pages may vary.

Subtly, the only reason this test works on Python 2.X at all is because 2.X allows normal

and Unicode strings to be mixed, as long as the normal string is all 7-bit ASCII char-

acters. On 3.3, the 2.X u'...' Unicode literal is supported for compatibility, but taken

the same as normal '...' strings, which are always Unicode (removing the leading u

makes the test work in 3.0 through 3.2 too, but breaks 2.X compatibility):

c:\code> py −2

>>> print u'\xA5' + '1', '%s2' % u'\u00A3' # 2.X: unicode/str mix for ASCII str

¥1 £2

c:\code> py −3

Example: Dual Mode Code | 755

www.it-ebooks.info

>>> print(u'\xA5' + '1', '%s2' % u'\u00A3') # 3.X: str is Unicode, u'' optional

¥1 £2

>>> print('\xA5' + '1', '%s2' % '\u00A3')

¥1 £2

Again, there’s much more on Unicode in Chapter 37—a topic many see as peripheral,

but which can crop up even in relatively simple contexts like this! The takeaway point

here is that, operational issues aside, a carefully coded script can often manage to sup-

port Unicode in both 3.X and 2.X.

Docstrings: Module Documentation at Work

Finally, because this example’s main file uses the docstring feature introduced in Chap-

ter 15, we can use the help function or PyDoc’s GUI/browser modes to explore its tools

as well—modules are almost automatically general-purpose tools. Here’s help at work;

Figure 25-2 gives the PyDoc view on our file.

>>> import formats

>>> help(formats)

Help on module formats:

NAME

formats

DESCRIPTION

File: formats.py (2.X and 3.X)

Various specialized string display formatting utilities.

Test me with canned self-test or command-line arguments.

To do: add parens for negative money, add more features.

FUNCTIONS

commas(N)

Format positive integer-like N for display with

commas between digit groupings: "xxx,yyy,zzz".

money(N, numwidth=0, currency='$')

Format number N for display with commas, 2 decimal digits,

leading $ and sign, and optional padding: "$ -xxx,yyy.zz".

numwidth=0 for no space padding, currency='' to omit symbol,

and non-ASCII for others (e.g., pound=u'£' or u'£').

FILE

c:\code\formats.py

Changing the Module Search Path

Let’s return to more general module topics. In Chapter 22, we learned that the module

search path is a list of directories that can be customized via the environment variable

PYTHONPATH, and possibly via .pth files. What I haven’t shown you until now is how a

Python program itself can actually change the search path by changing the built-in

756 | Chapter 25: Advanced Module Topics

www.it-ebooks.info

sys.path list. Per Chapter 22, sys.path is initialized on startup, but thereafter you can

delete, append, and reset its components however you like:

>>> import sys

>>> sys.path

['', 'c:\\temp', 'C:\\Windows\\system32\\python33.zip', ...more deleted...]

>>> sys.path.append('C:\\sourcedir') # Extend module search path

>>> import string # All imports search the new dir last

Once you’ve made such a change, it will impact all future imports anywhere while a

Python program runs, as all importers share the same single sys.path list (there’s only

one copy of a given module in memory during a program’s run—that’s why reload

exists). In fact, this list may be changed arbitrarily:

>>> sys.path = [r'd:\temp'] # Change module search path

>>> sys.path.append('c:\\lp5e\\examples') # For this run (process) only

>>> sys.path.insert(0, '..')

>>> sys.path

['..', 'd:\\temp', 'c:\\lp5e\\examples']

>>> import string

Traceback (most recent call last):

File "<stdin>", line 1, in <module>

ImportError: No module named 'string'

Thus, you can use this technique to dynamically configure a search path inside a Python

program. Be careful, though: if you delete a critical directory from the path, you may

lose access to critical utilities. In the prior example, for instance, we no longer have

Figure 25-2. PyDoc’s view of formats.py, obtained by running a “py −3 -m pydoc –b” command line

in 3.2 and later and clicking on the file’s index entry (see Chapter 15)

Changing the Module Search Path | 757

www.it-ebooks.info

access to the string module because we deleted the Python source library’s directory

from the path!

Also, remember that such sys.path settings endure for only as long as the Python ses-

sion or program (technically, process) that made them runs; they are not retained after

Python exits. By contrast, PYTHONPATH and .pth file path configurations live in the op-

erating system instead of a running Python program, and so are more global: they are

picked up by every program on your machine and live on after a program completes.

On some systems, the former can be per-user and the latter can be installation-wide.

The as Extension for import and from

Both the import and from statements were eventually extended to allow an imported

name to be given a different name in your script. We’ve used this extension earlier, but

here are some additional details: the following import statement:

import modulename as name # And use name, not modulename

is equivalent to the following, which renames the module in the importer’s scope only

(it’s still known by its original name to other files):

import modulename

name = modulename

del modulename # Don't keep original name

After such an import, you can—and in fact must—use the name listed after the as to

refer to the module. This works in a from statement, too, to assign a name imported

from a file to a different name in the importer’s scope; as before you get only the new

name you provide, not its original:

from modulename import attrname as name # And use name, not attrname

As discussed in Chapter 23, this extension is commonly used to provide short syno-

nyms for longer names, and to avoid name clashes when you are already using a name

in your script that would otherwise be overwritten by a normal import statement:

import reallylongmodulename as name # Use shorter nickname

name.func()

from module1 import utility as util1 # Can have only 1 "utility"

from module2 import utility as util2

util1(); util2()

It also comes in handy for providing a short, simple name for an entire directory path

and avoiding name collisions when using the package import feature described in

Chapter 24:

import dir1.dir2.mod as mod # Only list full path once

mod.func()

from dir1.dir2.mod import func as modfunc # Rename to make unique if needed

modfunc()

758 | Chapter 25: Advanced Module Topics

www.it-ebooks.info

This is also something of a hedge against name changes: if a new release of a library

renames a module or tool your code uses extensively, or provides a new alternative

you’d rather use instead, you can simply rename it to its prior name on import to avoid

breaking your code:

import newname as oldname

from library import newname as oldname

...and keep happily using oldname until you have time to update all your code...

For example, this approach can address some 3.X library changes (e.g., 3.X’s tkinter

versus 2.X’s Tkinter), though they’re often substantially more than just a new name!

Example: Modules Are Objects

Because modules expose most of their interesting properties as built-in attributes, it’s

easy to write programs that manage other programs. We usually call such manager

programs metaprograms because they work on top of other systems. This is also referred

to as introspection, because programs can see and process object internals. Introspec-

tion is a somewhat advanced feature, but it can be useful for building programming

tools.

For instance, to get to an attribute called name in a module called M, we can use attribute

qualification or index the module’s attribute dictionary, exposed in the built-in

__dict__ attribute we met in Chapter 23. Python also exports the list of all loaded

modules as the sys.modules dictionary and provides a built-in called getattr that lets

us fetch attributes from their string names—it’s like saying object.attr, but attr is an

expression that yields a string at runtime. Because of that, all the following expressions

reach the same attribute and object:1

M.name # Qualify object by attribute

M.__dict__['name'] # Index namespace dictionary manually

sys.modules['M'].name # Index loaded-modules table manually

getattr(M, 'name') # Call built-in fetch function

By exposing module internals like this, Python helps you build programs about pro-

grams. For example, here is a module named mydir.py that puts these ideas to work to

implement a customized version of the built-in dir function. It defines and exports a

function called listing, which takes a module object as an argument and prints a for-

matted listing of the module’s namespace sorted by name:

1. As we saw briefly in “Other Ways to Access Globals” in Chapter 17, because a function can access its

enclosing module by going through the sys.modules table like this, it can also be used to emulate the effect

of the global statement. For instance, the effect of global X; X=0 can be simulated (albeit with much

more typing!) by saying this inside a function: import sys; glob=sys.modules[__name__]; glob.X=0.

Remember, each module gets a __name__ attribute for free; it’s visible as a global name inside the functions

within the module. This trick provides another way to change both local and global variables of the same

name inside a function.

Example: Modules Are Objects | 759

www.it-ebooks.info

#!python

"""

mydir.py: a module that lists the namespaces of other modules

"""

from __future__ import print_function # 2.X compatibility

seplen = 60

sepchr = '-'

def listing(module, verbose=True):

sepline = sepchr * seplen

if verbose:

print(sepline)

print('name:', module.__name__, 'file:', module.__file__)

print(sepline)

count = 0

for attr in sorted(module.__dict__): # Scan namespace keys (or enumerate)

print('%02d) %s' % (count, attr), end = ' ')

if attr.startswith('__'):

print('<built-in name>') # Skip __file__, etc.

else:

print(getattr(module, attr)) # Same as .__dict__[attr]

count += 1

if verbose:

print(sepline)

print(module.__name__, 'has %d names' % count)

print(sepline)

if __name__ == '__main__':

import mydir

listing(mydir) # Self-test code: list myself

Notice the docstring at the top; as in the prior formats.py example, because we may

want to use this as a general tool, the docstring provides functional information acces-

sible via help and GUI/browser mode of PyDoc—a tool that uses similar introspection

tools to do its job. A self-test is also provided at the bottom of this module, which

narcissistically imports and lists itself. Here’s the sort of output produced in Python

3.3; this script works on 2.X too (where it may list fewer names) because it prints from

the __future__:

c:\code> py −3 mydir.py

------------------------------------------------------------

name: mydir file: c:\code\mydir.py

------------------------------------------------------------

00) __builtins__ <built-in name>

01) __cached__ <built-in name>

02) __doc__ <built-in name>

03) __file__ <built-in name>

04) __initializing__ <built-in name>

05) __loader__ <built-in name>

06) __name__ <built-in name>

07) __package__ <built-in name>

760 | Chapter 25: Advanced Module Topics

www.it-ebooks.info

08) listing <function listing at 0x000000000295B488>

09) print_function _Feature((2, 6, 0, 'alpha', 2), (3, 0, 0, 'alpha', 0), 65536)

10) sepchr -

11) seplen 60

------------------------------------------------------------

mydir has 12 names

------------------------------------------------------------

To use this as a tool for listing other modules, simply pass the modules in as objects to

this file’s function. Here it is listing attributes in the tkinter GUI module in the standard

library (a.k.a. Tkinter in Python 2.X); it will technically work on any object with

__name__, __file__, and __dict__ attributes:

>>> import mydir

>>> import tkinter

>>> mydir.listing(tkinter)

------------------------------------------------------------

name: tkinter file: C:\Python33\lib\tkinter\__init__.py

------------------------------------------------------------

00) ACTIVE active

01) ALL all

02) ANCHOR anchor

03) ARC arc

04) At <function At at 0x0000000002BD41E0>

...many more names omitted...

156) image_types <function image_types at 0x0000000002BE2378>

157) mainloop <function mainloop at 0x0000000002BCBBF8>

158) sys <module 'sys' (built-in)>

159) wantobjects 1

160) warnings <module 'warnings' from 'C:\\Python33\\lib\\warnings.py'>

------------------------------------------------------------

tkinter has 161 names

------------------------------------------------------------

We’ll meet getattr and its relatives again later. The point to notice here is that mydir

is a program that lets you browse other programs. Because Python exposes its internals,

you can process objects generically.2

Importing Modules by Name String

The module name in an import or from statement is a hardcoded variable name. Some-

times, though, your program will get the name of a module to be imported as a string

at runtime—from a user selection in a GUI, or a parse of an XML document, for in-

stance. Unfortunately, you can’t use import statements directly to load a module given

its name as a string—Python expects a variable name that’s taken literally and not

evaluated, not a string or expression. For instance:

2. You can preload tools such as mydir.listing and the reloader we’ll meet in a moment into the interactive

namespace by importing them in the file referenced by the PYTHONSTARTUP environment variable. Because

code in the startup file runs in the interactive namespace (module __main__), importing common tools in

the startup file can save you some typing. See Appendix A for more details.

Importing Modules by Name String | 761

www.it-ebooks.info

>>> import 'string'

File "<stdin>", line 1

import "string"

SyntaxError: invalid syntax

It also won’t work to simply assign the string to a variable name:

x = 'string'

import x

Here, Python will try to import a file x.py, not the string module—the name in an

import statement both becomes a variable assigned to the loaded module and identifies

the external file literally.

Running Code Strings

To get around this, you need to use special tools to load a module dynamically from a

string that is generated at runtime. The most general approach is to construct an

import statement as a string of Python code and pass it to the exec built-in function to

run (exec is a statement in Python 2.X, but it can be used exactly as shown here—the

parentheses are simply ignored):

>>> modname = 'string'

>>> exec('import ' + modname) # Run a string of code

>>> string # Imported in this namespace

We met the exec function (and its cousin for expressions, eval) earlier, in Chapter 3

and Chapter 10. It compiles a string of code and passes it to the Python interpreter to

be executed. In Python, the byte code compiler is available at runtime, so you can write

programs that construct and run other programs like this. By default, exec runs the

code in the current scope, but you can get more specific by passing in optional name-

space dictionaries if needed. It also has security issues noted earlier in the book, which

may be minor in a code string you are building yourself.

Direct Calls: Two Options

The only real drawback to exec here is that it must compile the import statement each

time it runs, and compiling can be slow. Precompiling to byte code with the compile

built-in may help for code strings run many times, but in most cases it’s probably

simpler and may run quicker to use the built-in __import__ function to load from a

name string instead, as noted in Chapter 22. The effect is similar, but __import__ returns

the module object, so assign it to a name here to keep it:

>>> modname = 'string'

>>> string = __import__(modname)

>>> string

762 | Chapter 25: Advanced Module Topics

www.it-ebooks.info

As also noted in Chapter 22, the newer call importlib.import_module does the same

work, and is generally preferred in more recent Pythons for direct calls to import by

name string—at least per the current “official” policy stated in Python’s manuals:

>>> import importlib

>>> modname = 'string'

>>> string = importlib.import_module(modname)

>>> string

The import_module call takes a module name string, and an optional second argument

that gives the package used as the anchor point for resolving relative imports, which

defaults to None. This call works the same as __import__ in its basic roles, but see

Python’s manuals for more details.

Though both calls still work, in Pythons where both are available, the original

__import__ is generally intended for customizing import operations by reassignment in

the built-in scope (and any future changes in “official” policy are beyond the scope of

this book!).

Example: Transitive Module Reloads

This section develops a module tool that ties together and applies some earlier topics,

and serves as a larger case study to close out this chapter and part. We studied module

reloads in Chapter 23, as a way to pick up changes in code without stopping and re-

starting a program. When you reload a module, though, Python reloads only that par-

ticular module’s file; it doesn’t automatically reload modules that the file being reloaded

happens to import.

For example, if you reload some module A, and A imports modules B and C, the reload

applies only to A, not to B and C. The statements inside A that import B and C are rerun

during the reload, but they just fetch the already loaded B and C module objects (as-

suming they’ve been imported before). In actual yet abstract code, here’s the file A.py:

# A.py

import B # Not reloaded when A is!

import C # Just an import of an already loaded module: no-ops

% python

>>> . . .

>>> from imp import reload

>>> reload(A)

By default, this means that you cannot depend on reloads to pick up changes in all the

modules in your program transitively—instead, you must use multiple reload calls to

update the subcomponents independently. This can require substantial work for large

systems you’re testing interactively. You can design your systems to reload their sub-

components automatically by adding reload calls in parent modules like A, but this

complicates the modules’ code.

Example: Transitive Module Reloads | 763

www.it-ebooks.info

A Recursive Reloader

A better approach is to write a general tool to do transitive reloads automatically by

scanning modules’ __dict__ namespace attributes and checking each item’s type to

find nested modules to reload. Such a utility function could call itself recursively to

navigate arbitrarily shaped and deep import dependency chains. Module __dict__ at-

tributes were introduced in Chapter 23 and employed earlier in this chapter, and the

type call was presented in Chapter 9; we just need to combine the two tools.

The module reloadall.py listed next defines a reload_all function that automatically

reloads a module, every module that the module imports, and so on, all the way to the

bottom of each import chain. It uses a dictionary to keep track of already reloaded

modules, recursion to walk the import chains, and the standard library’s types module,

which simply predefines type results for built-in types. The visited dictionary techni-

que works to avoid cycles here when imports are recursive or redundant, because mod-

ule objects are immutable and so can be dictionary keys; as we learned in Chapter 5

and Chapter 8, a set would offer similar functionality if we use visited.add(module) to

insert:

#!python

"""

reloadall.py: transitively reload nested modules (2.X + 3.X).

Call reload_all with one or more imported module module objects.

"""

import types

from imp import reload # from required in 3.X

def status(module):

print('reloading ' + module.__name__)

def tryreload(module):

try:

reload(module) # 3.3 (only?) fails on some

except:

print('FAILED: %s' % module)

def transitive_reload(module, visited):

if not module in visited: # Trap cycles, duplicates

status(module) # Reload this module

tryreload(module) # And visit children

visited[module] = True

for attrobj in module.__dict__.values(): # For all attrs

if type(attrobj) == types.ModuleType: # Recur if module

transitive_reload(attrobj, visited)

def reload_all(*args):

visited = {} # Main entry point

for arg in args: # For all passed in

if type(arg) == types.ModuleType:

transitive_reload(arg, visited)

764 | Chapter 25: Advanced Module Topics

www.it-ebooks.info

def tester(reloader, modname): # Self-test code

import importlib, sys # Import on tests only

if len(sys.argv) > 1: modname = sys.argv[1] # command line (or passed)

module = importlib.import_module(modname) # Import by name string

reloader(module) # Test passed-in reloader

if __name__ == '__main__':

tester(reload_all, 'reloadall') # Test: reload myself?

Besides namespace dictionaries, this script makes use of other tools we’ve studied here:

it includes a __name__ test to launch self-test code when run as a top-level script only,

and its tester function uses sys.argv to inspect command-line arguments and impor

tlib to import a module by name string passed in as a function or command-line ar-

gument. One curious bit: notice how this code must wrap the basic reload call in a

try statement to catch exceptions—in Python 3.3, reloads sometimes fail due to a re-

write of the import machinery. The try was previewed in Chapter 10, and is covered

in full in Part VII.

Testing recursive reloads

Now, to leverage this utility for normal use, import its reload_all function and pass it

an already loaded module object—just as you would for the built-in reload function.

When the file runs standalone, its self-test code calls reload_all automatically, reload-

ing its own module by default if no command-line arguments are used. In this mode,

the module must import itself because its own name is not defined in the file without

an import. This code works in both 3.X and 2.X because we’ve used + and % instead of

a comma in the prints, though the set of modules used and thus reloaded may vary

across lines:

C:\code> c:\Python33\python reloadall.py

reloading reloadall

reloading types

c:\code> C:\Python27\python reloadall.py

reloading reloadall

reloading types

With a command-line argument, the tester instead reloads the given module by its name

string—here, the benchmark module we coded in Chapter 21. Note that we give a

module name in this mode, not a filename (as for import statements, don’t include

the .py extension); the script ultimately imports the module using the module search

path as usual:

c:\code> reloadall.py pybench

reloading pybench

reloading timeit

reloading itertools

reloading sys

reloading time

reloading gc

reloading os

Example: Transitive Module Reloads | 765

www.it-ebooks.info

reloading errno

reloading ntpath

reloading stat

reloading genericpath

reloading copyreg

Perhaps most commonly, we can also deploy this module at the interactive prompt—

here, in 3.3 for some standard library modules. Notice how os is imported by

tkinter, but tkinter reaches sys before os can (if you want to test this on Python 2.X,

substitute Tkinter for tkinter):

>>> from reloadall import reload_all

>>> import os, tkinter

>>> reload_all(os) # Normal usage mode

reloading os

reloading ntpath

reloading stat

reloading sys

reloading genericpath

reloading errno

reloading copyreg

>>> reload_all(tkinter)

reloading tkinter

reloading _tkinter

reloading warnings

reloading sys

reloading linecache

reloading tokenize

reloading builtins

FAILED: <module 'builtins'>

reloading re

...etc...

reloading os

reloading ntpath

reloading stat

reloading genericpath

reloading errno

...etc...

And finally here is a session that shows the effect of normal versus transitive reloads—

changes made to the two nested files are not picked up by reloads, unless the transitive

utility is used:

import b # File a.py

X = 1

import c # File b.py

Y = 2

Z = 3 # File c.py

C:\code> py −3

>>> import a

>>> a.X, a.b.Y, a.b.c.Z

766 | Chapter 25: Advanced Module Topics

www.it-ebooks.info

(1, 2, 3)

# Without stopping Python, change all three files' assignment values and save

>>> from imp import reload

>>> reload(a) # Built-in reload is top level only

>>> a.X, a.b.Y, a.b.c.Z

(111, 2, 3)

>>> from reloadall import reload_all

>>> reload_all(a) # Normal usage mode

reloading a

reloading b

reloading c

>>> a.X, a.b.Y, a.b.c.Z # Reloads all nested modules too

(111, 222, 333)

Study the reloader’s code and results for more on its operation. The next section exer-

cises its tools further.

Alternative Codings

For all the recursion fans in the audience, the following lists an alternative recursive

coding for the function in the prior section—it uses a set instead of a dictionary to detect

cycles, is marginally more direct because it eliminates a top-level loop, and serves to

illustrate recursive function techniques in general (compare with the original to see how

this differs). This version also gets some of its work for free from the original, though

the order in which it reloads modules might vary if namespace dictionary order does

too:

"""

reloadall2.py: transitively reload nested modules (alternative coding)

"""

import types

from imp import reload # from required in 3.X

from reloadall import status, tryreload, tester

def transitive_reload(objects, visited):

for obj in objects:

if type(obj) == types.ModuleType and obj not in visited:

status(obj)

tryreload(obj) # Reload this, recur to attrs

visited.add(obj)

transitive_reload(obj.__dict__.values(), visited)

def reload_all(*args):

transitive_reload(args, set())

if __name__ == '__main__':

tester(reload_all, 'reloadall2') # Test code: reload myself?

Example: Transitive Module Reloads | 767

www.it-ebooks.info

As we saw in Chapter 19, there is usually an explicit stack or queue equivalent to most

recursive functions, which may be preferable in some contexts. The following is one

such transitive reloader; it uses a generator expression to filter out nonmodules and

modules already visited in the current module’s namespace. Because it both pops and

adds items at the end of its list, it is stack based, though the order of both pushes and

dictionary values influences the order in which it reaches and reloads modules—it visits

submodules in namespace dictionaries from right to left, unlike the left-to-right order

of the recursive versions (trace through the code to see how). We could change this,

but dictionary order is arbitrary anyhow.

"""

reloadall3.py: transitively reload nested modules (explicit stack)

"""

import types

from imp import reload # from required in 3.X

from reloadall import status, tryreload, tester

def transitive_reload(modules, visited):

while modules:

next = modules.pop() # Delete next item at end

status(next) # Reload this, push attrs

tryreload(next)

visited.add(next)

modules.extend(x for x in next.__dict__.values()

if type(x) == types.ModuleType and x not in visited)

def reload_all(*modules):

transitive_reload(list(modules), set())

if __name__ == '__main__':

tester(reload_all, 'reloadall3') # Test code: reload myself?

If the recursion and nonrecursion used in this example is confusing, see the discussion

of recursive functions in Chapter 19 for background on the subject.

Testing reload variants

To prove that these work the same, let’s test all three of our reloader variants. Thanks

to their common testing function, we can run all three from a command line both with

no arguments to test the module reloading itself, and with the name of a module to be

reloaded listed on the command line (in sys.argv):

c:\code> reloadall.py

reloading reloadall

reloading types

c:\code> reloadall2.py

reloading reloadall2

reloading types

c:\code> reloadall3.py

768 | Chapter 25: Advanced Module Topics

www.it-ebooks.info

reloading reloadall3

reloading types

Though it’s hard to see here, we really are testing the individual reloader alternatives

—each of these tests shares a common tester function, but passes it the reload_all

from its own file. Here are the variants reloading the 3.X tkinter GUI module and all

the modules its imports reach:

c:\code> reloadall.py tkinter

reloading tkinter

reloading _tkinter

reloading tkinter._fix

...etc...

c:\code> reloadall2.py tkinter

reloading tkinter

reloading tkinter.constants

reloading tkinter._fix

...etc...

c:\code> reloadall3.py tkinter

reloading tkinter

reloading sys

reloading tkinter.constants

...etc...

All three work on both Python 3.X and 2.X too—they’re careful to unify prints with

formatting, and avoid using version-specific tools (though you must use 2.X module

names like Tkinter, and I’m using the 3.3 Windows launcher here to run per Appen-

dix B):

c:\code> py −2 reloadall.py

reloading reloadall

reloading types

c:\code> py −2 reloadall2.py Tkinter

reloading Tkinter

reloading _tkinter

reloading FixTk

...etc...

As usual we can test interactively, too, by importing and calling either a module’s main

reload entry point with a module object, or the testing function with a reloader function

and module name string:

C:\code> py −3

>>> import reloadall, reloadall2, reloadall3

>>> import tkinter

>>> reloadall.reload_all(tkinter) # Normal use case

reloading tkinter

reloading tkinter._fix

reloading os

...etc...

>>> reloadall.tester(reloadall2.reload_all, 'tkinter') # Testing utility

reloading tkinter

reloading tkinter._fix

reloading os

Example: Transitive Module Reloads | 769

www.it-ebooks.info

...etc...

>>> reloadall.tester(reloadall3.reload_all, 'reloadall3') # Mimic self-test code

reloading reloadall3

reloading types

Finally, if you look at the output of tkinter reloads earlier, you may notice that each

of the three variants may produce results in a different order; they all depend on name-

space dictionary ordering, and the last also relies on the order in which items are added

to its stack. In fact, under Python 3.3, the reload order for a given reloader can vary

from run to run. To ensure that all three are reloading the same modules irrespective

of the order in which they do so, we can use sets (or sorts) to test for order-neutral

equality of their printed messages—obtained here by running shell commands with the

os.popen utility we met in Chapter 13 and used in Chapter 21:

>>> import os

>>> res1 = os.popen('reloadall.py tkinter').read()

>>> res2 = os.popen('reloadall2.py tkinter').read()

>>> res3 = os.popen('reloadall3.py tkinter').read()

>>> res1[:75]

'reloading tkinter\nreloading tkinter.constants\nreloading tkinter._fix\nreload'

>>> res1 == res2, res2 == res3

(False, False)

>>> set(res1) == set(res2), set(res2) == set(res3)

(True, True)

Run these scripts, study their code, and experiment on your own for more insight; these

are the sort of importable tools you might want to add to your own source code library.

Watch for a similar testing technique in the coverage of class tree listers in Chap-

ter 31, where we’ll apply it to passed class objects and extend it further.

Also keep in mind that all three variants reload only modules that were loaded with

import statements—since names copied with from statements do not cause a module

to be nested and referenced in the importer’s namespace, their containing module is

not reloaded. More fundamentally, the transitive reloaders rely on the fact that module

reloads update module objects in place, such that all references to those modules in any

scope will see the updated version automatically. Because they copy names out, from

importers are not updated by reloads—transitive or not—and supporting this may

require either source code analysis, or customization of the import operation (see

Chapter 22 for pointers).

Tool impacts like this are perhaps another reason to prefer import to from—which

brings us to the end of this chapter and part, and the standard set of warnings for this

part’s topic.

Module Gotchas

In this section, we’ll take a look at the usual collection of boundary cases that can make

life interesting for Python beginners. Some are review here, and a few are so obscure

770 | Chapter 25: Advanced Module Topics

www.it-ebooks.info

that coming up with representative examples can be a challenge, but most illustrate

something important about the language.

Module Name Clashes: Package and Package-Relative Imports

If you have two modules of the same name, you may only be able to import one of them

—by default, the one whose directory is leftmost in the sys.path module search path

will always be chosen. This isn’t an issue if the module you prefer is in your top-level

script’s directory; since that is always first in the module path, its contents will be

located first automatically. For cross-directory imports, however, the linear nature of

the module search path means that same-named files can clash.

To fix, either avoid same-named files or use the package imports feature of Chap-

ter 24. If you need to get to both same-named files, structure your source files in sub-

directories, such that package import directory names make the module references

unique. As long as the enclosing package directory names are unique, you’ll be able to

access either or both of the same-named modules.

Note that this issue can also crop up if you accidentally use a name for a module of

your own that happens to be the same as a standard library module you need—your

local module in the program’s home directory (or another directory early in the module

path) can hide and replace the library module.

To fix, either avoid using the same name as another module you need or store your

modules in a package directory and use Python 3.X’s package-relative import model,

available in 2.X as an option. In this model, normal imports skip the package directory

(so you’ll get the library’s version), but special dotted import statements can still select

the local version of the module if needed.

Statement Order Matters in Top-Level Code

As we’ve seen, when a module is first imported (or reloaded), Python executes its

statements one by one, from the top of the file to the bottom. This has a few subtle

implications regarding forward references that are worth underscoring here:

• Code at the top level of a module file (not nested in a function) runs as soon as

Python reaches it during an import; because of that, it cannot reference names

assigned lower in the file.

• Code inside a function body doesn’t run until the function is called; because names

in a function aren’t resolved until the function actually runs, they can usually ref-

erence names anywhere in the file.

Generally, forward references are only a concern in top-level module code that executes

immediately; functions can reference names arbitrarily. Here’s a file that illustrates

forward reference dos and don’ts:

Module Gotchas | 771

www.it-ebooks.info

func1() # Error: "func1" not yet assigned

def func1():

print(func2()) # OK: "func2" looked up later

func1() # Error: "func2" not yet assigned

def func2():

return "Hello"

func1() # OK: "func1" and "func2" assigned

When this file is imported (or run as a standalone program), Python executes its state-

ments from top to bottom. The first call to func1 fails because the func1 def hasn’t run

yet. The call to func2 inside func1 works as long as func2’s def has been reached by the

time func1 is called—and it hasn’t when the second top-level func1 call is run. The last

call to func1 at the bottom of the file works because func1 and func2 have both been

assigned.

Mixing defs with top-level code is not only difficult to read, it’s also dependent on

statement ordering. As a rule of thumb, if you need to mix immediate code with defs,

put your defs at the top of the file and your top-level code at the bottom. That way,

your functions are guaranteed to be defined and assigned by the time Python runs the

code that uses them.

from Copies Names but Doesn’t Link

Although it’s commonly used, the from statement is the source of a variety of potential

gotchas in Python. As we’ve learned, the from statement is really an assignment to names

in the importer’s scope—a name-copy operation, not a name aliasing. The implications

of this are the same as for all assignments in Python, but they’re subtle, especially given

that the code that shares the objects lives in different files. For instance, suppose we

define the following module, nested1.py:

# nested1.py

X = 99

def printer(): print(X)

If we import its two names using from in another module, nested2.py, we get copies of

those names, not links to them. Changing a name in the importer resets only the binding

of the local version of that name, not the name in nested1.py:

# nested2.py

from nested1 import X, printer # Copy names out

X = 88 # Changes my "X" only!

printer() # nested1's X is still 99

% python nested2.py

772 | Chapter 25: Advanced Module Topics

www.it-ebooks.info

If we use import to get the whole module and then assign to a qualified name, however,

we change the name in nested1.py. Attribute qualification directs Python to a name in

the module object, rather than a name in the importer, nested3.py:

# nested3.py

import nested1 # Get module as a whole

nested1.X = 88 # OK: change nested1's X

nested1.printer()

% python nested3.py

from * Can Obscure the Meaning of Variables

I mentioned this earlier but saved the details for here. Because you don’t list the vari-

ables you want when using the from module import * statement form, it can accidentally

overwrite names you’re already using in your scope. Worse, it can make it difficult to

determine where a variable comes from. This is especially true if the from * form is used

on more than one imported file.

For example, if you use from * on three modules in the following, you’ll have no way

of knowing what a raw function call really means, short of searching all three external

module files—all of which may be in other directories:

>>> from module1 import * # Bad: may overwrite my names silently

>>> from module2 import * # Worse: no way to tell what we get!

>>> from module3 import *

>>> . . .

>>> func() # Huh???

The solution again is not to do this: try to explicitly list the attributes you want in your

from statements, and restrict the from * form to at most one imported module per file.

That way, any undefined names must by deduction be in the module named in the

single from *. You can avoid the issue altogether if you always use import instead of

from, but that advice is too harsh; like much else in programming, from is a convenient

tool if used wisely. Even this example isn’t an absolute evil—it’s OK for a program to

use this technique to collect names in a single space for convenience, as long as it’s well

known.

reload May Not Impact from Imports

Here’s another from-related gotcha: as discussed previously, because from copies (as-

signs) names when run, there’s no link back to the modules where the names came

from. Names imported with from simply become references to objects, which happen

to have been referenced by the same names in the importee when the from ran.

Module Gotchas | 773

www.it-ebooks.info

Because of this behavior, reloading the importee has no effect on clients that import its

names using from. That is, the client’s names will still reference the original objects

fetched with from, even if the names in the original module are later reset:

from module import X # X may not reflect any module reloads!

. . .

from imp import reload

reload(module) # Changes module, but not my names

X # Still references old object

To make reloads more effective, use import and name qualification instead of from.

Because qualifications always go back to the module, they will find the new bindings

of module names after reloading has updated the module’s content in place:

import module # Get module, not names

. . .

from imp import reload

reload(module) # Changes module in place

module.X # Get current X: reflects module reloads

As a related consequence, our transitive reloader earlier in this chapter doesn’t apply

to names fetched with from, only import; again, if you’re going to use reloads, you’re

probably better off with import.

reload, from, and Interactive Testing

In fact, the prior gotcha is even more subtle than it appears. Chapter 3 warned that it’s

usually better not to launch programs with imports and reloads because of the com-

plexities involved. Things get even worse when from is brought into the mix. Python

beginners most often stumble onto its issues in scenarios like this—imagine that after

opening a module file in a text edit window, you launch an interactive session to load

and test your module with from:

from module import function

function(1, 2, 3)

Finding a bug, you jump back to the edit window, make a change, and try to reload

the module this way:

from imp import reload

reload(module)

This doesn’t work, because the from statement assigned only the name function, not

module. To refer to the module in a reload, you have to first bind its name with an

import statement at least once:

from imp import reload

import module

reload(module)

function(1, 2, 3)

However, this doesn’t quite work either—reload updates the module object in place,

but as discussed in the preceding section, names like function that were copied out of

774 | Chapter 25: Advanced Module Topics

www.it-ebooks.info

the module in the past still refer to the old objects; in this instance, function is still the

original version of the function. To really get the new function, you must refer to it as

module.function after the reload, or rerun the from:

from imp import reload

import module

reload(module)

from module import function # Or give up and use module.function()

function(1, 2, 3)

Now, the new version of the function will finally run, but it seems an awful lot of work

to get there.

As you can see, there are problems inherent in using reload with from: not only do you

have to remember to reload after imports, but you also have to remember to rerun your

from statements after reloads. This is complex enough to trip up even an expert once

in a while. In fact, the situation has gotten even worse in Python 3.X, because you must

also remember to import reload itself!

The short story is that you should not expect reload and from to play together nicely.

Again, the best policy is not to combine them at all—use reload with import, or launch

your programs other ways, as suggested in Chapter 3: using the Run→Run Module

menu option in IDLE, file icon clicks, system command lines, or the exec built-in

function.

Recursive from Imports May Not Work

I saved the most bizarre (and, thankfully, obscure) gotcha for last. Because imports

execute a file’s statements from top to bottom, you need to be careful when using

modules that import each other. This is often called recursive imports, but the recursion

doesn’t really occur (in fact, circular may be a better term here)—such imports won’t

get stuck in infinite importing loops. Still, because the statements in a module may not

all have been run when it imports another module, some of its names may not yet exist.

If you use import to fetch the module as a whole, this probably doesn’t matter; the

module’s names won’t be accessed until you later use qualification to fetch their values,

and by that time the module is likely complete. But if you use from to fetch specific

names, you must bear in mind that you will only have access to names in that module

that have already been assigned when a recursive import is kicked off.

For instance, consider the following modules, recur1 and recur2. recur1 assigns a name

X, and then imports recur2 before assigning the name Y. At this point, recur2 can fetch

recur1 as a whole with an import—it already exists in Python’s internal modules table,

which makes it importable, and also prevents the imports from looping. But if recur2

uses from, it will be able to see only the name X; the name Y, which is assigned below

the import in recur1, doesn’t yet exist, so you get an error:

# recur1.py

X = 1

Module Gotchas | 775

www.it-ebooks.info

import recur2 # Run recur2 now if it doesn't exist

Y = 2

# recur2.py

from recur1 import X # OK: "X" already assigned

from recur1 import Y # Error: "Y" not yet assigned

C:\code> py −3

>>> import recur1

Traceback (most recent call last):

File "<stdin>", line 1, in <module>

File ".\recur1.py", line 2, in <module>

import recur2

File ".\recur2.py", line 2, in <module>

from recur1 import Y

ImportError: cannot import name Y

Python avoids rerunning recur1’s statements when they are imported recursively from

recur2 (otherwise the imports would send the script into an infinite loop that might

require a Ctrl-C solution or worse), but recur1’s namespace is incomplete when it’s

imported by recur2.

The solution? Don’t use from in recursive imports (no, really!). Python won’t get stuck

in a cycle if you do, but your programs will once again be dependent on the order of

the statements in the modules. In fact, there are two ways out of this gotcha:

• You can usually eliminate import cycles like this by careful design—maximizing

cohesion and minimizing coupling are good first steps.

• If you can’t break the cycles completely, postpone module name accesses by using

import and attribute qualification (instead of from and direct names), or by running

your froms either inside functions (instead of at the top level of the module) or near

the bottom of your file to defer their execution.

There is additional perspective on this issue in the exercises at the end of this chapter

—which we’ve officially reached.

Chapter Summary

This chapter surveyed some more advanced module-related concepts. We studied data

hiding techniques, enabling new language features with the __future__ module, the

__name__ usage mode variable, transitive reloads, importing by name strings, and more.

We also explored and summarized module design issues, wrote some more substantial

programs, and looked at common mistakes related to modules to help you avoid them

in your code.

The next chapter begins our look at Python’s class—its object-oriented programming

tool. Much of what we’ve covered in the last few chapters will apply there, too: classes

live in modules and are namespaces as well, but they add an extra component to at-

tribute lookup called inheritance search. As this is the last chapter in this part of the

776 | Chapter 25: Advanced Module Topics

www.it-ebooks.info

book, however, before we dive into that topic, be sure to work through this part’s set

of lab exercises. And before that, here is this chapter’s quiz to review the topics covered

here.

Test Your Knowledge: Quiz

1. What is significant about variables at the top level of a module whose names begin

with a single underscore?

2. What does it mean when a module’s __name__ variable is the string "__main__"?

3. If the user interactively types the name of a module to test, how can your code

import it?

4. How is changing sys.path different from setting PYTHONPATH to modify the module

search path?

5. If the module __future__ allows us to import from the future, can we also import

from the past?

Test Your Knowledge: Answers

1. Variables at the top level of a module whose names begin with a single underscore

are not copied out to the importing scope when the from * statement form is used.

They can still be accessed by an import or the normal from statement form, though.

The __all__ list is similar, but the logical converse; its contents are the only names

that are copied out on a from *.

2. If a module’s __name__ variable is the string "__main__", it means that the file is

being executed as a top-level script instead of being imported from another file in

the program. That is, the file is being used as a program, not a library. This usage

mode variable supports dual-mode code and tests.

3. User input usually comes into a script as a string; to import the referenced module

given its string name, you can build and run an import statement with exec, or pass

the string name in a call to the __import__ or importlib.import_module.

4. Changing sys.path only affects one running program (process), and is temporary

—the change goes away when the program ends. PYTHONPATH settings live in the

operating system—they are picked up globally by all your programs on a machine,

and changes to these settings endure after programs exit.

5. No, we can’t import from the past in Python. We can install (or stubbornly use)

an older version of the language, but the latest Python is generally the best Python

(at least within lines—see 2.X longevity!).

Test Your Knowledge: Answers | 777

www.it-ebooks.info

Test Your Knowledge: Part V Exercises

See Part V in Appendix D for the solutions.

1. Import basics. Write a program that counts the lines and characters in a file (similar

in spirit to part of what wc does on Unix). With your text editor, code a Python

module called mymod.py that exports three top-level names:

• A countLines(name) function that reads an input file and counts the number

of lines in it (hint: file.readlines does most of the work for you, and len does

the rest, though you could count with for and file iterators to support massive

files too).

• A countChars(name) function that reads an input file and counts the number

of characters in it (hint: file.read returns a single string, which may be used

in similar ways).

• A test(name) function that calls both counting functions with a given input

filename. Such a filename generally might be passed in, hardcoded, input with

the input built-in function, or pulled from a command line via the sys.argv

list shown in this chapter’s formats.py and reloadall.py examples; for now, you

can assume it’s a passed-in function argument.

All three mymod functions should expect a filename string to be passed in. If you

type more than two or three lines per function, you’re working much too hard—

use the hints I just gave!

Next, test your module interactively, using import and attribute references to fetch

your exports. Does your PYTHONPATH need to include the directory where you created

mymod.py? Try running your module on itself: for example, test("mymod.py").

Note that test opens the file twice; if you’re feeling ambitious, you may be able to

improve this by passing an open file object into the two count functions (hint:

file.seek(0) is a file rewind).

2. from/from *. Test your mymod module from exercise 1 interactively by using from to

load the exports directly, first by name, then using the from * variant to fetch

everything.

3. __main__. Add a line in your mymod module that calls the test function automati-

cally only when the module is run as a script, not when it is imported. The line you

add will probably test the value of __name__ for the string "__main__", as shown in

this chapter. Try running your module from the system command line; then, im-

port the module and test its functions interactively. Does it still work in both

modes?

4. Nested imports. Write a second module, myclient.py, that imports mymod and tests

its functions; then run myclient from the system command line. If myclient uses

from to fetch from mymod, will mymod’s functions be accessible from the top level of

myclient? What if it imports with import instead? Try coding both variations in

778 | Chapter 25: Advanced Module Topics

www.it-ebooks.info

myclient and test interactively by importing myclient and inspecting its __dict__

attribute.

5. Package imports. Import your file from a package. Create a subdirectory called

mypkg nested in a directory on your module import search path, copy or move the

mymod.py module file you created in exercise 1 or 3 into the new directory, and

try to import it with a package import of the form import mypkg.mymod and call its

functions. Try to fetch your counter functions with a from too.

You’ll need to add an __init__.py file in the directory your module was moved to

make this go, but it should work on all major Python platforms (that’s part of the

reason Python uses “.” as a path separator). The package directory you create can

be simply a subdirectory of the one you’re working in; if it is, it will be found via

the home directory component of the search path, and you won’t have to configure

your path. Add some code to your __init__.py, and see if it runs on each import.

6. Reloads. Experiment with module reloads: perform the tests in Chapter 23’s

changer.py example, changing the called function’s message and/or behavior re-

peatedly, without stopping the Python interpreter. Depending on your system, you

might be able to edit changer in another window, or suspend the Python interpreter

and edit in the same window (on Unix, a Ctrl-Z key combination usually suspends

the current process, and an fg command later resumes it, though a text edit window

probably works just as well).

7. Circular imports. In the section on recursive (a.k.a. circular) import gotchas, im-

porting recur1 raised an error. But if you restart Python and import recur2 inter-

actively, the error doesn’t occur—test this and see for yourself. Why do you think

it works to import recur2, but not recur1? (Hint: Python stores new modules in

the built-in sys.modules table—a dictionary—before running their code; later im-

ports fetch the module from this table first, whether the module is “complete” yet

or not.) Now, try running recur1 as a top-level script file: python recur1.py. Do

you get the same error that occurs when recur1 is imported interactively? Why?

(Hint: when modules are run as programs, they aren’t imported, so this case has

the same effect as importing recur2 interactively; recur2 is the first module impor-

ted.) What happens when you run recur2 as a script? Circular imports are uncom-

mon and rarely this bizarre in practice. On the other hand, if you can understand

why they are a potential problem, you know a lot about Python’s import semantics.

Test Your Knowledge: Part V Exercises | 779

www.it-ebooks.info

PART VI

Classes and OOP

www.it-ebooks.info

CHAPTER 26

OOP: The Big Picture

So far in this book, we’ve been using the term “object” generically. Really, the code

written up to this point has been object-based—we’ve passed objects around our scripts,

used them in expressions, called their methods, and so on. For our code to qualify as

being truly object-oriented (OO), though, our objects will generally need to also par-

ticipate in something called an inheritance hierarchy.

This chapter begins our exploration of the Python class—a coding structure and device

used to implement new kinds of objects in Python that support inheritance. Classes are

Python’s main object-oriented programming (OOP) tool, so we’ll also look at OOP

basics along the way in this part of the book. OOP offers a different and often more

effective way of programming, in which we factor code to minimize redundancy, and

write new programs by customizing existing code instead of changing it in place.

In Python, classes are created with a new statement: the class. As you’ll see, the objects

defined with classes can look a lot like the built-in types we studied earlier in the book.

In fact, classes really just apply and extend the ideas we’ve already covered; roughly,

they are packages of functions that use and process built-in object types. Classes,

though, are designed to create and manage new objects, and support inheritance—a

mechanism of code customization and reuse above and beyond anything we’ve seen

so far.

One note up front: in Python, OOP is entirely optional, and you don’t need to use

classes just to get started. You can get plenty of work done with simpler constructs such

as functions, or even simple top-level script code. Because using classes well requires

some up-front planning, they tend to be of more interest to people who work in stra-

tegic mode (doing long-term product development) than to people who work in tacti-

cal mode (where time is in very short supply).

Still, as you’ll see in this part of the book, classes turn out to be one of the most useful

tools Python provides. When used well, classes can actually cut development time

radically. They’re also employed in popular Python tools like the tkinter GUI API, so

most Python programmers will usually find at least a working knowledge of class basics

helpful.

783

www.it-ebooks.info

Why Use Classes?

Remember when I told you that programs “do things with stuff” in Chapter 4 and

Chapter 10? In simple terms, classes are just a way to define new sorts of stuff, reflecting

real objects in a program’s domain. For instance, suppose we decide to implement that

hypothetical pizza-making robot we used as an example in Chapter 16. If we implement

it using classes, we can model more of its real-world structure and relationships. Two

aspects of OOP prove useful here:

Inheritance

Pizza-making robots are kinds of robots, so they possess the usual robot-y prop-

erties. In OOP terms, we say they “inherit” properties from the general category

of all robots. These common properties need to be implemented only once for the

general case and can be reused in part or in full by all types of robots we may build

in the future.

Composition

Pizza-making robots are really collections of components that work together as a

team. For instance, for our robot to be successful, it might need arms to roll dough,

motors to maneuver to the oven, and so on. In OOP parlance, our robot is an

example of composition; it contains other objects that it activates to do its bidding.

Each component might be coded as a class, which defines its own behavior and

relationships.

General OOP ideas like inheritance and composition apply to any application that can

be decomposed into a set of objects. For example, in typical GUI systems, interfaces

are written as collections of widgets—buttons, labels, and so on—which are all drawn

when their container is drawn (composition). Moreover, we may be able to write our

own custom widgets—buttons with unique fonts, labels with new color schemes, and

the like—which are specialized versions of more general interface devices (inheritance).

From a more concrete programming perspective, classes are Python program units, just

like functions and modules: they are another compartment for packaging logic and

data. In fact, classes also define new namespaces, much like modules. But, compared

to other program units we’ve already seen, classes have three critical distinctions that

make them more useful when it comes to building new objects:

Multiple instances

Classes are essentially factories for generating one or more objects. Every time we

call a class, we generate a new object with a distinct namespace. Each object gen-

erated from a class has access to the class’s attributes and gets a namespace of its

own for data that varies per object. This is similar to the per-call state retention of

Chapter 17’s closure functions, but is explicit and natural in classes, and is just one

of the things that classes do. Classes offer a complete programming solution.

784 | Chapter 26: OOP: The Big Picture

www.it-ebooks.info

Customization via inheritance

Classes also support the OOP notion of inheritance; we can extend a class by re-

defining its attributes outside the class itself in new software components coded

as subclasses. More generally, classes can build up namespace hierarchies, which

define names to be used by objects created from classes in the hierarchy. This

supports multiple customizable behaviors more directly than other tools.

Operator overloading

By providing special protocol methods, classes can define objects that respond to

the sorts of operations we saw at work on built-in types. For instance, objects made

with classes can be sliced, concatenated, indexed, and so on. Python provides

hooks that classes can use to intercept and implement any built-in type operation.

At its base, the mechanism of OOP in Python is largely just two bits of magic: a special

first argument in functions (to receive the subject of a call) and inheritance attribute

search (to support programming by customization). Other than this, the model is

largely just functions that ultimately process built-in types. While not radically new,

though, OOP adds an extra layer of structure that supports better programming than

flat procedural models. Along with the functional tools we met earlier, it represents a

major abstraction step above computer hardware that helps us build more sophisticated

programs.

OOP from 30,000 Feet

Before we see what this all means in terms of code, I’d like to say a few words about

the general ideas behind OOP. If you’ve never done anything object-oriented in your

life before now, some of the terminology in this chapter may seem a bit perplexing on

the first pass. Moreover, the motivation for these terms may be elusive until you’ve had

a chance to study the ways that programmers apply them in larger systems. OOP is as

much an experience as a technology.

Attribute Inheritance Search

The good news is that OOP is much simpler to understand and use in Python than in

other languages, such as C++ or Java. As a dynamically typed scripting language,

Python removes much of the syntactic clutter and complexity that clouds OOP in other

tools. In fact, much of the OOP story in Python boils down to this expression:

object.attribute

We’ve been using this expression throughout the book to access module attributes, call

methods of objects, and so on. When we say this to an object that is derived from a

class statement, however, the expression kicks off a search in Python—it searches a

tree of linked objects, looking for the first appearance of attribute that it can find.

When classes are involved, the preceding Python expression effectively translates to

the following in natural language:

OOP from 30,000 Feet | 785

www.it-ebooks.info

Find the first occurrence of attribute by looking in object, then in all classes above it,

from bottom to top and left to right.

In other words, attribute fetches are simply tree searches. The term inheritance is ap-

plied because objects lower in a tree inherit attributes attached to objects higher in that

tree. As the search proceeds from the bottom up, in a sense, the objects linked into a

tree are the union of all the attributes defined in all their tree parents, all the way up

the tree.

In Python, this is all very literal: we really do build up trees of linked objects with code,

and Python really does climb this tree at runtime searching for attributes every time we

use the object.attribute expression. To make this more concrete, Figure 26-1 sketches

an example of one of these trees.

Figure 26-1. A class tree, with two instances at the bottom (I1 and I2), a class above them (C1), and

two superclasses at the top (C2 and C3). All of these objects are namespaces (packages of variables),

and the inheritance search is simply a search of the tree from bottom to top looking for the lowest

occurrence of an attribute name. Code implies the shape of such trees.

In this figure, there is a tree of five objects labeled with variables, all of which have

attached attributes, ready to be searched. More specifically, this tree links together three

class objects (the ovals C1, C2, and C3) and two instance objects (the rectangles I1 and

I2) into an inheritance search tree. Notice that in the Python object model, classes and

the instances you generate from them are two distinct object types:

Classes

Serve as instance factories. Their attributes provide behavior—data and functions

—that is inherited by all the instances generated from them (e.g., a function to

compute an employee’s salary from pay and hours).

Instances

Represent the concrete items in a program’s domain. Their attributes record data

that varies per specific object (e.g., an employee’s Social Security number).

In terms of search trees, an instance inherits attributes from its class, and a class inherits

attributes from all classes above it in the tree.

786 | Chapter 26: OOP: The Big Picture

www.it-ebooks.info

In Figure 26-1, we can further categorize the ovals by their relative positions in the tree.

We usually call classes higher in the tree (like C2 and C3) superclasses; classes lower in

the tree (like C1) are known as subclasses. These terms refer to both relative tree positions

and roles. Superclasses provide behavior shared by all their subclasses, but because the

search proceeds from the bottom up, subclasses may override behavior defined in their

superclasses by redefining superclass names lower in the tree.1

As these last few words are really the crux of the matter of software customization in

OOP, let’s expand on this concept. Suppose we build up the tree in Figure 26-1, and

then say this:

I2.w

Right away, this code invokes inheritance. Because this is an object.attribute expres-

sion, it triggers a search of the tree in Figure 26-1—Python will search for the attribute

w by looking in I2 and above. Specifically, it will search the linked objects in this order:

I2, C1, C2, C3

and stop at the first attached w it finds (or raise an error if w isn’t found at all). In this

case, w won’t be found until C3 is searched because it appears only in that object. In

other words, I2.w resolves to C3.w by virtue of the automatic search. In OOP termi-

nology, I2 “inherits” the attribute w from C3.

Ultimately, the two instances inherit four attributes from their classes: w, x, y, and z.

Other attribute references will wind up following different paths in the tree. For ex-

ample:

•I1.x and I2.x both find x in C1 and stop because C1 is lower than C2.

•I1.y and I2.y both find y in C1 because that’s the only place y appears.

•I1.z and I2.z both find z in C2 because C2 is further to the left than C3.

•I2.name finds name in I2 without climbing the tree at all.

Trace these searches through the tree in Figure 26-1 to get a feel for how inheritance

searches work in Python.

The first item in the preceding list is perhaps the most important to notice—because

C1 redefines the attribute x lower in the tree, it effectively replaces the version above it

in C2. As you’ll see in a moment, such redefinitions are at the heart of software cus-

tomization in OOP—by redefining and replacing the attribute, C1 effectively customizes

what it inherits from its superclasses.

1. In other literature and circles, you may also occasionally see the terms base classes and derived classes

used to describe superclasses and subclasses, respectively. Python people and this book tend to use the

latter terms.

OOP from 30,000 Feet | 787

www.it-ebooks.info

Classes and Instances

Although they are technically two separate object types in the Python model, the classes

and instances we put in these trees are almost identical—each type’s main purpose is

to serve as another kind of namespace—a package of variables, and a place where we

can attach attributes. If classes and instances therefore sound like modules, they should;

however, the objects in class trees also have automatically searched links to other

namespace objects, and classes correspond to statements, not entire files.

The primary difference between classes and instances is that classes are a kind of fac-

tory for generating instances. For example, in a realistic application, we might have an

Employee class that defines what it means to be an employee; from that class, we generate

actual Employee instances. This is another difference between classes and modules—

we only ever have one instance of a given module in memory (that’s why we have to

reload a module to get its new code), but with classes, we can make as many instances

as we need.

Operationally, classes will usually have functions attached to them (e.g., computeSa

lary), and the instances will have more basic data items used by the class’s functions

(e.g., hoursWorked). In fact, the object-oriented model is not that different from the

classic data-processing model of programs plus records—in OOP, instances are like

records with “data,” and classes are the “programs” for processing those records. In

OOP, though, we also have the notion of an inheritance hierarchy, which supports

software customization better than earlier models.

Method Calls

In the prior section, we saw how the attribute reference I2.w in our example class tree

was translated to C3.w by the inheritance search procedure in Python. Perhaps just as

important to understand as the inheritance of attributes, though, is what happens when

we try to call methods—functions attached to classes as attributes.

If this I2.w reference is a function call, what it really means is “call the C3.w function to

process I2.” That is, Python will automatically map the call I2.w() into the call

C3.w(I2), passing in the instance as the first argument to the inherited function.

In fact, whenever we call a function attached to a class in this fashion, an instance of

the class is always implied. This implied subject or context is part of the reason we refer

to this as an object-oriented model—there is always a subject object when an operation

is run. In a more realistic example, we might invoke a method called giveRaise attached

as an attribute to an Employee class; such a call has no meaning unless qualified with

the employee to whom the raise should be given.

As we’ll see later, Python passes in the implied instance to a special first argument in

the method, called self by convention. Methods go through this argument to process

the subject of the call. As we’ll also learn, methods can be called through either an

instance—bob.giveRaise()—or a class—Employee.giveRaise(bob)—and both forms

788 | Chapter 26: OOP: The Big Picture

www.it-ebooks.info

serve purposes in our scripts. These calls also illustrate both of the key ideas in OOP:

to run a bob.giveRaise() method call, Python:

1. Looks up giveRaise from bob, by inheritance search

2. Passes bob to the located giveRaise function, in the special self argument

When you call Employee.giveRaise(bob), you’re just performing both steps yourself.

This description is technically the default case (Python has additional method types

we’ll meet later), but it applies to the vast majority of the OOP code written in the

language. To see how methods receive their subjects, though, we need to move on to

some code.

Coding Class Trees

Although we are speaking in the abstract here, there is tangible code behind all these

ideas, of course. We construct trees and their objects with class statements and class

calls, which we’ll meet in more detail later. In short:

• Each class statement generates a new class object.

• Each time a class is called, it generates a new instance object.

• Instances are automatically linked to the classes from which they are created.

• Classes are automatically linked to their superclasses according to the way we list

them in parentheses in a class header line; the left-to-right order there gives the

order in the tree.

To build the tree in Figure 26-1, for example, we would run Python code of the following

form. Like function definition, classes are normally coded in module files and are run

during an import (I’ve omitted the guts of the class statements here for brevity):

class C2: ... # Make class objects (ovals)

class C3: ...

class C1(C2, C3): ... # Linked to superclasses (in this order)

I1 = C1() # Make instance objects (rectangles)

I2 = C1() # Linked to their classes

Here, we build the three class objects by running three class statements, and make the

two instance objects by calling the class C1 twice, as though it were a function. The

instances remember the class they were made from, and the class C1 remembers its listed

superclasses.

Technically, this example is using something called multiple inheritance, which simply

means that a class has more than one superclass above it in the class tree—a useful

technique when you wish to combine multiple tools. In Python, if there is more than

one superclass listed in parentheses in a class statement (like C1’s here), their left-to-

right order gives the order in which those superclasses will be searched for attributes

OOP from 30,000 Feet | 789

www.it-ebooks.info

by inheritance. The leftmost version of a name is used by default, though you can always

choose a name by asking for it from the class it lives in (e.g., C3.z).

Because of the way inheritance searches proceed, the object to which you attach an

attribute turns out to be crucial—it determines the name’s scope. Attributes attached

to instances pertain only to those single instances, but attributes attached to classes are

shared by all their subclasses and instances. Later, we’ll study the code that hangs

attributes on these objects in depth. As we’ll find:

• Attributes are usually attached to classes by assignments made at the top level in

class statement blocks, and not nested inside function def statements there.

• Attributes are usually attached to instances by assignments to the special argument

passed to functions coded inside classes, called self.

For example, classes provide behavior for their instances with method functions we

create by coding def statements inside class statements. Because such nested defs as-

sign names within the class, they wind up attaching attributes to the class object that

will be inherited by all instances and subclasses:

class C2: ... # Make superclass objects

class C3: ...

class C1(C2, C3): # Make and link class C1

def setname(self, who): # Assign name: C1.setname

self.name = who # Self is either I1 or I2

I1 = C1() # Make two instances

I2 = C1()

I1.setname('bob') # Sets I1.name to 'bob'

I2.setname('sue') # Sets I2.name to 'sue'

print(I1.name) # Prints 'bob'

There’s nothing syntactically unique about def in this context. Operationally, though,

when a def appears inside a class like this, it is usually known as a method, and it

automatically receives a special first argument—called self by convention—that pro-

vides a handle back to the instance to be processed. Any values you pass to the method

yourself go to arguments after self (here, to who).2

Because classes are factories for multiple instances, their methods usually go through

this automatically passed-in self argument whenever they need to fetch or set attributes

of the particular instance being processed by a method call. In the preceding code,

self is used to store a name in one of two instances.

Like simple variables, attributes of classes and instances are not declared ahead of time,

but spring into existence the first time they are assigned values. When a method assigns

to a self attribute, it creates or changes an attribute in an instance at the bottom of the

2. If you’ve ever used C++ or Java, you’ll recognize that Python’s self is the same as the this pointer, but

self is always explicit in both headers and bodies of Python methods to make attribute accesses more

obvious: a name has fewer possible meanings.

790 | Chapter 26: OOP: The Big Picture

www.it-ebooks.info

class tree (i.e., one of the rectangles in Figure 26-1) because self automatically refers

to the instance being processed—the subject of the call.

In fact, because all the objects in class trees are just namespace objects, we can fetch or

set any of their attributes by going through the appropriate names. Saying C1.setname

is as valid as saying I1.setname, as long as the names C1 and I1 are in your code’s scopes.

Operator Overloading

As currently coded, our C1 class doesn’t attach a name attribute to an instance until the

setname method is called. Indeed, referencing I1.name before calling I1.setname would

produce an undefined name error. If a class wants to guarantee that an attribute like

name is always set in its instances, it more typically will fill out the attribute at con-

struction time, like this:

class C2: ... # Make superclass objects

class C3: ...

class C1(C2, C3):

def __init__(self, who): # Set name when constructed

self.name = who # Self is either I1 or I2

I1 = C1('bob') # Sets I1.name to 'bob'

I2 = C1('sue') # Sets I2.name to 'sue'

print(I1.name) # Prints 'bob'

If it’s coded or inherited, Python automatically calls a method named __init__ each

time an instance is generated from a class. The new instance is passed in to the self

argument of __init__ as usual, and any values listed in parentheses in the class call go

to arguments two and beyond. The effect here is to initialize instances when they are

made, without requiring extra method calls.

The __init__ method is known as the constructor because of when it is run. It’s the

most commonly used representative of a larger class of methods called operator over-

loading methods, which we’ll discuss in more detail in the chapters that follow. Such

methods are inherited in class trees as usual and have double underscores at the start

and end of their names to make them distinct. Python runs them automatically when

instances that support them appear in the corresponding operations, and they are

mostly an alternative to using simple method calls. They’re also optional: if omitted,

the operations are not supported. If no __init__ is present, class calls return an empty

instance, without initializing it.

For example, to implement set intersection, a class might either provide a method

named intersect, or overload the & expression operator to dispatch to the required

logic by coding a method named __and__. Because the operator scheme makes instances

look and feel more like built-in types, it allows some classes to provide a consistent and

natural interface, and be compatible with code that expects a built-in type. Still, apart

from the __init__ constructor—which appears in most realistic classes—many pro-

OOP from 30,000 Feet | 791

www.it-ebooks.info

grams may be better off with simpler named methods unless their objects are similar

to built-ins. A giveRaise may make sense for an Employee, but a & might not.

OOP Is About Code Reuse

And that, along with a few syntax details, is most of the OOP story in Python. Of course,

there’s a bit more to it than just inheritance. For example, operator overloading is much

more general than I’ve described so far—classes may also provide their own imple-

mentations of operations such as indexing, fetching attributes, printing, and more. By

and large, though, OOP is about looking up attributes in trees with a special first ar-

gument in functions.

So why would we be interested in building and searching trees of objects? Although it

takes some experience to see how, when used well, classes support code reuse in ways

that other Python program components cannot. In fact, this is their highest purpose.

With classes, we code by customizing existing software, instead of either changing

existing code in place or starting from scratch for each new project. This turns out to

be a powerful paradigm in realistic programming.

At a fundamental level, classes are really just packages of functions and other names,

much like modules. However, the automatic attribute inheritance search that we get

with classes supports customization of software above and beyond what we can do

with modules and functions. Moreover, classes provide a natural structure for code

that packages and localizes logic and names, and so aids in debugging.

For instance, because methods are simply functions with a special first argument, we

can mimic some of their behavior by manually passing objects to be processed to simple

functions. The participation of methods in class inheritance, though, allows us to nat-

urally customize existing software by coding subclasses with new method definitions,

rather than changing existing code in place. There is really no such concept with mod-

ules and functions.

Polymorphism and classes

As an example, suppose you’re assigned the task of implementing an employee database

application. As a Python OOP programmer, you might begin by coding a general su-

perclass that defines default behaviors common to all the kinds of employees in your

organization:

class Employee: # General superclass

def computeSalary(self): ... # Common or default behaviors

def giveRaise(self): ...

def promote(self): ...

def retire(self): ...

Once you’ve coded this general behavior, you can specialize it for each specific kind of

employee to reflect how the various types differ from the norm. That is, you can code

subclasses that customize just the bits of behavior that differ per employee type; the

792 | Chapter 26: OOP: The Big Picture

www.it-ebooks.info

rest of the employee types’ behavior will be inherited from the more general class. For

example, if engineers have a unique salary computation rule (perhaps it’s not hours

times rate), you can replace just that one method in a subclass:

class Engineer(Employee): # Specialized subclass

def computeSalary(self): ... # Something custom here

Because the computeSalary version here appears lower in the class tree, it will replace

(override) the general version in Employee. You then create instances of the kinds of

employee classes that the real employees belong to, to get the correct behavior:

bob = Employee() # Default behavior

sue = Employee() # Default behavior

tom = Engineer() # Custom salary calculator

Notice that you can make instances of any class in a tree, not just the ones at the bottom

—the class you make an instance from determines the level at which the attribute search

will begin, and thus which versions of the methods it will employ.

Ultimately, these three instance objects might wind up embedded in a larger container

object—for instance, a list, or an instance of another class—that represents a depart-

ment or company using the composition idea mentioned at the start of this chapter.

When you later ask for these employees’ salaries, they will be computed according to

the classes from which the objects were made, due to the principles of the inheritance

search:

company = [bob, sue, tom] # A composite object

for emp in company:

print(emp.computeSalary()) # Run this object's version: default or custom

This is yet another instance of the idea of polymorphism introduced in Chapter 4 and

expanded in Chapter 16. Recall that polymorphism means that the meaning of an op-

eration depends on the object being operated on. That is, code shouldn’t care about

what an object is, only about what it does. Here, the method computeSalary is located

by inheritance search in each object before it is called. The net effect is that we auto-

matically run the correct version for the object being processed. Trace the code to see

why.3

In other applications, polymorphism might also be used to hide (i.e., encapsulate) in-

terface differences. For example, a program that processes data streams might be coded

to expect objects with input and output methods, without caring what those methods

actually do:

def processor(reader, converter, writer):

while True:

data = reader.read()

3. The company list in this example could be a database if stored in a file with Python object pickling,

introduced in Chapter 9, to make the employees persistent. Python also comes with a module named

shelve, which allows the pickled representation of class instances to be stored in an access-by-key

filesystem; we’ll deploy it in Chapter 28.

OOP from 30,000 Feet | 793

www.it-ebooks.info

if not data: break

data = converter(data)

writer.write(data)

By passing in instances of subclasses that specialize the required read and write method

interfaces for various data sources, we can reuse the processor function for any data

source we need to use, both now and in the future:

class Reader:

def read(self): ... # Default behavior and tools

def other(self): ...

class FileReader(Reader):

def read(self): ... # Read from a local file

class SocketReader(Reader):

def read(self): ... # Read from a network socket

...

processor(FileReader(...), Converter, FileWriter(...))

processor(SocketReader(...), Converter, TapeWriter(...))

processor(FtpReader(...), Converter, XmlWriter(...))

Moreover, because the internal implementations of those read and write methods have

been factored into single locations, they can be changed without impacting code such

as this that uses them. The processor function might even be a class itself to allow the

conversion logic of converter to be filled in by inheritance, and to allow readers and

writers to be embedded by composition (we’ll see how this works later in this part of

the book).

Programming by customization

Once you get used to programming this way (by software customization), you’ll find

that when it’s time to write a new program, much of your work may already be done

—your task largely becomes one of mixing together existing superclasses that already

implement the behavior required by your program. For example, someone else might

have written the Employee, Reader, and Writer classes in this section’s examples for use

in completely different programs. If so, you get all of that person’s code “for free.”

In fact, in many application domains, you can fetch or purchase collections of super-

classes, known as frameworks, that implement common programming tasks as classes,

ready to be mixed into your applications. These frameworks might provide database

interfaces, testing protocols, GUI toolkits, and so on. With frameworks, you often

simply code a subclass that fills in an expected method or two; the framework classes

higher in the tree do most of the work for you. Programming in such an OOP world is

just a matter of combining and specializing already debugged code by writing subclasses

of your own.

Of course, it takes a while to learn how to leverage classes to achieve such OOP utopia.

In practice, object-oriented work also entails substantial design work to fully realize

the code reuse benefits of classes—to this end, programmers have begun cataloging

common OOP structures, known as design patterns, to help with design issues. The

actual code you write to do OOP in Python, though, is so simple that it will not in itself

794 | Chapter 26: OOP: The Big Picture

www.it-ebooks.info

pose an additional obstacle to your OOP quest. To see why, you’ll have to move on to

Chapter 27.

Chapter Summary

We took an abstract look at classes and OOP in this chapter, taking in the big picture

before we dive into syntax details. As we’ve seen, OOP is mostly about an argument

named self, and a search for attributes in trees of linked objects called inheritance.

Objects at the bottom of the tree inherit attributes from objects higher up in the tree

—a feature that enables us to program by customizing code, rather than changing it or

starting from scratch. When used well, this model of programming can cut develop-

ment time radically.

The next chapter will begin to fill in the coding details behind the picture painted here.

As we get deeper into Python classes, though, keep in mind that the OOP model in

Python is very simple; as we’ve seen here, it’s really just about looking up attributes in

object trees and a special function argument. Before we move on, here’s a quick quiz

to review what we’ve covered here.

Test Your Knowledge: Quiz

1. What is the main point of OOP in Python?

2. Where does an inheritance search look for an attribute?

3. What is the difference between a class object and an instance object?

4. Why is the first argument in a class’s method function special?

5. What is the __init__ method used for?

6. How do you create a class instance?

7. How do you create a class?

8. How do you specify a class’s superclasses?

Test Your Knowledge: Answers

1. OOP is about code reuse—you factor code to minimize redundancy and program

by customizing what already exists instead of changing code in place or starting

from scratch.

2. An inheritance search looks for an attribute first in the instance object, then in the

class the instance was created from, then in all higher superclasses, progressing

from the bottom to the top of the object tree, and from left to right (by default).

The search stops at the first place the attribute is found. Because the lowest version

of a name found along the way wins, class hierarchies naturally support customi-

zation by extension in new subclasses.

Test Your Knowledge: Answers | 795

www.it-ebooks.info

3. Both class and instance objects are namespaces (packages of variables that appear

as attributes). The main difference between them is that classes are a kind of factory

for creating multiple instances. Classes also support operator overloading meth-

ods, which instances inherit, and treat any functions nested in the class as methods

for processing instances.

4. The first argument in a class’s method function is special because it always receives

the instance object that is the implied subject of the method call. It’s usually called

self by convention. Because method functions always have this implied subject

and object context by default, we say they are “object-oriented” (i.e., designed to

process or change objects).

5. If the __init__ method is coded or inherited in a class, Python calls it automatically

each time an instance of that class is created. It’s known as the constructor method;

it is passed the new instance implicitly, as well as any arguments passed explicitly

to the class name. It’s also the most commonly used operator overloading method.

If no __init__ method is present, instances simply begin life as empty namespaces.

6. You create a class instance by calling the class name as though it were a function;

any arguments passed into the class name show up as arguments two and beyond

in the __init__ constructor method. The new instance remembers the class it was

created from for inheritance purposes.

7. You create a class by running a class statement; like function definitions, these

statements normally run when the enclosing module file is imported (more on this

in the next chapter).

8. You specify a class’s superclasses by listing them in parentheses in the class state-

ment, after the new class’s name. The left-to-right order in which the classes are

listed in the parentheses gives the left-to-right inheritance search order in the class

tree.

796 | Chapter 26: OOP: The Big Picture

www.it-ebooks.info

CHAPTER 27

Class Coding Basics

Now that we’ve talked about OOP in the abstract, it’s time to see how this translates

to actual code. This chapter begins to fill in the syntax details behind the class model

in Python.

If you’ve never been exposed to OOP in the past, classes can seem somewhat compli-

cated if taken in a single dose. To make class coding easier to absorb, we’ll begin our

detailed exploration of OOP by taking a first look at some basic classes in action in this

chapter. We’ll expand on the details introduced here in later chapters of this part of

the book, but in their basic form, Python classes are easy to understand.

In fact, classes have just three primary distinctions. At a base level, they are mostly just

namespaces, much like the modules we studied in Part V. Unlike modules, though,

classes also have support for generating multiple objects, for namespace inheritance,

and for operator overloading. Let’s begin our class statement tour by exploring each

of these three distinctions in turn.

Classes Generate Multiple Instance Objects

To understand how the multiple objects idea works, you have to first understand that

there are two kinds of objects in Python’s OOP model: class objects and instance ob-

jects. Class objects provide default behavior and serve as factories for instance objects.

Instance objects are the real objects your programs process—each is a namespace in

its own right, but inherits (i.e., has automatic access to) names in the class from which

it was created. Class objects come from statements, and instances come from calls; each

time you call a class, you get a new instance of that class.

This object-generation concept is very different from most of the other program con-

structs we’ve seen so far in this book. In effect, classes are essentially factories for gen-

erating multiple instances. By contrast, only one copy of each module is ever imported

into a single program. In fact, this is why reload works as it does, updating a single-

instance shared object in place. With classes, each instance can have its own, inde-

pendent data, supporting multiple versions of the object that the class models.

797

www.it-ebooks.info

In this role, class instances are similar to the per-call state of the closure (a.k.a. factory)

functions of Chapter 17, but this is a natural part of the class model, and state in classes

is explicit attributes instead of implicit scope references. Moreover, this is just part of

what classes do—they also support customization by inheritance, operator overload-

ing, and multiple behaviors via methods. Generally speaking, classes are a more com-

plete programming tool, though OOP and function programming are not mutually ex-

clusive paradigms. We may combine them by using functional tools in methods, by

coding methods that are themselves generators, by writing user-defined iterators (as

we’ll see in Chapter 30), and so on.

The following is a quick summary of the bare essentials of Python OOP in terms of its

two object types. As you’ll see, Python classes are in some ways similar to both defs

and modules, but they may be quite different from what you’re used to in other lan-

guages.

Class Objects Provide Default Behavior

When we run a class statement, we get a class object. Here’s a rundown of the main

properties of Python classes:

•The class statement creates a class object and assigns it a name. Just like the

function def statement, the Python class statement is an executable statement.

When reached and run, it generates a new class object and assigns it to the name

in the class header. Also, like defs, class statements typically run when the files

they are coded in are first imported.

•Assignments inside class statements make class attributes. Just like in module

files, top-level assignments within a class statement (not nested in a def) generate

attributes in a class object. Technically, the class statement defines a local scope

that morphs into the attribute namespace of the class object, just like a module’s

global scope. After running a class statement, class attributes are accessed by name

qualification: object.name.

•Class attributes provide object state and behavior. Attributes of a class object

record state information and behavior to be shared by all instances created from

the class; function def statements nested inside a class generate methods, which

process instances.

Instance Objects Are Concrete Items

When we call a class object, we get an instance object. Here’s an overview of the key

points behind class instances:

•Calling a class object like a function makes a new instance object. Each time

a class is called, it creates and returns a new instance object. Instances represent

concrete items in your program’s domain.

798 | Chapter 27: Class Coding Basics

www.it-ebooks.info

•Each instance object inherits class attributes and gets its own namespace.

Instance objects created from classes are new namespaces; they start out empty

but inherit attributes that live in the class objects from which they were generated.

•Assignments to attributes of self in methods make per-instance attributes.

Inside a class’s method functions, the first argument (called self by convention)

references the instance object being processed; assignments to attributes of self

create or change data in the instance, not the class.

The end result is that classes define common, shared data and behavior, and generate

instances. Instances reflect concrete application entities, and record per-instance data

that may vary per object.

A First Example

Let’s turn to a real example to show how these ideas work in practice. To begin, let’s

define a class named FirstClass by running a Python class statement interactively:

>>> class FirstClass: # Define a class object

def setdata(self, value): # Define class's methods

self.data = value # self is the instance

def display(self):

print(self.data) # self.data: per instance

We’re working interactively here, but typically, such a statement would be run when

the module file it is coded in is imported. Like functions created with defs, this class

won’t even exist until Python reaches and runs this statement.

Like all compound statements, the class starts with a header line that lists the class

name, followed by a body of one or more nested and (usually) indented statements.

Here, the nested statements are defs; they define functions that implement the behavior

the class means to export.

As we learned in Part IV, def is really an assignment. Here, it assigns function objects

to the names setdata and display in the class statement’s scope, and so generates

attributes attached to the class—FirstClass.setdata and FirstClass.display. In fact,

any name assigned at the top level of the class’s nested block becomes an attribute of

the class.

Functions inside a class are usually called methods. They’re coded with normal defs,

and they support everything we’ve learned about functions already (they can have de-

faults, return values, yield items on request, and so on). But in a method function, the

first argument automatically receives an implied instance object when called—the sub-

ject of the call. We need to create a couple of instances to see how this works:

>>> x = FirstClass() # Make two instances

>>> y = FirstClass() # Each is a new namespace

By calling the class this way (notice the parentheses), we generate instance objects,

which are just namespaces that have access to their classes’ attributes. Properly speak-

Classes Generate Multiple Instance Objects | 799

www.it-ebooks.info

ing, at this point, we have three objects: two instances and a class. Really, we have three

linked namespaces, as sketched in Figure 27-1. In OOP terms, we say that x “is a”

FirstClass, as is y—they both inherit names attached to the class.

The two instances start out empty but have links back to the class from which they

were generated. If we qualify an instance with the name of an attribute that lives in the

class object, Python fetches the name from the class by inheritance search (unless it

also lives in the instance):

>>> x.setdata("King Arthur") # Call methods: self is x

>>> y.setdata(3.14159) # Runs: FirstClass.setdata(y, 3.14159)

Neither x nor y has a setdata attribute of its own, so to find it, Python follows the link

from instance to class. And that’s about all there is to inheritance in Python: it happens

at attribute qualification time, and it just involves looking up names in linked objects

—here, by following the is-a links in Figure 27-1.

In the setdata function inside FirstClass, the value passed in is assigned to

self.data. Within a method, self—the name given to the leftmost argument by con-

vention—automatically refers to the instance being processed (x or y), so the assign-

ments store values in the instances’ namespaces, not the class’s; that’s how the data

names in Figure 27-1 are created.

Because classes can generate multiple instances, methods must go through the self

argument to get to the instance to be processed. When we call the class’s display

method to print self.data, we see that it’s different in each instance; on the other hand,

the name display itself is the same in x and y, as it comes (is inherited) from the class:

>>> x.display() # self.data differs in each instance

King Arthur

>>> y.display() # Runs: FirstClass.display(y)

3.14159

Notice that we stored different object types in the data member in each instance—a

string and a floating-point number. As with everything else in Python, there are no

declarations for instance attributes (sometimes called members); they spring into exis-

tence the first time they are assigned values, just like simple variables. In fact, if we were

Figure 27-1. Classes and instances are linked namespace objects in a class tree that is searched by

inheritance. Here, the “data” attribute is found in instances, but “setdata” and “display” are in the

class above them.

800 | Chapter 27: Class Coding Basics

www.it-ebooks.info

to call display on one of our instances before calling setdata, we would trigger an

undefined name error—the attribute named data doesn’t even exist in memory until it

is assigned within the setdata method.

As another way to appreciate how dynamic this model is, consider that we can change

instance attributes in the class itself, by assigning to self in methods, or outside the

class, by assigning to an explicit instance object:

>>> x.data = "New value" # Can get/set attributes

>>> x.display() # Outside the class too

New value

Although less common, we could even generate an entirely new attribute in the in-

stance’s namespace by assigning to its name outside the class’s method functions:

>>> x.anothername = "spam" # Can set new attributes here too!

This would attach a new attribute called anothername, which may or may not be used

by any of the class’s methods, to the instance object x. Classes usually create all of the

instance’s attributes by assignment to the self argument, but they don’t have to—

programs can fetch, change, or create attributes on any objects to which they have

references.

It usually doesn’t make sense to add data that the class cannot use, and it’s possible to

prevent this with extra “privacy” code based on attribute access operator overloading,

as we’ll discuss later in this book (see Chapter 30 and Chapter 39). Still, free attribute

access translates to less syntax, and there are cases where it’s even useful—for example,

in coding data records of the sort we’ll see later in this chapter.

Classes Are Customized by Inheritance

Let’s move on to the second major distinction of classes. Besides serving as factories

for generating multiple instance objects, classes also allow us to make changes by in-

troducing new components (called subclasses), instead of changing existing compo-

nents in place.

As we’ve seen, instance objects generated from a class inherit the class’s attributes.

Python also allows classes to inherit from other classes, opening the door to coding

hierarchies of classes that specialize behavior—by redefining attributes in subclasses

that appear lower in the hierarchy, we override the more general definitions of those

attributes higher in the tree. In effect, the further down the hierarchy we go, the more

specific the software becomes. Here, too, there is no parallel with modules, whose

attributes live in a single, flat namespace that is not as amenable to customization.

In Python, instances inherit from classes, and classes inherit from superclasses. Here

are the key ideas behind the machinery of attribute inheritance:

•Superclasses are listed in parentheses in a class header. To make a class inherit

attributes from another class, just list the other class in parentheses in the new

Classes Are Customized by Inheritance | 801

www.it-ebooks.info

class statement’s header line. The class that inherits is usually called a subclass,

and the class that is inherited from is its superclass.

•Classes inherit attributes from their superclasses. Just as instances inherit the

attribute names defined in their classes, classes inherit all of the attribute names

defined in their superclasses; Python finds them automatically when they’re ac-

cessed, if they don’t exist in the subclasses.

•Instances inherit attributes from all accessible classes. Each instance gets

names from the class it’s generated from, as well as all of that class’s superclasses.

When looking for a name, Python checks the instance, then its class, then all su-

perclasses.

•Each object.attribute reference invokes a new, independent search. Python

performs an independent search of the class tree for each attribute fetch expression.

This includes references to instances and classes made outside class statements

(e.g., X.attr), as well as references to attributes of the self instance argument in a

class’s method functions. Each self.attr expression in a method invokes a new

search for attr in self and above.

•Logic changes are made by subclassing, not by changing superclasses. By

redefining superclass names in subclasses lower in the hierarchy (class tree), sub-

classes replace and thus customize inherited behavior.

The net effect—and the main purpose of all this searching—is that classes support

factoring and customization of code better than any other language tool we’ve seen so

far. On the one hand, they allow us to minimize code redundancy (and so reduce

maintenance costs) by factoring operations into a single, shared implementation; on

the other, they allow us to program by customizing what already exists, rather than

changing it in place or starting from scratch.

Strictly speaking, Python’s inheritance is a bit richer than described here,

when we factor in new-style descriptors and metaclasses—advanced

topics we’ll study later—but we can safely restrict our scope to instances

and their classes, both at this point in the book and in most Python

application code. We’ll define inheritance formally in Chapter 40.

A Second Example

To illustrate the role of inheritance, this next example builds on the previous one. First,

we’ll define a new class, SecondClass, that inherits all of FirstClass’s names and pro-

vides one of its own:

>>> class SecondClass(FirstClass): # Inherits setdata

def display(self): # Changes display

print('Current value = "%s"' % self.data)

802 | Chapter 27: Class Coding Basics

www.it-ebooks.info

SecondClass defines the display method to print with a different format. By defining

an attribute with the same name as an attribute in FirstClass, SecondClass effectively

replaces the display attribute in its superclass.

Recall that inheritance searches proceed upward from instances to subclasses to su-

perclasses, stopping at the first appearance of the attribute name that it finds. In this

case, since the display name in SecondClass will be found before the one in First

Class, we say that SecondClass overrides FirstClass’s display. Sometimes we call this

act of replacing attributes by redefining them lower in the tree overloading.

The net effect here is that SecondClass specializes FirstClass by changing the behavior

of the display method. On the other hand, SecondClass (and any instances created from

it) still inherits the setdata method in FirstClass verbatim. Let’s make an instance to

demonstrate:

>>> z = SecondClass()

>>> z.setdata(42) # Finds setdata in FirstClass

>>> z.display() # Finds overridden method in SecondClass

Current value = "42"

As before, we make a SecondClass instance object by calling it. The setdata call still

runs the version in FirstClass, but this time the display attribute comes from Second

Class and prints a custom message. Figure 27-2 sketches the namespaces involved.

Now, here’s a crucial thing to notice about OOP: the specialization introduced in

SecondClass is completely external to FirstClass. That is, it doesn’t affect existing or

future FirstClass objects, like the x from the prior example:

>>> x.display() # x is still a FirstClass instance (old message)

New value

Rather than changing FirstClass, we customized it. Naturally, this is an artificial ex-

ample, but as a rule, because inheritance allows us to make changes like this in external

components (i.e., in subclasses), classes often support extension and reuse better than

functions or modules can.

Figure 27-2. Specialization: overriding inherited names by redefining them in extensions lower in the

class tree. Here, SecondClass redefines and so customizes the “display” method for its instances.

Classes Are Customized by Inheritance | 803

www.it-ebooks.info

Classes Are Attributes in Modules

Before we move on, remember that there’s nothing magic about a class name. It’s just

a variable assigned to an object when the class statement runs, and the object can be

referenced with any normal expression. For instance, if our FirstClass were coded in

a module file instead of being typed interactively, we could import it and use its name

normally in a class header line:

from modulename import FirstClass # Copy name into my scope

class SecondClass(FirstClass): # Use class name directly

def display(self): ...

Or, equivalently:

import modulename # Access the whole module

class SecondClass(modulename.FirstClass): # Qualify to reference

def display(self): ...

Like everything else, class names always live within a module, so they must follow all

the rules we studied in Part V. For example, more than one class can be coded in a

single module file—like other statements in a module, class statements are run during

imports to define names, and these names become distinct module attributes. More

generally, each module may arbitrarily mix any number of variables, functions, and

classes, and all names in a module behave the same way. The file food.py demonstrates:

# food.py

var = 1 # food.var

def func(): ... # food.func

class spam: ... # food.spam

class ham: ... # food.ham

class eggs: ... # food.eggs

This holds true even if the module and class happen to have the same name. For ex-

ample, given the following file, person.py:

class person: ...

we need to go through the module to fetch the class as usual:

import person # Import module

x = person.person() # Class within module

Although this path may look redundant, it’s required: person.person refers to the per

son class inside the person module. Saying just person gets the module, not the class,

unless the from statement is used:

from person import person # Get class from module

x = person() # Use class name

As with any other variable, we can never see a class in a file without first importing and

somehow fetching it from its enclosing file. If this seems confusing, don’t use the same

name for a module and a class within it. In fact, common convention in Python dictates

that class names should begin with an uppercase letter, to help make them more distinct:

804 | Chapter 27: Class Coding Basics

www.it-ebooks.info

import person # Lowercase for modules

x = person.Person() # Uppercase for classes

Also, keep in mind that although classes and modules are both namespaces for attach-

ing attributes, they correspond to very different source code structures: a module re-

flects an entire file, but a class is a statement within a file. We’ll say more about such

distinctions later in this part of the book.

Classes Can Intercept Python Operators

Let’s move on to the third and final major difference between classes and modules:

operator overloading. In simple terms, operator overloading lets objects coded with

classes intercept and respond to operations that work on built-in types: addition, slic-

ing, printing, qualification, and so on. It’s mostly just an automatic dispatch mechanism

—expressions and other built-in operations route control to implementations in

classes. Here, too, there is nothing similar in modules: modules can implement function

calls, but not the behavior of expressions.

Although we could implement all class behavior as method functions, operator over-

loading lets objects be more tightly integrated with Python’s object model. Moreover,

because operator overloading makes our own objects act like built-ins, it tends to foster

object interfaces that are more consistent and easier to learn, and it allows class-based

objects to be processed by code written to expect a built-in type’s interface. Here is a

quick rundown of the main ideas behind overloading operators:

•Methods named with double underscores (__X__) are special hooks. In Python

classes we implement operator overloading by providing specially named methods

to intercept operations. The Python language defines a fixed and unchangeable

mapping from each of these operations to a specially named method.

•Such methods are called automatically when instances appear in built-in

operations. For instance, if an instance object inherits an __add__ method, that

method is called whenever the object appears in a + expression. The method’s

return value becomes the result of the corresponding expression.

•Classes may override most built-in type operations. There are dozens of special

operator overloading method names for intercepting and implementing nearly

every operation available for built-in types. This includes expressions, but also

basic operations like printing and object creation.

•There are no defaults for operator overloading methods, and none are re-

quired. If a class does not define or inherit an operator overloading method, it just

means that the corresponding operation is not supported for the class’s instances.

If there is no __add__, for example, + expressions raise exceptions.

•New-style classes have some defaults, but not for common operations. In

Python 3.X, and so-called “new style” classes in 2.X that we’ll define later, a root

Classes Can Intercept Python Operators | 805

www.it-ebooks.info

class named object does provide defaults for some __X__ methods, but not for

many, and not for most commonly used operations.

•Operators allow classes to integrate with Python’s object model. By over-

loading type operations, the user-defined objects we implement with classes can

act just like built-ins, and so provide consistency as well as compatibility with

expected interfaces.

Operator overloading is an optional feature; it’s used primarily by people developing

tools for other Python programmers, not by application developers. And, candidly, you

probably shouldn’t use it just because it seems clever or “cool.” Unless a class needs to

mimic built-in type interfaces, it should usually stick to simpler named methods. Why

would an employee database application support expressions like * and +, for example?

Named methods like giveRaise and promote would usually make more sense.

Because of this, we won’t go into details on every operator overloading method available

in Python in this book. Still, there is one operator overloading method you are likely

to see in almost every realistic Python class: the __init__ method, which is known as

the constructor method and is used to initialize objects’ state. You should pay special

attention to this method, because __init__, along with the self argument, turns out

to be a key requirement to reading and understanding most OOP code in Python.

A Third Example

On to another example. This time, we’ll define a subclass of the prior section’s Second

Class that implements three specially named attributes that Python will call automat-

ically:

•__init__ is run when a new instance object is created: self is the new ThirdClass

object.1

•__add__ is run when a ThirdClass instance appears in a + expression.

•__str__ is run when an object is printed (technically, when it’s converted to its

print string by the str built-in function or its Python internals equivalent).

Our new subclass also defines a normally named method called mul, which changes the

instance object in place. Here’s the new subclass:

>>> class ThirdClass(SecondClass): # Inherit from SecondClass

def __init__(self, value): # On "ThirdClass(value)"

self.data = value

def __add__(self, other): # On "self + other"

return ThirdClass(self.data + other)

def __str__(self): # On "print(self)", "str()"

return '[ThirdClass: %s]' % self.data

1. Not to be confused with the __init__.py files in module packages! The method here is a class constructor

function used to initialize the newly created instance, not a module package. See Chapter 24 for more

details.

806 | Chapter 27: Class Coding Basics

www.it-ebooks.info

def mul(self, other): # In-place change: named

self.data *= other

>>> a = ThirdClass('abc') # __init__ called

>>> a.display() # Inherited method called

Current value = "abc"

>>> print(a) # __str__: returns display string

[ThirdClass: abc]

>>> b = a + 'xyz' # __add__: makes a new instance

>>> b.display() # b has all ThirdClass methods

Current value = "abcxyz"

>>> print(b) # __str__: returns display string

[ThirdClass: abcxyz]

>>> a.mul(3) # mul: changes instance in place

>>> print(a)

[ThirdClass: abcabcabc]

ThirdClass “is a” SecondClass, so its instances inherit the customized display method

from SecondClass of the preceding section. This time, though, ThirdClass creation calls

pass an argument (e.g., “abc”). This argument is passed to the value argument in the

__init__ constructor and assigned to self.data there. The net effect is that Third

Class arranges to set the data attribute automatically at construction time, instead of

requiring setdata calls after the fact.

Further, ThirdClass objects can now show up in + expressions and print calls. For +,

Python passes the instance object on the left to the self argument in __add__ and the

value on the right to other, as illustrated in Figure 27-3; whatever __add__ returns be-

comes the result of the + expression (more on its result in a moment).

For print, Python passes the object being printed to self in __str__; whatever string

this method returns is taken to be the print string for the object. With __str__ (or its

more broadly relevant twin __repr__, which we’ll meet and use in the next chapter),

we can use a normal print to display objects of this class, instead of calling the special

display method.

Figure 27-3. In operator overloading, expression operators and other built-in operations performed

on class instances are mapped back to specially named methods in the class. These special methods

are optional and may be inherited as usual. Here, a + expression triggers the __add__ method.

Classes Can Intercept Python Operators | 807

www.it-ebooks.info

Specially named methods such as __init__, __add__, and __str__ are inherited by sub-

classes and instances, just like any other names assigned in a class. If they’re not coded

in a class, Python looks for such names in all its superclasses, as usual. Operator over-

loading method names are also not built-in or reserved words; they are just attributes

that Python looks for when objects appear in various contexts. Python usually calls

them automatically, but they may occasionally be called by your code as well. For

example, the __init__ method is often called manually to trigger initialization steps in

a superclass, as we’ll see in the next chapter.

Returning results, or not

Some operator overloading methods like __str__ require results, but others are more

flexible. For example, notice how the __add__ method makes and returns a new instance

object of its class, by calling ThirdClass with the result value—which in turn triggers

__init__ to initialize the result. This is a common convention, and explains why b in

the listing has a display method; it’s a ThirdClass object too, because that’s what +

returns for this class’s objects. This essentially propagates the type.

By contrast, mul changes the current instance object in place, by reassigning the self

attribute. We could overload the * expression to do the latter, but this would be too

different from the behavior of * for built-in types such as numbers and strings, for which

it always makes new objects. Common practice dictates that overloaded operators

should work the same way that built-in operator implementations do. Because operator

overloading is really just an expression-to-method dispatch mechanism, though, you

can interpret operators any way you like in your own class objects.

Why Use Operator Overloading?

As a class designer, you can choose to use operator overloading or not. Your choice

simply depends on how much you want your object to look and feel like built-in types.

As mentioned earlier, if you omit an operator overloading method and do not inherit

it from a superclass, the corresponding operation will not be supported for your in-

stances; if it’s attempted, an exception will be raised (or, in some cases like printing, a

standard default will be used).

Frankly, many operator overloading methods tend to be used only when you are im-

plementing objects that are mathematical in nature; a vector or matrix class may over-

load the addition operator, for example, but an employee class likely would not. For

simpler classes, you might not use overloading at all, and would rely instead on explicit

method calls to implement your objects’ behavior.

On the other hand, you might decide to use operator overloading if you need to pass

a user-defined object to a function that was coded to expect the operators available on

a built-in type like a list or a dictionary. Implementing the same operator set in your

class will ensure that your objects support the same expected object interface and so

are compatible with the function. Although we won’t cover every operator overloading

808 | Chapter 27: Class Coding Basics

www.it-ebooks.info

method in this book, we’ll survey additional common operator overloading techniques

in action in Chapter 30.

One overloading method we will use often here is the __init__ constructor method,

used to initialize newly created instance objects, and present in almost every realistic

class. Because it allows classes to fill out the attributes in their new instances immedi-

ately, the constructor is useful for almost every kind of class you might code. In fact,

even though instance attributes are not declared in Python, you can usually find out

which attributes an instance will have by inspecting its class’s __init__ method.

Of course, there’s nothing wrong with experimenting with interesting language tools,

but they don’t always translate to production code. With time and experience, you’ll

find these programming patterns and guidelines to be natural and nearly automatic.

The World’s Simplest Python Class

We’ve begun studying class statement syntax in detail in this chapter, but I’d again

like to remind you that the basic inheritance model that classes produce is very simple

—all it really involves is searching for attributes in trees of linked objects. In fact, we

can create a class with nothing in it at all. The following statement makes a class with

no attributes attached, an empty namespace object:

>>> class rec: pass # Empty namespace object

We need the no-operation pass placeholder statement (discussed in Chapter 13) here

because we don’t have any methods to code. After we make the class by running this

statement interactively, we can start attaching attributes to the class by assigning names

to it completely outside of the original class statement:

>>> rec.name = 'Bob' # Just objects with attributes

>>> rec.age = 40

And, after we’ve created these attributes by assignment, we can fetch them with the

usual syntax. When used this way, a class is roughly similar to a “struct” in C, or a

“record” in Pascal. It’s basically an object with field names attached to it (as we’ll see

ahead, doing similar with dictionary keys requires extra characters):

>>> print(rec.name) # Like a C struct or a record

Bob

Notice that this works even though there are no instances of the class yet; classes are

objects in their own right, even without instances. In fact, they are just self-contained

namespaces; as long as we have a reference to a class, we can set or change its attributes

anytime we wish. Watch what happens when we do create two instances, though:

>>> x = rec() # Instances inherit class names

>>> y = rec()

The World’s Simplest Python Class | 809

www.it-ebooks.info

These instances begin their lives as completely empty namespace objects. Because they

remember the class from which they were made, though, they will obtain the attributes

we attached to the class by inheritance:

>>> x.name, y.name # name is stored on the class only

('Bob', 'Bob')

Really, these instances have no attributes of their own; they simply fetch the name at-

tribute from the class object where it is stored. If we do assign an attribute to an instance,

though, it creates (or changes) the attribute in that object, and no other—crucially,

attribute references kick off inheritance searches, but attribute assignments affect only

the objects in which the assignments are made. Here, this means that x gets its own

name, but y still inherits the name attached to the class above it:

>>> x.name = 'Sue' # But assignment changes x only

>>> rec.name, x.name, y.name

('Bob', 'Sue', 'Bob')

In fact, as we’ll explore in more detail in Chapter 29, the attributes of a namespace

object are usually implemented as dictionaries, and class inheritance trees are (generally

speaking) just dictionaries with links to other dictionaries. If you know where to look,

you can see this explicitly.

For example, the __dict__ attribute is the namespace dictionary for most class-based

objects. Some classes may also (or instead) define attributes in __slots__, an advanced

and seldom-used feature that we’ll note in Chapter 28, but largely postpone until

Chapter 31 and Chapter 32. Normally, __dict__ literally is an instance’s attribute

namespace.

To illustrate, the following was run in Python 3.3; the order of names and set of

__X__ internal names present can vary from release to release, and we filter out built-

ins with a generator expression as we’ve done before, but the names we assigned are

present in all:

>>> list(rec.__dict__.keys())

['age', '__module__', '__qualname__', '__weakref__', 'name', '__dict__', '__doc__']

>>> list(name for name in rec.__dict__ if not name.startswith('__'))

['age', 'name']

>>> list(x.__dict__.keys())

['name']

>>> list(y.__dict__.keys()) # list() not required in Python 2.X

[]

Here, the class’s namespace dictionary shows the name and age attributes we assigned

to it, x has its own name, and y is still empty. Because of this model, an attribute can

often be fetched by either dictionary indexing or attribute notation, but only if it’s

present on the object in question—attribute notation kicks off inheritance search, but

indexing looks in the single object only (as we’ll see later, both have valid roles):

>>> x.name, x.__dict__['name'] # Attributes present here are dict keys

('Sue', 'Sue')

810 | Chapter 27: Class Coding Basics

www.it-ebooks.info

>>> x.age # But attribute fetch checks classes too

>>> x.__dict__['age'] # Indexing dict does not do inheritance

KeyError: 'age'

To facilitate inheritance search on attribute fetches, each instance has a link to its class

that Python creates for us—it’s called __class__, if you want to inspect it:

>>> x.__class__ # Instance to class link

Classes also have a __bases__ attribute, which is a tuple of references to their superclass

objects—in this example just the implied object root class in Python 3.X we’ll explore

later (you’ll get an empty tuple in 2.X instead):

>>> rec.__bases__ # Class to superclasses link, () in 2.X

(<class 'object'>,)

These two attributes are how class trees are literally represented in memory by Python.

Internal details like these are not required knowledge—class trees are implied by the

code you run, and their search is normally automatic—but they can often help demys-

tify the model.

The main point to take away from this look under the hood is that Python’s class model

is extremely dynamic. Classes and instances are just namespace objects, with attributes

created on the fly by assignment. Those assignments usually happen within the class

statements you code, but they can occur anywhere you have a reference to one of the

objects in the tree.

Even methods, normally created by a def nested in a class, can be created completely

independently of any class object. The following, for example, defines a simple function

outside of any class that takes one argument:

>>> def uppername(obj):

return obj.name.upper() # Still needs a self argument (obj)

There is nothing about a class here yet—it’s a simple function, and it can be called as

such at this point, provided we pass in an object obj with a name attribute, whose value

in turn has an upper method—our class instances happen to fit the expected interface,

and kick off string uppercase conversion:

>>> uppername(x) # Call as a simple function

'SUE'

If we assign this simple function to an attribute of our class, though, it becomes a

method, callable through any instance, as well as through the class name itself as long

as we pass in an instance manually—a technique we’ll leverage further in the next

chapter:2

>>> rec.method = uppername # Now it's a class's method!

>>> x.method() # Run method to process x

'SUE'

The World’s Simplest Python Class | 811

www.it-ebooks.info

>>> y.method() # Same, but pass y to self

'BOB'

>>> rec.method(x) # Can call through instance or class

'SUE'

Normally, classes are filled out by class statements, and instance attributes are created

by assignments to self attributes in method functions. The point again, though, is that

they don’t have to be; OOP in Python really is mostly about looking up attributes in

linked namespace objects.

Records Revisited: Classes Versus Dictionaries

Although the simple classes of the prior section are meant to illustrate class model

basics, the techniques they employ can also be used for real work. For example, Chap-

ter 8 and Chapter 9 showed how to use dictionaries, tuples, and lists to record properties

of entities in our programs, generically called records. It turns out that classes can often

serve better in this role—they package information like dictionaries, but can also bun-

dle processing logic in the form of methods. For reference, here is an example for tuple-

and dictionary-based records we used earlier in the book (using one of many dictionary

coding techniques):

>>> rec = ('Bob', 40.5, ['dev', 'mgr']) # Tuple-based record

>>> print(rec[0])

Bob

>>> rec = {}

>>> rec['name'] = 'Bob' # Dictionary-based record

>>> rec['age'] = 40.5 # Or {...}, dict(n=v), etc.

>>> rec['jobs'] = ['dev', 'mgr']

>>>

>>> print(rec['name'])

Bob

This code emulates tools like records in other languages. As we just saw, though, there

are also multiple ways to do the same with classes. Perhaps the simplest is this—trading

keys for attributes:

>>> class rec: pass

>>> rec.name = 'Bob' # Class-based record

>>> rec.age = 40.5

>>> rec.jobs = ['dev', 'mgr']

2. In fact, this is one of the reasons the self argument must always be explicit in Python methods—because

methods can be created as simple functions independent of a class, they need to make the implied instance

argument explicit. They can be called as either functions or methods, and Python can neither guess nor

assume that a simple function might eventually become a class’s method. The main reason for the explicit

self argument, though, is to make the meanings of names more obvious: names not referenced through

self are simple variables mapped to scopes, while names referenced through self with attribute notation

are obviously instance attributes.

812 | Chapter 27: Class Coding Basics

www.it-ebooks.info

>>>

>>> print(rec.name)

Bob

This code has substantially less syntax than the dictionary equivalent. It uses an empty

class statement to generate an empty namespace object. Once we make the empty

class, we fill it out by assigning class attributes over time, as before.

This works, but a new class statement will be required for each distinct record we will

need. Perhaps more typically, we can instead generate instances of an empty class to

represent each distinct entity:

>>> class rec: pass

>>> pers1 = rec() # Instance-based records

>>> pers1.name = 'Bob'

>>> pers1.jobs = ['dev', 'mgr']

>>> pers1.age = 40.5

>>>

>>> pers2 = rec()

>>> pers2.name = 'Sue'

>>> pers2.jobs = ['dev', 'cto']

>>>

>>> pers1.name, pers2.name

('Bob', 'Sue')

Here, we make two records from the same class. Instances start out life empty, just like

classes. We then fill in the records by assigning to attributes. This time, though, there

are two separate objects, and hence two separate name attributes. In fact, instances of

the same class don’t even have to have the same set of attribute names; in this example,

one has a unique age name. Instances really are distinct namespaces, so each has a

distinct attribute dictionary. Although they are normally filled out consistently by a

class’s methods, they are more flexible than you might expect.

Finally, we might instead code a more full-blown class to implement the record and its

processing—something that data-oriented dictionaries do not directly support:

>>> class Person:

def __init__(self, name, jobs, age=None): # class = data + logic

self.name = name

self.jobs = jobs

self.age = age

def info(self):

return (self.name, self.jobs)

>>> rec1 = Person('Bob', ['dev', 'mgr'], 40.5) # Construction calls

>>> rec2 = Person('Sue', ['dev', 'cto'])

>>>

>>> rec1.jobs, rec2.info() # Attributes + methods

(['dev', 'mgr'], ('Sue', ['dev', 'cto']))

This scheme also makes multiple instances, but the class is not empty this time: we’ve

added logic (methods) to initialize instances at construction time and collect attributes

The World’s Simplest Python Class | 813

www.it-ebooks.info

into a tuple on request. The constructor imposes some consistency on instances here

by always setting the name, job, and age attributes, even though the latter can be omitted

when an object is made. Together, the class’s methods and instance attributes create a

package, which combines both data and logic.

We could further extend this code by adding logic to compute salaries, parse names,

and so on. Ultimately, we might link the class into a larger hierarchy to inherit and

customize an existing set of methods via the automatic attribute search of classes, or

perhaps even store instances of the class in a file with Python object pickling to make

them persistent. In fact, we will—in the next chapter, we’ll expand on this analogy

between classes and records with a more realistic running example that demonstrates

class basics in action.

To be fair to other tools, in this form, the two class construction calls above more closely

resemble dictionaries made all at once, but still seem less cluttered and provide extra

processing methods. In fact, the class’s construction calls more closely resemble Chap-

ter 9’s named tuples—which makes sense, given that named tuples really are classes

with extra logic to map attributes to tuple offsets:

>>> rec = dict(name='Bob', age=40.5, jobs=['dev', 'mgr']) # Dictionaries

>>> rec = {'name': 'Bob', 'age': 40.5, 'jobs': ['dev', 'mgr']}

>>> rec = Rec('Bob', 40.5, ['dev', 'mgr']) # Named tuples

In the end, although types like dictionaries and tuples are flexible, classes allow us to

add behavior to objects in ways that built-in types and simple functions do not directly

support. Although we can store functions in dictionaries, too, using them to process

implied instances is nowhere near as natural and structured as it is in classes. To see

this more clearly, let’s move ahead to the next chapter.

Chapter Summary

This chapter introduced the basics of coding classes in Python. We studied the syntax

of the class statement, and we saw how to use it to build up a class inheritance tree.

We also studied how Python automatically fills in the first argument in method func-

tions, how attributes are attached to objects in a class tree by simple assignment, and

how specially named operator overloading methods intercept and implement built-in

operations for our instances (e.g., expressions and printing).

Now that we’ve learned all about the mechanics of coding classes in Python, the next

chapter turns to a larger and more realistic example that ties together much of what

we’ve learned about OOP so far, and introduces some new topics. After that, we’ll

continue our look at class coding, taking a second pass over the model to fill in some

of the details that were omitted here to keep things simple. First, though, let’s work

through a quiz to review the basics we’ve covered so far.

814 | Chapter 27: Class Coding Basics

www.it-ebooks.info

Test Your Knowledge: Quiz

1. How are classes related to modules?

2. How are instances and classes created?

3. Where and how are class attributes created?

4. Where and how are instance attributes created?

5. What does self mean in a Python class?

6. How is operator overloading coded in a Python class?

7. When might you want to support operator overloading in your classes?

8. Which operator overloading method is most commonly used?

9. What are two key concepts required to understand Python OOP code?

Test Your Knowledge: Answers

1. Classes are always nested inside a module; they are attributes of a module object.

Classes and modules are both namespaces, but classes correspond to statements

(not entire files) and support the OOP notions of multiple instances, inheritance,

and operator overloading (modules do not). In a sense, a module is like a single-

instance class, without inheritance, which corresponds to an entire file of code.

2. Classes are made by running class statements; instances are created by calling a

class as though it were a function.

3. Class attributes are created by assigning attributes to a class object. They are nor-

mally generated by top-level assignments nested in a class statement—each name

assigned in the class statement block becomes an attribute of the class object

(technically, the class statement’s local scope morphs into the class object’s at-

tribute namespace, much like a module). Class attributes can also be created,

though, by assigning attributes to the class anywhere a reference to the class object

exists—even outside the class statement.

4. Instance attributes are created by assigning attributes to an instance object. They

are normally created within a class’s method functions coded inside the class

statement, by assigning attributes to the self argument (which is always the im-

plied instance). Again, though, they may be created by assignment anywhere a

reference to the instance appears, even outside the class statement. Normally, all

instance attributes are initialized in the __init__ constructor method; that way,

later method calls can assume the attributes already exist.

5. self is the name commonly given to the first (leftmost) argument in a class’s

method function; Python automatically fills it in with the instance object that is

the implied subject of the method call. This argument need not be called self

(though this is a very strong convention); its position is what is significant. (Ex-

C++ or Java programmers might prefer to call it this because in those languages

Test Your Knowledge: Answers | 815

www.it-ebooks.info

that name reflects the same idea; in Python, though, this argument must always be

explicit.)

6. Operator overloading is coded in a Python class with specially named methods;

they all begin and end with double underscores to make them unique. These are

not built-in or reserved names; Python just runs them automatically when an in-

stance appears in the corresponding operation. Python itself defines the mappings

from operations to special method names.

7. Operator overloading is useful to implement objects that resemble built-in types

(e.g., sequences or numeric objects such as matrixes), and to mimic the built-in

type interface expected by a piece of code. Mimicking built-in type interfaces en-

ables you to pass in class instances that also have state information (i.e., attributes

that remember data between operation calls). You shouldn’t use operator over-

loading when a simple named method will suffice, though.

8. The __init__ constructor method is the most commonly used; almost every class

uses this method to set initial values for instance attributes and perform other

startup tasks.

9. The special self argument in method functions and the __init__ constructor

method are the two cornerstones of OOP code in Python; if you get these, you

should be able to read the text of most OOP Python code—apart from these, it’s

largely just packages of functions. The inheritance search matters too, of course,

but self represents the automatic object argument, and __init__ is widespread.

816 | Chapter 27: Class Coding Basics

www.it-ebooks.info

CHAPTER 28

A More Realistic Example

We’ll dig into more class syntax details in the next chapter. Before we do, though, I’d

like to show you a more realistic example of classes in action that’s more practical than

what we’ve seen so far. In this chapter, we’re going to build a set of classes that do

something more concrete—recording and processing information about people. As

you’ll see, what we call instances and classes in Python programming can often serve

the same roles as records and programs in more traditional terms.

Specifically, in this chapter we’re going to code two classes:

•Person—a class that creates and processes information about people

•Manager—a customization of Person that modifies inherited behavior

Along the way, we’ll make instances of both classes and test out their functionality.

When we’re done, I’ll show you a nice example use case for classes—we’ll store our

instances in a shelve object-oriented database, to make them permanent. That way, you

can use this code as a template for fleshing out a full-blown personal database written

entirely in Python.

Besides actual utility, though, our aim here is also educational: this chapter provides a

tutorial on object-oriented programming in Python. Often, people grasp the last chap-

ter’s class syntax on paper, but have trouble seeing how to get started when confronted

with having to code a new class from scratch. Toward this end, we’ll take it one step

at a time here, to help you learn the basics; we’ll build up the classes gradually, so you

can see how their features come together in complete programs.

In the end, our classes will still be relatively small in terms of code, but they will

demonstrate all of the main ideas in Python’s OOP model. Despite its syntax details,

Python’s class system really is largely just a matter of searching for an attribute in a tree

of objects, along with a special first argument for functions.

817

www.it-ebooks.info

Step 1: Making Instances

OK, so much for the design phase—let’s move on to implementation. Our first task is

to start coding the main class, Person. In your favorite text editor, open a new file for

the code we’ll be writing. It’s a fairly strong convention in Python to begin module

names with a lowercase letter and class names with an uppercase letter; like the name

of self arguments in methods, this is not required by the language, but it’s so common

that deviating might be confusing to people who later read your code. To conform,

we’ll call our new module file person.py and our class within it Person, like this:

# File person.py (start)

class Person: # Start a class

All our work will be done in this file until later in this chapter. We can code any number

of functions and classes in a single module file in Python, and this one’s person.py name

might not make much sense if we add unrelated components to it later. For now, we’ll

assume everything in it will be Person-related. It probably should be anyhow—as we’ve

learned, modules tend to work best when they have a single, cohesive purpose.

Coding Constructors

Now, the first thing we want to do with our Person class is record basic information

about people—to fill out record fields, if you will. Of course, these are known as in-

stance object attributes in Python-speak, and they generally are created by assignment

to self attributes in a class’s method functions. The normal way to give instance at-

tributes their first values is to assign them to self in the __init__ constructor method,

which contains code run automatically by Python each time an instance is created. Let’s

add one to our class:

# Add record field initialization

class Person:

def __init__(self, name, job, pay): # Constructor takes three arguments

self.name = name # Fill out fields when created

self.job = job # self is the new instance object

self.pay = pay

This is a very common coding pattern: we pass in the data to be attached to an instance

as arguments to the constructor method and assign them to self to retain them per-

manently. In OO terms, self is the newly created instance object, and name, job, and

pay become state information—descriptive data saved on an object for later use. Al-

though other techniques (such as enclosing scope reference closures) can save details,

too, instance attributes make this very explicit and easy to understand.

Notice that the argument names appear twice here. This code might even seem a bit

redundant at first, but it’s not. The job argument, for example, is a local variable in the

scope of the __init__ function, but self.job is an attribute of the instance that’s the

818 | Chapter 28: A More Realistic Example

www.it-ebooks.info

implied subject of the method call. They are two different variables, which happen to

have the same name. By assigning the job local to the self.job attribute with

self.job=job, we save the passed-in job on the instance for later use. As usual in Python,

where a name is assigned, or what object it is assigned to, determines what it means.

Speaking of arguments, there’s really nothing magical about __init__, apart from the

fact that it’s called automatically when an instance is made and has a special first ar-

gument. Despite its weird name, it’s a normal function and supports all the features of

functions we’ve already covered. We can, for example, provide defaults for some of its

arguments, so they need not be provided in cases where their values aren’t available or

useful.

To demonstrate, let’s make the job argument optional—it will default to None, meaning

the person being created is not (currently) employed. If job defaults to None, we’ll

probably want to default pay to 0, too, for consistency (unless some of the people you

know manage to get paid without having jobs!). In fact, we have to specify a default

for pay because according to Python’s syntax rules and Chapter 18, any arguments in

a function’s header after the first default must all have defaults, too:

# Add defaults for constructor arguments

class Person:

def __init__(self, name, job=None, pay=0): # Normal function args

self.name = name

self.job = job

self.pay = pay

What this code means is that we’ll need to pass in a name when making Persons, but

job and pay are now optional; they’ll default to None and 0 if omitted. The self argu-

ment, as usual, is filled in by Python automatically to refer to the instance object—

assigning values to attributes of self attaches them to the new instance.

Testing As You Go

This class doesn’t do much yet—it essentially just fills out the fields of a new record—

but it’s a real working class. At this point we could add more code to it for more features,

but we won’t do that yet. As you’ve probably begun to appreciate already, programming

in Python is really a matter of incremental prototyping—you write some code, test it,

write more code, test again, and so on. Because Python provides both an interactive

session and nearly immediate turnaround after code changes, it’s more natural to test

as you go than to write a huge amount of code to test all at once.

Before adding more features, then, let’s test what we’ve got so far by making a few

instances of our class and displaying their attributes as created by the constructor. We

could do this interactively, but as you’ve also probably surmised by now, interactive

testing has its limits—it gets tedious to have to reimport modules and retype test cases

each time you start a new testing session. More commonly, Python programmers use

Step 1: Making Instances | 819

www.it-ebooks.info

the interactive prompt for simple one-off tests but do more substantial testing by writing

code at the bottom of the file that contains the objects to be tested, like this:

# Add incremental self-test code

class Person:

def __init__(self, name, job=None, pay=0):

self.name = name

self.job = job

self.pay = pay

bob = Person('Bob Smith') # Test the class

sue = Person('Sue Jones', job='dev', pay=100000) # Runs __init__ automatically

print(bob.name, bob.pay) # Fetch attached attributes

print(sue.name, sue.pay) # sue's and bob's attrs differ

Notice here that the bob object accepts the defaults for job and pay, but sue provides

values explicitly. Also note how we use keyword arguments when making sue; we could

pass by position instead, but the keywords may help remind us later what the data is,

and they allow us to pass the arguments in any left-to-right order we like. Again, despite

its unusual name, __init__ is a normal function, supporting everything you already

know about functions—including both defaults and pass-by-name keyword argu-

ments.

When this file runs as a script, the test code at the bottom makes two instances of our

class and prints two attributes of each (name and pay):

C:\code> person.py

Bob Smith 0

Sue Jones 100000

You can also type this file’s test code at Python’s interactive prompt (assuming you

import the Person class there first), but coding canned tests inside the module file like

this makes it much easier to rerun them in the future.

Although this is fairly simple code, it’s already demonstrating something important.

Notice that bob’s name is not sue’s, and sue’s pay is not bob’s. Each is an independent

record of information. Technically, bob and sue are both namespace objects—like all

class instances, they each have their own independent copy of the state information

created by the class. Because each instance of a class has its own set of self attributes,

classes are a natural for recording information for multiple objects this way; just like

built-in types such as lists and dictionaries, classes serve as a sort of object factory.

Other Python program structures, such as functions and modules, have no such con-

cept. Chapter 17’s closure functions come close in terms of per-call state, but don’t

have the multiple methods, inheritance, and larger structure we get from classes.

Using Code Two Ways

As is, the test code at the bottom of the file works, but there’s a big catch—its top-level

print statements run both when the file is run as a script and when it is imported as a

820 | Chapter 28: A More Realistic Example

www.it-ebooks.info

module. This means if we ever decide to import the class in this file in order to use it

somewhere else (and we will soon in this chapter), we’ll see the output of its test code

every time the file is imported. That’s not very good software citizenship, though: client

programs probably don’t care about our internal tests and won’t want to see our output

mixed in with their own.

Although we could split the test code off into a separate file, it’s often more convenient

to code tests in the same file as the items to be tested. It would be better to arrange to

run the test statements at the bottom only when the file is run for testing, not when the

file is imported. That’s exactly what the module __name__ check is designed for, as you

learned in the preceding part of this book. Here’s what this addition looks like—add

the require test and indent your self-test code:

# Allow this file to be imported as well as run/tested

class Person:

def __init__(self, name, job=None, pay=0):

self.name = name

self.job = job

self.pay = pay

if __name__ == '__main__': # When run for testing only

# self-test code

bob = Person('Bob Smith')

sue = Person('Sue Jones', job='dev', pay=100000)

print(bob.name, bob.pay)

print(sue.name, sue.pay)

Now, we get exactly the behavior we’re after—running the file as a top-level script tests

it because its __name__ is __main__, but importing it as a library of classes later does not:

C:\code> person.py

Bob Smith 0

Sue Jones 100000

C:\code> python

Python 3.3.0 (v3.3.0:bd8afb90ebf2, Sep 29 2012, 10:57:17) ...

>>> import person

>>>

When imported, the file now defines the class, but does not use it. When run directly,

this file creates two instances of our class as before, and prints two attributes of each;

again, because each instance is an independent namespace object, the values of their

attributes differ.

Version Portability: Prints

All of this chapter’s code works on both Python 2.X and 3.X, but I’m running it under

Python 3.X, and a few of its outputs use 3.X print function calls with multiple argu-

ments. As explained in Chapter 11, this means that some of its outputs may vary slightly

under Python 2.X. If you run under 2.X the code will work as is, but you’ll notice

Step 1: Making Instances | 821

www.it-ebooks.info

parentheses around some output lines because the extra parentheses in a print turn

multiple items into a tuple in 2.X only:

C:\code> c:\python27\python person.py

('Bob Smith', 0)

('Sue Jones', 100000)

If this difference is the sort of detail that might keep you awake at nights, simply remove

the parentheses to use 2.X print statements, or add an import of Python 3.X’s print

function at the top of your script, as shown in Chapter 11 (I’d add this everywhere here,

but it’s a bit distracting):

from __future__ import print_function

You can also avoid the extra parentheses portably by using formatting to yield a single

object to print. Either of the following works in both 2.X and 3.X, though the method

form is newer:

print('{0} {1}'.format(bob.name, bob.pay)) # Format method

print('%s %s' % (bob.name, bob.pay)) # Format expression

As also described in Chapter 11, such formatting may be required in some cases, be-

cause objects nested in a tuple may print differently than those printed as top-level

objects—the former prints with __repr__ and the latter with __str__ (operator over-

loading methods discussed further in this chapter as well as Chapter 30).

To sidestep this issue, this edition codes displays with __repr__ (the fallback in all cases,

including nesting and the interactive prompt) instead of __str__ (the default for prints)

so that all object appearances print the same in 3.X and 2.X, even those in superfluous

tuple parentheses!

Step 2: Adding Behavior Methods

Everything looks good so far—at this point, our class is essentially a record factory; it

creates and fills out fields of records (attributes of instances, in more Pythonic terms).

Even as limited as it is, though, we can still run some operations on its objects. Although

classes add an extra layer of structure, they ultimately do most of their work by em-

bedding and processing basic core data types like lists and strings. In other words, if

you already know how to use Python’s simple core types, you already know much of

the Python class story; classes are really just a minor structural extension.

For example, the name field of our objects is a simple string, so we can extract last names

from our objects by splitting on spaces and indexing. These are all core data type op-

erations, which work whether their subjects are embedded in class instances or not:

>>> name = 'Bob Smith' # Simple string, outside class

>>> name.split() # Extract last name

['Bob', 'Smith']

>>> name.split()[-1] # Or [1], if always just two parts

'Smith'

822 | Chapter 28: A More Realistic Example

www.it-ebooks.info

Similarly, we can give an object a pay raise by updating its pay field—that is, by changing

its state information in place with an assignment. This task also involves basic opera-

tions that work on Python’s core objects, regardless of whether they are standalone or

embedded in a class structure (I’m formatting the result in the following to mask the

fact that different Pythons print a different number of decimal digits):

>>> pay = 100000 # Simple variable, outside class

>>> pay *= 1.10 # Give a 10% raise

>>> print('%.2f' % pay) # Or: pay = pay * 1.10, if you like to type

110000.00 # Or: pay = pay + (pay * .10), if you _really_ do!

To apply these operations to the Person objects created by our script, simply do to

bob.name and sue.pay what we just did to name and pay. The operations are the same,

but the subjects are attached as attributes to objects created from our class:

# Process embedded built-in types: strings, mutability

class Person:

def __init__(self, name, job=None, pay=0):

self.name = name

self.job = job

self.pay = pay

if __name__ == '__main__':

bob = Person('Bob Smith')

sue = Person('Sue Jones', job='dev', pay=100000)

print(bob.name, bob.pay)

print(sue.name, sue.pay)

print(bob.name.split()[-1]) # Extract object's last name

sue.pay *= 1.10 # Give this object a raise

print('%.2f' % sue.pay)

We’ve added the last three lines here; when they’re run, we extract bob’s last name by

using basic string and list operations on his name field, and give sue a pay raise by

modifying her pay attribute in place with basic number operations. In a sense, sue is

also a mutable object—her state changes in place just like a list after an append call.

Here’s the new version’s output:

Bob Smith 0

Sue Jones 100000

Smith

110000.00

The preceding code works as planned, but if you show it to a veteran software developer

he or she will probably tell you that its general approach is not a great idea in practice.

Hardcoding operations like these outside of the class can lead to maintenance problems

in the future.

For example, what if you’ve hardcoded the last-name-extraction formula at many dif-

ferent places in your program? If you ever need to change the way it works (to support

a new name structure, for instance), you’ll need to hunt down and update every oc-

currence. Similarly, if the pay-raise code ever changes (e.g., to require approval or da-

Step 2: Adding Behavior Methods | 823

www.it-ebooks.info

tabase updates), you may have multiple copies to modify. Just finding all the appear-

ances of such code may be problematic in larger programs—they may be scattered

across many files, split into individual steps, and so on. In a prototype like this, frequent

change is almost guaranteed.

Coding Methods

What we really want to do here is employ a software design concept known as encap-

sulation—wrapping up operation logic behind interfaces, such that each operation is

coded only once in our program. That way, if our needs change in the future, there is

just one copy to update. Moreover, we’re free to change the single copy’s internals

almost arbitrarily, without breaking the code that uses it.

In Python terms, we want to code operations on objects in a class’s methods, instead

of littering them throughout our program. In fact, this is one of the things that classes

are very good at—factoring code to remove redundancy and thus optimize maintaina-

bility. As an added bonus, turning operations into methods enables them to be applied

to any instance of the class, not just those that they’ve been hardcoded to process.

This is all simpler in code than it may sound in theory. The following achieves encap-

sulation by moving the two operations from code outside the class to methods inside

the class. While we’re at it, let’s change our self-test code at the bottom to use the new

methods we’re creating, instead of hardcoding operations:

# Add methods to encapsulate operations for maintainability

class Person:

def __init__(self, name, job=None, pay=0):

self.name = name

self.job = job

self.pay = pay

def lastName(self): # Behavior methods

return self.name.split()[-1] # self is implied subject

def giveRaise(self, percent):

self.pay = int(self.pay * (1 + percent)) # Must change here only

if __name__ == '__main__':

bob = Person('Bob Smith')

sue = Person('Sue Jones', job='dev', pay=100000)

print(bob.name, bob.pay)

print(sue.name, sue.pay)

print(bob.lastName(), sue.lastName()) # Use the new methods

sue.giveRaise(.10) # instead of hardcoding

print(sue.pay)

As we’ve learned, methods are simply normal functions that are attached to classes and

designed to process instances of those classes. The instance is the subject of the method

call and is passed to the method’s self argument automatically.

The transformation to the methods in this version is straightforward. The new last

Name method, for example, simply does to self what the previous version hardcoded

824 | Chapter 28: A More Realistic Example

www.it-ebooks.info

for bob, because self is the implied subject when the method is called. lastName also

returns the result, because this operation is a called function now; it computes a value

for its caller to use arbitrarily, even if it is just to be printed. Similarly, the new

giveRaise method just does to self what we did to sue before.

When run now, our file’s output is similar to before—we’ve mostly just refactored the

code to allow for easier changes in the future, not altered its behavior:

Bob Smith 0

Sue Jones 100000

Smith Jones

110000

A few coding details are worth pointing out here. First, notice that sue’s pay is now still

an integer after a pay raise—we convert the math result back to an integer by calling

the int built-in within the method. Changing the value to either int or float is probably

not a significant concern for this demo: integer and floating-point objects have the same

interfaces and can be mixed within expressions. Still, we may need to address truncation

and rounding issues in a real system—money probably is significant to Persons!

As we learned in Chapter 5, we might handle this by using the round(N, 2) built-in to

round and retain cents, using the decimal type to fix precision, or storing monetary

values as full floating-point numbers and displaying them with a %.2f or {0:.2f} for-

matting string to show cents as we did earlier. For now, we’ll simply truncate any cents

with int. For another idea, also see the money function in the formats.py module of

Chapter 25; you could import this tool to show pay with commas, cents, and currency

signs.

Second, notice that we’re also printing sue’s last name this time—because the last-name

logic has been encapsulated in a method, we get to use it on any instance of the class.

As we’ve seen, Python tells a method which instance to process by automatically pass-

ing it in to the first argument, usually called self. Specifically:

• In the first call, bob.lastName(), bob is the implied subject passed to self.

• In the second call, sue.lastName(), sue goes to self instead.

Trace through these calls to see how the instance winds up in self—it’s a key concept.

The net effect is that the method fetches the name of the implied subject each time.

The same happens for giveRaise. We could, for example, give bob a raise by calling

giveRaise for both instances this way, too. Unfortunately for bob, though, his zero

starting pay will prevent him from getting a raise as the program is currently coded—

nothing times anything is nothing, something we may want to address in a future 2.0

release of our software.

Finally, notice that the giveRaise method assumes that percent is passed in as a floating-

point number between zero and one. That may be too radical an assumption in the real

world (a 1000% raise would probably be a bug for most of us!); we’ll let it pass for this

prototype, but we might want to test or at least document this in a future iteration of

Step 2: Adding Behavior Methods | 825

www.it-ebooks.info

this code. Stay tuned for a rehash of this idea in a later chapter in this book, where we’ll

code something called function decorators and explore Python’s assert statement—

alternatives that can do the validity test for us automatically during development. In

Chapter 39, for example, we’ll write a tool that lets us validate with strange incantations

like the following:

@rangetest(percent=(0.0, 1.0)) # Use decorator to validate

def giveRaise(self, percent):

self.pay = int(self.pay * (1 + percent))

Step 3: Operator Overloading

At this point, we have a fairly full-featured class that generates and initializes instances,

along with two new bits of behavior for processing instances in the form of methods.

So far, so good.

As it stands, though, testing is still a bit less convenient than it needs to be—to trace

our objects, we have to manually fetch and print individual attributes (e.g., bob.name,

sue.pay). It would be nice if displaying an instance all at once actually gave us some

useful information. Unfortunately, the default display format for an instance object

isn’t very good—it displays the object’s class name, and its address in memory (which

is essentially useless in Python, except as a unique identifier).

To see this, change the last line in the script to print(sue) so it displays the object as a

whole. Here’s what you’ll get—the output says that sue is an “object” in 3.X, and an

“instance” in 2.X as coded:

Bob Smith 0

Sue Jones 100000

Smith Jones

<__main__.Person object at 0x00000000029A0668>

Providing Print Displays

Fortunately, it’s easy to do better by employing operator overloading—coding methods

in a class that intercept and process built-in operations when run on the class’s instan-

ces. Specifically, we can make use of what are probably the second most commonly

used operator overloading methods in Python, after __init__: the __repr__ method

we’ll deploy here, and its __str__ twin introduced in the preceding chapter.

These methods are run automatically every time an instance is converted to its print

string. Because that’s what printing an object does, the net transitive effect is that

printing an object displays whatever is returned by the object’s __str__ or __repr__

method, if the object either defines one itself or inherits one from a superclass. Double-

underscored names are inherited just like any other.

Technically, __str__ is preferred by print and str, and __repr__ is used as a fallback

for these roles and in all other contexts. Although the two can be used to implement

826 | Chapter 28: A More Realistic Example

www.it-ebooks.info

different displays in different contexts, coding just __repr__ alone suffices to give a

single display in all cases—prints, nested appearances, and interactive echoes. This still

allows clients to provide an alternative display with __str__, but for limited contexts

only; since this is a self-contained example, this is a moot point here.

The __init__ constructor method we’ve already coded is, strictly speaking, operator

overloading too—it is run automatically at construction time to initialize a newly cre-

ated instance. Constructors are so common, though, that they almost seem like a special

case. More focused methods like __repr__ allow us to tap into specific operations and

provide specialized behavior when our objects are used in those contexts.

Let’s put this into code. The following extends our class to give a custom display that

lists attributes when our class’s instances are displayed as a whole, instead of relying

on the less useful default display:

# Add __repr__ overload method for printing objects

class Person:

def __init__(self, name, job=None, pay=0):

self.name = name

self.job = job

self.pay = pay

def lastName(self):

return self.name.split()[-1]

def giveRaise(self, percent):

self.pay = int(self.pay * (1 + percent))

def __repr__(self): # Added method

return '[Person: %s, %s]' % (self.name, self.pay) # String to print

if __name__ == '__main__':

bob = Person('Bob Smith')

sue = Person('Sue Jones', job='dev', pay=100000)

print(bob)

print(sue)

print(bob.lastName(), sue.lastName())

sue.giveRaise(.10)

print(sue)

Notice that we’re doing string % formatting to build the display string in __repr__ here;

at the bottom, classes use built-in type objects and operations like these to get their

work done. Again, everything you’ve already learned about both built-in types and

functions applies to class-based code. Classes largely just add an additional layer of

structure that packages functions and data together and supports extensions.

We’ve also changed our self-test code to print objects directly, instead of printing in-

dividual attributes. When run, the output is more coherent and meaningful now; the

“[...]” lines are returned by our new __repr__, run automatically by print operations:

[Person: Bob Smith, 0]

[Person: Sue Jones, 100000]

Smith Jones

[Person: Sue Jones, 110000]

Step 3: Operator Overloading | 827

www.it-ebooks.info

Design note: as we’ll learn in Chapter 30, the __repr__ method is often used to provide

an as-code low-level display of an object when present, and __str__ is reserved for more

user-friendly informational displays like ours here. Sometimes classes provide both a

__str__ for user-friendly displays and a __repr__ with extra details for developers to

view. Because printing runs __str__ and the interactive prompt echoes results with

__repr__, this can provide both target audiences with an appropriate display.

Since __repr__ applies to more display cases, including nested appearances, and be-

cause we’re not interested in displaying two different formats, the all-inclusive

__repr__ is sufficient for our class. Here, this also means that our custom display will

be used in 2.X if we list both bob and sue in a 3.X print call—a technically nested

appearance, per the sidebar in “Version Portability: Prints” on page 821.

Step 4: Customizing Behavior by Subclassing

At this point, our class captures much of the OOP machinery in Python: it makes

instances, provides behavior in methods, and even does a bit of operator overloading

now to intercept print operations in __repr__. It effectively packages our data and logic

together into a single, self-contained software component, making it easy to locate code

and straightforward to change it in the future. By allowing us to encapsulate behavior,

it also allows us to factor that code to avoid redundancy and its associated maintenance

headaches.

The only major OOP concept it does not yet capture is customization by inheritance.

In some sense, we’re already doing inheritance, because instances inherit methods from

their classes. To demonstrate the real power of OOP, though, we need to define a

superclass/subclass relationship that allows us to extend our software and replace bits

of inherited behavior. That’s the main idea behind OOP, after all; by fostering a coding

model based upon customization of work already done, it can dramatically cut devel-

opment time.

Coding Subclasses

As a next step, then, let’s put OOP’s methodology to use and customize our Person

class by extending our software hierarchy. For the purpose of this tutorial, we’ll define

a subclass of Person called Manager that replaces the inherited giveRaise method with

a more specialized version. Our new class begins as follows:

class Manager(Person): # Define a subclass of Person

This code means that we’re defining a new class named Manager, which inherits from

and may add customizations to the superclass Person. In plain terms, a Manager is almost

like a Person (admittedly, a very long journey for a very small joke...), but Manager has

a custom way to give raises.

828 | Chapter 28: A More Realistic Example

www.it-ebooks.info

For the sake of argument, let’s assume that when a Manager gets a raise, it receives the

passed-in percentage as usual, but also gets an extra bonus that defaults to 10%. For

instance, if a Manager’s raise is specified as 10%, it will really get 20%. (Any relation to

Persons living or dead is, of course, strictly coincidental.) Our new method begins as

follows; because this redefinition of giveRaise will be closer in the class tree to Man

ager instances than the original version in Person, it effectively replaces, and thereby

customizes, the operation. Recall that according to the inheritance search rules, the

lowest version of the name wins:1

class Manager(Person): # Inherit Person attrs

def giveRaise(self, percent, bonus=.10): # Redefine to customize

Augmenting Methods: The Bad Way

Now, there are two ways we might code this Manager customization: a good way and a

bad way. Let’s start with the bad way, since it might be a bit easier to understand. The

bad way is to cut and paste the code of giveRaise in Person and modify it for Manager,

like this:

class Manager(Person):

def giveRaise(self, percent, bonus=.10):

self.pay = int(self.pay * (1 + percent + bonus)) # Bad: cut and paste

This works as advertised—when we later call the giveRaise method of a Manager in-

stance, it will run this custom version, which tacks on the extra bonus. So what’s wrong

with something that runs correctly?

The problem here is a very general one: anytime you copy code with cut and paste, you

essentially double your maintenance effort in the future. Think about it: because we

copied the original version, if we ever have to change the way raises are given (and we

probably will), we’ll have to change the code in two places, not one. Although this is a

small and artificial example, it’s also representative of a universal issue—anytime you’re

tempted to program by copying code this way, you probably want to look for a better

approach.

Augmenting Methods: The Good Way

What we really want to do here is somehow augment the original giveRaise, instead of

replacing it altogether. The good way to do that in Python is by calling to the original

version directly, with augmented arguments, like this:

class Manager(Person):

def giveRaise(self, percent, bonus=.10):

Person.giveRaise(self, percent + bonus) # Good: augment original

1. And no offense to any managers in the audience, of course. I once taught a Python class in New Jersey,

and nobody laughed at this joke, among others. The organizers later told me it was a group of managers

evaluating Python.

Step 4: Customizing Behavior by Subclassing | 829

www.it-ebooks.info

This code leverages the fact that a class’s method can always be called either through

an instance (the usual way, where Python sends the instance to the self argument

automatically) or through the class (the less common scheme, where you must pass the

instance manually). In more symbolic terms, recall that a normal method call of this

form:

instance.method(args...)

is automatically translated by Python into this equivalent form:

class.method(instance, args...)

where the class containing the method to be run is determined by the inheritance search

rule applied to the method’s name. You can code either form in your script, but there

is a slight asymmetry between the two—you must remember to pass along the instance

manually if you call through the class directly. The method always needs a subject

instance one way or another, and Python provides it automatically only for calls made

through an instance. For calls through the class name, you need to send an instance to

self yourself; for code inside a method like giveRaise, self already is the subject of the

call, and hence the instance to pass along.

Calling through the class directly effectively subverts inheritance and kicks the call

higher up the class tree to run a specific version. In our case, we can use this technique

to invoke the default giveRaise in Person, even though it’s been redefined at the Man

ager level. In some sense, we must call through Person this way, because a self.giveR

aise() inside Manager’s giveRaise code would loop—since self already is a Manager,

self.giveRaise() would resolve again to Manager.giveRaise, and so on and so forth

recursively until available memory is exhausted.

This “good” version may seem like a small difference in code, but it can make a huge

difference for future code maintenance—because the giveRaise logic lives in just one

place now (Person’s method), we have only one version to change in the future as needs

evolve. And really, this form captures our intent more directly anyhow—we want to

perform the standard giveRaise operation, but simply tack on an extra bonus. Here’s

our entire module file with this step applied:

# Add customization of one behavior in a subclass

class Person:

def __init__(self, name, job=None, pay=0):

self.name = name

self.job = job

self.pay = pay

def lastName(self):

return self.name.split()[-1]

def giveRaise(self, percent):

self.pay = int(self.pay * (1 + percent))

def __repr__(self):

return '[Person: %s, %s]' % (self.name, self.pay)

class Manager(Person):

830 | Chapter 28: A More Realistic Example

www.it-ebooks.info

def giveRaise(self, percent, bonus=.10): # Redefine at this level

Person.giveRaise(self, percent + bonus) # Call Person's version

if __name__ == '__main__':

bob = Person('Bob Smith')

sue = Person('Sue Jones', job='dev', pay=100000)

print(bob)

print(sue)

print(bob.lastName(), sue.lastName())

sue.giveRaise(.10)

print(sue)

tom = Manager('Tom Jones', 'mgr', 50000) # Make a Manager: __init__

tom.giveRaise(.10) # Runs custom version

print(tom.lastName()) # Runs inherited method

print(tom) # Runs inherited __repr__

To test our Manager subclass customization, we’ve also added self-test code that makes

a Manager, calls its methods, and prints it. When we make a Manager, we pass in a name,

and an optional job and pay as before—because Manager had no __init__ constructor,

it inherits that in Person. Here’s the new version’s output:

[Person: Bob Smith, 0]

[Person: Sue Jones, 100000]

Smith Jones

[Person: Sue Jones, 110000]

Jones

[Person: Tom Jones, 60000]

Everything looks good here: bob and sue are as before, and when tom the Manager is

given a 10% raise, he really gets 20% (his pay goes from $50K to $60K), because the

customized giveRaise in Manager is run for him only. Also notice how printing tom as a

whole at the end of the test code displays the nice format defined in Person’s __repr__:

Manager objects get this, lastName, and the __init__ constructor method’s code “for

free” from Person, by inheritance.

What About super?

To extend inherited methods, the examples in this chapter simply call the original

through the superclass name: Person.giveRaise(...). This is the traditional and sim-

plest scheme in Python, and the one used in most of this book.

Java programmers may especially be interested to know that Python also has a super

built-in function that allows calling back to a superclass’s methods more generically—

but it’s cumbersome to use in 2.X; differs in form between 2.X and 3.X; relies on unusual

semantics in 3.X; works unevenly with Python’s operator overloading; and does not

always mesh well with traditionally coded multiple inheritance, where a single super-

class call won’t suffice.

In its defense, the super call has a valid use case too—cooperative same-named method

dispatch in multiple inheritance trees—but it relies on the “MRO” ordering of classes,

which many find esoteric and artificial; unrealistically assumes universal deployment

to be used reliably; does not fully support method replacement and varying argument

Step 4: Customizing Behavior by Subclassing | 831

www.it-ebooks.info

lists; and to many observers seems an obscure solution to a use case that is rare in real

Python code.

Because of these downsides, this book prefers to call superclasses by explicit name

instead of super, recommends the same policy for newcomers, and defers presenting

super until Chapter 32. It’s usually best judged after you learn the simpler, and generally

more traditional and “Pythonic” ways of achieving the same goals, especially if you’re

new to OOP. Topics like MROs and cooperative multiple inheritance dispatch seem a

lot to ask of beginners—and others.

And to any Java programmers in the audience: I suggest resisting the temptation to use

Python’s super until you’ve had a chance to study its subtle implications. Once you

step up to multiple inheritance, it’s not what you think it is, and more than you probably

expect. The class it invokes may not be the superclass at all, and can even vary per

context. Or to paraphrase a movie line: Python’s super is like a box of chocolates—you

never know what you’re going to get!

Polymorphism in Action

To make this acquisition of inherited behavior even more striking, we can add the

following code at the end of our file temporarily:

if __name__ == '__main__':

...

print('--All three--')

for obj in (bob, sue, tom): # Process objects generically

obj.giveRaise(.10) # Run this object's giveRaise

print(obj) # Run the common __repr__

Here’s the resulting output, with its new parts highlighted in bold:

[Person: Bob Smith, 0]

[Person: Sue Jones, 100000]

Smith Jones

[Person: Sue Jones, 110000]

Jones

[Person: Tom Jones, 60000]

--All three--

[Person: Bob Smith, 0]

[Person: Sue Jones, 121000]

[Person: Tom Jones, 72000]

In the added code, object is either a Person or a Manager, and Python runs the appro-

priate giveRaise automatically—our original version in Person for bob and sue, and our

customized version in Manager for tom. Trace the method calls yourself to see how

Python selects the right giveRaise method for each object.

This is just Python’s notion of polymorphism, which we met earlier in the book, at work

again—what giveRaise does depends on what you do it to. Here, it’s made all the more

obvious when it selects from code we’ve written ourselves in classes. The practical effect

in this code is that sue gets another 10% but tom gets another 20%, because

832 | Chapter 28: A More Realistic Example

www.it-ebooks.info

giveRaise is dispatched based upon the object’s type. As we’ve learned, polymorphism

is at the heart of Python’s flexibility. Passing any of our three objects to a function that

calls a giveRaise method, for example, would have the same effect: the appropriate

version would be run automatically, depending on which type of object was passed.

On the other hand, printing runs the same __repr__ for all three objects, because it’s

coded just once in Person. Manager both specializes and applies the code we originally

wrote in Person. Although this example is small, it’s already leveraging OOP’s talent

for code customization and reuse; with classes, this almost seems automatic at times.

Inherit, Customize, and Extend

In fact, classes can be even more flexible than our example implies. In general, classes

can inherit, customize, or extend existing code in superclasses. For example, although

we’re focused on customization here, we can also add unique methods to Manager that

are not present in Person, if Managers require something completely different (Python

namesake reference intended). The following snippet illustrates. Here, giveRaise re-

defines a superclass’s method to customize it, but someThingElse defines something

new to extend:

class Person:

def lastName(self): ...

def giveRaise(self): ...

def __repr__(self): ...

class Manager(Person): # Inherit

def giveRaise(self, ...): ... # Customize

def someThingElse(self, ...): ... # Extend

tom = Manager()

tom.lastName() # Inherited verbatim

tom.giveRaise() # Customized version

tom.someThingElse() # Extension here

print(tom) # Inherited overload method

Extra methods like this code’s someThingElse extend the existing software and are avail-

able on Manager objects only, not on Persons. For the purposes of this tutorial, however,

we’ll limit our scope to customizing some of Person’s behavior by redefining it, not

adding to it.

OOP: The Big Idea

As is, our code may be small, but it’s fairly functional. And really, it already illustrates

the main point behind OOP in general: in OOP, we program by customizing what has

already been done, rather than copying or changing existing code. This isn’t always an

obvious win to newcomers at first glance, especially given the extra coding requirements

of classes. But overall, the programming style implied by classes can cut development

time radically compared to other approaches.

Step 4: Customizing Behavior by Subclassing | 833

www.it-ebooks.info

For instance, in our example we could theoretically have implemented a custom giv

eRaise operation without subclassing, but none of the other options yield code as op-

timal as ours:

• Although we could have simply coded Manager from scratch as new, independent

code, we would have had to reimplement all the behaviors in Person that are the

same for Managers.

• Although we could have simply changed the existing Person class in place for the

requirements of Manager’s giveRaise, doing so would probably break the places

where we still need the original Person behavior.

• Although we could have simply copied the Person class in its entirety, renamed the

copy to Manager, and changed its giveRaise, doing so would introduce code re-

dundancy that would double our work in the future—changes made to Person in

the future would not be picked up automatically, but would have to be manually

propagated to Manager’s code. As usual, the cut-and-paste approach may seem

quick now, but it doubles your work in the future.

The customizable hierarchies we can build with classes provide a much better solution

for software that will evolve over time. No other tools in Python support this develop-

ment mode. Because we can tailor and extend our prior work by coding new subclasses,

we can leverage what we’ve already done, rather than starting from scratch each time,

breaking what already works, or introducing multiple copies of code that may all have

to be updated in the future. When done right, OOP is a powerful programmer’s ally.

Step 5: Customizing Constructors, Too

Our code works as it is, but if you study the current version closely, you may be struck

by something a bit odd—it seems pointless to have to provide a mgr job name for

Manager objects when we create them: this is already implied by the class itself. It would

be better if we could somehow fill in this value automatically when a Manager is made.

The trick we need to improve on this turns out to be the same as the one we employed

in the prior section: we want to customize the constructor logic for Managers in such a

way as to provide a job name automatically. In terms of code, we want to redefine an

__init__ method in Manager that provides the mgr string for us. And as in giveRaise

customization, we also want to run the original __init__ in Person by calling through

the class name, so it still initializes our objects’ state information attributes.

The following extension to person.py will do the job—we’ve coded the new Manager

constructor and changed the call that creates tom to not pass in the mgr job name:

# File person.py

# Add customization of constructor in a subclass

class Person:

def __init__(self, name, job=None, pay=0):

self.name = name

834 | Chapter 28: A More Realistic Example

www.it-ebooks.info

self.job = job

self.pay = pay

def lastName(self):

return self.name.split()[-1]

def giveRaise(self, percent):

self.pay = int(self.pay * (1 + percent))

def __repr__(self):

return '[Person: %s, %s]' % (self.name, self.pay)

class Manager(Person):

def __init__(self, name, pay): # Redefine constructor

Person.__init__(self, name, 'mgr', pay) # Run original with 'mgr'

def giveRaise(self, percent, bonus=.10):

Person.giveRaise(self, percent + bonus)

if __name__ == '__main__':

bob = Person('Bob Smith')

sue = Person('Sue Jones', job='dev', pay=100000)

print(bob)

print(sue)

print(bob.lastName(), sue.lastName())

sue.giveRaise(.10)

print(sue)

tom = Manager('Tom Jones', 50000) # Job name not needed:

tom.giveRaise(.10) # Implied/set by class

print(tom.lastName())

print(tom)

Again, we’re using the same technique to augment the __init__ constructor here that

we used for giveRaise earlier—running the superclass version by calling through the

class name directly and passing the self instance along explicitly. Although the con-

structor has a strange name, the effect is identical. Because we need Person’s construc-

tion logic to run too (to initialize instance attributes), we really have to call it this way;

otherwise, instances would not have any attributes attached.

Calling superclass constructors from redefinitions this way turns out to be a very com-

mon coding pattern in Python. By itself, Python uses inheritance to look for and call

only one __init__ method at construction time—the lowest one in the class tree. If you

need higher __init__ methods to be run at construction time (and you usually do), you

must call them manually, and usually through the superclass’s name. The upside to

this is that you can be explicit about which argument to pass up to the superclass’s

constructor and can choose to not call it at all: not calling the superclass constructor

allows you to replace its logic altogether, rather than augmenting it.

The output of this file’s self-test code is the same as before—we haven’t changed what

it does, we’ve simply restructured to get rid of some logical redundancy:

[Person: Bob Smith, 0]

[Person: Sue Jones, 100000]

Smith Jones

[Person: Sue Jones, 110000]

Jones

[Person: Tom Jones, 60000]

Step 5: Customizing Constructors, Too | 835

www.it-ebooks.info

OOP Is Simpler Than You May Think

In this complete form, and despite their relatively small sizes, our classes capture nearly

all the important concepts in Python’s OOP machinery:

• Instance creation—filling out instance attributes

• Behavior methods—encapsulating logic in a class’s methods

• Operator overloading—providing behavior for built-in operations like printing

• Customizing behavior—redefining methods in subclasses to specialize them

• Customizing constructors—adding initialization logic to superclass steps

Most of these concepts are based upon just three simple ideas: the inheritance search

for attributes in object trees, the special self argument in methods, and operator over-

loading’s automatic dispatch to methods.

Along the way, we’ve also made our code easy to change in the future, by harnessing

the class’s propensity for factoring code to reduce redundancy. For example, we wrap-

ped up logic in methods and called back to superclass methods from extensions to

avoid having multiple copies of the same code. Most of these steps were a natural

outgrowth of the structuring power of classes.

By and large, that’s all there is to OOP in Python. Classes certainly can become larger

than this, and there are some more advanced class concepts, such as decorators and

metaclasses, which we will meet in later chapters. In terms of the basics, though, our

classes already do it all. In fact, if you’ve grasped the workings of the classes we’ve

written, most OOP Python code should now be within your reach.

Other Ways to Combine Classes

Having said that, I should also tell you that although the basic mechanics of OOP are

simple in Python, some of the art in larger programs lies in the way that classes are put

together. We’re focusing on inheritance in this tutorial because that’s the mechanism

the Python language provides, but programmers sometimes combine classes in other

ways, too.

For example, a common coding pattern involves nesting objects inside each other to

build up composites. We’ll explore this pattern in more detail in Chapter 31, which is

really more about design than about Python. As a quick example, though, we could

use this composition idea to code our Manager extension by embedding a Person, instead

of inheriting from it.

The following alternative, coded in file person-composite.py, does so by using the __get

attr__ operator overloading method to intercept undefined attribute fetches and del-

egate them to the embedded object with the getattr built-in. The getattr call was

introduced in Chapter 25—it’s the same as X.Y attribute fetch notation and thus per-

836 | Chapter 28: A More Realistic Example

www.it-ebooks.info

forms inheritance, but the attribute name Y is a runtime string—and __getattr__ is

covered in full in Chapter 30, but its basic usage is simple enough to leverage here.

By combining these tools, the giveRaise method here still achieves customization, by

changing the argument passed along to the embedded object. In effect, Manager becomes

a controller layer that passes calls down to the embedded object, rather than up to

superclass methods:

# File person-composite.py

# Embedding-based Manager alternative

class Person:

...same...

class Manager:

def __init__(self, name, pay):

self.person = Person(name, 'mgr', pay) # Embed a Person object

def giveRaise(self, percent, bonus=.10):

self.person.giveRaise(percent + bonus) # Intercept and delegate

def __getattr__(self, attr):

return getattr(self.person, attr) # Delegate all other attrs

def __repr__(self):

return str(self.person) # Must overload again (in 3.X)

if __name__ == '__main__':

...same...

The output of this version is the same as the prior, so I won’t list it again. The more

important point here is that this Manager alternative is representative of a general coding

pattern usually known as delegation—a composite-based structure that manages a

wrapped object and propagates method calls to it.

This pattern works in our example, but it requires about twice as much code and is less

well suited than inheritance to the kinds of direct customizations we meant to express

(in fact, no reasonable Python programmer would code this example this way in prac-

tice, except perhaps those writing general tutorials!). Manager isn’t really a Person here,

so we need extra code to manually dispatch method calls to the embedded object;

operator overloading methods like __repr__ must be redefined (in 3.X, at least, as noted

in the upcoming sidebar “Catching Built-in Attributes in 3.X” on page 839); and

adding new Manager behavior is less straightforward since state information is one level

removed.

Still, object embedding, and design patterns based upon it, can be a very good fit when

embedded objects require more limited interaction with the container than direct cus-

tomization implies. A controller layer, or proxy, like this alternative Manager, for ex-

ample, might come in handy if we want to adapt a class to an expected interface it does

not support, or trace or validate calls to another object’s methods (indeed, we will use

a nearly identical coding pattern when we study class decorators later in the book).

Moreover, a hypothetical Department class like the following could aggregate other ob-

jects in order to treat them as a set. Replace the self-test code at the bottom of the

Step 5: Customizing Constructors, Too | 837

www.it-ebooks.info

person.py file temporarily to try this on your own; the file person-department.py in the

book’s examples does:

# File person-department.py

# Aggregate embedded objects into a composite

class Person:

...same...

class Manager(Person):

...same...

class Department:

def __init__(self, *args):

self.members = list(args)

def addMember(self, person):

self.members.append(person)

def giveRaises(self, percent):

for person in self.members:

person.giveRaise(percent)

def showAll(self):

for person in self.members:

print(person)

if __name__ == '__main__':

bob = Person('Bob Smith')

sue = Person('Sue Jones', job='dev', pay=100000)

tom = Manager('Tom Jones', 50000)

development = Department(bob, sue) # Embed objects in a composite

development.addMember(tom)

development.giveRaises(.10) # Runs embedded objects' giveRaise

development.showAll() # Runs embedded objects' __repr__

When run, the department’s showAll method lists all of its contained objects after up-

dating their state in true polymorphic fashion with giveRaises:

[Person: Bob Smith, 0]

[Person: Sue Jones, 110000]

[Person: Tom Jones, 60000]

Interestingly, this code uses both inheritance and composition—Department is a com-

posite that embeds and controls other objects to aggregate, but the embedded Person

and Manager objects themselves use inheritance to customize. As another example, a

GUI might similarly use inheritance to customize the behavior or appearance of labels

and buttons, but also composition to build up larger packages of embedded widgets,

such as input forms, calculators, and text editors. The class structure to use depends

on the objects you are trying to model—in fact, the ability to model real-world entities

this way is one of OOP’s strengths.

Design issues like composition are explored in Chapter 31, so we’ll postpone further

investigations for now. But again, in terms of the basic mechanics of OOP in Python,

our Person and Manager classes already tell the entire story. Now that you’ve mastered

838 | Chapter 28: A More Realistic Example

www.it-ebooks.info

the basics of OOP, though, developing general tools for applying it more easily in your

scripts is often a natural next step—and the topic of the next section.

Catching Built-in Attributes in 3.X

An implementation note: in Python 3.X—and in 2.X when 3.X’s “new style” classes

are enabled—the alternative delegation-based Manager class of the file person-compo-

site.py that we coded in this chapter will not be able to intercept and delegate operator

overloading method attributes like __repr__ without redefining them itself. Although

we know that __repr__ is the only such name used in our specific example, this is a

general issue for delegation-based classes.

Recall that built-in operations like printing and addition implicitly invoke operator

overloading methods such as __repr__ and __add__. In 3.X’s new-style classes, built-in

operations like these do not route their implicit attribute fetches through generic at-

tribute managers: neither __getattr__ (run for undefined attributes) nor its cousin

__getattribute__ (run for all attributes) is invoked. This is why we have to redefine

__repr__ redundantly in the alternative Manager, in order to ensure that printing is

routed to the embedded Person object in 3.X.

Comment out this method to see this live—the Manager instance prints with a default

in 3.X, but still uses Person’s __repr__ in 2.X. In fact, the __repr__ in Manager isn’t

required in 2.X at all, as it’s coded to use 2.X normal and default (a.k.a. “classic”)

classes:

c:\code> py −3 person-composite.py

[Person: Bob Smith, 0]

...etc...

<__main__.Manager object at 0x00000000029AA8D0>

c:\code> py −2 person-composite.py

[Person: Bob Smith, 0]

...etc...

[Person: Tom Jones, 60000]

Technically, this happens because built-in operations begin their implicit search for

method names at the instance in 2.X’s default classic classes, but start at the class in

3.X’s mandated new-style classes, skipping the instance entirely. By contrast, explicit

by-name attribute fetches are always routed to the instance first in both models. In 2.X

classic classes, built-ins route attributes this way too—printing, for example, routes

__repr__ through __getattr__. This is why commenting out Manager’s __repr__ has no

effect in 2.X: the call is delegated to Person. New-style classes also inherit a default for

__repr__ from their automatic object superclass that would foil __getattr__, but the

new-style __getattribute__ doesn’t intercept the name either.

This is a change, but isn’t a show-stopper—delegation-based new-style classes can

generally redefine operator overloading methods to delegate them to wrapped objects,

either manually or via tools or superclasses. This topic is too advanced to explore further

in this tutorial, though, so don’t sweat the details too much here. Watch for it to be

revisited in Chapter 31 and Chapter 32 (the latter of which defines new-style classes

more formally); to impact examples again in the attribute management coverage of

Step 5: Customizing Constructors, Too | 839

www.it-ebooks.info

Chapter 38 and the Private class decorator in Chapter 39 (the last of these also codes

workarounds); and to be a special-case factor in a nearly formal inheritance definition

in Chapter 40. In a language like Python that supports both attribute interception and

operator overloading, the impacts of this change can be as broad as this spread implies!

Step 6: Using Introspection Tools

Let’s make one final tweak before we throw our objects onto a database. As they are,

our classes are complete and demonstrate most of the basics of OOP in Python. They

still have two remaining issues we probably should iron out, though, before we go live

with them:

• First, if you look at the display of the objects as they are right now, you’ll notice

that when you print tom the Manager, the display labels him as a Person. That’s not

technically incorrect, since Manager is a kind of customized and specialized Per

son. Still, it would be more accurate to display an object with the most specific (that

is, lowest) class possible: the one an object is made from.

• Second, and perhaps more importantly, the current display format shows only the

attributes we include in our __repr__, and that might not account for future goals.

For example, we can’t yet verify that tom’s job name has been set to mgr correctly

by Manager’s constructor, because the __repr__ we coded for Person does not print

this field. Worse, if we ever expand or otherwise change the set of attributes as-

signed to our objects in __init__, we’ll have to remember to also update

__repr__ for new names to be displayed, or it will become out of sync over time.

The last point means that, yet again, we’ve made potential extra work for ourselves in

the future by introducing redundancy in our code. Because any disparity in __repr__

will be reflected in the program’s output, this redundancy may be more obvious than

the other forms we addressed earlier; still, avoiding extra work in the future is generally

a good thing.

Special Class Attributes

We can address both issues with Python’s introspection tools—special attributes and

functions that give us access to some of the internals of objects’ implementations. These

tools are somewhat advanced and generally used more by people writing tools for other

programmers to use than by programmers developing applications. Even so, a basic

knowledge of some of these tools is useful because they allow us to write code that

processes classes in generic ways. In our code, for example, there are two hooks that

can help us out, both of which were introduced near the end of the preceding chapter

and used in earlier examples:

• The built-in instance.__class__ attribute provides a link from an instance to the

class from which it was created. Classes in turn have a __name__, just like modules,

840 | Chapter 28: A More Realistic Example

www.it-ebooks.info

and a __bases__ sequence that provides access to superclasses. We can use these

here to print the name of the class from which an instance is made rather than one

we’ve hardcoded.

• The built-in object.__dict__ attribute provides a dictionary with one key/value

pair for every attribute attached to a namespace object (including modules, classes,

and instances). Because it is a dictionary, we can fetch its keys list, index by key,

iterate over its keys, and so on, to process all attributes generically. We can use this

here to print every attribute in any instance, not just those we hardcode in custom

displays, much as we did in Chapter 25’s module tools.

We met the first of these categories in the prior chapter, but here’s a quick review at

Python’s interactive prompt with the latest versions of our person.py classes. Notice

how we load Person at the interactive prompt with a from statement here—class names

live in and are imported from modules, exactly like function names and other variables:

>>> from person import Person

>>> bob = Person('Bob Smith')

>>> bob # Show bob's __repr__ (not __str__)

[Person: Bob Smith, 0]

>>> print(bob) # Ditto: print => __str__ or __repr__

[Person: Bob Smith, 0]

>>> bob.__class__ # Show bob's class and its name

>>> bob.__class__.__name__

'Person'

>>> list(bob.__dict__.keys()) # Attributes are really dict keys

['pay', 'job', 'name'] # Use list to force list in 3.X

>>> for key in bob.__dict__:

print(key, '=>', bob.__dict__[key]) # Index manually

pay => 0

job => None

name => Bob Smith

>>> for key in bob.__dict__:

print(key, '=>', getattr(bob, key)) # obj.attr, but attr is a var

pay => 0

job => None

name => Bob Smith

As noted briefly in the prior chapter, some attributes accessible from an instance might

not be stored in the __dict__ dictionary if the instance’s class defines __slots__: an

optional and relatively obscure feature of new-style classes (and hence all classes in

Python 3.X) that stores attributes sequentially in the instance; may preclude an instance

__dict__ altogether; and which we won’t study in full until Chapter 31 and Chap-

ter 32. Since slots really belong to classes instead of instances, and since they are rarely

Step 6: Using Introspection Tools | 841

www.it-ebooks.info

used in any event, we can reasonably ignore them here and focus on the normal

__dict__.

As we do, though, keep in mind that some programs may need to catch exceptions for

a missing __dict__, or use hasattr to test or getattr with a default if its users might

deploy slots. As we’ll see in Chapter 32, the next section’s code won’t fail if used by a

class with slots (its lack of them is enough to guarantee a __dict__) but slots—and other

“virtual” attributes—won’t be reported as instance data.

A Generic Display Tool

We can put these interfaces to work in a superclass that displays accurate class names

and formats all attributes of an instance of any class. Open a new file in your text editor

to code the following—it’s a new, independent module named classtools.py that im-

plements just such a class. Because its __repr__ display overload uses generic intro-

spection tools, it will work on any instance, regardless of the instance’s attributes set.

And because this is a class, it automatically becomes a general formatting tool: thanks

to inheritance, it can be mixed into any class that wishes to use its display format. As

an added bonus, if we ever want to change how instances are displayed we need only

change this class, as every class that inherits its __repr__ will automatically pick up the

new format when it’s next run:

# File classtools.py (new)

"Assorted class utilities and tools"

class AttrDisplay:

"""

Provides an inheritable display overload method that shows

instances with their class names and a name=value pair for

each attribute stored on the instance itself (but not attrs

inherited from its classes). Can be mixed into any class,

and will work on any instance.

"""

def gatherAttrs(self):

attrs = []

for key in sorted(self.__dict__):

attrs.append('%s=%s' % (key, getattr(self, key)))

return ', '.join(attrs)

def __repr__(self):

return '[%s: %s]' % (self.__class__.__name__, self.gatherAttrs())

if __name__ == '__main__':

class TopTest(AttrDisplay):

count = 0

def __init__(self):

self.attr1 = TopTest.count

self.attr2 = TopTest.count+1

TopTest.count += 2

842 | Chapter 28: A More Realistic Example

www.it-ebooks.info

class SubTest(TopTest):

pass

X, Y = TopTest(), SubTest() # Make two instances

print(X) # Show all instance attrs

print(Y) # Show lowest class name

Notice the docstrings here—because this is a general-purpose tool, we want to add

some functional documentation for potential users to read. As we saw in Chapter 15,

docstrings can be placed at the top of simple functions and modules, and also at the

start of classes and any of their methods; the help function and the PyDoc tool extract

and display these automatically. We’ll revisit docstrings for classes in Chapter 29.

When run directly, this module’s self-test makes two instances and prints them; the

__repr__ defined here shows the instance’s class, and all its attributes names and values,

in sorted attribute name order. This output is the same in Python 3.X and 2.X because

each object’s display is a single constructed string:

C:\code> classtools.py

[TopTest: attr1=0, attr2=1]

[SubTest: attr1=2, attr2=3]

Another design note here: because this class uses __repr__ instead of __str__ its displays

are used in all contexts, but its clients also won’t have the option of providing an al-

ternative low-level display—they can still add a __str__, but this applies to print and

str only. In a more general tool, using __str__ instead limits a display’s scope, but

leaves clients the option of adding a __repr__ for a secondary display at interactive

prompts and nested appearances. We’ll follow this alternative policy when we code

expanded versions of this class in Chapter 31; for this demo, we’ll stick with the all-

inclusive __repr__.

Instance Versus Class Attributes

If you study the classtools module’s self-test code long enough, you’ll notice that its

class displays only instance attributes, attached to the self object at the bottom of the

inheritance tree; that’s what self’s __dict__ contains. As an intended consequence, we

don’t see attributes inherited by the instance from classes above it in the tree (e.g.,

count in this file’s self-test code—a class attribute used as an instance counter). Inher-

ited class attributes are attached to the class only, not copied down to instances.

If you ever do wish to include inherited attributes too, you can climb the __class__ link

to the instance’s class, use the __dict__ there to fetch class attributes, and then iterate

through the class’s __bases__ attribute to climb to even higher superclasses, repeating

as necessary. If you’re a fan of simple code, running a built-in dir call on the instance

instead of using __dict__ and climbing would have much the same effect, since dir

results include inherited names in the sorted results list. In Python 2.7:

>>> from person import Person # 2.X: keys is list, dir shows less

>>> bob = Person('Bob Smith')

Step 6: Using Introspection Tools | 843

www.it-ebooks.info

>>> bob.__dict__.keys() # Instance attrs only

['pay', 'job', 'name']

>>> dir(bob) # Plus inherited attrs in classes

['__doc__', '__init__', '__module__', '__repr__', 'giveRaise', 'job', 'lastName',

'name', 'pay']

If you’re using Python 3.X, your output will vary, and may be more than you bargained

for; here’s the 3.3 result for the last two statements (keys list order can vary per run):

>>> list(bob.__dict__.keys()) # 3.X keys is a view, not a list

['name', 'job', 'pay']

>>> dir(bob) # 3.X includes class type methods

['__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__',

'__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__',

...more omitted: 31 attrs...

'__setattr__', '__sizeof__', '__str__', '__subclasshook__', '__weakref__',

'giveRaise', 'job', 'lastName', 'name', 'pay']

The code and output here varies between Python 2.X and 3.X, because 3.X’s

dict.keys is not a list, and 3.X’s dir returns extra class-type implementation attributes.

Technically, dir returns more in 3.X because classes are all “new style” and inherit a

large set of operator overloading names from the class type. In fact, as usual you’ll

probably want to filter out most of the __X__ names in the 3.X dir result, since they are

internal implementation details and not something you’d normally want to display:

>>> len(dir(bob))

>>> list(name for name in dir(bob) if not name.startswith('__'))

['giveRaise', 'job', 'lastName', 'name', 'pay']

In the interest of space, we’ll leave optional display of inherited class attributes with

either tree climbs or dir as suggested experiments for now. For more hints on this front,

though, watch for the classtree.py inheritance tree climber we will write in Chap-

ter 29, and the lister.py attribute listers and climbers we’ll code in Chapter 31.

Name Considerations in Tool Classes

One last subtlety here: because our AttrDisplay class in the classtools module is a

general tool designed to be mixed into other arbitrary classes, we have to be aware of

the potential for unintended name collisions with client classes. As is, I’ve assumed that

client subclasses may want to use both its __repr__ and gatherAttrs, but the latter of

these may be more than a subclass expects—if a subclass innocently defines a gather

Attrs name of its own, it will likely break our class, because the lower version in the

subclass will be used instead of ours.

To see this for yourself, add a gatherAttrs to TopTest in the file’s self-test code; unless

the new method is identical, or intentionally customizes the original, our tool class will

844 | Chapter 28: A More Realistic Example

www.it-ebooks.info

no longer work as planned—self.gatherAttrs within AttrDisplay searches anew from

the TopTest instance:

class TopTest(AttrDisplay):

....

def gatherAttrs(self): # Replaces method in AttrDisplay!

return 'Spam'

This isn’t necessarily bad—sometimes we want other methods to be available to sub-

classes, either for direct calls or for customization this way. If we really meant to provide

a __repr__ only, though, this is less than ideal.

To minimize the chances of name collisions like this, Python programmers often prefix

methods not meant for external use with a single underscore: _gatherAttrs in our case.

This isn’t foolproof (what if another class defines _gatherAttrs, too?), but it’s usually

sufficient, and it’s a common Python naming convention for methods internal to a class.

A better and less commonly used solution would be to use two underscores at the front

of the method name only: __gatherAttrs for us. Python automatically expands such

names to include the enclosing class’s name, which makes them truly unique when

looked up by the inheritance search. This is a feature usually called pseudoprivate class

attributes, which we’ll expand on in Chapter 31 and deploy in an expanded version of

this class there. For now, we’ll make both our methods available.

Our Classes’ Final Form

Now, to use this generic tool in our classes, all we need to do is import it from its

module, mix it in by inheritance in our top-level class, and get rid of the more specific

__repr__ we coded before. The new display overload method will be inherited by in-

stances of Person, as well as Manager; Manager gets __repr__ from Person, which now

obtains it from the AttrDisplay coded in another module. Here is the final version of

our person.py file with these changes applied:

# File classtools.py (new)

...as listed earlier...

# File person.py (final)

"""

Record and process information about people.

Run this file directly to test its classes.

"""

from classtools import AttrDisplay # Use generic display tool

class Person(AttrDisplay): # Mix in a repr at this level

"""

Create and process person records

"""

def __init__(self, name, job=None, pay=0):

self.name = name

self.job = job

self.pay = pay

Step 6: Using Introspection Tools | 845

www.it-ebooks.info

def lastName(self): # Assumes last is last

return self.name.split()[-1]

def giveRaise(self, percent): # Percent must be 0..1

self.pay = int(self.pay * (1 + percent))

class Manager(Person):

"""

A customized Person with special requirements

"""

def __init__(self, name, pay):

Person.__init__(self, name, 'mgr', pay) # Job name is implied

def giveRaise(self, percent, bonus=.10):

Person.giveRaise(self, percent + bonus)

if __name__ == '__main__':

bob = Person('Bob Smith')

sue = Person('Sue Jones', job='dev', pay=100000)

print(bob)

print(sue)

print(bob.lastName(), sue.lastName())

sue.giveRaise(.10)

print(sue)

tom = Manager('Tom Jones', 50000)

tom.giveRaise(.10)

print(tom.lastName())

print(tom)

As this is the final revision, we’ve added a few comments here to document our work

—docstrings for functional descriptions and # for smaller notes, per best-practice con-

ventions, as well as blank lines between methods for readability—a generally good style

choice when classes or methods grow large, which I resisted earlier for these small

classes, in part to save space and keep the code more compact.

When we run this code now, we see all the attributes of our objects, not just the ones

we hardcoded in the original __repr__. And our final issue is resolved: because AttrDis

play takes class names off the self instance directly, each object is shown with the name

of its closest (lowest) class—tom displays as a Manager now, not a Person, and we can

finally verify that his job name has been correctly filled in by the Manager constructor:

C:\code> person.py

[Person: job=None, name=Bob Smith, pay=0]

[Person: job=dev, name=Sue Jones, pay=100000]

Smith Jones

[Person: job=dev, name=Sue Jones, pay=110000]

Jones

[Manager: job=mgr, name=Tom Jones, pay=60000]

This is the more useful display we were after. From a larger perspective, though, our

attribute display class has become a general tool, which we can mix into any class by

inheritance to leverage the display format it defines. Further, all its clients will auto-

846 | Chapter 28: A More Realistic Example

www.it-ebooks.info

matically pick up future changes in our tool. Later in the book, we’ll meet even more

powerful class tool concepts, such as decorators and metaclasses; along with Python’s

many introspection tools, they allow us to write code that augments and manages

classes in structured and maintainable ways.

Step 7 (Final): Storing Objects in a Database

At this point, our work is almost complete. We now have a two-module system that not

only implements our original design goals for representing people, but also provides a

general attribute display tool we can use in other programs in the future. By coding

functions and classes in module files, we’ve ensured that they naturally support reuse.

And by coding our software as classes, we’ve ensured that it naturally supports exten-

sion.

Although our classes work as planned, though, the objects they create are not real

database records. That is, if we kill Python, our instances will disappear—they’re tran-

sient objects in memory and are not stored in a more permanent medium like a file, so

they won’t be available in future program runs. It turns out that it’s easy to make

instance objects more permanent, with a Python feature called object persistence—

making objects live on after the program that creates them exits. As a final step in this

tutorial, let’s make our objects permanent.

Pickles and Shelves

Object persistence is implemented by three standard library modules, available in every

Python:

pickle

Serializes arbitrary Python objects to and from a string of bytes

dbm (named anydbm in Python 2.X)

Implements an access-by-key filesystem for storing strings

shelve

Uses the other two modules to store Python objects on a file by key

We met these modules very briefly in Chapter 9 when we studied file basics. They

provide powerful data storage options. Although we can’t do them complete justice in

this tutorial or book, they are simple enough that a brief introduction is enough to get

you started.

The pickle module

The pickle module is a sort of super-general object formatting and deformatting tool:

given a nearly arbitrary Python object in memory, it’s clever enough to convert the

object to a string of bytes, which it can use later to reconstruct the original object in

memory. The pickle module can handle almost any object you can create—lists, dic-

Step 7 (Final): Storing Objects in a Database | 847

www.it-ebooks.info

tionaries, nested combinations thereof, and class instances. The latter are especially

useful things to pickle, because they provide both data (attributes) and behavior (meth-

ods); in fact, the combination is roughly equivalent to “records” and “programs.” Be-

cause pickle is so general, it can replace extra code you might otherwise write to create

and parse custom text file representations for your objects. By storing an object’s pickle

string on a file, you effectively make it permanent and persistent: simply load and un-

pickle it later to re-create the original object.

The shelve module

Although it’s easy to use pickle by itself to store objects in simple flat files and load

them from there later, the shelve module provides an extra layer of structure that allows

you to store pickled objects by key. shelve translates an object to its pickled string with

pickle and stores that string under a key in a dbm file; when later loading, shelve fetches

the pickled string by key and re-creates the original object in memory with pickle. This

is all quite a trick, but to your script a shelve2 of pickled objects looks just like a

dictionary—you index by key to fetch, assign to keys to store, and use dictionary tools

such as len, in, and dict.keys to get information. Shelves automatically map dictionary

operations to objects stored in a file.

In fact, to your script the only coding difference between a shelve and a normal dictio-

nary is that you must open shelves initially and must close them after making changes.

The net effect is that a shelve provides a simple database for storing and fetching native

Python objects by keys, and thus makes them persistent across program runs. It does

not support query tools such as SQL, and it lacks some advanced features found in

enterprise-level databases (such as true transaction processing), but native Python ob-

jects stored on a shelve may be processed with the full power of the Python language

once they are fetched back by key.

Storing Objects on a Shelve Database

Pickling and shelves are somewhat advanced topics, and we won’t go into all their

details here; you can read more about them in the standard library manuals, as well as

application-focused books such as the Programming Python follow-up text. This is all

simpler in Python than in English, though, so let’s jump into some code.

Let’s write a new script that throws objects of our classes onto a shelve. In your text

editor, open a new file we’ll call makedb.py. Since this is a new file, we’ll need to import

our classes in order to create a few instances to store. We used from to load a class at

the interactive prompt earlier, but really, as with functions and other variables, there

are two ways to load a class from a file (class names are variables like any other, and

not at all magic in this context):

2. Yes, we use “shelve” as a noun in Python, much to the chagrin of a variety of editors I’ve worked with

over the years, both electronic and human.

848 | Chapter 28: A More Realistic Example

www.it-ebooks.info

import person # Load class with import

bob = person.Person(...) # Go through module name

from person import Person # Load class with from

bob = Person(...) # Use name directly

We’ll use from to load in our script, just because it’s a bit less to type. To keep this

simple, copy or retype in our new script the self-test lines from person.py that make

instances of our classes, so we have something to store (this is a simple demo, so we

won’t worry about the test-code redundancy here). Once we have some instances, it’s

almost trivial to store them on a shelve. We simply import the shelve module, open a

new shelve with an external filename, assign the objects to keys in the shelve, and close

the shelve when we’re done because we’ve made changes:

# File makedb.py: store Person objects on a shelve database

from person import Person, Manager # Load our classes

bob = Person('Bob Smith') # Re-create objects to be stored

sue = Person('Sue Jones', job='dev', pay=100000)

tom = Manager('Tom Jones', 50000)

import shelve

db = shelve.open('persondb') # Filename where objects are stored

for obj in (bob, sue, tom): # Use object's name attr as key

db[obj.name] = obj # Store object on shelve by key

db.close() # Close after making changes

Notice how we assign objects to the shelve using their own names as keys. This is just

for convenience; in a shelve, the key can be any string, including one we might create

to be unique using tools such as process IDs and timestamps (available in the os and

time standard library modules). The only rule is that the keys must be strings and should

be unique, since we can store just one object per key, though that object can be a list,

dictionary, or other object containing many objects itself.

In fact, the values we store under keys can be Python objects of almost any sort—built-

in types like strings, lists, and dictionaries, as well as user-defined class instances, and

nested combinations of all of these and more. For example, the name and job attributes

of our objects could be nested dictionaries and lists as in earlier incarnations in this

book (though this would require a bit of redesign to the current code).

That’s all there is to it—if this script has no output when run, it means it probably

worked; we’re not printing anything, just creating and storing objects in a file-based

database.

C:\code> makedb.py

Exploring Shelves Interactively

At this point, there are one or more real files in the current directory whose names all

start with “persondb”. The actual files created can vary per platform, and just as in the

built-in open function, the filename in shelve.open() is relative to the current working

Step 7 (Final): Storing Objects in a Database | 849

www.it-ebooks.info

directory unless it includes a directory path. Wherever they are stored, these files im-

plement a keyed-access file that contains the pickled representation of our three Python

objects. Don’t delete these files—they are your database, and are what you’ll need to

copy or transfer when you back up or move your storage.

You can look at the shelve’s files if you want to, either from Windows Explorer or the

Python shell, but they are binary hash files, and most of their content makes little sense

outside the context of the shelve module. With Python 3.X and no extra software

installed, our database is stored in three files (in 2.X, it’s just one file, persondb, because

the bsddb extension module is preinstalled with Python for shelves; in 3.X, bsddb is an

optional third-party open source add-on).

For example, Python’s standard library glob module allows us to get directory listings

in Python code to verify the files here, and we can open the files in text or binary mode

to explore strings and bytes:

>>> import glob

>>> glob.glob('person*')

['person-composite.py', 'person-department.py', 'person.py', 'person.pyc',

'persondb.bak', 'persondb.dat', 'persondb.dir']

>>> print(open('persondb.dir').read())

'Sue Jones', (512, 92)

'Tom Jones', (1024, 91)

'Bob Smith', (0, 80)

>>> print(open('persondb.dat','rb').read())

b'\x80\x03cperson\nPerson\nq\x00)\x81q\x01}q\x02(X\x03\x00\x00\x00jobq\x03NX\x03\x00

...more omitted...

This content isn’t impossible to decipher, but it can vary on different platforms and

doesn’t exactly qualify as a user-friendly database interface! To verify our work better,

we can write another script, or poke around our shelve at the interactive prompt. Be-

cause shelves are Python objects containing Python objects, we can process them with

normal Python syntax and development modes. Here, the interactive prompt effectively

becomes a database client:

>>> import shelve

>>> db = shelve.open('persondb') # Reopen the shelve

>>> len(db) # Three 'records' stored

>>> list(db.keys()) # keys is the index

['Sue Jones', 'Tom Jones', 'Bob Smith'] # list() to make a list in 3.X

>>> bob = db['Bob Smith'] # Fetch bob by key

>>> bob # Runs __repr__ from AttrDisplay

[Person: job=None, name=Bob Smith, pay=0]

>>> bob.lastName() # Runs lastName from Person

'Smith'

850 | Chapter 28: A More Realistic Example

www.it-ebooks.info

>>> for key in db: # Iterate, fetch, print

print(key, '=>', db[key])

Sue Jones => [Person: job=dev, name=Sue Jones, pay=100000]

Tom Jones => [Manager: job=mgr, name=Tom Jones, pay=50000]

Bob Smith => [Person: job=None, name=Bob Smith, pay=0]

>>> for key in sorted(db):

print(key, '=>', db[key]) # Iterate by sorted keys

Bob Smith => [Person: job=None, name=Bob Smith, pay=0]

Sue Jones => [Person: job=dev, name=Sue Jones, pay=100000]

Tom Jones => [Manager: job=mgr, name=Tom Jones, pay=50000]

Notice that we don’t have to import our Person or Manager classes here in order to load

or use our stored objects. For example, we can call bob’s lastName method freely, and

get his custom print display format automatically, even though we don’t have his

Person class in our scope here. This works because when Python pickles a class instance,

it records its self instance attributes, along with the name of the class it was created

from and the module where the class lives. When bob is later fetched from the shelve

and unpickled, Python will automatically reimport the class and link bob to it.

The upshot of this scheme is that class instances automatically acquire all their class

behavior when they are loaded in the future. We have to import our classes only to

make new instances, not to process existing ones. Although a deliberate feature, this

scheme has somewhat mixed consequences:

• The downside is that classes and their module’s files must be importable when an

instance is later loaded. More formally, pickleable classes must be coded at the top

level of a module file accessible from a directory listed on the sys.path module

search path (and shouldn’t live in the topmost script files’ module __main__ unless

they’re always in that module when used). Because of this external module file

requirement, some applications choose to pickle simpler objects such as diction-

aries or lists, especially if they are to be transferred across the Internet.

• The upside is that changes in a class’s source code file are automatically picked up

when instances of the class are loaded again; there is often no need to update stored

objects themselves, since updating their class’s code changes their behavior.

Shelves also have well-known limitations (the database suggestions at the end of this

chapter mention a few of these). For simple object storage, though, shelves and pickles

are remarkably easy-to-use tools.

Updating Objects on a Shelve

Now for one last script: let’s write a program that updates an instance (record) each

time it runs, to prove the point that our objects really are persistent—that their current

values are available every time a Python program runs. The following file, upda-

tedb.py, prints the database and gives a raise to one of our stored objects each time. If

Step 7 (Final): Storing Objects in a Database | 851

www.it-ebooks.info

you trace through what’s going on here, you’ll notice that we’re getting a lot of utility

“for free”—printing our objects automatically employs the general __repr__ overload-

ing method, and we give raises by calling the giveRaise method we wrote earlier. This

all “just works” for objects based on OOP’s inheritance model, even when they live in

a file:

# File updatedb.py: update Person object on database

import shelve

db = shelve.open('persondb') # Reopen shelve with same filename

for key in sorted(db): # Iterate to display database objects

print(key, '\t=>', db[key]) # Prints with custom format

sue = db['Sue Jones'] # Index by key to fetch

sue.giveRaise(.10) # Update in memory using class's method

db['Sue Jones'] = sue # Assign to key to update in shelve

db.close() # Close after making changes

Because this script prints the database when it starts up, we have to run it at least twice

to see our objects change. Here it is in action, displaying all records and increasing

sue’s pay each time it is run (it’s a pretty good script for sue...something to schedule to

run regularly as a cron job perhaps?):

C:\code> updatedb.py

Bob Smith => [Person: job=None, name=Bob Smith, pay=0]

Sue Jones => [Person: job=dev, name=Sue Jones, pay=100000]

Tom Jones => [Manager: job=mgr, name=Tom Jones, pay=50000]

C:\code> updatedb.py

Bob Smith => [Person: job=None, name=Bob Smith, pay=0]

Sue Jones => [Person: job=dev, name=Sue Jones, pay=110000]

Tom Jones => [Manager: job=mgr, name=Tom Jones, pay=50000]

C:\code> updatedb.py

Bob Smith => [Person: job=None, name=Bob Smith, pay=0]

Sue Jones => [Person: job=dev, name=Sue Jones, pay=121000]

Tom Jones => [Manager: job=mgr, name=Tom Jones, pay=50000]

C:\code> updatedb.py

Bob Smith => [Person: job=None, name=Bob Smith, pay=0]

Sue Jones => [Person: job=dev, name=Sue Jones, pay=133100]

Tom Jones => [Manager: job=mgr, name=Tom Jones, pay=50000]

Again, what we see here is a product of the shelve and pickle tools we get from Python,

and of the behavior we coded in our classes ourselves. And once again, we can verify

our script’s work at the interactive prompt—the shelve’s equivalent of a database client:

C:\code> python

>>> import shelve

>>> db = shelve.open('persondb') # Reopen database

>>> rec = db['Sue Jones'] # Fetch object by key

>>> rec

[Person: job=dev, name=Sue Jones, pay=146410]

852 | Chapter 28: A More Realistic Example

www.it-ebooks.info

>>> rec.lastName()

'Jones'

>>> rec.pay

146410

For another example of object persistence in this book, see the sidebar in Chapter 31

titled “Why You Will Care: Classes and Persistence” on page 941. It stores a some-

what larger composite object in a flat file with pickle instead of shelve, but the effect

is similar. For more details and examples for both pickles and shelves, see also Chap-

ter 9 (file basics) and Chapter 37 (3.X string tool changes), other books, and Python’s

manuals.

Future Directions

And that’s a wrap for this tutorial. At this point, you’ve seen all the basics of Python’s

OOP machinery in action, and you’ve learned ways to avoid redundancy and its asso-

ciated maintenance issues in your code. You’ve built full-featured classes that do real

work. As an added bonus, you’ve made them real database records by storing them in

a Python shelve, so their information lives on persistently.

There is much more we could explore here, of course. For example, we could extend

our classes to make them more realistic, add new kinds of behavior to them, and so on.

Giving a raise, for instance, should in practice verify that pay increase rates are between

zero and one—an extension we’ll add when we meet decorators later in this book. You

might also mutate this example into a personal contacts database, by changing the state

information stored on objects, as well as the classes’ methods used to process it. We’ll

leave this a suggested exercise open to your imagination.

We could also expand our scope to use tools that either come with Python or are freely

available in the open source world:

GUIs

As is, we can only process our database with the interactive prompt’s command-

based interface, and scripts. We could also work on expanding our object data-

base’s usability by adding a desktop graphical user interface for browsing and up-

dating its records. GUIs can be built portably with either Python’s tkinter

(Tkinter in 2.X) standard library support, or third-party toolkits such as WxPython

and PyQt. tkinter ships with Python, lets you build simple GUIs quickly, and is

ideal for learning GUI programming techniques; WxPython and PyQt tend to be

more complex to use but often produce higher-grade GUIs in the end.

Websites

Although GUIs are convenient and fast, the Web is hard to beat in terms of acces-

sibility. We might also implement a website for browsing and updating records,

instead of or in addition to GUIs and the interactive prompt. Websites can be

constructed with either basic CGI scripting tools that come with Python, or full-

featured third-party web frameworks such as Django, TurboGears, Pylons,

Future Directions | 853

www.it-ebooks.info

web2Py, Zope, or Google’s App Engine. On the Web, your data can still be stored

in a shelve, pickle file, or other Python-based medium; the scripts that process it

are simply run automatically on a server in response to requests from web browsers

and other clients, and they produce HTML to interact with a user, either directly

or by interfacing with framework APIs. Rich Internet application (RIA) systems

such as Silverlight and pyjamas also attempt to combine GUI-like interactivity with

web-based deployment.

Web services

Although web clients can often parse information in the replies from websites (a

technique colorfully known as “screen scraping”), we might go further and provide

a more direct way to fetch records on the Web via a web services interface such as

SOAP or XML-RPC calls—APIs supported by either Python itself or the third-party

open source domain, which generally map data to and from XML format for trans-

mission. To Python scripts, such APIs return data more directly than text embed-

ded in the HTML of a reply page.

Databases

If our database becomes higher-volume or critical, we might eventually move it

from shelves to a more full-featured storage mechanism such as the open source

ZODB object-oriented database system (OODB), or a more traditional SQL-based

relational database system such as MySQL, Oracle, or PostgreSQL. Python itself

comes with the in-process SQLite database system built-in, but other open source

options are freely available on the Web. ZODB, for example, is similar to Python’s

shelve but addresses many of its limitations, better supporting larger databases,

concurrent updates, transaction processing, and automatic write-through on in-

memory changes (shelves can cache objects and flush to disk at close time with

their writeback option, but this has limitations: see other resources). SQL-based

systems like MySQL offer enterprise-level tools for database storage and may be

directly used from a Python script. As we saw in Chapter 9, MongoDB offers an

alternative approach that stores JSON documents, which closely parallel Python

dictionaries and lists, and are language neutral, unlike pickle data.

ORMs

If we do migrate to a relational database system for storage, we don’t have to sac-

rifice Python’s OOP tools. Object-relational mappers (ORMs) like SQLObject and

SQLAlchemy can automatically map relational tables and rows to and from Python

classes and instances, such that we can process the stored data using normal Python

class syntax. This approach provides an alternative to OODBs like shelve and

ZODB and leverages the power of both relational databases and Python’s class

model.

While I hope this introduction whets your appetite for future exploration, all of these

topics are of course far beyond the scope of this tutorial and this book at large. If you

want to explore any of them on your own, see the Web, Python’s standard library

manuals, and application-focused books such as Programming Python. In the latter I

854 | Chapter 28: A More Realistic Example

www.it-ebooks.info

pick up this example where we’ve stopped here, showing how to add both a GUI and

a website on top of the database to allow for browsing and updating instance records.

I hope to see you there eventually, but first, let’s return to class fundamentals and finish

up the rest of the core Python language story.

Chapter Summary

In this chapter, we explored all the fundamentals of Python classes and OOP in action,

by building upon a simple but real example, step by step. We added constructors,

methods, operator overloading, customization with subclasses, and introspection-

based tools, and we met other concepts such as composition, delegation, and poly-

morphism along the way.

In the end, we took objects created by our classes and made them persistent by storing

them on a shelve object database—an easy-to-use system for saving and retrieving na-

tive Python objects by key. While exploring class basics, we also encountered multiple

ways to factor our code to reduce redundancy and minimize future maintenance costs.

Finally, we briefly previewed ways to extend our code with application-programming

tools such as GUIs and databases, covered in follow-up books.

In the next chapters of this part of the book, we’ll return to our study of the details

behind Python’s class model and investigate its application to some of the design con-

cepts used to combine classes in larger programs. Before we move ahead, though, let’s

work through this chapter’s quiz to review what we covered here. Since we’ve already

done a lot of hands-on work in this chapter, we’ll close with a set of mostly theory-

oriented questions designed to make you trace through some of the code and ponder

some of the bigger ideas behind it.

Test Your Knowledge: Quiz

1. When we fetch a Manager object from the shelve and print it, where does the display

format logic come from?

2. When we fetch a Person object from a shelve without importing its module, how

does the object know that it has a giveRaise method that we can call?

3. Why is it so important to move processing into methods, instead of hardcoding it

outside the class?

4. Why is it better to customize by subclassing rather than copying the original and

modifying?

5. Why is it better to call back to a superclass method to run default actions, instead

of copying and modifying its code in a subclass?

6. Why is it better to use tools like __dict__ that allow objects to be processed ge-

nerically than to write more custom code for each type of class?

Test Your Knowledge: Quiz | 855

www.it-ebooks.info

7. In general terms, when might you choose to use object embedding and composition

instead of inheritance?

8. What would you have to change if the objects coded in this chapter used a dictio-

nary for names and a list for jobs, as in similar examples earlier in this book?

9. How might you modify the classes in this chapter to implement a personal contacts

database in Python?

Test Your Knowledge: Answers

1. In the final version of our classes, Manager ultimately inherits its __repr__ printing

method from AttrDisplay in the separate classtools module and two levels up in

the class tree. Manager doesn’t have one itself, so the inheritance search climbs to

its Person superclass; because there is no __repr__ there either, the search climbs

higher and finds it in AttrDisplay. The class names listed in parentheses in a

class statement’s header line provide the links to higher superclasses.

2. Shelves (really, the pickle module they use) automatically relink an instance to the

class it was created from when that instance is later loaded back into memory.

Python reimports the class from its module internally, creates an instance with its

stored attributes, and sets the instance’s __class__ link to point to its original class.

This way, loaded instances automatically obtain all their original methods (like

lastName, giveRaise, and __repr__), even if we have not imported the instance’s

class into our scope.

3. It’s important to move processing into methods so that there is only one copy to

change in the future, and so that the methods can be run on any instance. This is

Python’s notion of encapsulation—wrapping up logic behind interfaces, to better

support future code maintenance. If you don’t do so, you create code redundancy

that can multiply your work effort as the code evolves in the future.

4. Customizing with subclasses reduces development effort. In OOP, we code by

customizing what has already been done, rather than copying or changing existing

code. This is the real “big idea” in OOP—because we can easily extend our prior

work by coding new subclasses, we can leverage what we’ve already done. This is

much better than either starting from scratch each time, or introducing multiple

redundant copies of code that may all have to be updated in the future.

5. Copying and modifying code doubles your potential work effort in the future, re-

gardless of the context. If a subclass needs to perform default actions coded in a

superclass method, it’s much better to call back to the original through the super-

class’s name than to copy its code. This also holds true for superclass constructors.

Again, copying code creates redundancy, which is a major issue as code evolves.

6. Generic tools can avoid hardcoded solutions that must be kept in sync with the

rest of the class as it evolves over time. A generic __repr__ print method, for ex-

ample, need not be updated each time a new attribute is added to instances in an

856 | Chapter 28: A More Realistic Example

www.it-ebooks.info

__init__ constructor. In addition, a generic print method inherited by all classes

appears and need be modified in only one place—changes in the generic version

are picked up by all classes that inherit from the generic class. Again, eliminating

code redundancy cuts future development effort; that’s one of the primary assets

classes bring to the table.

7. Inheritance is best at coding extensions based on direct customization (like our

Manager specialization of Person). Composition is well suited to scenarios where

multiple objects are aggregated into a whole and directed by a controller layer class.

Inheritance passes calls up to reuse, and composition passes down to delegate.

Inheritance and composition are not mutually exclusive; often, the objects em-

bedded in a controller are themselves customizations based upon inheritance.

8. Not much since this was really a first-cut prototype, but the lastName method

would need to be updated for the new name format; the Person constructor would

have change the job default to an empty list; and the Manager class would probably

need to pass along a job list in its constructor instead of a single string (self-test

code would change as well, of course). The good news is that these changes would

need to be made in just one place—in our classes, where such details are encap-

sulated. The database scripts should work as is, as shelves support arbitrarily nes-

ted data.

9. The classes in this chapter could be used as boilerplate “template” code to imple-

ment a variety of types of databases. Essentially, you can repurpose them by mod-

ifying the constructors to record different attributes and providing whatever meth-

ods are appropriate for the target application. For instance, you might use at-

tributes such as name, address, birthday, phone, email, and so on for a contacts

database, and methods appropriate for this purpose. A method named sendmail,

for example, might use Python’s standard library smptlib module to send an email

to one of the contacts automatically when called (see Python’s manuals or appli-

cation-level books for more details on such tools). The AttrDisplay tool we wrote

here could be used verbatim to print your objects, because it is intentionally

generic. Most of the shelve database code here can be used to store your objects,

too, with minor changes.

Test Your Knowledge: Answers | 857

www.it-ebooks.info

CHAPTER 29

Class Coding Details

If you haven’t quite gotten all of Python OOP yet, don’t worry; now that we’ve had a

first tour, we’re going to dig a bit deeper and study the concepts introduced earlier in

further detail. In this and the following chapter, we’ll take another look at class me-

chanics. Here, we’re going to study classes, methods, and inheritance, formalizing and

expanding on some of the coding ideas introduced in Chapter 27. Because the class is

our last namespace tool, we’ll summarize Python’s namespace and scope concepts as

well.

The next chapter continues this in-depth second pass over class mechanics by covering

one specific aspect: operator overloading. Besides presenting additional details, this

chapter and the next also give us an opportunity to explore some larger classes than

those we have studied so far.

Content note: if you’ve been reading linearly, some of this chapter will be review and

summary of topics introduced in the preceding chapter’s case study, revisited here by

language topics with smaller and more self-contained examples for readers new to

OOP. Others may be tempted to skip some of this chapter, but be sure to see the

namespace coverage here, as it explains some subtleties in Python’s class model.

The class Statement

Although the Python class statement may seem similar to tools in other OOP languages

on the surface, on closer inspection, it is quite different from what some programmers

are used to. For example, as in C++, the class statement is Python’s main OOP tool,

but unlike in C++, Python’s class is not a declaration. Like a def, a class statement is

an object builder, and an implicit assignment—when run, it generates a class object

and stores a reference to it in the name used in the header. Also like a def, a class

statement is true executable code—your class doesn’t exist until Python reaches and

runs the class statement that defines it. This typically occurs while importing the

module it is coded in, but not before.

859

www.it-ebooks.info

General Form

class is a compound statement, with a body of statements typically indented appearing

under the header. In the header, superclasses are listed in parentheses after the class

name, separated by commas. Listing more than one superclass leads to multiple in-

heritance, which we’ll discuss more formally in Chapter 31. Here is the statement’s

general form:

class name(superclass,...): # Assign to name

attr = value # Shared class data

def method(self,...): # Methods

self.attr = value # Per-instance data

Within the class statement, any assignments generate class attributes, and specially

named methods overload operators; for instance, a function called __init__ is called

at instance object construction time, if defined.

Example

As we’ve seen, classes are mostly just namespaces—that is, tools for defining names

(i.e., attributes) that export data and logic to clients. A class statement effectively de-

fines a namespace. Just as in a module file, the statements nested in a class statement

body create its attributes. When Python executes a class statement (not a call to a

class), it runs all the statements in its body, from top to bottom. Assignments that

happen during this process create names in the class’s local scope, which become at-

tributes in the associated class object. Because of this, classes resemble both modules

and functions:

• Like functions, class statements are local scopes where names created by nested

assignments live.

• Like names in a module, names assigned in a class statement become attributes

in a class object.

The main distinction for classes is that their namespaces are also the basis of inheri-

tance in Python; reference attributes that are not found in a class or instance object are

fetched from other classes.

Because class is a compound statement, any sort of statement can be nested inside its

body—print, assignments, if, def, and so on. All the statements inside the class state-

ment run when the class statement itself runs (not when the class is later called to make

an instance). Typically, assignment statements inside the class statement make data

attributes, and nested defs make method attributes. In general, though, any type of

name assignment at the top level of a class statement creates a same-named attribute

of the resulting class object.

For example, assignments of simple nonfunction objects to class attributes produce

data attributes, shared by all instances:

860 | Chapter 29: Class Coding Details

www.it-ebooks.info

>>> class SharedData:

spam = 42 # Generates a class data attribute

>>> x = SharedData() # Make two instances

>>> y = SharedData()

>>> x.spam, y.spam # They inherit and share 'spam' (a.k.a. SharedData.spam)

(42, 42)

Here, because the name spam is assigned at the top level of a class statement, it is

attached to the class and so will be shared by all instances. We can change it by going

through the class name, and we can refer to it through either instances or the class:1

>>> SharedData.spam = 99

>>> x.spam, y.spam, SharedData.spam

(99, 99, 99)

Such class attributes can be used to manage information that spans all the instances—

a counter of the number of instances generated, for example (we’ll expand on this idea

by example in Chapter 32). Now, watch what happens if we assign the name spam

through an instance instead of the class:

>>> x.spam = 88

>>> x.spam, y.spam, SharedData.spam

(88, 99, 99)

Assignments to instance attributes create or change the names in the instance, rather

than in the shared class. More generally, inheritance searches occur only on attribute

references, not on assignment: assigning to an object’s attribute always changes that

object, and no other.2 For example, y.spam is looked up in the class by inheritance, but

the assignment to x.spam attaches a name to x itself.

Here’s a more comprehensive example of this behavior that stores the same name in

two places. Suppose we run the following class:

class MixedNames: # Define class

data = 'spam' # Assign class attr

def __init__(self, value): # Assign method name

self.data = value # Assign instance attr

def display(self):

print(self.data, MixedNames.data) # Instance attr, class attr

1. If you’ve used C++ you may recognize this as similar to the notion of C++’s “static” data members—

members that are stored in the class, independent of instances. In Python, it’s nothing special: all class

attributes are just names assigned in the class statement, whether they happen to reference functions

(C++’s “methods”) or something else (C++’s “members”). In Chapter 32, we’ll also meet Python static

methods (akin to those in C++), which are just self-less functions that usually process class attributes.

2. Unless the class has redefined the attribute assignment operation to do something unique with the

__setattr__ operator overloading method (discussed in Chapter 30), or uses advanced attribute tools

such as properties and descriptors (discussed in Chapter 32 and Chapter 38). Much of this chapter presents

the normal case, which suffices at this point in the book, but as we’ll see later, Python hooks allow

programs to deviate from the norm often.

The class Statement | 861

www.it-ebooks.info

This class contains two defs, which bind class attributes to method functions. It also

contains an = assignment statement; because this assignment assigns the name data

inside the class, it lives in the class’s local scope and becomes an attribute of the class

object. Like all class attributes, this data is inherited and shared by all instances of the

class that don’t have data attributes of their own.

When we make instances of this class, the name data is attached to those instances by

the assignment to self.data in the constructor method:

>>> x = MixedNames(1) # Make two instance objects

>>> y = MixedNames(2) # Each has its own data

>>> x.display(); y.display() # self.data differs, MixedNames.data is the same

1 spam

2 spam

The net result is that data lives in two places: in the instance objects (created by the

self.data assignment in __init__), and in the class from which they inherit names

(created by the data assignment in the class). The class’s display method prints both

versions, by first qualifying the self instance, and then the class.

By using these techniques to store attributes in different objects, we determine their

scope of visibility. When attached to classes, names are shared; in instances, names

record per-instance data, not shared behavior or data. Although inheritance searches

look up names for us, we can always get to an attribute anywhere in a tree by accessing

the desired object directly.

In the preceding example, for instance, specifying x.data or self.data will return an

instance name, which normally hides the same name in the class; however, Mixed

Names.data grabs the class’s version of the name explicitly. The next section describes

one of the most common roles for such coding patterns, and explains more about the

way we deployed it in the prior chapter.

Methods

Because you already know about functions, you also know about methods in classes.

Methods are just function objects created by def statements nested in a class state-

ment’s body. From an abstract perspective, methods provide behavior for instance

objects to inherit. From a programming perspective, methods work in exactly the same

way as simple functions, with one crucial exception: a method’s first argument always

receives the instance object that is the implied subject of the method call.

In other words, Python automatically maps instance method calls to a class’s method

functions as follows. Method calls made through an instance, like this:

instance.method(args...)

are automatically translated to class method function calls of this form:

class.method(instance, args...)

862 | Chapter 29: Class Coding Details

www.it-ebooks.info

where Python determines the class by locating the method name using the inheritance

search procedure. In fact, both call forms are valid in Python.

Besides the normal inheritance of method attribute names, the special first argument

is the only real magic behind method calls. In a class’s method, the first argument is

usually called self by convention (technically, only its position is significant, not its

name). This argument provides methods with a hook back to the instance that is the

subject of the call—because classes generate many instance objects, they need to use

this argument to manage data that varies per instance.

C++ programmers may recognize Python’s self argument as being similar to C++’s

this pointer. In Python, though, self is always explicit in your code: methods must

always go through self to fetch or change attributes of the instance being processed

by the current method call. This explicit nature of self is by design—the presence of

this name makes it obvious that you are using instance attribute names in your script,

not names in the local or global scope.

Method Example

To clarify these concepts, let’s turn to an example. Suppose we define the following

class:

class NextClass: # Define class

def printer(self, text): # Define method

self.message = text # Change instance

print(self.message) # Access instance

The name printer references a function object; because it’s assigned in the class state-

ment’s scope, it becomes a class object attribute and is inherited by every instance made

from the class. Normally, because methods like printer are designed to process in-

stances, we call them through instances:

>>> x = NextClass() # Make instance

>>> x.printer('instance call') # Call its method

instance call

>>> x.message # Instance changed

'instance call'

When we call the method by qualifying an instance like this, printer is first located by

inheritance, and then its self argument is automatically assigned the instance object

(x); the text argument gets the string passed at the call ('instance call'). Notice that

because Python automatically passes the first argument to self for us, we only actually

have to pass in one argument. Inside printer, the name self is used to access or set

per-instance data because it refers back to the instance currently being processed.

As we’ve seen, though, methods may be called in one of two ways—through an in-

stance, or through the class itself. For example, we can also call printer by going

through the class name, provided we pass an instance to the self argument explicitly:

Methods | 863

www.it-ebooks.info

>>> NextClass.printer(x, 'class call') # Direct class call

class call

>>> x.message # Instance changed again

'class call'

Calls routed through the instance and the class have the exact same effect, as long as

we pass the same instance object ourselves in the class form. By default, in fact, you get

an error message if you try to call a method without any instance:

>>> NextClass.printer('bad call')

TypeError: unbound method printer() must be called with NextClass instance...

Calling Superclass Constructors

Methods are normally called through instances. Calls to methods through a class,

though, do show up in a variety of special roles. One common scenario involves the

constructor method. The __init__ method, like all attributes, is looked up by inheri-

tance. This means that at construction time, Python locates and calls just one

__init__. If subclass constructors need to guarantee that superclass construction-time

logic runs, too, they generally must call the superclass’s __init__ method explicitly

through the class:

class Super:

def __init__(self, x):

...default code...

class Sub(Super):

def __init__(self, x, y):

Super.__init__(self, x) # Run superclass __init__

...custom code... # Do my init actions

I = Sub(1, 2)

This is one of the few contexts in which your code is likely to call an operator over-

loading method directly. Naturally, you should call the superclass constructor this way

only if you really want it to run—without the call, the subclass replaces it completely.

For a more realistic illustration of this technique in action, see the Manager class example

in the prior chapter’s tutorial.3

Other Method Call Possibilities

This pattern of calling methods through a class is the general basis of extending—

instead of completely replacing—inherited method behavior. It requires an explicit

instance to be passed because all methods do by default. Technically, this is because

methods are instance methods in the absence of any special code.

3. On a related note, you can also code multiple __init__ methods within the same class, but only the last

definition will be used; see Chapter 31 for more details on multiple method definitions.

864 | Chapter 29: Class Coding Details

www.it-ebooks.info

In Chapter 32, we’ll also meet a newer option added in Python 2.2, static methods, that

allow you to code methods that do not expect instance objects in their first arguments.

Such methods can act like simple instanceless functions, with names that are local to

the classes in which they are coded, and may be used to manage class data. A related

concept we’ll meet in the same chapter, the class method, receives a class when called

instead of an instance and can be used to manage per-class data, and is implied in

metaclasses.

These are both advanced and usually optional extensions, though. Normally, an in-

stance must always be passed to a method—whether automatically when it is called

through an instance, or manually when you call through a class.

Per the sidebar “What About super?” on page 831 in Chapter 28, Python

also has a super built-in function that allows calling back to a super-

class’s methods more generically, but we’ll defer its presentation until

Chapter 32 due to its downsides and complexities. See the aforemen-

tioned sidebar for more details; this call has well-known tradeoffs in

basic usage, and an esoteric advanced use case that requires universal

deployment to be most effective. Because of these issues, this book pre-

fers to call superclasses by explicit name instead of super as a policy; if

you’re new to Python, I recommend the same approach for now, espe-

cially for your first pass over OOP. Learn the simple way now, so you

can compare it to others later.

Inheritance

Of course, the whole point of the namespace created by the class statement is to sup-

port name inheritance. This section expands on some of the mechanisms and roles of

attribute inheritance in Python.

As we’ve seen, in Python, inheritance happens when an object is qualified, and it in-

volves searching an attribute definition tree—one or more namespaces. Every time you

use an expression of the form object.attr where object is an instance or class object,

Python searches the namespace tree from bottom to top, beginning with object, looking

for the first attr it can find. This includes references to self attributes in your methods.

Because lower definitions in the tree override higher ones, inheritance forms the basis

of specialization.

Attribute Tree Construction

Figure 29-1 summarizes the way namespace trees are constructed and populated with

names. Generally:

• Instance attributes are generated by assignments to self attributes in methods.

• Class attributes are created by statements (assignments) in class statements.

Inheritance | 865

www.it-ebooks.info

• Superclass links are made by listing classes in parentheses in a class statement

header.

The net result is a tree of attribute namespaces that leads from an instance, to the class

it was generated from, to all the superclasses listed in the class header. Python searches

upward in this tree, from instances to superclasses, each time you use qualification to

fetch an attribute name from an instance object.4

Figure 29-1. Program code creates a tree of objects in memory to be searched by attribute inheritance.

Calling a class creates a new instance that remembers its class, running a class statement creates a

new class, and superclasses are listed in parentheses in the class statement header. Each attribute

reference triggers a new bottom-up tree search—even references to self attributes within a class’s

methods.

Specializing Inherited Methods

The tree-searching model of inheritance just described turns out to be a great way to

specialize systems. Because inheritance finds names in subclasses before it checks su-

perclasses, subclasses can replace default behavior by redefining their superclasses’

4. Two fine points here: first, this description isn’t 100% complete, because we can also create instance and

class attributes by assigning them to objects outside class statements—but that’s a much less common

and sometimes more error-prone approach (changes aren’t isolated to class statements). In Python, all

attributes are always accessible by default. We’ll talk more about attribute name privacy in Chapter 30

when we study __setattr__, in Chapter 31 when we meet __X names, and again in Chapter 39, where

we’ll implement it with a class decorator.

Second, as also noted in Chapter 27, the full inheritance story grows more convoluted when advanced

topics such as metaclasses and descriptors are added to the mix—and we’re deferring a formal definition

until Chapter 40 for this reason. In common usage, though, it’s simply a way to redefine, and hence

customize, behavior coded in classes.

866 | Chapter 29: Class Coding Details

www.it-ebooks.info

attributes. In fact, you can build entire systems as hierarchies of classes, which you

extend by adding new external subclasses rather than changing existing logic in place.

The idea of redefining inherited names leads to a variety of specialization techniques.

For instance, subclasses may replace inherited attributes completely, provide attributes

that a superclass expects to find, and extend superclass methods by calling back to the

superclass from an overridden method. We’ve already seen some of these patterns in

action; here’s a self-contained example of extension at work:

>>> class Super:

def method(self):

print('in Super.method')

>>> class Sub(Super):

def method(self): # Override method

print('starting Sub.method') # Add actions here

Super.method(self) # Run default action

print('ending Sub.method')

Direct superclass method calls are the crux of the matter here. The Sub class replaces

Super’s method function with its own specialized version, but within the replacement,

Sub calls back to the version exported by Super to carry out the default behavior. In

other words, Sub.method just extends Super.method’s behavior, rather than replacing it

completely:

>>> x = Super() # Make a Super instance

>>> x.method() # Runs Super.method

in Super.method

>>> x = Sub() # Make a Sub instance

>>> x.method() # Runs Sub.method, calls Super.method

starting Sub.method

in Super.method

ending Sub.method

This extension coding pattern is also commonly used with constructors; see the section

“Methods” on page 862 for an example.

Class Interface Techniques

Extension is only one way to interface with a superclass. The file shown in this section,

specialize.py, defines multiple classes that illustrate a variety of common techniques:

Super

Defines a method function and a delegate that expects an action in a subclass.

Inheritor

Doesn’t provide any new names, so it gets everything defined in Super.

Replacer

Overrides Super’s method with a version of its own.

Inheritance | 867

www.it-ebooks.info

Extender

Customizes Super’s method by overriding and calling back to run the default.

Provider

Implements the action method expected by Super’s delegate method.

Study each of these subclasses to get a feel for the various ways they customize their

common superclass. Here’s the file:

class Super:

def method(self):

print('in Super.method') # Default behavior

def delegate(self):

self.action() # Expected to be defined

class Inheritor(Super): # Inherit method verbatim

pass

class Replacer(Super): # Replace method completely

def method(self):

print('in Replacer.method')

class Extender(Super): # Extend method behavior

def method(self):

print('starting Extender.method')

Super.method(self)

print('ending Extender.method')

class Provider(Super): # Fill in a required method

def action(self):

print('in Provider.action')

if __name__ == '__main__':

for klass in (Inheritor, Replacer, Extender):

print('\n' + klass.__name__ + '...')

klass().method()

print('\nProvider...')

x = Provider()

x.delegate()

A few things are worth pointing out here. First, notice how the self-test code at the end

of this example creates instances of three different classes in a for loop. Because classes

are objects, you can store them in a tuple and create instances generically with no extra

syntax (more on this idea later). Classes also have the special __name__ attribute, like

modules; it’s preset to a string containing the name in the class header. Here’s what

happens when we run the file:

% python specialize.py

Inheritor...

in Super.method

Replacer...

in Replacer.method

868 | Chapter 29: Class Coding Details

www.it-ebooks.info

Extender...

starting Extender.method

in Super.method

ending Extender.method

Provider...

in Provider.action

Abstract Superclasses

Of the prior example’s classes, Provider may be the most crucial to understand. When

we call the delegate method through a Provider instance, two independent inheritance

searches occur:

1. On the initial x.delegate call, Python finds the delegate method in Super by

searching the Provider instance and above. The instance x is passed into the

method’s self argument as usual.

2. Inside the Super.delegate method, self.action invokes a new, independent in-

heritance search of self and above. Because self references a Provider instance,

the action method is located in the Provider subclass.

This “filling in the blanks” sort of coding structure is typical of OOP frameworks. In a

more realistic context, the method filled in this way might handle an event in a GUI,

provide data to be rendered as part of a web page, process a tag’s text in an XML file,

and so on—your subclass provides specific actions, but the framework handles the rest

of the overall job.

At least in terms of the delegate method, the superclass in this example is what is

sometimes called an abstract superclass—a class that expects parts of its behavior to be

provided by its subclasses. If an expected method is not defined in a subclass, Python

raises an undefined name exception when the inheritance search fails.

Class coders sometimes make such subclass requirements more obvious with assert

statements, or by raising the built-in NotImplementedError exception with raise state-

ments. We’ll study statements that may trigger exceptions in depth in the next part of

this book; as a quick preview, here’s the assert scheme in action:

class Super:

def delegate(self):

self.action()

def action(self):

assert False, 'action must be defined!' # If this version is called

>>> X = Super()

>>> X.delegate()

AssertionError: action must be defined!

We’ll meet assert in Chapter 33 and Chapter 34; in short, if its first expression evaluates

to false, it raises an exception with the provided error message. Here, the expression is

Inheritance | 869

www.it-ebooks.info

always false so as to trigger an error message if a method is not redefined, and inheri-

tance locates the version here. Alternatively, some classes simply raise a NotImplemen

tedError exception directly in such method stubs to signal the mistake:

class Super:

def delegate(self):

self.action()

def action(self):

raise NotImplementedError('action must be defined!')

>>> X = Super()

>>> X.delegate()

NotImplementedError: action must be defined!

For instances of subclasses, we still get the exception unless the subclass provides the

expected method to replace the default in the superclass:

>>> class Sub(Super): pass

>>> X = Sub()

>>> X.delegate()

NotImplementedError: action must be defined!

>>> class Sub(Super):

def action(self): print('spam')

>>> X = Sub()

>>> X.delegate()

spam

For a somewhat more realistic example of this section’s concepts in action, see the “Zoo

animal hierarchy” exercise (Exercise 8) at the end of Chapter 32, and its solution in

“Part VI, Classes and OOP” in Appendix D. Such taxonomies are a traditional way to

introduce OOP, but they’re a bit removed from most developers’ job descriptions (with

apologies to any readers who happen to work at the zoo!).

Abstract superclasses in Python 3.X and 2.6+: Preview

As of Python 2.6 and 3.0, the prior section’s abstract superclasses (a.k.a. “abstract base

classes”), which require methods to be filled in by subclasses, may also be implemented

with special class syntax. The way we code this varies slightly depending on the version.

In Python 3.X, we use a keyword argument in a class header, along with special @

decorator syntax, both of which we’ll study in detail later in this book:

from abc import ABCMeta, abstractmethod

class Super(metaclass=ABCMeta):

@abstractmethod

def method(self, ...):

pass

But in Python 2.6 and 2.7, we use a class attribute instead:

870 | Chapter 29: Class Coding Details

www.it-ebooks.info

class Super:

__metaclass__ = ABCMeta

@abstractmethod

def method(self, ...):

pass

Either way, the effect is the same—we can’t make an instance unless the method is

defined lower in the class tree. In 3.X, for example, here is the special syntax equivalent

of the prior section’s example:

>>> from abc import ABCMeta, abstractmethod

>>>

>>> class Super(metaclass=ABCMeta):

def delegate(self):

self.action()

@abstractmethod

def action(self):

pass

>>> X = Super()

TypeError: Can't instantiate abstract class Super with abstract methods action

>>> class Sub(Super): pass

>>> X = Sub()

TypeError: Can't instantiate abstract class Sub with abstract methods action

>>> class Sub(Super):

def action(self): print('spam')

>>> X = Sub()

>>> X.delegate()

spam

Coded this way, a class with an abstract method cannot be instantiated (that is, we

cannot create an instance by calling it) unless all of its abstract methods have been

defined in subclasses. Although this requires more code and extra knowledge, the po-

tential advantage of this approach is that errors for missing methods are issued when

we attempt to make an instance of the class, not later when we try to call a missing

method. This feature may also be used to define an expected interface, automatically

verified in client classes.

Unfortunately, this scheme also relies on two advanced language tools we have not met

yet—function decorators, introduced in Chapter 32 and covered in depth in Chap-

ter 39, as well as metaclass declarations, mentioned in Chapter 32 and covered in

Chapter 40—so we will finesse other facets of this option here. See Python’s standard

manuals for more on this, as well as precoded abstract superclasses Python provides.

Inheritance | 871

www.it-ebooks.info

Namespaces: The Conclusion

Now that we’ve examined class and instance objects, the Python namespace story is

complete. For reference, I’ll quickly summarize all the rules used to resolve names here.

The first things you need to remember are that qualified and unqualified names are

treated differently, and that some scopes serve to initialize object namespaces:

• Unqualified names (e.g., X) deal with scopes.

• Qualified attribute names (e.g., object.X) use object namespaces.

• Some scopes initialize object namespaces (for modules and classes).

These concepts sometimes interact—in object.X, for example, object is looked up per

scopes, and then X is looked up in the result objects. Since scopes and namespaces are

essential to understanding Python code, let’s summarize the rules in more detail.

Simple Names: Global Unless Assigned

As we’ve learned, unqualified simple names follow the LEGB lexical scoping rule out-

lined when we explored functions in Chapter 17:

Assignment (X = value)

Makes names local by default: creates or changes the name X in the current local

scope, unless declared global (or nonlocal in 3.X).

Reference (X)

Looks for the name X in the current local scope, then any and all enclosing func-

tions, then the current global scope, then the built-in scope, per the LEGB rule.

Enclosing classes are not searched: class names are fetched as object attributes

instead.

Also per Chapter 17, some special-case constructs localize names further (e.g., variables

in some comprehensions and try statement clauses), but the vast majority of names

follow the LEGB rule.

Attribute Names: Object Namespaces

We’ve also seen that qualified attribute names refer to attributes of specific objects and

obey the rules for modules and classes. For class and instance objects, the reference

rules are augmented to include the inheritance search procedure:

Assignment (object.X = value)

Creates or alters the attribute name X in the namespace of the object being quali-

fied, and none other. Inheritance-tree climbing happens only on attribute refer-

ence, not on attribute assignment.

872 | Chapter 29: Class Coding Details

www.it-ebooks.info

Reference (object.X)

For class-based objects, searches for the attribute name X in object, then in all

accessible classes above it, using the inheritance search procedure. For nonclass

objects such as modules, fetches X from object directly.

As noted earlier, the preceding captures the normal and typical case. These attribute

rules can vary in classes that utilize more advanced tools, especially for new-style classes

—an option in 2.X and the standard in 3.X, which we’ll explore in Chapter 32. For

example, reference inheritance can be richer than implied here when metaclasses are

deployed, and classes which leverage attribute management tools such as properties,

descriptors, and __setattr__ can intercept and route attribute assignments arbitrarily.

In fact, some inheritance is run on assignment too, to locate descriptors with a

__set__ method in new-style classes; such tools override the normal rules for both

reference and assignment. We’ll explore attribute management tools in depth in Chap-

ter 38, and formalize inheritance and its use of descriptors in Chapter 40. For now,

most readers should focus on the normal rules given here, which cover most Python

application code.

The “Zen” of Namespaces: Assignments Classify Names

With distinct search procedures for qualified and unqualified names, and multiple

lookup layers for both, it can sometimes be difficult to tell where a name will wind up

going. In Python, the place where you assign a name is crucial—it fully determines the

scope or object in which a name will reside. The file manynames.py illustrates how this

principle translates to code and summarizes the namespace ideas we have seen through-

out this book (sans obscure special-case scopes like comprehensions):

# File manynames.py

X = 11 # Global (module) name/attribute (X, or manynames.X)

def f():

print(X) # Access global X (11)

def g():

X = 22 # Local (function) variable (X, hides module X)

print(X)

class C:

X = 33 # Class attribute (C.X)

def m(self):

X = 44 # Local variable in method (X)

self.X = 55 # Instance attribute (instance.X)

This file assigns the same name, X, five times—illustrative, though not exactly best

practice! Because this name is assigned in five different locations, though, all five Xs in

this program are completely different variables. From top to bottom, the assignments

to X here generate: a module attribute (11), a local variable in a function (22), a class

Namespaces: The Conclusion | 873

www.it-ebooks.info

attribute (33), a local variable in a method (44), and an instance attribute (55). Although

all five are named X, the fact that they are all assigned at different places in the source

code or to different objects makes all of these unique variables.

You should take the time to study this example carefully because it collects ideas we’ve

been exploring throughout the last few parts of this book. When it makes sense to you,

you will have achieved Python namespace enlightenment. Or, you can run the code

and see what happens—here’s the remainder of this source file, which makes an in-

stance and prints all the Xs that it can fetch:

# manynames.py, continued

if __name__ == '__main__':

print(X) # 11: module (a.k.a. manynames.X outside file)

f() # 11: global

g() # 22: local

print(X) # 11: module name unchanged

obj = C() # Make instance

print(obj.X) # 33: class name inherited by instance

obj.m() # Attach attribute name X to instance now

print(obj.X) # 55: instance

print(C.X) # 33: class (a.k.a. obj.X if no X in instance)

#print(C.m.X) # FAILS: only visible in method

#print(g.X) # FAILS: only visible in function

The outputs that are printed when the file is run are noted in the comments in the code;

trace through them to see which variable named X is being accessed each time. Notice

in particular that we can go through the class to fetch its attribute (C.X), but we can

never fetch local variables in functions or methods from outside their def statements.

Locals are visible only to other code within the def, and in fact only live in memory

while a call to the function or method is executing.

Some of the names defined by this file are visible outside the file to other modules too,

but recall that we must always import before we can access names in another file—

name segregation is the main point of modules, after all:

# otherfile.py

import manynames

X = 66

print(X) # 66: the global here

print(manynames.X) # 11: globals become attributes after imports

manynames.f() # 11: manynames's X, not the one here!

manynames.g() # 22: local in other file's function

print(manynames.C.X) # 33: attribute of class in other module

I = manynames.C()

print(I.X) # 33: still from class here

874 | Chapter 29: Class Coding Details

www.it-ebooks.info

I.m()

print(I.X) # 55: now from instance!

Notice here how manynames.f() prints the X in manynames, not the X assigned in this file

—scopes are always determined by the position of assignments in your source code

(i.e., lexically) and are never influenced by what imports what or who imports whom.

Also, notice that the instance’s own X is not created until we call I.m()—attributes, like

all variables, spring into existence when assigned, and not before. Normally we create

instance attributes by assigning them in class __init__ constructor methods, but this

isn’t the only option.

Finally, as we learned in Chapter 17, it’s also possible for a function to change names

outside itself, with global and (in Python 3.X) nonlocal statements—these statements

provide write access, but also modify assignment’s namespace binding rules:

X = 11 # Global in module

def g1():

print(X) # Reference global in module (11)

def g2():

global X

X = 22 # Change global in module

def h1():

X = 33 # Local in function

def nested():

print(X) # Reference local in enclosing scope (33)

def h2():

X = 33 # Local in function

def nested():

nonlocal X # Python 3.X statement

X = 44 # Change local in enclosing scope

Of course, you generally shouldn’t use the same name for every variable in your script

—but as this example demonstrates, even if you do, Python’s namespaces will work to

keep names used in one context from accidentally clashing with those used in another.

Nested Classes: The LEGB Scopes Rule Revisited

The preceding example summarized the effect of nested functions on scopes, which we

studied in Chapter 17. It turns out that classes can be nested too—a useful coding

pattern in some types of programs, with scope implications that follow naturally from

what you already know, but that may not be obvious on first encounter. This section

illustrates the concept by example.

Though they are normally coded at the top level of a module, classes also sometimes

appear nested in functions that generate them—a variation on the “factory function”

(a.k.a. closure) theme in Chapter 17, with similar state retention roles. There we noted

Namespaces: The Conclusion | 875

www.it-ebooks.info

that class statements introduce new local scopes much like function def statements,

which follow the same LEGB scope lookup rule as function definitions.

This rule applies both to the top level of the class itself, as well as to the top level of

method functions nested within it. Both form the L layer in this rule—they are normal

local scopes, with access to their names, names in any enclosing functions, globals in

the enclosing module, and built-ins. Like modules, the class’s local scope morphs into

an attribute namespace after the class statement is run.

Although classes have access to enclosing functions’ scopes, though, they do not act

as enclosing scopes to code nested within the class: Python searches enclosing functions

for referenced names, but never any enclosing classes. That is, a class is a local scope

and has access to enclosing local scopes, but it does not serve as an enclosing local scope

to further nested code. Because the search for names used in method functions skips

the enclosing class, class attributes must be fetched as object attributes using inheri-

tance.

For example, in the following nester function, all references to X are routed to the global

scope except the last, which picks up a local scope redefinition (the section’s code is in

file classscope.py, and the output of each example is described in its last two comments):

X = 1

def nester():

print(X) # Global: 1

class C:

print(X) # Global: 1

def method1(self):

print(X) # Global: 1

def method2(self):

X = 3 # Hides global

print(X) # Local: 3

I = C()

I.method1()

I.method2()

print(X) # Global: 1

nester() # Rest: 1, 1, 1, 3

print('-'*40)

Watch what happens, though, when we reassign the same name in nested function

layers: the redefinitions of X create locals that hide those in enclosing scopes, just as for

simple nested functions; the enclosing class layer does not change this rule, and in fact

is irrelevant to it:

X = 1

def nester():

X = 2 # Hides global

print(X) # Local: 2

class C:

print(X) # In enclosing def (nester): 2

876 | Chapter 29: Class Coding Details

www.it-ebooks.info

def method1(self):

print(X) # In enclosing def (nester): 2

def method2(self):

X = 3 # Hides enclosing (nester)

print(X) # Local: 3

I = C()

I.method1()

I.method2()

print(X) # Global: 1

nester() # Rest: 2, 2, 2, 3

print('-'*40)

And here’s what happens when we reassign the same name at multiple stops along the

way: assignments in the local scopes of both functions and classes hide globals or en-

closing function locals of the same name, regardless of the nesting involved:

X = 1

def nester():

X = 2 # Hides global

print(X) # Local: 2

class C:

X = 3 # Class local hides nester's: C.X or I.X (not scoped)

print(X) # Local: 3

def method1(self):

print(X) # In enclosing def (not 3 in class!): 2

print(self.X) # Inherited class local: 3

def method2(self):

X = 4 # Hides enclosing (nester, not class)

print(X) # Local: 4

self.X = 5 # Hides class

print(self.X) # Located in instance: 5

I = C()

I.method1()

I.method2()

print(X) # Global: 1

nester() # Rest: 2, 3, 2, 3, 4, 5

print('-'*40)

Most importantly, the lookup rules for simple names like X never search enclosing

class statements—just defs, modules, and built-ins (it’s the LEGB rule, not CLEGB!).

In method1, for example, X is found in a def outside the enclosing class that has the same

name in its local scope. To get to names assigned in the class (e.g., methods), we must

fetch them as class or instance object attributes, via self.X in this case.

Believe it or not, we’ll see use cases for this nested classes coding pattern later in this

book, especially in some of Chapter 39’s decorators. In this role, the enclosing function

usually both serves as a class factory and provides retained state for later use in the

enclosed class or its methods.

Namespaces: The Conclusion | 877

www.it-ebooks.info

Namespace Dictionaries: Review

In Chapter 23, we learned that module namespaces have a concrete implementation as

dictionaries, exposed with the built-in __dict__ attribute. In Chapter 27 and Chap-

ter 28, we learned that the same holds true for class and instance objects—attribute

qualification is mostly a dictionary indexing operation internally, and attribute inher-

itance is largely a matter of searching linked dictionaries. In fact, within Python, in-

stance and class objects are mostly just dictionaries with links between them. Python

exposes these dictionaries, as well as their links, for use in advanced roles (e.g., for

coding tools).

We put some of these tools to work in the prior chapter, but to summarize and help

you better understand how attributes work internally, let’s work through an interactive

session that traces the way namespace dictionaries grow when classes are involved.

Now that we know more about methods and superclasses, we can also embellish the

coverage here for a better look. First, let’s define a superclass and a subclass with meth-

ods that will store data in their instances:

>>> class Super:

def hello(self):

self.data1 = 'spam'

>>> class Sub(Super):

def hola(self):

self.data2 = 'eggs'

When we make an instance of the subclass, the instance starts out with an empty

namespace dictionary, but it has links back to the class for the inheritance search to

follow. In fact, the inheritance tree is explicitly available in special attributes, which

you can inspect. Instances have a __class__ attribute that links to their class, and classes

have a __bases__ attribute that is a tuple containing links to higher superclasses (I’m

running this on Python 3.3; your name formats, internal attributes, and key orders may

vary):

>>> X = Sub()

>>> X.__dict__ # Instance namespace dict

{}

>>> X.__class__ # Class of instance

>>> Sub.__bases__ # Superclasses of class

(<class '__main__.Super'>,)

>>> Super.__bases__ # () empty tuple in Python 2.X

(<class 'object'>,)

As classes assign to self attributes, they populate the instance objects—that is, at-

tributes wind up in the instances’ attribute namespace dictionaries, not in the classes’.

An instance object’s namespace records data that can vary from instance to instance,

and self is a hook into that namespace:

>>> Y = Sub()

878 | Chapter 29: Class Coding Details

www.it-ebooks.info

>>> X.hello()

>>> X.__dict__

{'data1': 'spam'}

>>> X.hola()

>>> X.__dict__

{'data2': 'eggs', 'data1': 'spam'}

>>> list(Sub.__dict__.keys())

['__qualname__', '__module__', '__doc__', 'hola']

>>> list(Super.__dict__.keys())

['__module__', 'hello', '__dict__', '__qualname__', '__doc__', '__weakref__']

>>> Y.__dict__

{}

Notice the extra underscore names in the class dictionaries; Python sets these auto-

matically, and we can filter them out with the generator expressions we saw in Chap-

ter 27 and Chapter 28 that we won’t repeat here. Most are not used in typical programs,

but there are tools that use some of them (e.g., __doc__ holds the docstrings discussed

in Chapter 15).

Also, observe that Y, a second instance made at the start of this series, still has an empty

namespace dictionary at the end, even though X’s dictionary has been populated by

assignments in methods. Again, each instance has an independent namespace dictio-

nary, which starts out empty and can record completely different attributes than those

recorded by the namespace dictionaries of other instances of the same class.

Because attributes are actually dictionary keys inside Python, there are really two ways

to fetch and assign their values—by qualification, or by key indexing:

>>> X.data1, X.__dict__['data1']

('spam', 'spam')

>>> X.data3 = 'toast'

>>> X.__dict__

{'data2': 'eggs', 'data3': 'toast', 'data1': 'spam'}

>>> X.__dict__['data3'] = 'ham'

>>> X.data3

'ham'

This equivalence applies only to attributes actually attached to the instance, though.

Because attribute fetch qualification also performs an inheritance search, it can access

inherited attributes that namespace dictionary indexing cannot. The inherited attribute

X.hello, for instance, cannot be accessed by X.__dict__['hello'].

Experiment with these special attributes on your own to get a better feel for how name-

spaces actually do their attribute business. Also try running these objects through the

dir function we met in the prior two chapters—dir(X) is similar to

X.__dict__.keys(), but dir sorts its list and includes some inherited and built-in at-

Namespaces: The Conclusion | 879

www.it-ebooks.info

tributes. Even if you will never use these in the kinds of programs you write, seeing that

they are just normal dictionaries can help solidify namespaces in general.

In Chapter 32, we’ll learn also about slots, a somewhat advanced new-

style class feature that stores attributes in instances, but not in their

namespace dictionaries. It’s tempting to treat these as class attributes,

and indeed, they appear in class namespaces where they manage the

per-instance values. As we’ll see, though, slots may prevent a __dict__

from being created in the instance entirely—a potential that generic

tools must sometimes account for by using storage-neutral tools such

as dir and getattr.

Namespace Links: A Tree Climber

The prior section demonstrated the special __class__ and __bases__ instance and class

attributes, without really explaining why you might care about them. In short, these

attributes allow you to inspect inheritance hierarchies within your own code. For ex-

ample, they can be used to display a class tree, as in the following Python 3.X and 2.X

example:

#!python

"""

classtree.py: Climb inheritance trees using namespace links,

displaying higher superclasses with indentation for height

"""

def classtree(cls, indent):

print('.' * indent + cls.__name__) # Print class name here

for supercls in cls.__bases__: # Recur to all superclasses

classtree(supercls, indent+3) # May visit super > once

def instancetree(inst):

print('Tree of %s' % inst) # Show instance

classtree(inst.__class__, 3) # Climb to its class

def selftest():

class A: pass

class B(A): pass

class C(A): pass

class D(B,C): pass

class E: pass

class F(D,E): pass

instancetree(B())

instancetree(F())

if __name__ == '__main__': selftest()

The classtree function in this script is recursive—it prints a class’s name using

__name__, then climbs up to the superclasses by calling itself. This allows the function

to traverse arbitrarily shaped class trees; the recursion climbs to the top, and stops at

880 | Chapter 29: Class Coding Details

www.it-ebooks.info

root superclasses that have empty __bases__ attributes. When using recursion, each

active level of a function gets its own copy of the local scope; here, this means that

cls and indent are different at each classtree level.

Most of this file is self-test code. When run standalone in Python 2.X, it builds an empty

class tree, makes two instances from it, and prints their class tree structures:

C:\code> c:\python27\python classtree.py

Tree of <__main__.B instance at 0x00000000022C3A88>

...B

......A

Tree of <__main__.F instance at 0x00000000022C3A88>

...F

......D

.........B

............A

.........C

............A

......E

When run by Python 3.X, the tree includes the implied object superclass that is auto-

matically added above standalone root (i.e., topmost) classes, because all classes are

“new style” in 3.X—more on this change in Chapter 32:

C:\code> c:\python33\python classtree.py

Tree of <__main__.selftest.<locals>.B object at 0x00000000029216A0>

...B

......A

.........object

Tree of <__main__.selftest.<locals>.F object at 0x00000000029216A0>

...F

......D

.........B

............A

...............object

.........C

............A

...............object

......E

.........object

Here, indentation marked by periods is used to denote class tree height. Of course, we

could improve on this output format, and perhaps even sketch it in a GUI display. Even

as is, though, we can import these functions anywhere we want a quick display of a

physical class tree:

C:\code> c:\python33\python

>>> class Emp: pass

>>> class Person(Emp): pass

>>> bob = Person()

>>> import classtree

>>> classtree.instancetree(bob)

Namespaces: The Conclusion | 881

www.it-ebooks.info

Tree of <__main__.Person object at 0x000000000298B6D8>

...Person

......Emp

.........object

Regardless of whether you will ever code or use such tools, this example demonstrates

one of the many ways that you can make use of special attributes that expose interpreter

internals. You’ll see another when we code the lister.py general-purpose class display

tools in Chapter 31’s section “Multiple Inheritance: “Mix-in” Classes” on page 956

—there, we will extend this technique to also display attributes in each object in a class

tree and function as a common superclass.

In the last part of this book, we’ll revisit such tools in the context of Python tool building

at large, to code tools that implement attribute privacy, argument validation, and more.

While not in every Python programmer’s job description, access to internals enables

powerful development tools.

Documentation Strings Revisited

The last section’s example includes a docstring for its module, but remember that doc-

strings can be used for class components as well. Docstrings, which we covered in detail

in Chapter 15, are string literals that show up at the top of various structures and are

automatically saved by Python in the corresponding objects’ __doc__ attributes. This

works for module files, function defs, and classes and methods.

Now that we know more about classes and methods, the following file, docstr.py, pro-

vides a quick but comprehensive example that summarizes the places where docstrings

can show up in your code. All of these can be triple-quoted blocks or simpler one-liner

literals like those here:

"I am: docstr.__doc__"

def func(args):

"I am: docstr.func.__doc__"

pass

class spam:

"I am: spam.__doc__ or docstr.spam.__doc__ or self.__doc__"

def method(self):

"I am: spam.method.__doc__ or self.method.__doc__"

print(self.__doc__)

print(self.method.__doc__)

The main advantage of documentation strings is that they stick around at runtime.

Thus, if it’s been coded as a docstring, you can qualify an object with its __doc__ at-

tribute to fetch its documentation (printing the result interprets line breaks if it’s a

multiline string):

>>> import docstr

>>> docstr.__doc__

882 | Chapter 29: Class Coding Details

www.it-ebooks.info

'I am: docstr.__doc__'

>>> docstr.func.__doc__

'I am: docstr.func.__doc__'

>>> docstr.spam.__doc__

'I am: spam.__doc__ or docstr.spam.__doc__ or self.__doc__'

>>> docstr.spam.method.__doc__

'I am: spam.method.__doc__ or self.method.__doc__'

>>> x = docstr.spam()

>>> x.method()

I am: spam.__doc__ or docstr.spam.__doc__ or self.__doc__

I am: spam.method.__doc__ or self.method.__doc__

A discussion of the PyDoc tool, which knows how to format all these strings in reports

and web pages, appears in Chapter 15. Here it is running its help function on our code

under Python 2.X (Python 3.X shows additional attributes inherited from the implied

object superclass in the new-style class model—run this on your own to see the 3.X

extras, and watch for more about this difference in Chapter 32):

>>> help(docstr)

Help on module docstr:

NAME

docstr - I am: docstr.__doc__

FILE

c:\code\docstr.py

CLASSES

spam

class spam

| I am: spam.__doc__ or docstr.spam.__doc__ or self.__doc__

| Methods defined here:

| method(self)

| I am: spam.method.__doc__ or self.method.__doc__

FUNCTIONS

func(args)

I am: docstr.func.__doc__

Documentation strings are available at runtime, but they are less flexible syntactically

than # comments, which can appear anywhere in a program. Both forms are useful

tools, and any program documentation is good (as long as it’s accurate, of course!). As

stated before, the Python “best practice” rule of thumb is to use docstrings for func-

tional documentation (what your objects do) and hash-mark comments for more mi-

cro-level documentation (how arcane bits of code work).

Documentation Strings Revisited | 883

www.it-ebooks.info

Classes Versus Modules

Finally, let’s wrap up this chapter by briefly comparing the topics of this book’s last

two parts: modules and classes. Because they’re both about namespaces, the distinction

can be confusing. In short:

• Modules

— Implement data/logic packages

— Are created with Python files or other-language extensions

— Are used by being imported

— Form the top-level in Python program structure

• Classes

— Implement new full-featured objects

— Are created with class statements

— Are used by being called

— Always live within a module

Classes also support extra features that modules don’t, such as operator overloading,

multiple instance generation, and inheritance. Although both classes and modules are

namespaces, you should be able to tell by now that they are very different things. We

need to move ahead to see just how different classes can be.

Chapter Summary

This chapter took us on a second, more in-depth tour of the OOP mechanisms of the

Python language. We learned more about classes, methods, and inheritance, and we

wrapped up the namespaces and scopes story in Python by extending it to cover its

application to classes. Along the way, we looked at some more advanced concepts,

such as abstract superclasses, class data attributes, namespace dictionaries and links,

and manual calls to superclass methods and constructors.

Now that we’ve learned all about the mechanics of coding classes in Python, Chap-

ter 30 turns to a specific facet of those mechanics: operator overloading. After that we’ll

explore common design patterns, looking at some of the ways that classes are com-

monly used and combined to optimize code reuse. Before you read ahead, though, be

sure to work through the usual chapter quiz to review what we’ve covered here.

Test Your Knowledge: Quiz

1. What is an abstract superclass?

2. What happens when a simple assignment statement appears at the top level of a

class statement?

884 | Chapter 29: Class Coding Details

www.it-ebooks.info

3. Why might a class need to manually call the __init__ method in a superclass?

4. How can you augment, instead of completely replacing, an inherited method?

5. How does a class’s local scope differ from that of a function?

6. What...was the capital of Assyria?

Test Your Knowledge: Answers

1. An abstract superclass is a class that calls a method, but does not inherit or define

it—it expects the method to be filled in by a subclass. This is often used as a way

to generalize classes when behavior cannot be predicted until a more specific sub-

class is coded. OOP frameworks also use this as a way to dispatch to client-defined,

customizable operations.

2. When a simple assignment statement (X = Y) appears at the top level of a class

statement, it attaches a data attribute to the class (Class.X). Like all class attributes,

this will be shared by all instances; data attributes are not callable method func-

tions, though.

3. A class must manually call the __init__ method in a superclass if it defines an

__init__ constructor of its own and still wants the superclass’s construction code

to run. Python itself automatically runs just one constructor—the lowest one in

the tree. Superclass constructors are usually called through the class name, passing

in the self instance manually: Superclass.__init__(self, ...).

4. To augment instead of completely replacing an inherited method, redefine it in a

subclass, but call back to the superclass’s version of the method manually from the

new version of the method in the subclass. That is, pass the self instance to the

superclass’s version of the method manually: Superclass.method(self, ...).

5. A class is a local scope and has access to enclosing local scopes, but it does not

serve as an enclosing local scope to further nested code. Like modules, the class

local scope morphs into an attribute namespace after the class statement is run.

6. Ashur (or Qalat Sherqat), Calah (or Nimrud), the short-lived Dur Sharrukin (or

Khorsabad), and finally Nineveh.

Test Your Knowledge: Answers | 885

www.it-ebooks.info

CHAPTER 30

Operator Overloading

This chapter continues our in-depth survey of class mechanics by focusing on operator

overloading. We looked briefly at operator overloading in prior chapters; here, we’ll

fill in more details and look at a handful of commonly used overloading methods.

Although we won’t demonstrate each of the many operator overloading methods avail-

able, those we will code here are a representative sample large enough to uncover the

possibilities of this Python class feature.

The Basics

Really “operator overloading” simply means intercepting built-in operations in a class’s

methods—Python automatically invokes your methods when instances of the class

appear in built-in operations, and your method’s return value becomes the result of the

corresponding operation. Here’s a review of the key ideas behind overloading:

• Operator overloading lets classes intercept normal Python operations.

• Classes can overload all Python expression operators.

• Classes can also overload built-in operations such as printing, function calls, at-

tribute access, etc.

• Overloading makes class instances act more like built-in types.

• Overloading is implemented by providing specially named methods in a class.

In other words, when certain specially named methods are provided in a class, Python

automatically calls them when instances of the class appear in their associated expres-

sions. Your class provides the behavior of the corresponding operation for instance

objects created from it.

As we’ve learned, operator overloading methods are never required and generally don’t

have defaults (apart from a handful that some classes get from object); if you don’t

code or inherit one, it just means that your class does not support the corresponding

operation. When used, though, these methods allow classes to emulate the interfaces

of built-in objects, and so appear more consistent.

887

www.it-ebooks.info

Constructors and Expressions: __init__ and __sub__

As a review, consider the following simple example: its Number class, coded in the file

number.py, provides a method to intercept instance construction (__init__), as well as

one for catching subtraction expressions (__sub__). Special methods such as these are

the hooks that let you tie into built-in operations:

# File number.py

class Number:

def __init__(self, start): # On Number(start)

self.data = start

def __sub__(self, other): # On instance - other

return Number(self.data - other) # Result is a new instance

>>> from number import Number # Fetch class from module

>>> X = Number(5) # Number.__init__(X, 5)

>>> Y = X - 2 # Number.__sub__(X, 2)

>>> Y.data # Y is new Number instance

As we’ve already learned, the __init__ constructor method seen in this code is the most

commonly used operator overloading method in Python; it’s present in most classes,

and used to initialize the newly created instance object using any arguments passed to

the class name. The __sub__ method plays the binary operator role that __add__ did in

Chapter 27’s introduction, intercepting subtraction expressions and returning a new

instance of the class as its result (and running __init__ along the way).

We’ve already studied __init__ and basic binary operators like __sub__ in some depth,

so we won’t rehash their usage further here. In this chapter, we will tour some of the

other tools available in this domain and look at example code that applies them in

common use cases.

Technically, instance creation first triggers the __new__ method, which

creates and returns the new instance object, which is then passed into

__init__ for initialization. Since __new__ has a built-in implementation

and is redefined in only very limited roles, though, nearly all Python

classes initialize by defining an __init__ method. We’ll see one use case

for __new__ when we study metaclasses in Chapter 40; though rare, it is

sometimes also used to customize creation of instances of mutable

types.

Common Operator Overloading Methods

Just about everything you can do to built-in objects such as integers and lists has a

corresponding specially named method for overloading in classes. Table 30-1 lists a

few of the most common; there are many more. In fact, many overloading methods

come in multiple versions (e.g., __add__, __radd__, and __iadd__ for addition), which

888 | Chapter 30: Operator Overloading

www.it-ebooks.info

is one reason there are so many. See other Python books, or the Python language ref-

erence manual, for an exhaustive list of the special method names available.

Table 30-1. Common operator overloading methods

Method Implements Called for

__init__ Constructor Object creation: X = Class(args)

__del__ Destructor Object reclamation of X

__add__ Operator + X + Y, X += Y if no __iadd__

__or__ Operator | (bitwise OR) X | Y, X |= Y if no __ior__

__repr__, __str__ Printing, conversions print(X), repr(X), str(X)

__call__ Function calls X(*args, **kargs)

__getattr__ Attribute fetch X.undefined

__setattr__ Attribute assignment X.any = value

__delattr__ Attribute deletion del X.any

__getattribute__ Attribute fetch X.any

__getitem__ Indexing, slicing, iteration X[key], X[i:j], for loops and other iterations if no

__iter__

__setitem__ Index and slice assignment X[key] = value, X[i:j] = iterable

__delitem__ Index and slice deletion del X[key], del X[i:j]

__len__ Length len(X), truth tests if no __bool__

__bool__ Boolean tests bool(X), truth tests (named __nonzero__ in 2.X)

__lt__, __gt__,

__le__, __ge__,

__eq__, __ne__

Comparisons X < Y, X > Y, X <= Y, X >= Y, X == Y, X != Y

(or else __cmp__ in 2.X only)

__radd__ Right-side operators Other + X

__iadd__ In-place augmented operators X += Y (or else __add__)

__iter__, __next__ Iteration contexts I=iter(X), next(I); for loops, in if no __con

tains__, all comprehensions, map(F,X), others

(__next__ is named next in 2.X)

__contains__ Membership test item in X (any iterable)

__index__ Integer value hex(X), bin(X), oct(X), O[X], O[X:] (replaces 2.X

__oct__, __hex__)

__enter__, __exit__ Context manager (Chapter 34)with obj as var:

__get__, __set__,

__delete__

Descriptor attributes (Chapter 38)X.attr, X.attr = value, del X.attr

__new__ Creation (Chapter 40) Object creation, before __init__

All overloading methods have names that start and end with two underscores to keep

them distinct from other names you define in your classes. The mappings from special

The Basics | 889

www.it-ebooks.info

method names to expressions or operations are predefined by the Python language,

and documented in full in the standard language manual and other reference resources.

For example, the name __add__ always maps to + expressions by Python language def-

inition, regardless of what an __add__ method’s code actually does.

Operator overloading methods may be inherited from superclasses if not defined, just

like any other methods. Operator overloading methods are also all optional—if you

don’t code or inherit one, that operation is simply unsupported by your class, and

attempting it will raise an exception. Some built-in operations, like printing, have de-

faults (inherited from the implied object class in Python 3.X), but most built-ins fail

for class instances if no corresponding operator overloading method is present.

Most overloading methods are used only in advanced programs that require objects to

behave like built-ins, though the __init__ constructor we’ve already met tends to ap-

pear in most classes. Let’s explore some of the additional methods in Table 30-1 by

example.

Although expressions trigger operator methods, be careful not to as-

sume that there is a speed advantage to cutting out the middleman and

calling the operator method directly. In fact, calling the operator method

directly might be twice as slow, presumably because of the overhead of

a function call, which Python avoids or optimizes in built-in cases.

Here’s the story for len and __len__ using Appendix B’s Windows

launcher and Chapter 21’s timing techniques on Python 3.3 and 2.7: in

both, calling __len__ directly takes twice as long:

c:\code> py −3 -m timeit -n 1000 -r 5

-s "L = list(range(100))" "x = L.__len__()"

1000 loops, best of 5: 0.134 usec per loop

c:\code> py −3 -m timeit -n 1000 -r 5

-s "L = list(range(100))" "x = len(L)"

1000 loops, best of 5: 0.063 usec per loop

c:\code> py −2 -m timeit -n 1000 -r 5

-s "L = list(range(100))" "x = L.__len__()"

1000 loops, best of 5: 0.117 usec per loop

c:\code> py −2 -m timeit -n 1000 -r 5

-s "L = list(range(100))" "x = len(L)"

1000 loops, best of 5: 0.0596 usec per loop

This is not as artificial as it may seem—I’ve actually come across rec-

ommendations for using the slower alternative in the name of speed at

a noted research institution!

Indexing and Slicing: __getitem__ and __setitem__

Our first method set allows your classes to mimic some of the behaviors of sequences

and mappings. If defined in a class (or inherited by it), the __getitem__ method is called

890 | Chapter 30: Operator Overloading

www.it-ebooks.info

automatically for instance-indexing operations. When an instance X appears in an in-

dexing expression like X[i], Python calls the __getitem__ method inherited by the in-

stance, passing X to the first argument and the index in brackets to the second argument.

For example, the following class returns the square of an index value—atypical perhaps,

but illustrative of the mechanism in general:

>>> class Indexer:

def __getitem__(self, index):

return index ** 2

>>> X = Indexer()

>>> X[2] # X[i] calls X.__getitem__(i)

>>> for i in range(5):

print(X[i], end=' ') # Runs __getitem__(X, i) each time

0 1 4 9 16

Intercepting Slices

Interestingly, in addition to indexing, __getitem__ is also called for slice expressions—

always in 3.X, and conditionally in 2.X if you don’t provide more specific slicing meth-

ods. Formally speaking, built-in types handle slicing the same way. Here, for example,

is slicing at work on a built-in list, using upper and lower bounds and a stride (see

Chapter 7 if you need a refresher on slicing):

>>> L = [5, 6, 7, 8, 9]

>>> L[2:4] # Slice with slice syntax: 2..(4-1)

[7, 8]

>>> L[1:]

[6, 7, 8, 9]

>>> L[:-1]

[5, 6, 7, 8]

>>> L[::2]

[5, 7, 9]

Really, though, slicing bounds are bundled up into a slice object and passed to the list’s

implementation of indexing. In fact, you can always pass a slice object manually—slice

syntax is mostly syntactic sugar for indexing with a slice object:

>>> L[slice(2, 4)] # Slice with slice objects

[7, 8]

>>> L[slice(1, None)]

[6, 7, 8, 9]

>>> L[slice(None, −1)]

[5, 6, 7, 8]

>>> L[slice(None, None, 2)]

[5, 7, 9]

This matters in classes with a __getitem__ method—in 3.X, the method will be called

both for basic indexing (with an index) and for slicing (with a slice object). Our previous

Indexing and Slicing: __getitem__ and __setitem__ | 891

www.it-ebooks.info

class won’t handle slicing because its math assumes integer indexes are passed, but the

following class will. When called for indexing, the argument is an integer as before:

>>> class Indexer:

data = [5, 6, 7, 8, 9]

def __getitem__(self, index): # Called for index or slice

print('getitem:', index)

return self.data[index] # Perform index or slice

>>> X = Indexer()

>>> X[0] # Indexing sends __getitem__ an integer

getitem: 0

>>> X[1]

getitem: 1

>>> X[-1]

getitem: −1

When called for slicing, though, the method receives a slice object, which is simply

passed along to the embedded list indexer in a new index expression:

>>> X[2:4] # Slicing sends __getitem__ a slice object

getitem: slice(2, 4, None)

[7, 8]

>>> X[1:]

getitem: slice(1, None, None)

[6, 7, 8, 9]

>>> X[:-1]

getitem: slice(None, −1, None)

[5, 6, 7, 8]

>>> X[::2]

getitem: slice(None, None, 2)

[5, 7, 9]

Where needed, __getitem__ can test the type of its argument, and extract slice object

bounds—slice objects have attributes start, stop, and step, any of which can be None

if omitted:

>>> class Indexer:

def __getitem__(self, index):

if isinstance(index, int): # Test usage mode

print('indexing', index)

else:

print('slicing', index.start, index.stop, index.step)

>>> X = Indexer()

>>> X[99]

indexing 99

>>> X[1:99:2]

slicing 1 99 2

>>> X[1:]

slicing 1 None None

892 | Chapter 30: Operator Overloading

www.it-ebooks.info

If used, the __setitem__ index assignment method similarly intercepts both index and

slice assignments—in 3.X (and usually in 2.X) it receives a slice object for the latter,

which may be passed along in another index assignment or used directly in the same

way:

class IndexSetter:

def __setitem__(self, index, value): # Intercept index or slice assignment

...

self.data[index] = value # Assign index or slice

In fact, __getitem__ may be called automatically in even more contexts than indexing

and slicing—it’s also an iteration fallback option, as we’ll see in a moment. First,

though, let’s take a quick look at 2.X’s flavor of these operations for 2.X readers, and

clarify a potential point of confusion in this category.

Slicing and Indexing in Python 2.X

In Python 2.X only, classes can also define __getslice__ and __setslice__ methods to

intercept slice fetches and assignments specifically. If defined, these methods are passed

the bounds of the slice expression, and are preferred over __getitem__ and __seti

tem__ for two-limit slices. In all other cases, though, this context works the same as in

3.X; for example, a slice object is still created and passed to __getitem__ if no __get

slice__ is found or a three-limit extended slice form is used:

C:\code> c:\python27\python

>>> class Slicer:

def __getitem__(self, index): print index

def __getslice__(self, i, j): print i, j

def __setslice__(self, i, j,seq): print i, j,seq

>>> Slicer()[1] # Runs __getitem__ with int, like 3.X

>>> Slicer()[1:9] # Runs __getslice__ if present, else __getitem__

1 9

>>> Slicer()[1:9:2] # Runs __getitem__ with slice(), like 3.X!

slice(1, 9, 2)

These slice-specific methods are removed in 3.X, so even in 2.X you should generally

use __getitem__ and __setitem__ instead and allow for both indexes and slice objects

as arguments—both for forward compatibility, and to avoid having to handle two- and

three-limit slices differently. In most classes, this works without any special code, be-

cause indexing methods can manually pass along the slice object in the square brackets

of another index expression, as in the prior section’s example. See the section “Mem-

bership: __contains__, __iter__, and __getitem__” on page 906 for another example

of slice interception at work.

Indexing and Slicing: __getitem__ and __setitem__ | 893

www.it-ebooks.info

But 3.X’s __index__ Is Not Indexing!

On a related note, don’t confuse the (perhaps unfortunately named) __index__ method

in Python 3.X for index interception—this method returns an integer value for an in-

stance when needed and is used by built-ins that convert to digit strings (and in retro-

spect, might have been better named __asindex__):

>>> class C:

def __index__(self):

return 255

>>> X = C()

>>> hex(X) # Integer value

'0xff'

>>> bin(X)

'0b11111111'

>>> oct(X)

'0o377'

Although this method does not intercept instance indexing like __getitem__, it is also

used in contexts that require an integer—including indexing:

>>> ('C' * 256)[255]

'C'

>>> ('C' * 256)[X] # As index (not X[i])

'C'

>>> ('C' * 256)[X:] # As index (not X[i:])

'C'

This method works the same way in Python 2.X, except that it is not called for the

hex and oct built-in functions; use __hex__ and __oct__ in 2.X (only) instead to intercept

these calls.

Index Iteration: __getitem__

Here’s a hook that isn’t always obvious to beginners, but turns out to be surprisingly

useful. In the absence of more-specific iteration methods we’ll get to in the next section,

the for statement works by repeatedly indexing a sequence from zero to higher indexes,

until an out-of-bounds IndexError exception is detected. Because of that, __geti

tem__ also turns out to be one way to overload iteration in Python—if this method is

defined, for loops call the class’s __getitem__ each time through, with successively

higher offsets.

It’s a case of “code one, get one free”—any built-in or user-defined object that responds

to indexing also responds to for loop iteration:

>>> class StepperIndex:

def __getitem__(self, i):

return self.data[i]

>>> X = StepperIndex() # X is a StepperIndex object

>>> X.data = "Spam"

894 | Chapter 30: Operator Overloading

www.it-ebooks.info

>>>

>>> X[1] # Indexing calls __getitem__

'p'

>>> for item in X: # for loops call __getitem__

print(item, end=' ') # for indexes items 0..N

S p a m

In fact, it’s really a case of “code one, get a bunch free.” Any class that supports for

loops automatically supports all iteration contexts in Python, many of which we’ve seen

in earlier chapters (iteration contexts were presented in Chapter 14). For example, the

in membership test, list comprehensions, the map built-in, list and tuple assignments,

and type constructors will also call __getitem__ automatically, if it’s defined:

>>> 'p' in X # All call __getitem__ too

True

>>> [c for c in X] # List comprehension

['S', 'p', 'a', 'm']

>>> list(map(str.upper, X)) # map calls (use list() in 3.X)

['S', 'P', 'A', 'M']

>>> (a, b, c, d) = X # Sequence assignments

>>> a, c, d

('S', 'a', 'm')

>>> list(X), tuple(X), ''.join(X) # And so on...

(['S', 'p', 'a', 'm'], ('S', 'p', 'a', 'm'), 'Spam')

>>> X

<__main__.StepperIndex object at 0x000000000297B630>

In practice, this technique can be used to create objects that provide a sequence interface

and to add logic to built-in sequence type operations; we’ll revisit this idea when ex-

tending built-in types in Chapter 32.

Iterable Objects: __iter__ and __next__

Although the __getitem__ technique of the prior section works, it’s really just a fallback

for iteration. Today, all iteration contexts in Python will try the __iter__ method first,

before trying __getitem__. That is, they prefer the iteration protocol we learned about

in Chapter 14 to repeatedly indexing an object; only if the object does not support the

iteration protocol is indexing attempted instead. Generally speaking, you should prefer

__iter__ too—it supports general iteration contexts better than __getitem__ can.

Technically, iteration contexts work by passing an iterable object to the iter built-in

function to invoke an __iter__ method, which is expected to return an iterator object.

If it’s provided, Python then repeatedly calls this iterator object’s __next__ method to

produce items until a StopIteration exception is raised. A next built-in function is also

Iterable Objects: __iter__ and __next__ | 895

www.it-ebooks.info

available as a convenience for manual iterations—next(I) is the same as

I.__next__(). For a review of this model’s essentials, see Figure 14-1 in Chapter 14.

This iterable object interface is given priority and attempted first. Only if no such

__iter__ method is found, Python falls back on the __getitem__ scheme and repeatedly

indexes by offsets as before, until an IndexError exception is raised.

Version skew note: As described in Chapter 14, if you are using Python

2.X, the I.__next__() iterator method just described is named

I.next() in your Python, and the next(I) built-in is present for porta-

bility—it calls I.next() in 2.X and I.__next__() in 3.X. Iteration works

the same in 2.X in all other respects.

User-Defined Iterables

In the __iter__ scheme, classes implement user-defined iterables by simply imple-

menting the iteration protocol introduced in Chapter 14 and elaborated in Chap-

ter 20. For example, the following file uses a class to define a user-defined iterable that

generates squares on demand, instead of all at once (per the preceding note, in Python

2.X define next instead of __next__, and print with a trailing comma as usual):

# File squares.py

class Squares:

def __init__(self, start, stop): # Save state when created

self.value = start - 1

self.stop = stop

def __iter__(self): # Get iterator object on iter

return self

def __next__(self): # Return a square on each iteration

if self.value == self.stop: # Also called by next built-in

raise StopIteration

self.value += 1

return self.value ** 2

When imported, its instances can appear in iteration contexts just like built-ins:

% python

>>> from squares import Squares

>>> for i in Squares(1, 5): # for calls iter, which calls __iter__

print(i, end=' ') # Each iteration calls __next__

1 4 9 16 25

Here, the iterator object returned by __iter__ is simply the instance self, because the

__next__ method is part of this class itself. In more complex scenarios, the iterator

object may be defined as a separate class and object with its own state information to

support multiple active iterations over the same data (we’ll see an example of this in a

moment). The end of the iteration is signaled with a Python raise statement—intro-

duced in Chapter 29 and covered in full in the next part of this book, but which simply

896 | Chapter 30: Operator Overloading

www.it-ebooks.info

raises an exception as if Python itself had done so. Manual iterations work the same on

user-defined iterables as they do on built-in types as well:

>>> X = Squares(1, 5) # Iterate manually: what loops do

>>> I = iter(X) # iter calls __iter__

>>> next(I) # next calls __next__ (in 3.X)

>>> next(I)

...more omitted...

>>> next(I)

>>> next(I) # Can catch this in try statement

StopIteration

An equivalent coding of this iterable with __getitem__ might be less natural, because

the for would then iterate through all offsets zero and higher; the offsets passed in

would be only indirectly related to the range of values produced (0..N would need to

map to start..stop). Because __iter__ objects retain explicitly managed state between

next calls, they can be more general than __getitem__.

On the other hand, iterables based on __iter__ can sometimes be more complex and

less functional than those based on __getitem__. They are really designed for iteration,

not random indexing—in fact, they don’t overload the indexing expression at all,

though you can collect their items in a sequence such as a list to enable other operations:

>>> X = Squares(1, 5)

>>> X[1]

TypeError: 'Squares' object does not support indexing

>>> list(X)[1]

Single versus multiple scans

The __iter__ scheme is also the implementation for all the other iteration contexts we

saw in action for the __getitem__ method—membership tests, type constructors, se-

quence assignment, and so on. Unlike our prior __getitem__ example, though, we also

need to be aware that a class’s __iter__ may be designed for a single traversal only, not

many. Classes choose scan behavior explicitly in their code.

For example, because the current Squares class’s __iter__ always returns self with just

one copy of iteration state, it is a one-shot iteration; once you’ve iterated over an in-

stance of that class, it’s empty. Calling __iter__ again on the same instance returns

self again, in whatever state it may have been left. You generally need to make a new

iterable instance object for each new iteration:

>>> X = Squares(1, 5) # Make an iterable with state

>>> [n for n in X] # Exhausts items: __iter__ returns self

[1, 4, 9, 16, 25]

>>> [n for n in X] # Now it's empty: __iter__ returns same self

[]

Iterable Objects: __iter__ and __next__ | 897

www.it-ebooks.info

>>> [n for n in Squares(1, 5)] # Make a new iterable object

[1, 4, 9, 16, 25]

>>> list(Squares(1, 3)) # A new object for each new __iter__ call

[1, 4, 9]

To support multiple iterations more directly, we could also recode this example with

an extra class or other technique, as we will in a moment. As is, though, by creating a

new instance for each iteration, you get a fresh copy of iteration state:

>>> 36 in Squares(1, 10) # Other iteration contexts

True

>>> a, b, c = Squares(1, 3) # Each calls __iter__ and then __next__

>>> a, b, c

(1, 4, 9)

>>> ':'.join(map(str, Squares(1, 5)))

'1:4:9:16:25'

Just like single-scan built-ins such as map, converting to a list supports multiple scans

as well, but adds time and space performance costs, which may or may not be significant

to a given program:

>>> X = Squares(1, 5)

>>> tuple(X), tuple(X) # Iterator exhausted in second tuple()

((1, 4, 9, 16, 25), ())

>>> X = list(Squares(1, 5))

>>> tuple(X), tuple(X)

((1, 4, 9, 16, 25), (1, 4, 9, 16, 25))

We’ll improve this to support multiple scans more directly ahead, after a bit of compare-

and-contrast.

Classes versus generators

Notice that the preceding example would probably be simpler if it was coded with

generator functions or expressions—tools introduced in Chapter 20 that automatically

produce iterable objects and retain local variable state between iterations:

>>> def gsquares(start, stop):

for i in range(start, stop + 1):

yield i ** 2

>>> for i in gsquares(1, 5):

print(i, end=' ')

1 4 9 16 25

>>> for i in (x ** 2 for x in range(1, 6)):

print(i, end=' ')

1 4 9 16 25

Unlike classes, generator functions and expressions implicitly save their state and create

the methods required to conform to the iteration protocol—with obvious advantages

898 | Chapter 30: Operator Overloading

www.it-ebooks.info

in code conciseness for simpler examples like these. On the other hand, the class’s more

explicit attributes and methods, extra structure, inheritance hierarchies, and support

for multiple behaviors may be better suited for richer use cases.

Of course, for this artificial example, you could in fact skip both techniques and simply

use a for loop, map, or a list comprehension to build the list all at once. Barring perfor-

mance data to the contrary, the best and fastest way to accomplish a task in Python is

often also the simplest:

>>> [x ** 2 for x in range(1, 6)]

[1, 4, 9, 16, 25]

However, classes may be better at modeling more complex iterations, especially when

they can benefit from the assets of classes in general. An iterable that produces items

in a complex database or web service result, for example, might be able to take fuller

advantage of classes. The next section explores another use case for classes in user-

defined iterables.

Multiple Iterators on One Object

Earlier, I mentioned that the iterator object (with a __next__) produced by an iterable

may be defined as a separate class with its own state information to more directly

support multiple active iterations over the same data. Consider what happens when

we step across a built-in type like a string:

>>> S = 'ace'

>>> for x in S:

for y in S:

print(x + y, end=' ')

aa ac ae ca cc ce ea ec ee

Here, the outer loop grabs an iterator from the string by calling iter, and each nested

loop does the same to get an independent iterator. Because each active iterator has its

own state information, each loop can maintain its own position in the string, regardless

of any other active loops. Moreover, we’re not required to make a new string or convert

to a list each time; the single string object itself supports multiple scans.

We saw related examples earlier, in Chapter 14 and Chapter 20. For instance, generator

functions and expressions, as well as built-ins like map and zip, proved to be single-

iterator objects, thus supporting a single active scan. By contrast, the range built-in,

and other built-in types like lists, support multiple active iterators with independent

positions.

When we code user-defined iterables with classes, it’s up to us to decide whether we

will support a single active iteration or many. To achieve the multiple-iterator effect,

__iter__ simply needs to define a new stateful object for the iterator, instead of re-

turning self for each iterator request.

Iterable Objects: __iter__ and __next__ | 899

www.it-ebooks.info

The following SkipObject class, for example, defines an iterable object that skips every

other item on iterations. Because its iterator object is created anew from a supplemental

class for each iteration, it supports multiple active loops directly (this is file skip-

per.py in the book’s examples):

#!python3

# File skipper.py

class SkipObject:

def __init__(self, wrapped): # Save item to be used

self.wrapped = wrapped

def __iter__(self):

return SkipIterator(self.wrapped) # New iterator each time

class SkipIterator:

def __init__(self, wrapped):

self.wrapped = wrapped # Iterator state information

self.offset = 0

def __next__(self):

if self.offset >= len(self.wrapped): # Terminate iterations

raise StopIteration

else:

item = self.wrapped[self.offset] # else return and skip

self.offset += 2

return item

if __name__ == '__main__':

alpha = 'abcdef'

skipper = SkipObject(alpha) # Make container object

I = iter(skipper) # Make an iterator on it

print(next(I), next(I), next(I)) # Visit offsets 0, 2, 4

for x in skipper: # for calls __iter__ automatically

for y in skipper: # Nested fors call __iter__ again each time

print(x + y, end=' ') # Each iterator has its own state, offset

A quick portability note: as is, this is 3.X-only code. To make it 2.X compatible, import

the 3.X print function, and either use next instead of __next__ for 2.X-only use, or alias

the two names in the class’s scope for dual 2.X/3.X usage (file skipper_2x.py in the

book’s examples does):

#!python

from __future__ import print_function # 2.X/3.X compatibility

...

class SkipIterator:

...

def __next__(self):

...

next = __next__ # 2.X/3.X compatibility

When the appropriate version is run in either Python, this example works like the nested

loops with built-in strings. Each active loop has its own position in the string because

each obtains an independent iterator object that records its own state information:

900 | Chapter 30: Operator Overloading

www.it-ebooks.info

% python skipper.py

a c e

aa ac ae ca cc ce ea ec ee

By contrast, our earlier Squares example supports just one active iteration, unless we

call Squares again in nested loops to obtain new objects. Here, there is just one SkipOb

ject iterable, with multiple iterator objects created from it.

Classes versus slices

As before, we could achieve similar results with built-in tools—for example, slicing

with a third bound to skip items:

>>> S = 'abcdef'

>>> for x in S[::2]:

for y in S[::2]: # New objects on each iteration

print(x + y, end=' ')

aa ac ae ca cc ce ea ec ee

This isn’t quite the same, though, for two reasons. First, each slice expression here will

physically store the result list all at once in memory; iterables, on the other hand, pro-

duce just one value at a time, which can save substantial space for large result lists.

Second, slices produce new objects, so we’re not really iterating over the same object in

multiple places here. To be closer to the class, we would need to make a single object

to step across by slicing ahead of time:

>>> S = 'abcdef'

>>> S = S[::2]

>>> S

'ace'

>>> for x in S:

for y in S: # Same object, new iterators

print(x + y, end=' ')

aa ac ae ca cc ce ea ec ee

This is more similar to our class-based solution, but it still stores the slice result in

memory all at once (there is no generator form of built-in slicing today), and it’s only

equivalent for this particular case of skipping every other item.

Because user-defined iterables coded with classes can do anything a class can do, they

are much more general than this example may imply. Though such generality is not

required in all applications, user-defined iterables are a powerful tool—they allow us

to make arbitrary objects look and feel like the other sequences and iterables we have

met in this book. We could use this technique with a database object, for example, to

support iterations over large database fetches, with multiple cursors into the same query

result.

Iterable Objects: __iter__ and __next__ | 901

www.it-ebooks.info

Coding Alternative: __iter__ plus yield

And now, for something completely implicit—but potentially useful nonetheless. In

some applications, it’s possible to minimize coding requirements for user-defined itera-

bles by combining the __iter__ method we’re exploring here and the yield generator

function statement we studied in Chapter 20. Because generator functions automati-

cally save local variable state and create required iterator methods, they fit this role

well, and complement the state retention and other utility we get from classes.

As a review, recall that any function that contains a yield statement is turned into a

generator function. When called, it returns a new generator object with automatic re-

tention of local scope and code position, an automatically created __iter__ method

that simply returns itself, and an automatically created __next__ method (next in 2.X)

that starts the function or resumes it where it last left off:

>>> def gen(x):

for i in range(x): yield i ** 2

>>> G = gen(5) # Create a generator with __iter__ and __next__

>>> G.__iter__() == G # Both methods exist on the same object

True

>>> I = iter(G) # Runs __iter__: generator returns itself

>>> next(I), next(I) # Runs __next__ (next in 2.X)

(0, 1)

>>> list(gen(5)) # Iteration contexts automatically run iter and next

[0, 1, 4, 9, 16]

This is still true even if the generator function with a yield happens to be a method

named __iter__: whenever invoked by an iteration context tool, such a method will

return a new generator object with the requisite __next__. As an added bonus, generator

functions coded as methods in classes have access to saved state in both instance at-

tributes and local scope variables.

For example, the following class is equivalent to the initial Squares user-defined iterable

we coded earlier in squares.py.

# File squares_yield.py

class Squares: # __iter__ + yield generator

def __init__(self, start, stop): # __next__ is automatic/implied

self.start = start

self.stop = stop

def __iter__(self):

for value in range(self.start, self.stop + 1):

yield value ** 2

There’s no need to alias next to __next__ for 2.X compatibility here, because this

method is now automated and implied by the use of yield. As before, for loops and

other iteration tools iterate through instances of this class automatically:

% python

>>> from squares_yield import Squares

>>> for i in Squares(1, 5): print(i, end=' ')

902 | Chapter 30: Operator Overloading

www.it-ebooks.info

1 4 9 16 25

And as usual, we can look under the hood to see how this actually works in iteration

contexts. Running our class instance through iter obtains the result of calling

__iter__ as usual, but in this case the result is a generator object with an automatically

created __next__ of the same sort we always get when calling a generator function that

contains a yield. The only difference here is that the generator function is automatically

called on iter. Invoking the result object’s next interface produces results on demand:

>>> S = Squares(1, 5) # Runs __init__: class saves instance state

>>> S

<squares_yield.Squares object at 0x000000000294B630>

>>> I = iter(S) # Runs __iter__: returns a generator

>>> I

>>> next(I)

>>> next(I) # Runs generator's __next__

...etc...

>>> next(I) # Generator has both instance and local scope state

StopIteration

It may also help to notice that we could name the generator method something other

than __iter__ and call manually to iterate—Squares(1,5).gen(), for example. Using

the __iter__ name invoked automatically by iteration tools simply skips a manual at-

tribute fetch and call step:

class Squares: # Non __iter__ equivalent (squares_manual.py)

def __init__(...):

...

def gen(self):

for value in range(self.start, self.stop + 1):

yield value ** 2

% python

>>> from squares_manual import Squares

>>> for i in Squares(1, 5).gen(): print(i, end=' ')

...same results...

>>> S = Squares(1, 5)

>>> I = iter(S.gen()) # Call generator manually for iterable/iterator

>>> next(I)

...same results...

Coding the generator as __iter__ instead cuts out the middleman in your code, though

both schemes ultimately wind up creating a new generator object for each iteration:

• With __iter__, iteration triggers __iter__, which returns a new generator with

__next__.

Iterable Objects: __iter__ and __next__ | 903

www.it-ebooks.info

• Without __iter__, your code calls to make a generator, which returns itself for

__iter__.

See Chapter 20 for more on yield and generators if this is puzzling, and compare it

with the more explicit __next__ version in squares.py earlier. You’ll notice that this new

squares_yield.py version is 4 lines shorter (7 versus 11). In a sense, this scheme reduces

class coding requirements much like the closure functions of Chapter 17, but in this

case does so with a combination of functional and OOP techniques, instead of an al-

ternative to classes. For example, the generator method still leverages self attributes.

This may also very well seem like one too many levels of magic to some observers—it

relies on both the iteration protocol and the object creation of generators, both of which

are highly implicit (in contradiction of longstanding Python themes: see import this).

Opinions aside, it’s important to understand the non-yield flavor of class iterables too,

because it’s explicit, general, and sometimes broader in scope.

Still, the __iter__/yield technique may prove effective in cases where it applies. It also

comes with a substantial advantage—as the next section explains.

Multiple iterators with yield

Besides its code conciseness, the user-defined class iterable of the prior section based

upon the __iter__/yield combination has an important added bonus—it also supports

multiple active iterators automatically. This naturally follows from the fact that each

call to __iter__ is a call to a generator function, which returns a new generator with its

own copy of the local scope for state retention:

% python

>>> from squares_yield import Squares # Using the __iter__/yield Squares

>>> S = Squares(1, 5)

>>> I = iter(S)

>>> next(I); next(I)

>>> J = iter(S) # With yield, multiple iterators automatic

>>> next(J)

>>> next(I) # I is independent of J: own local state

Although generator functions are single-scan iterables, the implicit calls to __iter__ in

iteration contexts make new generators supporting new independent scans:

>>> S = Squares(1, 3)

>>> for i in S: # Each for calls __iter__

for j in S:

print('%s:%s' % (i, j), end=' ')

1:1 1:4 1:9 4:1 4:4 4:9 9:1 9:4 9:9

904 | Chapter 30: Operator Overloading

www.it-ebooks.info

To do the same without yield requires a supplemental class that stores iterator state

explicitly and manually, using techniques of the preceding section (and grows to 15

lines: 8 more than with yield):

# File squares_nonyield.py

class Squares:

def __init__(self, start, stop): # Non-yield generator

self.start = start # Multiscans: extra object

self.stop = stop

def __iter__(self):

return SquaresIter(self.start, self.stop)

class SquaresIter:

def __init__(self, start, stop):

self.value = start - 1

self.stop = stop

def __next__(self):

if self.value == self.stop:

raise StopIteration

self.value += 1

return self.value ** 2

This works the same as the yield multiscan version, but with more, and more explicit,

code:

% python

>>> from squares_nonyield import Squares

>>> for i in Squares(1, 5): print(i, end=' ')

1 4 9 16 25

>>>

>>> S = Squares(1, 5)

>>> I = iter(S)

>>> next(I); next(I)

>>> J = iter(S) # Multiple iterators without yield

>>> next(J)

>>> next(I)

>>> S = Squares(1, 3)

>>> for i in S: # Each for calls __iter___

for j in S:

print('%s:%s' % (i, j), end=' ')

1:1 1:4 1:9 4:1 4:4 4:9 9:1 9:4 9:9

Finally, the generator-based approach could similarly remove the need for an extra

iterator class in the prior item-skipper example of file skipper.py, thanks to its automatic

methods and local variable state retention (and checks in at 9 lines versus the original’s

16):

Iterable Objects: __iter__ and __next__ | 905

www.it-ebooks.info

# File skipper_yield.py

class SkipObject: # Another __iter__ + yield generator

def __init__(self, wrapped): # Instance scope retained normally

self.wrapped = wrapped # Local scope state saved auto

def __iter__(self):

offset = 0

while offset < len(self.wrapped):

item = self.wrapped[offset]

offset += 2

yield item

This works the same as the non-yield multiscan version, but with less, and less explicit,

code:

% python

>>> from skipper_yield import SkipObject

>>> skipper = SkipObject('abcdef')

>>> I = iter(skipper)

>>> next(I); next(I); next(I)

'a'

'c'

'e'

>>> for x in skipper: # Each for calls __iter__: new auto generator

for y in skipper:

print(x + y, end=' ')

aa ac ae ca cc ce ea ec ee

Of course, these are all artificial examples that could be replaced with simpler tools like

comprehensions, and their code may or may not scale up in kind to more realistic tasks.

Study these alternatives to see how they compare. As so often in programming, the best

tool for the job will likely be the best tool for your job!

Membership: __contains__, __iter__, and __getitem__

The iteration story is even richer than we’ve seen thus far. Operator overloading is often

layered: classes may provide specific methods, or more general alternatives used as

fallback options. For example:

• Comparisons in Python 2.X use specific methods such as __lt__ for “less than” if

present, or else the general __cmp__. Python 3.X uses only specific methods, not

__cmp__, as discussed later in this chapter.

• Boolean tests similarly try a specific __bool__ first (to give an explicit True/False

result), and if it’s absent fall back on the more general __len__ (a nonzero length

means True). As we’ll also see later in this chapter, Python 2.X works the same but

uses the name __nonzero__ instead of __bool__.

In the iterations domain, classes can implement the in membership operator as an

iteration, using either the __iter__ or __getitem__ methods. To support more specific

membership, though, classes may code a __contains__ method—when present, this

906 | Chapter 30: Operator Overloading

www.it-ebooks.info

method is preferred over __iter__, which is preferred over __getitem__. The __con

tains__ method should define membership as applying to keys for a mapping (and can

use quick lookups), and as a search for sequences.

Consider the following class, whose file has been instrumented for dual 2.X/3.X usage

using the techniques described earlier. It codes all three methods and tests membership

and various iteration contexts applied to an instance. Its methods print trace messages

when called:

# File contains.py

from __future__ import print_function # 2.X/3.X compatibility

class Iters:

def __init__(self, value):

self.data = value

def __getitem__(self, i): # Fallback for iteration

print('get[%s]:' % i, end='') # Also for index, slice

return self.data[i]

def __iter__(self): # Preferred for iteration

print('iter=> ', end='') # Allows only one active iterator

self.ix = 0

return self

def __next__(self):

print('next:', end='')

if self.ix == len(self.data): raise StopIteration

item = self.data[self.ix]

self.ix += 1

return item

def __contains__(self, x): # Preferred for 'in'

print('contains: ', end='')

return x in self.data

next = __next__ # 2.X/3.X compatibility

if __name__ == '__main__':

X = Iters([1, 2, 3, 4, 5]) # Make instance

print(3 in X) # Membership

for i in X: # for loops

print(i, end=' | ')

print()

print([i ** 2 for i in X]) # Other iteration contexts

print( list(map(bin, X)) )

I = iter(X) # Manual iteration (what other contexts do)

while True:

try:

print(next(I), end=' @ ')

except StopIteration:

break

Membership: __contains__, __iter__, and __getitem__ | 907

www.it-ebooks.info

As is, the class in this file has an __iter__ that supports multiple scans, but only a single

scan can be active at any point in time (e.g., nested loops won’t work), because each

iteration attempt resets the scan cursor to the front. Now that you know about yield

in iteration methods, you should be able to tell that the following is equivalent but

allows multiple active scans—and judge for yourself whether its more implicit nature

is worth the nested-scan support and six lines shaved (this is in file contains_yield.py):

class Iters:

def __init__(self, value):

self.data = value

def __getitem__(self, i): # Fallback for iteration

print('get[%s]:' % i, end='') # Also for index, slice

return self.data[i]

def __iter__(self): # Preferred for iteration

print('iter=> next:', end='') # Allows multiple active iterators

for x in self.data: # no __next__ to alias to next

yield x

print('next:', end='')

def __contains__(self, x): # Preferred for 'in'

print('contains: ', end='')

return x in self.data

On both Python 3.X and 2.X, when either version of this file runs its output is as follows

—the specific __contains__ intercepts membership, the general __iter__ catches other

iteration contexts such that __next__ (whether explicitly coded or implied by yield) is

called repeatedly, and __getitem__ is never called:

contains: True

iter=> next:next:next:next:next:next:[1, 4, 9, 16, 25]

iter=> next:next:next:next:next:next:['0b1', '0b10', '0b11', '0b100', '0b101']

iter=> next:1 @ next:2 @ next:3 @ next:4 @ next:5 @ next:

Watch what happens to this code’s output if we comment out its __contains__ method,

though—membership is now routed to the general __iter__ instead:

iter=> next:next:next:True

iter=> next:next:next:next:next:next:[1, 4, 9, 16, 25]

iter=> next:next:next:next:next:next:['0b1', '0b10', '0b11', '0b100', '0b101']

iter=> next:1 @ next:2 @ next:3 @ next:4 @ next:5 @ next:

And finally, here is the output if both __contains__ and __iter__ are commented out

—the indexing __getitem__ fallback is called with successively higher indexes until it

raises IndexError, for membership and other iteration contexts:

get[0]:get[1]:get[2]:True

get[0]:1 | get[1]:2 | get[2]:3 | get[3]:4 | get[4]:5 | get[5]:

get[0]:get[1]:get[2]:get[3]:get[4]:get[5]:[1, 4, 9, 16, 25]

get[0]:get[1]:get[2]:get[3]:get[4]:get[5]:['0b1', '0b10', '0b11', '0b100','0b101']

get[0]:1 @ get[1]:2 @ get[2]:3 @ get[3]:4 @ get[4]:5 @ get[5]:

908 | Chapter 30: Operator Overloading

www.it-ebooks.info

As we’ve seen, the __getitem__ method is even more general: besides iterations, it also

intercepts explicit indexing as well as slicing. Slice expressions trigger __getitem__ with

a slice object containing bounds, both for built-in types and user-defined classes, so

slicing is automatic in our class:

>>> from contains import Iters

>>> X = Iters('spam') # Indexing

>>> X[0] # __getitem__(0)

get[0]:'s'

>>> 'spam'[1:] # Slice syntax

'pam'

>>> 'spam'[slice(1, None)] # Slice object

'pam'

>>> X[1:] # __getitem__(slice(..))

get[slice(1, None, None)]:'pam'

>>> X[:-1]

get[slice(None, −1, None)]:'spa'

>>> list(X) # And iteration too!

iter=> next:next:next:next:next:['s', 'p', 'a', 'm']

In more realistic iteration use cases that are not sequence-oriented, though, the

__iter__ method may be easier to write since it must not manage an integer index, and

__contains__ allows for membership optimization as a special case.

Attribute Access: __getattr__ and __setattr__

In Python, classes can also intercept basic attribute access (a.k.a. qualification) when

needed or useful. Specifically, for an object created from a class, the dot operator ex-

pression object.attribute can be implemented by your code too, for reference, as-

signment, and deletion contexts. We saw a limited example in this category in Chap-

ter 28, but will review and expand on the topic here.

Attribute Reference

The __getattr__ method intercepts attribute references. It’s called with the attribute

name as a string whenever you try to qualify an instance with an undefined (nonexistent)

attribute name. It is not called if Python can find the attribute using its inheritance tree

search procedure.

Because of its behavior, __getattr__ is useful as a hook for responding to attribute

requests in a generic fashion. It’s commonly used to delegate calls to embedded (or

“wrapped”) objects from a proxy controller object—of the sort introduced in Chap-

ter 28’s introduction to delegation. This method can also be used to adapt classes to an

interface, or add accessors for data attributes after the fact—logic in a method that

validates or computes an attribute after it’s already being used with simple dot notation.

Attribute Access: __getattr__ and __setattr__ | 909

www.it-ebooks.info

The basic mechanism underlying these goals is straightforward—the following class

catches attribute references, computing the value for one dynamically, and triggering

an error for others unsupported with the raise statement described earlier in this chap-

ter for iterators (and fully covered in Part VII):

>>> class Empty:

def __getattr__(self, attrname): # On self.undefined

if attrname == 'age':

return 40

else:

raise AttributeError(attrname)

>>> X = Empty()

>>> X.age

>>> X.name

...error text omitted...

AttributeError: name

Here, the Empty class and its instance X have no real attributes of their own, so the access

to X.age gets routed to the __getattr__ method; self is assigned the instance (X), and

attrname is assigned the undefined attribute name string ('age'). The class makes age

look like a real attribute by returning a real value as the result of the X.age qualification

expression (40). In effect, age becomes a dynamically computed attribute—its value is

formed by running code, not fetching an object.

For attributes that the class doesn’t know how to handle, __getattr__ raises the built-

in AttributeError exception to tell Python that these are bona fide undefined names;

asking for X.name triggers the error. You’ll see __getattr__ again when we see delegation

and properties at work in the next two chapters; let’s move on to related tools here.

Attribute Assignment and Deletion

In the same department, the __setattr__ intercepts all attribute assignments. If this

method is defined or inherited, self.attr = value becomes self.__setattr__('attr',

value). Like __getattr__, this allows your class to catch attribute changes, and validate

or transform as desired.

This method is a bit trickier to use, though, because assigning to any self attributes

within __setattr__ calls __setattr__ again, potentially causing an infinite recursion

loop (and a fairly quick stack overflow exception!). In fact, this applies to all self at-

tribute assignments anywhere in the class—all are routed to __setattr__, even those

in other methods, and those to names other than that which may have triggered

__setattr__ in the first place. Remember, this catches all attribute assignments.

If you wish to use this method, you can avoid loops by coding instance attribute as-

signments as assignments to attribute dictionary keys. That is, use

self.__dict__['name'] = x, not self.name = x; because you’re not assigning to

__dict__ itself, this avoids the loop:

910 | Chapter 30: Operator Overloading

www.it-ebooks.info

>>> class Accesscontrol:

def __setattr__(self, attr, value):

if attr == 'age':

self.__dict__[attr] = value + 10 # Not self.name=val or setattr

else:

raise AttributeError(attr + ' not allowed')

>>> X = Accesscontrol()

>>> X.age = 40 # Calls __setattr__

>>> X.age

>>> X.name = 'Bob'

...text omitted...

AttributeError: name not allowed

If you change the __dict__ assignment in this to either of the following, it triggers the

infinite recursion loop and exception—both dot notation and its setattr built-in func-

tion equivalent (the assignment analog of getattr) fail when age is assigned outside the

class:

self.age = value + 10 # Loops

setattr(self, attr, value + 10) # Loops (attr is 'age')

An assignment to another name within the class triggers a recursive __setattr__ call

too, though in this class ends less dramatically in the manual AttributeError exception:

self.other = 99 # Recurs but doesn't loop: fails

It’s also possible to avoid recursive loops in a class that uses __setattr__ by routing

any attribute assignments to a higher superclass with a call, instead of assigning keys

in __dict__:

self.__dict__[attr] = value + 10 # OK: doesn't loop

object.__setattr__(self, attr, value + 10) # OK: doesn't loop (new-style only)

Because the object form requires use of new-style classes in 2.X, though, we’ll postpone

details on this form until Chapter 38’s deeper look at attribute management at large.

A third attribute management method, __delattr__, is passed the attribute name string

and invoked on all attribute deletions (i.e., del object.attr). Like __setattr__, it must

avoid recursive loops by routing attribute deletions with the using class through

__dict__ or a superclass.

As we’ll learn in Chapter 32, attributes implemented with new-style

class features such as slots and properties are not physically stored in the

instance’s __dict__ namespace dictionary (and slots may even preclude

its existence entirely!). Because of this, code that wishes to support such

attributes should code __setattr__ to assign with the

object.__setattr__ scheme shown here, not by self.__dict__ indexing

unless it’s known that subject classes store all their data in the instance

itself. In Chapter 38 we’ll also see that the new-style __getattribute__

Attribute Access: __getattr__ and __setattr__ | 911

www.it-ebooks.info

has similar requirements. This change is mandated in Python 3.X, but

also applies to 2.X if new-style classes are used.

Other Attribute Management Tools

These three attribute-access overloading methods allow you to control or specialize

access to attributes in your objects. They tend to play highly specialized roles, some of

which we’ll explore later in this book. For another example of __getattr__ at work, see

Chapter 28’s person-composite.py. And for future reference, keep in mind that there are

other ways to manage attribute access in Python:

• The __getattribute__ method intercepts all attribute fetches, not just those that

are undefined, but when using it you must be more cautious than with __get

attr__ to avoid loops.

• The property built-in function allows us to associate methods with fetch and set

operations on a specific class attribute.

•Descriptors provide a protocol for associating __get__ and __set__ methods of a

class with accesses to a specific class attribute.

•Slots attributes are declared in classes but create implicit storage in each instance.

Because these are somewhat advanced tools not of interest to every Python program-

mer, we’ll defer a look at properties until Chapter 32 and detailed coverage of all the

attribute management techniques until Chapter 38.

Emulating Privacy for Instance Attributes: Part 1

As another use case for such tools, the following code—file private0.py—generalizes

the previous example, to allow each subclass to have its own list of private names that

cannot be assigned to its instances (and uses a user-defined exception class, which you’ll

have to take on faith until Part VII):

class PrivateExc(Exception): pass # More on exceptions in Part VII

class Privacy:

def __setattr__(self, attrname, value): # On self.attrname = value

if attrname in self.privates:

raise PrivateExc(attrname, self) # Make, raise user-define except

else:

self.__dict__[attrname] = value # Avoid loops by using dict key

class Test1(Privacy):

privates = ['age']

class Test2(Privacy):

privates = ['name', 'pay']

def __init__(self):

self.__dict__['name'] = 'Tom' # To do better, see Chapter 39!

912 | Chapter 30: Operator Overloading

www.it-ebooks.info

if __name__ == '__main__':

x = Test1()

y = Test2()

x.name = 'Bob' # Works

#y.name = 'Sue' # Fails

print(x.name)

y.age = 30 # Works

#x.age = 40 # Fails

print(y.age)

In fact, this is a first-cut solution for an implementation of attribute privacy in Python

—disallowing changes to attribute names outside a class. Although Python doesn’t

support private declarations per se, techniques like this can emulate much of their

purpose.

This is a partial—and even clumsy—solution, though; to make it more effective, we

must augment it to allow classes to set their private attributes more naturally, without

having to go through __dict__ each time, as the constructor must do here to avoid

triggering __setattr__ and an exception. A better and more complete approach might

require a wrapper (“proxy”) class to check for private attribute accesses made outside

the class only, and a __getattr__ to validate attribute fetches too.

We’ll postpone a more complete solution to attribute privacy until Chapter 39, where

we’ll use class decorators to intercept and validate attributes more generally. Even

though privacy can be emulated this way, though, it almost never is in practice. Python

programmers are able to write large OOP frameworks and applications without private

declarations—an interesting finding about access controls in general that is beyond the

scope of our purposes here.

Still, catching attribute references and assignments is generally a useful technique; it

supports delegation, a design technique that allows controller objects to wrap up em-

bedded objects, add new behaviors, and route other operations back to the wrapped

objects. Because they involve design topics, we’ll revisit delegation and wrapper classes

in the next chapter.

String Representation: __repr__ and __str__

Our next methods deal with display formats—a topic we’ve already explored in prior

chapters, but will summarize and formalize here. As a review, the following code ex-

ercises the __init__ constructor and the __add__ overload method, both of which we’ve

already seen (+ is an in-place operation here, just to show that it can be; per Chap-

ter 27, a named method may be preferred). As we’ve learned, the default display of

instance objects for a class like this is neither generally useful nor aesthetically pretty:

>>> class adder:

def __init__(self, value=0):

self.data = value # Initialize data

String Representation: __repr__ and __str__ | 913

www.it-ebooks.info

def __add__(self, other):

self.data += other # Add other in place (bad form?)

>>> x = adder() # Default displays

>>> print(x)

<__main__.adder object at 0x00000000029736D8>

>>> x

<__main__.adder object at 0x00000000029736D8>

But coding or inheriting string representation methods allows us to customize the dis-

play—as in the following, which defines a __repr__ method in a subclass that returns

a string representation for its instances.

>>> class addrepr(adder): # Inherit __init__, __add__

def __repr__(self): # Add string representation

return 'addrepr(%s)' % self.data # Convert to as-code string

>>> x = addrepr(2) # Runs __init__

>>> x + 1 # Runs __add__ (x.add() better?)

>>> x # Runs __repr__

addrepr(3)

>>> print(x) # Runs __repr__

addrepr(3)

>>> str(x), repr(x) # Runs __repr__ for both

('addrepr(3)', 'addrepr(3)')

If defined, __repr__ (or its close relative, __str__) is called automatically when class

instances are printed or converted to strings. These methods allow you to define a better

display format for your objects than the default instance display. Here, __repr__ uses

basic string formatting to convert the managed self.data object to a more human-

friendly string for display.

Why Two Display Methods?

So far, what we’ve seen is largely review. But while these methods are generally straight-

forward to use, their roles and behavior have some subtle implications both for design

and coding. In particular, Python provides two display methods to support alternative

displays for different audiences:

•__str__ is tried first for the print operation and the str built-in function (the in-

ternal equivalent of which print runs). It generally should return a user-friendly

display.

•__repr__ is used in all other contexts: for interactive echoes, the repr function, and

nested appearances, as well as by print and str if no __str__ is present. It should

generally return an as-code string that could be used to re-create the object, or a

detailed display for developers.

That is, __repr__ is used everywhere, except by print and str when a __str__ is defined.

This means you can code a __repr__ to define a single display format used everywhere,

914 | Chapter 30: Operator Overloading

www.it-ebooks.info

and may code a __str__ to either support print and str exclusively, or to provide an

alternative display for them.

As noted in Chapter 28, general tools may also prefer __str__ to leave other classes the

option of adding an alternative __repr__ display for use in other contexts, as long as

print and str displays suffice for the tool. Conversely, a general tool that codes a

__repr__ still leaves clients the option of adding alternative displays with a __str__ for

print and str. In other words, if you code either, the other is available for an additional

display. In cases where the choice isn’t clear, __str__ is generally preferred for larger

user-friendly displays, and __repr__ for lower-level or as-code displays and all-inclusive

roles.

Let’s write some code to illustrate these two methods’ distinctions in more concrete

terms. The prior example in this section showed how __repr__ is used as the fallback

option in many contexts. However, while printing falls back on __repr__ if no

__str__ is defined, the inverse is not true—other contexts, such as interactive echoes,

use __repr__ only and don’t try __str__ at all:

>>> class addstr(adder):

def __str__(self): # __str__ but no __repr__

return '[Value: %s]' % self.data # Convert to nice string

>>> x = addstr(3)

>>> x + 1

>>> x # Default __repr__

<__main__.addstr object at 0x00000000029738D0>

>>> print(x) # Runs __str__

[Value: 4]

>>> str(x), repr(x)

('[Value: 4]', '<__main__.addstr object at 0x00000000029738D0>')

Because of this, __repr__ may be best if you want a single display for all contexts. By

defining both methods, though, you can support different displays in different contexts

—for example, an end-user display with __str__, and a low-level display for program-

mers to use during development with __repr__. In effect, __str__ simply overrides

__repr__ for more user-friendly display contexts:

>>> class addboth(adder):

def __str__(self):

return '[Value: %s]' % self.data # User-friendly string

def __repr__(self):

return 'addboth(%s)' % self.data # As-code string

>>> x = addboth(4)

>>> x + 1

>>> x # Runs __repr__

addboth(5)

>>> print(x) # Runs __str__

[Value: 5]

>>> str(x), repr(x)

('[Value: 5]', 'addboth(5)')

String Representation: __repr__ and __str__ | 915

www.it-ebooks.info

Display Usage Notes

Though generally simple to use, I should mention three usage notes regarding these

methods here. First, keep in mind that __str__ and __repr__ must both return strings;

other result types are not converted and raise errors, so be sure to run them through a

to-string converter (e.g., str or %) if needed.

Second, depending on a container’s string-conversion logic, the user-friendly display

of __str__ might only apply when objects appear at the top level of a print operation;

objects nested in larger objects might still print with their __repr__ or its default. The

following illustrates both of these points:

>>> class Printer:

def __init__(self, val):

self.val = val

def __str__(self): # Used for instance itself

return str(self.val) # Convert to a string result

>>> objs = [Printer(2), Printer(3)]

>>> for x in objs: print(x) # __str__ run when instance printed

# But not when instance is in a list!

>>> print(objs)

[<__main__.Printer object at 0x000000000297AB38>, <__main__.Printer obj...etc...>]

>>> objs

[<__main__.Printer object at 0x000000000297AB38>, <__main__.Printer obj...etc...>]

To ensure that a custom display is run in all contexts regardless of the container, code

__repr__, not __str__; the former is run in all cases if the latter doesn’t apply, including

nested appearances:

>>> class Printer:

def __init__(self, val):

self.val = val

def __repr__(self): # __repr__ used by print if no __str__

return str(self.val) # __repr__ used if echoed or nested

>>> objs = [Printer(2), Printer(3)]

>>> for x in objs: print(x) # No __str__: runs __repr__

>>> print(objs) # Runs __repr__, not ___str__

[2, 3]

>>> objs

[2, 3]

Third, and perhaps most subtle, the display methods also have the potential to trigger

infinite recursion loops in rare contexts—because some objects’ displays include dis-

plays of other objects, it’s not impossible that a display may trigger a display of an object

being displayed, and thus loop. This is rare and obscure enough to skip here, but watch

916 | Chapter 30: Operator Overloading

www.it-ebooks.info

for an example of this looping potential to appear for these methods in a note near the

end of the next chapter in its listinherited.py example’s class, where __repr__ can loop.

In practice, __str__, and its more inclusive relative __repr__, seem to be the second

most commonly used operator overloading methods in Python scripts, behind

__init__. Anytime you can print an object and see a custom display, one of these two

tools is probably in use. For additional examples of these tools at work and the design

tradeoffs they imply, see Chapter 28’s case study and Chapter 31’s class lister mix-ins,

as well as their role in Chapter 35’s exception classes, where __str__ is required over

__repr__.

Right-Side and In-Place Uses: __radd__ and __iadd__

Our next group of overloading methods extends the functionality of binary operator

methods such as __add__ and __sub__ (called for + and -), which we’ve already seen.

As mentioned earlier, part of the reason there are so many operator overloading meth-

ods is because they come in multiple flavors—for every binary expression, we can im-

plement a left, right, and in-place variant. Though defaults are also applied if you don’t

code all three, your objects’ roles dictate how many variants you’ll need to code.

Right-Side Addition

For instance, the __add__ methods coded so far technically do not support the use of

instance objects on the right side of the + operator:

>>> class Adder:

def __init__(self, value=0):

self.data = value

def __add__(self, other):

return self.data + other

>>> x = Adder(5)

>>> x + 2

>>> 2 + x

TypeError: unsupported operand type(s) for +: 'int' and 'Adder'

To implement more general expressions, and hence support commutative-style opera-

tors, code the __radd__ method as well. Python calls __radd__ only when the object on

the right side of the + is your class instance, but the object on the left is not an instance

of your class. The __add__ method for the object on the left is called instead in all other

cases (all of this section’s five Commuter classes are coded in file commuter.py in the

book’s examples, along with a self-test):

class Commuter1:

def __init__(self, val):

self.val = val

def __add__(self, other):

print('add', self.val, other)

Right-Side and In-Place Uses: __radd__ and __iadd__ | 917

www.it-ebooks.info

return self.val + other

def __radd__(self, other):

print('radd', self.val, other)

return other + self.val

>>> from commuter import Commuter1

>>> x = Commuter1(88)

>>> y = Commuter1(99)

>>> x + 1 # __add__: instance + noninstance

add 88 1

>>> 1 + y # __radd__: noninstance + instance

radd 99 1

100

>>> x + y # __add__: instance + instance, triggers __radd__

add 88 <commuter.Commuter1 object at 0x00000000029B39E8>

radd 99 88

187

Notice how the order is reversed in __radd__: self is really on the right of the +, and

other is on the left. Also note that x and y are instances of the same class here; when

instances of different classes appear mixed in an expression, Python prefers the class

of the one on the left. When we add the two instances together, Python runs __add__,

which in turn triggers __radd__ by simplifying the left operand.

Reusing __add__ in __radd__

For truly commutative operations that do not require special-casing by position, it is

also sometimes sufficient to reuse __add__ for __radd__: either by calling __add__ di-

rectly; by swapping order and re-adding to trigger __add__ indirectly; or by simply

assigning __radd__ to be an alias for __add__ at the top level of the class statement (i.e.,

in the class’s scope). The following alternatives implement all three of these schemes,

and return the same results as the original—though the last saves an extra call or dis-

patch and hence may be quicker (in all, __radd__ is run when self is on the right side

of a +):

class Commuter2:

def __init__(self, val):

self.val = val

def __add__(self, other):

print('add', self.val, other)

return self.val + other

def __radd__(self, other):

return self.__add__(other) # Call __add__ explicitly

class Commuter3:

def __init__(self, val):

self.val = val

def __add__(self, other):

print('add', self.val, other)

return self.val + other

def __radd__(self, other):

918 | Chapter 30: Operator Overloading

www.it-ebooks.info

return self + other # Swap order and re-add

class Commuter4:

def __init__(self, val):

self.val = val

def __add__(self, other):

print('add', self.val, other)

return self.val + other

__radd__ = __add__ # Alias: cut out the middleman

In all these, right-side instance appearances trigger the single, shared __add__ method,

passing the right operand to self, to be treated the same as a left-side appearance. Run

these on your own for more insight; their returned values are the same as the original.

Propagating class type

In more realistic classes where the class type may need to be propagated in results,

things can become trickier: type testing may be required to tell whether it’s safe to

convert and thus avoid nesting. For instance, without the isinstance test in the fol-

lowing, we could wind up with a Commuter5 whose val is another Commuter5 when two

instances are added and __add__ triggers __radd__:

class Commuter5: # Propagate class type in results

def __init__(self, val):

self.val = val

def __add__(self, other):

if isinstance(other, Commuter5): # Type test to avoid object nesting

other = other.val

return Commuter5(self.val + other) # Else + result is another Commuter

def __radd__(self, other):

return Commuter5(other + self.val)

def __str__(self):

return '<Commuter5: %s>' % self.val

>>> from commuter import Commuter5

>>> x = Commuter5(88)

>>> y = Commuter5(99)

>>> print(x + 10) # Result is another Commuter instance

<Commuter5: 98>

>>> print(10 + y)

<Commuter5: 109>

>>> z = x + y # Not nested: doesn't recur to __radd__

>>> print(z)

<Commuter5: 187>

>>> print(z + 10)

<Commuter5: 197>

>>> print(z + z)

<Commuter5: 374>

>>> print(z + z + 1)

<Commuter5: 375>

The need for the isinstance type test here is very subtle—uncomment, run, and trace

to see why it’s required. If you do, you’ll see that the last part of the preceding test

Right-Side and In-Place Uses: __radd__ and __iadd__ | 919

www.it-ebooks.info

winds up differing and nesting objects—which still do the math correctly, but kick off

pointless recursive calls to simplify their values, and extra constructor calls build re-

sults:

>>> z = x + y # With isinstance test commented-out

>>> print(z)

<Commuter5: <Commuter5: 187>>

>>> print(z + 10)

<Commuter5: <Commuter5: 197>>

>>> print(z + z)

<Commuter5: <Commuter5: <Commuter5: <Commuter5: 374>>>>

>>> print(z + z + 1)

<Commuter5: <Commuter5: <Commuter5: <Commuter5: 375>>>>

To test, the rest of commuter.py looks and runs like this—classes can appear in tuples

naturally:

#!python

from __future__ import print_function # 2.X/3.X compatibility

...classes defined here...

if __name__ == '__main__':

for klass in (Commuter1, Commuter2, Commuter3, Commuter4, Commuter5):

print('-' * 60)

x = klass(88)

y = klass(99)

print(x + 1)

print(1 + y)

print(x + y)

c:\code> commuter.py

------------------------------------------------------------

add 88 1

radd 99 1

100

add 88 <__main__.Commuter1 object at 0x000000000297F2B0>

radd 99 88

187

------------------------------------------------------------

...etc...

There are too many coding variations to explore here, so experiment with these classes

on your own for more insight; aliasing __radd__ to __add__ in Commuter5, for example,

saves a line, but doesn’t prevent object nesting without isinstance. See also Python’s

manuals for a discussion of other options in this domain; for example, classes may also

return the special NotImplemented object for unsupported operands to influence method

selection (this is treated as though the method were not defined).

In-Place Addition

To also implement += in-place augmented addition, code either an __iadd__ or an

__add__. The latter is used if the former is absent. In fact, the prior section’s Commuter

920 | Chapter 30: Operator Overloading

www.it-ebooks.info

classes already support += for this reason—Python runs __add__ and assigns the result

manually. The __iadd__ method, though, allows for more efficient in-place changes to

be coded where applicable:

>>> class Number:

def __init__(self, val):

self.val = val

def __iadd__(self, other): # __iadd__ explicit: x += y

self.val += other # Usually returns self

return self

>>> x = Number(5)

>>> x += 1

>>> x.val

For mutable objects, this method can often specialize for quicker in-place changes:

>>> y = Number([1]) # In-place change faster than +

>>> y += [2]

>>> y += [3]

>>> y.val

[1, 2, 3]

The normal __add__ method is run as a fallback, but may not be able optimize in-place

cases:

>>> class Number:

def __init__(self, val):

self.val = val

def __add__(self, other): # __add__ fallback: x = (x + y)

return Number(self.val + other) # Propagates class type

>>> x = Number(5)

>>> x += 1

>>> x += 1 # And += does concatenation here

>>> x.val

Though we’ve focused on + here, keep in mind that every binary operator has similar

right-side and in-place overloading methods that work the same (e.g., __mul__,

__rmul__, and __imul__). Still, right-side methods are an advanced topic and tend to be

fairly uncommon in practice; you only code them when you need operators to be com-

mutative, and then only if you need to support such operators at all. For instance, a

Vector class may use these tools, but an Employee or Button class probably would not.

Call Expressions: __call__

On to our next overloading method: the __call__ method is called when your instance

is called. No, this isn’t a circular definition—if defined, Python runs a __call__ method

for function call expressions applied to your instances, passing along whatever posi-

Call Expressions: __call__ | 921

www.it-ebooks.info

tional or keyword arguments were sent. This allows instances to conform to a function-

based API:

>>> class Callee:

def __call__(self, *pargs, **kargs): # Intercept instance calls

print('Called:', pargs, kargs) # Accept arbitrary arguments

>>> C = Callee()

>>> C(1, 2, 3) # C is a callable object

Called: (1, 2, 3) {}

>>> C(1, 2, 3, x=4, y=5)

Called: (1, 2, 3) {'y': 5, 'x': 4}

More formally, all the argument-passing modes we explored in Chapter 18 are sup-

ported by the __call__ method—whatever is passed to the instance is passed to this

method, along with the usual implied instance argument. For example, the method

definitions:

class C:

def __call__(self, a, b, c=5, d=6): ... # Normals and defaults

class C:

def __call__(self, *pargs, **kargs): ... # Collect arbitrary arguments

class C:

def __call__(self, *pargs, d=6, **kargs): ... # 3.X keyword-only argument

all match all the following instance calls:

X = C()

X(1, 2) # Omit defaults

X(1, 2, 3, 4) # Positionals

X(a=1, b=2, d=4) # Keywords

X(*[1, 2], **dict(c=3, d=4)) # Unpack arbitrary arguments

X(1, *(2,), c=3, **dict(d=4)) # Mixed modes

See Chapter 18 for a refresher on function arguments. The net effect is that classes and

instances with a __call__ support the exact same argument syntax and semantics as

normal functions and methods.

Intercepting call expression like this allows class instances to emulate the look and feel

of things like functions, but also retain state information for use during calls. We saw

an example similar to the following while exploring scopes in Chapter 17, but you

should now be familiar enough with operator overloading to understand this pattern

better:

>>> class Prod:

def __init__(self, value): # Accept just one argument

self.value = value

def __call__(self, other):

return self.value * other

>>> x = Prod(2) # "Remembers" 2 in state

>>> x(3) # 3 (passed) * 2 (state)

922 | Chapter 30: Operator Overloading

www.it-ebooks.info

>>> x(4)

In this example, the __call__ may seem a bit gratuitous at first glance. A simple method

can provide similar utility:

>>> class Prod:

def __init__(self, value):

self.value = value

def comp(self, other):

return self.value * other

>>> x = Prod(3)

>>> x.comp(3)

>>> x.comp(4)

However, __call__ can become more useful when interfacing with APIs (i.e., libraries)

that expect functions—it allows us to code objects that conform to an expected func-

tion call interface, but also retain state information, and other class assets such as in-

heritance. In fact, it may be the third most commonly used operator overloading

method, behind the __init__ constructor and the __str__ and __repr__ display-format

alternatives.

Function Interfaces and Callback-Based Code

As an example, the tkinter GUI toolkit (named Tkinter in Python 2.X) allows you to

the registered objects. If you want an event handler to retain state between events, you

can register either a class’s bound method, or an instance that conforms to the expected

interface with __call__.

In the prior section’s code, for example, both x.comp from the second example and x

from the first can pass as function-like objects this way. Chapter 17’s closure func-

tions with state in enclosing scopes can achieve similar effects, but don’t provide as

much support for multiple operations or customization.

I’ll have more to say about bound methods in the next chapter, but for now, here’s a

hypothetical example of __call__ applied to the GUI domain. The following class de-

fines an object that supports a function-call interface, but also has state information

that remembers the color a button should change to when it is later pressed:

class Callback:

def __init__(self, color): # Function + state information

self.color = color

def __call__(self): # Support calls with no arguments

print('turn', self.color)

Call Expressions: __call__ | 923

www.it-ebooks.info

Now, in the context of a GUI, we can register instances of this class as event handlers

for buttons, even though the GUI expects to be able to invoke event handlers as simple

functions with no arguments:

# Handlers

cb1 = Callback('blue') # Remember blue

cb2 = Callback('green') # Remember green

B1 = Button(command=cb1) # Register handlers

B2 = Button(command=cb2)

When the button is later pressed, the instance object is called as a simple function with

no arguments, exactly like in the following calls. Because it retains state as instance

attributes, though, it remembers what to do—it becomes a stateful function object:

# Events

cb1() # Prints 'turn blue'

cb2() # Prints 'turn green'

In fact, many consider such classes to be the best way to retain state information in the

Python language (per generally accepted Pythonic principles, at least). With OOP, the

state remembered is made explicit with attribute assignments. This is different than

other state retention techniques (e.g., global variables, enclosing function scope refer-

ences, and default mutable arguments), which rely on more limited or implicit behavior.

Moreover, the added structure and customization in classes goes beyond state reten-

tion.

On the other hand, tools such as closure functions are useful in basic state retention

roles too, and 3.X’s nonlocal statement makes enclosing scopes a viable alternative in

more programs. We’ll revisit such tradeoffs when we start coding substantial decorators

in Chapter 39, but here’s a quick closure equivalent:

def callback(color): # Enclosing scope versus attrs

def oncall():

print('turn', color)

return oncall

cb3 = callback('yellow') # Handler to be registered

cb3() # On event: prints 'turn yellow'

Before we move on, there are two other ways that Python programmers sometimes tie

information to a callback function like this. One option is to use default arguments in

lambda functions:

cb4 = (lambda color='red': 'turn ' + color) # Defaults retain state too

print(cb4())

The other is to use bound methods of a class— a bit of a preview, but simple enough to

introduce here. A bound method object is a kind of object that remembers both the

self instance and the referenced function. This object may therefore be called later as

a simple function without an instance:

924 | Chapter 30: Operator Overloading

www.it-ebooks.info

class Callback:

def __init__(self, color): # Class with state information

self.color = color

def changeColor(self): # A normal named method

print('turn', self.color)

cb1 = Callback('blue')

cb2 = Callback('yellow')

B1 = Button(command=cb1.changeColor) # Bound method: reference, don't call

B2 = Button(command=cb2.changeColor) # Remembers function + self pair

In this case, when this button is later pressed it’s as if the GUI does this, which invokes

the instance’s changeColor method to process the object’s state information, instead of

the instance itself:

cb1 = Callback('blue')

obj = cb1.changeColor # Registered event handler

obj() # On event prints 'turn blue'

Note that a lambda is not required here, because a bound method reference by itself

already defers a call until later. This technique is simpler, but perhaps less general than

overloading calls with __call__. Again, watch for more about bound methods in the

next chapter.

You’ll also see another __call__ example in Chapter 32, where we will use it to imple-

ment something known as a function decorator—a callable object often used to add a

layer of logic on top of an embedded function. Because __call__ allows us to attach

state information to a callable object, it’s a natural implementation technique for a

function that must remember to call another function when called itself. For more

__call__ examples, see the state retention preview examples in Chapter 17, and the

more advanced decorators and metaclasses of Chapter 39 and Chapter 40.

Comparisons: __lt__, __gt__, and Others

Our next batch of overloading methods supports comparisons. As suggested in Ta-

ble 30-1, classes can define methods to catch all six comparison operators: <, >, <=, >=,

==, and !=. These methods are generally straightforward to use, but keep the following

qualifications in mind:

• Unlike the __add__/__radd__ pairings discussed earlier, there are no right-side var-

iants of comparison methods. Instead, reflective methods are used when only one

operand supports comparison (e.g., __lt__ and __gt__ are each other’s reflection).

• There are no implicit relationships among the comparison operators. The truth of

== does not imply that != is false, for example, so both __eq__ and __ne__ should

be defined to ensure that both operators behave correctly.

• In Python 2.X, a __cmp__ method is used by all comparisons if no more specific

comparison methods are defined; it returns a number that is less than, equal to, or

Comparisons: __lt__, __gt__, and Others | 925

www.it-ebooks.info

greater than zero, to signal less than, equal, and greater than results for the com-

parison of its two arguments (self and another operand). This method often uses

the cmp(x, y) built-in to compute its result. Both the __cmp__ method and the

cmp built-in function are removed in Python 3.X: use the more specific methods

instead.

We don’t have space for an in-depth exploration of comparison methods, but as a quick

introduction, consider the following class and test code:

class C:

data = 'spam'

def __gt__(self, other): # 3.X and 2.X version

return self.data > other

def __lt__(self, other):

return self.data < other

X = C()

print(X > 'ham') # True (runs __gt__)

print(X < 'ham') # False (runs __lt__)

When run under Python 3.X or 2.X, the prints at the end display the expected results

noted in their comments, because the class’s methods intercept and implement com-

parison expressions. Consult Python’s manuals and other reference resources for more

details in this category; for example, __lt__ is used for sorts in Python3.X, and as for

binary expression operators, these methods can also return NotImplemented for unsup-

ported arguments.

The __cmp__ Method in Python 2.X

In Python 2.X only, the __cmp__ method is used as a fallback if more specific methods

are not defined: its integer result is used to evaluate the operator being run. The fol-

lowing produces the same result as the prior section’s code under 2.X, for example, but

fails in 3.X because __cmp__ is no longer used:

class C:

data = 'spam' # 2.X only

def __cmp__(self, other): # __cmp__ not used in 3.X

return cmp(self.data, other) # cmp not defined in 3.X

X = C()

print(X > 'ham') # True (runs __cmp__)

print(X < 'ham') # False (runs __cmp__)

Notice that this fails in 3.X because __cmp__ is no longer special, not because the cmp

built-in function is no longer present. If we change the prior class to the following to

try to simulate the cmp call, the code still works in 2.X but fails in 3.X:

class C:

data = 'spam'

def __cmp__(self, other):

return (self.data > other) - (self.data < other)

926 | Chapter 30: Operator Overloading

www.it-ebooks.info

So why, you might be asking, did I just show you a comparison method that is no longer

supported in 3.X? While it would be easier to erase history entirely, this book is designed

to support both 2.X and 3.X readers. Because __cmp__ may appear in code 2.X readers

must reuse or maintain, it’s fair game in this book. Moreover, __cmp__ was removed

more abruptly than the __getslice__ method described earlier, and so may endure

longer. If you use 3.X, though, or care about running your code under 3.X in the future,

don’t use __cmp__ anymore: use the more specific comparison methods instead.

Boolean Tests: __bool__ and __len__

The next set of methods is truly useful (yes, pun intended!). As we’ve learned, every

object is inherently true or false in Python. When you code classes, you can define what

this means for your objects by coding methods that give the True or False values of

instances on request. The names of these methods differ per Python line; this section

starts with the 3.X story, then shows 2.X’s equivalent.

As mentioned briefly earlier, in Boolean contexts, Python first tries __bool__ to obtain

a direct Boolean value; if that method is missing, Python tries __len__ to infer a truth

value from the object’s length. The first of these generally uses object state or other

information to produce a Boolean result. In 3.X:

>>> class Truth:

def __bool__(self): return True

>>> X = Truth()

>>> if X: print('yes!')

yes!

>>> class Truth:

def __bool__(self): return False

>>> X = Truth()

>>> bool(X)

False

If this method is missing, Python falls back on length because a nonempty object is

considered true (i.e., a nonzero length is taken to mean the object is true, and a zero

length means it is false):

>>> class Truth:

def __len__(self): return 0

>>> X = Truth()

>>> if not X: print('no!')

no!

If both methods are present Python prefers __bool__ over __len__, because it is more

specific:

Boolean Tests: __bool__ and __len__ | 927

www.it-ebooks.info

>>> class Truth:

def __bool__(self): return True # 3.X tries __bool__ first

def __len__(self): return 0 # 2.X tries __len__ first

>>> X = Truth()

>>> if X: print('yes!')

yes!

If neither truth method is defined, the object is vacuously considered true (though any

potential implications for more metaphysically inclined readers are strictly coinciden-

tal):

>>> class Truth:

pass

>>> X = Truth()

>>> bool(X)

True

At least that’s the Truth in 3.X. These examples won’t generate exceptions in 2.X, but

some of their results there may look a bit odd (and trigger an existential crisis or two)

unless you read the next section.

Boolean Methods in Python 2.X

Alas, it’s not nearly as dramatic as billed—Python 2.X users simply use __nonzero__

instead of __bool__ in all of the preceding section’s code. Python 3.X renamed the 2.X

__nonzero__ method to __bool__, but Boolean tests work the same otherwise; both 3.X

and 2.X use __len__ as a fallback.

Subtly, if you don’t use the 2.X name, the first test in the prior section will work the

same for you anyhow, but only because __bool__ is not recognized as a special method

name in 2.X, and objects are considered true by default! To witness this version dif-

ference live, you need to return False:

C:\code> c:\python33\python

>>> class C:

def __bool__(self):

print('in bool')

return False

>>> X = C()

>>> bool(X)

in bool

False

>>> if X: print(99)

in bool

This works as advertised in 3.X. In 2.X, though, __bool__ is ignored and the object is

always considered true by default:

928 | Chapter 30: Operator Overloading

www.it-ebooks.info

C:\code> c:\python27\python

>>> class C:

def __bool__(self):

print('in bool')

return False

>>> X = C()

>>> bool(X)

True

>>> if X: print(99)

The short story here: in 2.X, use __nonzero__ for Boolean values, or return 0 from the

__len__ fallback method to designate false:

C:\code> c:\python27\python

>>> class C:

def __nonzero__(self):

print('in nonzero')

return False # Returns int (or True/False, same as 1/0)

>>> X = C()

>>> bool(X)

in nonzero

False

>>> if X: print(99)

in nonzero

But keep in mind that __nonzero__ works in 2.X only; if used in 3.X it will be silently

ignored and the object will be classified as true by default—just like using 3.X’s

__bool__ in 2.X!

And now that we’ve managed to cross over into the realm of philosophy, let’s move on

to look at one last overloading context: object demise.

Object Destruction: __del__

It’s time to close out this chapter—and learn how to do the same for our class objects.

We’ve seen how the __init__ constructor is called whenever an instance is generated

(and noted how __new__ is run first to make the object). Its counterpart, the destruc-

tor method __del__, is run automatically when an instance’s space is being reclaimed

(i.e., at “garbage collection” time):

>>> class Life:

def __init__(self, name='unknown'):

print('Hello ' + name)

self.name = name

def live(self):

print(self.name)

def __del__(self):

print('Goodbye ' + self.name)

Object Destruction: __del__ | 929

www.it-ebooks.info

>>> brian = Life('Brian')

Hello Brian

>>> brian.live()

Brian

>>> brian = 'loretta'

Goodbye Brian

Here, when brian is assigned a string, we lose the last reference to the Life instance

and so trigger its destructor method. This works, and it may be useful for implementing

some cleanup activities, such as terminating a server connection. However, destructors

are not as commonly used in Python as in some OOP languages, for a number of reasons

that the next section describes.

Destructor Usage Notes

The destructor method works as documented, but it has some well-known caveats and

a few outright dark corners that make it somewhat rare to see in Python code:

•Need: For one thing, destructors may not be as useful in Python as they are in some

other OOP languages. Because Python automatically reclaims all memory space

held by an instance when the instance is reclaimed, destructors are not necessary

for space management. In the current CPython implementation of Python, you

also don’t need to close file objects held by the instance in destructors because they

are automatically closed when reclaimed. As mentioned in Chapter 9, though, it’s

still sometimes best to run file close methods anyhow, because this autoclose be-

havior may vary in alternative Python implementations (e.g., Jython).

•Predictability: For another, you cannot always easily predict when an instance will

be reclaimed. In some cases, there may be lingering references to your objects in

system tables that prevent destructors from running when your program expects

them to be triggered. Python also does not guarantee that destructor methods will

be called for objects that still exist when the interpreter exits.

•Exceptions: In fact, __del__ can be tricky to use for even more subtle reasons. Ex-

ceptions raised within it, for example, simply print a warning message to

sys.stderr (the standard error stream) rather than triggering an exception event,

because of the unpredictable context under which it is run by the garbage collector

—it’s not always possible to know where such an exception should be delivered.

•Cycles: In addition, cyclic (a.k.a. circular) references among objects may prevent

garbage collection from happening when you expect it to. An optional cycle de-

tector, enabled by default, can automatically collect such objects eventually, but

only if they do not have __del__ methods. Since this is relatively obscure, we’ll

ignore further details here; see Python’s standard manuals’ coverage of both

__del__ and the gc garbage collector module for more information.

Because of these downsides, it’s often better to code termination activities in an ex-

plicitly called method (e.g., shutdown). As described in the next part of the book, the

930 | Chapter 30: Operator Overloading

www.it-ebooks.info

try/finally statement also supports termination actions, as does the with statement

for objects that support its context manager model.

Chapter Summary

That’s as many overloading examples as we have space for here. Most of the other

operator overloading methods work similarly to the ones we’ve explored, and all are

just hooks for intercepting built-in type operations. Some overloading methods, for

example, have unique argument lists or return values, but the general usage pattern is

the same. We’ll see a few others in action later in the book:

•Chapter 34 uses __enter__ and __exit__ in with statement context managers.

•Chapter 38 uses the __get__ and __set__ class descriptor fetch/set methods.

•Chapter 40 uses the __new__ object creation method in the context of metaclasses.

In addition, some of the methods we’ve studied here, such as __call__ and __str__,

will be employed by later examples in this book. For complete coverage, though, I’ll

defer to other documentation sources—see Python’s standard language manual or ref-

erence books for details on additional overloading methods.

In the next chapter, we leave the realm of class mechanics behind to explore common

design patterns—the ways that classes are commonly used and combined to optimize

code reuse. After that, we’ll survey a handful of advanced topics and move on to ex-

ceptions, the last core subject of this book. Before you read on, though, take a moment

to work through the chapter quiz below to review the concepts we’ve covered.

Test Your Knowledge: Quiz

1. What two operator overloading methods can you use to support iteration in your

classes?

2. What two operator overloading methods handle printing, and in what contexts?

3. How can you intercept slice operations in a class?

4. How can you catch in-place addition in a class?

5. When should you provide operator overloading?

Test Your Knowledge: Answers

1. Classes can support iteration by defining (or inheriting) __getitem__ or __iter__.

In all iteration contexts, Python tries to use __iter__ first, which returns an object

that supports the iteration protocol with a __next__ method: if no __iter__ is found

by inheritance search, Python falls back on the __getitem__ indexing method,

Test Your Knowledge: Answers | 931

www.it-ebooks.info

which is called repeatedly, with successively higher indexes. If used, the yield

statement can create the __next__ method automatically.

2. The __str__ and __repr__ methods implement object print displays. The former is

called by the print and str built-in functions; the latter is called by print and str

if there is no __str__, and always by the repr built-in, interactive echoes, and nested

appearances. That is, __repr__ is used everywhere, except by print and str when

a __str__ is defined. A __str__ is usually used for user-friendly displays;

__repr__ gives extra details or the object’s as-code form.

3. Slicing is caught by the __getitem__ indexing method: it is called with a slice object,

instead of a simple integer index, and slice objects may be passed on or inspected

as needed. In Python 2.X, __getslice__ (defunct in 3.X) may be used for two-limit

slices as well.

4. In-place addition tries __iadd__ first, and __add__ with an assignment second. The

same pattern holds true for all binary operators. The __radd__ method is also avail-

able for right-side addition.

5. When a class naturally matches, or needs to emulate, a built-in type’s interfaces.

For example, collections might imitate sequence or mapping interfaces, and call-

ables might be coded for use with an API that expects a function. You generally

shouldn’t implement expression operators if they don’t naturally map to your ob-

jects naturally and logically, though—use normally named methods instead.

932 | Chapter 30: Operator Overloading

www.it-ebooks.info

CHAPTER 31

Designing with Classes

So far in this part of the book, we’ve concentrated on using Python’s OOP tool, the

class. But OOP is also about design issues—that is, how to use classes to model useful

objects. This chapter will touch on a few core OOP ideas and present some additional

examples that are more realistic than many shown so far.

Along the way, we’ll code some common OOP design patterns in Python, such as

inheritance, composition, delegation, and factories. We’ll also investigate some design-

focused class concepts, such as pseudoprivate attributes, multiple inheritance, and

bound methods.

One note up front: some of the design terms mentioned here require more explanation

than I can provide in this book. If this material sparks your curiosity, I suggest exploring

a text on OOP design or design patterns as a next step. As we’ll see, the good news is

that Python makes many traditional design patterns trivial.

Python and OOP

Let’s begin with a review—Python’s implementation of OOP can be summarized by

three ideas:

Inheritance

Inheritance is based on attribute lookup in Python (in X.name expressions).

Polymorphism

In X.method, the meaning of method depends on the type (class) of subject object X.

Encapsulation

Methods and operators implement behavior, though data hiding is a convention

by default.

By now, you should have a good feel for what inheritance is all about in Python. We’ve

also talked about Python’s polymorphism a few times already; it flows from Python’s

lack of type declarations. Because attributes are always resolved at runtime, objects that

933

www.it-ebooks.info

implement the same interfaces are automatically interchangeable; clients don’t need to

know what sorts of objects are implementing the methods they call.

Encapsulation means packaging in Python—that is, hiding implementation details be-

hind an object’s interface. It does not mean enforced privacy, though that can be im-

plemented with code, as we’ll see in Chapter 39. Encapsulation is available and useful

in Python nonetheless: it allows the implementation of an object’s interface to be

changed without impacting the users of that object.

Polymorphism Means Interfaces, Not Call Signatures

Some OOP languages also define polymorphism to mean overloading functions based

on the type signatures of their arguments—the number passed and/or their types. Be-

cause there are no type declarations in Python, this concept doesn’t really apply; as

we’ve seen, polymorphism in Python is based on object interfaces, not types.

If you’re pining for your C++ days, you can try to overload methods by their argument

lists, like this:

class C:

def meth(self, x):

...

def meth(self, x, y, z):

...

This code will run, but because the def simply assigns an object to a name in the class’s

scope, the last definition of the method function is the only one that will be retained.

Put another way, it’s just as if you say X = 1 and then X = 2; X will be 2. Hence, there

can be only one definition of a method name.

If they are truly required, you can always code type-based selections using the type-

testing ideas we met in Chapter 4 and Chapter 9, or the argument list tools introduced

in Chapter 18:

class C:

def meth(self, *args):

if len(args) == 1: # Branch on number arguments

...

elif type(arg[0]) == int: # Branch on argument types (or isinstance())

...

You normally shouldn’t do this, though—it’s not the Python way. As described in

Chapter 16, you should write your code to expect only an object interface, not a specific

data type. That way, it will be useful for a broader category of types and applications,

both now and in the future:

class C:

def meth(self, x):

x.operation() # Assume x does the right thing

934 | Chapter 31: Designing with Classes

www.it-ebooks.info

It’s also generally considered better to use distinct method names for distinct opera-

tions, rather than relying on call signatures (no matter what language you code in).

Although Python’s object model is straightforward, much of the art in OOP is in the

way we combine classes to achieve a program’s goals. The next section begins a tour

of some of the ways larger programs use classes to their advantage.

OOP and Inheritance: “Is-a” Relationships

We’ve explored the mechanics of inheritance in depth already, but I’d now like to show

you an example of how it can be used to model real-world relationships. From a pro-

grammer’s point of view, inheritance is kicked off by attribute qualifications, which

trigger searches for names in instances, their classes, and then any superclasses. From

a designer’s point of view, inheritance is a way to specify set membership: a class defines

a set of properties that may be inherited and customized by more specific sets (i.e.,

subclasses).

To illustrate, let’s put that pizza-making robot we talked about at the start of this part

of the book to work. Suppose we’ve decided to explore alternative career paths and

open a pizza restaurant (not bad, as career paths go). One of the first things we’ll need

to do is hire employees to serve customers, prepare the food, and so on. Being engineers

at heart, we’ve decided to build a robot to make the pizzas; but being politically and

cybernetically correct, we’ve also decided to make our robot a full-fledged employee

with a salary.

Our pizza shop team can be defined by the four classes in the following Python 3.X and

2.X example file, employees.py. The most general class, Employee, provides common

behavior such as bumping up salaries (giveRaise) and printing (__repr__). There are

two kinds of employees, and so two subclasses of Employee—Chef and Server. Both

override the inherited work method to print more specific messages. Finally, our pizza

robot is modeled by an even more specific class—PizzaRobot is a kind of Chef, which

is a kind of Employee. In OOP terms, we call these relationships “is-a” links: a robot is

a chef, which is an employee. Here’s the employees.py file:

# File employees.py (2.X + 3.X)

from __future__ import print_function

class Employee:

def __init__(self, name, salary=0):

self.name = name

self.salary = salary

def giveRaise(self, percent):

self.salary = self.salary + (self.salary * percent)

def work(self):

print(self.name, "does stuff")

def __repr__(self):

return "<Employee: name=%s, salary=%s>" % (self.name, self.salary)

class Chef(Employee):

OOP and Inheritance: “Is-a” Relationships | 935

www.it-ebooks.info

def __init__(self, name):

Employee.__init__(self, name, 50000)

def work(self):

print(self.name, "makes food")

class Server(Employee):

def __init__(self, name):

Employee.__init__(self, name, 40000)

def work(self):

print(self.name, "interfaces with customer")

class PizzaRobot(Chef):

def __init__(self, name):

Chef.__init__(self, name)

def work(self):

print(self.name, "makes pizza")

if __name__ == "__main__":

bob = PizzaRobot('bob') # Make a robot named bob

print(bob) # Run inherited __repr__

bob.work() # Run type-specific action

bob.giveRaise(0.20) # Give bob a 20% raise

print(bob); print()

for klass in Employee, Chef, Server, PizzaRobot:

obj = klass(klass.__name__)

obj.work()

When we run the self-test code included in this module, we create a pizza-making robot

named bob, which inherits names from three classes: PizzaRobot, Chef, and Employee.

For instance, printing bob runs the Employee.__repr__ method, and giving bob a raise

invokes Employee.giveRaise because that’s where the inheritance search finds that

method:

c:\code> python employees.py

<Employee: name=bob, salary=50000>

bob makes pizza

<Employee: name=bob, salary=60000.0>

Employee does stuff

Chef makes food

Server interfaces with customer

PizzaRobot makes pizza

In a class hierarchy like this, you can usually make instances of any of the classes, not

just the ones at the bottom. For instance, the for loop in this module’s self-test code

creates instances of all four classes; each responds differently when asked to work be-

cause the work method is different in each. bob the robot, for example, gets work from

the most specific (i.e., lowest) PizzaRobot class.

Of course, these classes just simulate real-world objects; work prints a message for the

time being, but it could be expanded to do real work later (see Python’s interfaces to

936 | Chapter 31: Designing with Classes

www.it-ebooks.info

devices such as serial ports, Arduino boards, and the Raspberry Pi if you’re taking this

section much too literally!).

OOP and Composition: “Has-a” Relationships

The notion of composition was introduced in Chapter 26 and Chapter 28. From a

programmer’s point of view, composition involves embedding other objects in a con-

tainer object, and activating them to implement container methods. To a designer,

composition is another way to represent relationships in a problem domain. But, rather

than set membership, composition has to do with components—parts of a whole.

Composition also reflects the relationships between parts, called “has-a” relationships.

Some OOP design texts refer to composition as aggregation, or distinguish between the

two terms by using aggregation to describe a weaker dependency between container

and contained. In this text, a “composition” simply refers to a collection of embedded

objects. The composite class generally provides an interface all its own and implements

it by directing the embedded objects.

Now that we’ve implemented our employees, let’s put them in the pizza shop and let

them get busy. Our pizza shop is a composite object: it has an oven, and it has employees

like servers and chefs. When a customer enters and places an order, the components

of the shop spring into action—the server takes the order, the chef makes the pizza,

and so on. The following example—file pizzashop.py—runs the same on Python 3.X

and 2.X and simulates all the objects and relationships in this scenario:

# File pizzashop.py (2.X + 3.X)

from __future__ import print_function

from employees import PizzaRobot, Server

class Customer:

def __init__(self, name):

self.name = name

def order(self, server):

print(self.name, "orders from", server)

def pay(self, server):

print(self.name, "pays for item to", server)

class Oven:

def bake(self):

print("oven bakes")

class PizzaShop:

def __init__(self):

self.server = Server('Pat') # Embed other objects

self.chef = PizzaRobot('Bob') # A robot named bob

self.oven = Oven()

def order(self, name):

customer = Customer(name) # Activate other objects

customer.order(self.server) # Customer orders from server

OOP and Composition: “Has-a” Relationships | 937

www.it-ebooks.info

self.chef.work()

self.oven.bake()

customer.pay(self.server)

if __name__ == "__main__":

scene = PizzaShop() # Make the composite

scene.order('Homer') # Simulate Homer's order

print('...')

scene.order('Shaggy') # Simulate Shaggy's order

The PizzaShop class is a container and controller; its constructor makes and embeds

instances of the employee classes we wrote in the prior section, as well as an Oven class

defined here. When this module’s self-test code calls the PizzaShop order method, the

embedded objects are asked to carry out their actions in turn. Notice that we make a

new Customer object for each order, and we pass on the embedded Server object to

Customer methods; customers come and go, but the server is part of the pizza shop

composite. Also notice that employees are still involved in an inheritance relationship;

composition and inheritance are complementary tools.

When we run this module, our pizza shop handles two orders—one from Homer, and

then one from Shaggy:

c:\code> python pizzashop.py

Homer orders from <Employee: name=Pat, salary=40000>

Bob makes pizza

oven bakes

Homer pays for item to <Employee: name=Pat, salary=40000>

...

Shaggy orders from <Employee: name=Pat, salary=40000>

Bob makes pizza

oven bakes

Shaggy pays for item to <Employee: name=Pat, salary=40000>

Again, this is mostly just a toy simulation, but the objects and interactions are repre-

sentative of composites at work. As a rule of thumb, classes can represent just about

any objects and relationships you can express in a sentence; just replace nouns with

classes (e.g., Oven), and verbs with methods (e.g., bake), and you’ll have a first cut at a

design.

Stream Processors Revisited

For a composition example that may be a bit more tangible than pizza-making robots,

recall the generic data stream processor function we partially coded in the introduction

to OOP in Chapter 26:

def processor(reader, converter, writer):

while True:

data = reader.read()

if not data: break

data = converter(data)

writer.write(data)

938 | Chapter 31: Designing with Classes

www.it-ebooks.info

Rather than using a simple function here, we might code this as a class that uses com-

position to do its work in order to provide more structure and support inheritance. The

following 3.X/2.X file, streams.py, demonstrates one way to code the class:

class Processor:

def __init__(self, reader, writer):

self.reader = reader

self.writer = writer

def process(self):

while True:

data = self.reader.readline()

if not data: break

data = self.converter(data)

self.writer.write(data)

def converter(self, data):

assert False, 'converter must be defined' # Or raise exception

This class defines a converter method that it expects subclasses to fill in; it’s an example

of the abstract superclass model we outlined in Chapter 29 (more on assert in Part VII—

it simply raises an exception if its test is false). Coded this way, reader and writer objects

are embedded within the class instance (composition), and we supply the conversion

logic in a subclass rather than passing in a converter function (inheritance). The file

converters.py shows how:

from streams import Processor

class Uppercase(Processor):

def converter(self, data):

return data.upper()

if __name__ == '__main__':

import sys

obj = Uppercase(open('trispam.txt'), sys.stdout)

obj.process()

Here, the Uppercase class inherits the stream-processing loop logic (and anything else

that may be coded in its superclasses). It needs to define only what is unique about it

—the data conversion logic. When this file is run, it makes and runs an instance that

reads from the file trispam.txt and writes the uppercase equivalent of that file to the

stdout stream:

c:\code> type trispam.txt

spam

Spam

SPAM!

c:\code> python converters.py

SPAM

SPAM!

OOP and Composition: “Has-a” Relationships | 939

www.it-ebooks.info

To process different sorts of streams, pass in different sorts of objects to the class con-

struction call. Here, we use an output file instead of a stream:

C:\code> python

>>> import converters

>>> prog = converters.Uppercase(open('trispam.txt'), open('trispamup.txt', 'w'))

>>> prog.process()

C:\code> type trispamup.txt

SPAM

SPAM!

But, as suggested earlier, we could also pass in arbitrary objects coded as classes that

define the required input and output method interfaces. Here’s a simple example that

passes in a writer class that wraps up the text inside HTML tags:

C:\code> python

>>> from converters import Uppercase

>>>

>>> class HTMLize:

def write(self, line):

print('<PRE>%s</PRE>' % line.rstrip())

>>> Uppercase(open('trispam.txt'), HTMLize()).process()

If you trace through this example’s control flow, you’ll see that we get both uppercase

conversion (by inheritance) and HTML formatting (by composition), even though the

core processing logic in the original Processor superclass knows nothing about either

step. The processing code only cares that writers have a write method and that a method

named convert is defined; it doesn’t care what those methods do when they are called.

Such polymorphism and encapsulation of logic is behind much of the power of classes

in Python.

As is, the Processor superclass only provides a file-scanning loop. In more realistic

work, we might extend it to support additional programming tools for its subclasses,

and, in the process, turn it into a full-blown application framework. Coding such a tool

once in a superclass enables you to reuse it in all of your programs. Even in this simple

example, because so much is packaged and inherited with classes, all we had to code

was the HTML formatting step; the rest was free.

For another example of composition at work, see exercise 9 at the end of Chapter 32

and its solution in Appendix D; it’s similar to the pizza shop example. We’ve focused

on inheritance in this book because that is the main tool that the Python language itself

provides for OOP. But, in practice, composition may be used as much as inheritance

as a way to structure classes, especially in larger systems. As we’ve seen, inheritance

and composition are often complementary (and sometimes alternative) techniques.

940 | Chapter 31: Designing with Classes

www.it-ebooks.info

Because composition is a design issue outside the scope of the Python language and

this book, though, I’ll defer to other resources for more on this topic.

Why You Will Care: Classes and Persistence

I’ve mentioned Python’s pickle and shelve object persistence support a few times in

this part of the book because it works especially well with class instances. In fact, these

tools are often compelling enough to motivate the use of classes in general—by pickling

or shelving a class instance, we get data storage that contains both data and logic com-

bined.

For example, besides allowing us to simulate real-world interactions, the pizza shop

classes developed in this chapter could also be used as the basis of a persistent restaurant

database. Instances of classes can be stored away on disk in a single step using Python’s

pickle or shelve modules. We used shelves to store instances of classes in the OOP

tutorial in Chapter 28, but the object pickling interface is remarkably easy to use as well:

import pickle

object = SomeClass()

file = open(filename, 'wb') # Create external file

pickle.dump(object, file) # Save object in file

import pickle

file = open(filename, 'rb')

object = pickle.load(file) # Fetch it back later

Pickling converts in-memory objects to serialized byte streams (in Python, strings),

which may be stored in files, sent across a network, and so on; unpickling converts

back from byte streams to identical in-memory objects. Shelves are similar, but they

automatically pickle objects to an access-by-key database, which exports a dictionary-

like interface:

import shelve

object = SomeClass()

dbase = shelve.open(filename)

dbase['key'] = object # Save under key

import shelve

dbase = shelve.open(filename)

object = dbase['key'] # Fetch it back later

In our pizza shop example, using classes to model employees means we can get a simple

database of employees and shops with little extra work—pickling such instance objects

to a file makes them persistent across Python program executions:

>>> from pizzashop import PizzaShop

>>> shop = PizzaShop()

>>> shop.server, shop.chef

(<Employee: name=Pat, salary=40000>, <Employee: name=Bob, salary=50000>)

>>> import pickle

>>> pickle.dump(shop, open('shopfile.pkl', 'wb'))

This stores an entire composite shop object in a file all at once. To bring it back later in

another session or program, a single step suffices as well. In fact, objects restored this

way retain both state and behavior:

OOP and Composition: “Has-a” Relationships | 941

www.it-ebooks.info

>>> import pickle

>>> obj = pickle.load(open('shopfile.pkl', 'rb'))

>>> obj.server, obj.chef

(<Employee: name=Pat, salary=40000>, <Employee: name=Bob, salary=50000>)

>>> obj.order('LSP')

LSP orders from <Employee: name=Pat, salary=40000>

Bob makes pizza

oven bakes

LSP pays for item to <Employee: name=Pat, salary=40000>

This just runs a simulation as is, but we might extend the shop to keep track of inven-

tory, revenue, and so on—saving it to its file after changes would retain its updated

state. See the standard library manual and related coverage in Chapter 9, Chapter 28,

and Chapter 37 for more on pickles and shelves.

OOP and Delegation: “Wrapper” Proxy Objects

Beside inheritance and composition, object-oriented programmers often speak of del-

egation, which usually implies controller objects that embed other objects to which

they pass off operation requests. The controllers can take care of administrative activ-

ities, such as logging or validating accesses, adding extra steps to interface components,

or monitoring active instances.

In a sense, delegation is a special form of composition, with a single embedded object

managed by a wrapper (sometimes called a proxy) class that retains most or all of the

embedded object’s interface. The notion of proxies sometimes applies to other mech-

anisms too, such as function calls; in delegation, we’re concerned with proxies for all

of an object’s behavior, including method calls and other operations.

This concept was introduced by example in Chapter 28, and in Python is often imple-

mented with the __getattr__ method hook we studied in Chapter 30. Because this

operator overloading method intercepts accesses to nonexistent attributes, a wrapper

class can use __getattr__ to route arbitrary accesses to a wrapped object. Because this

method allows attribute requests to be routed generically, the wrapper class retains the

interface of the wrapped object and may add additional operations of its own.

By way of review, consider the file trace.py (which runs the same in 2.X and 3.X):

class Wrapper:

def __init__(self, object):

self.wrapped = object # Save object

def __getattr__(self, attrname):

print('Trace: ' + attrname) # Trace fetch

return getattr(self.wrapped, attrname) # Delegate fetch

Recall from Chapter 30 that __getattr__ gets the attribute name as a string. This code

makes use of the getattr built-in function to fetch an attribute from the wrapped object

by name string—getattr(X,N) is like X.N, except that N is an expression that evaluates

to a string at runtime, not a variable. In fact, getattr(X,N) is similar to X.__dict__[N],

942 | Chapter 31: Designing with Classes

www.it-ebooks.info

but the former also performs an inheritance search, like X.N, while the latter does not

(see Chapter 22 and Chapter 29 for more on the __dict__ attribute).

You can use the approach of this module’s wrapper class to manage access to any object

with attributes—lists, dictionaries, and even classes and instances. Here, the Wrapper

class simply prints a trace message on each attribute access and delegates the attribute

request to the embedded wrapped object:

>>> from trace import Wrapper

>>> x = Wrapper([1, 2, 3]) # Wrap a list

>>> x.append(4) # Delegate to list method

Trace: append

>>> x.wrapped # Print my member

[1, 2, 3, 4]

>>> x = Wrapper({'a': 1, 'b': 2}) # Wrap a dictionary

>>> list(x.keys()) # Delegate to dictionary method

Trace: keys

['a', 'b']

The net effect is to augment the entire interface of the wrapped object, with additional

code in the Wrapper class. We can use this to log our method calls, route method calls

to extra or custom logic, adapt a class to a new interface, and so on.

We’ll revive the notions of wrapped objects and delegated operations as one way to

extend built-in types in the next chapter. If you are interested in the delegation design

pattern, also watch for the discussions in Chapter 32 and Chapter 39 of function dec-

orators, a strongly related concept designed to augment a specific function or method

call rather than the entire interface of an object, and class decorators, which serve as a

way to automatically add such delegation-based wrappers to all instances of a class.

Version skew note: As we saw by example in Chapter 28, delegation of

object interfaces by general proxies has changed substantially in 3.X

when wrapped objects implement operator overloading methods. Tech-

nically, this is a new-style class difference, and can appear in 2.X code

too if it enables this option; per the next chapter, it’s mandatory in 3.X

and thus often considered a 3.X change.

In Python 2.X’s default classes, operator overloading methods run by

built-in operations are routed through generic attribute interception

methods like __getattr__. Printing a wrapped object directly, for ex-

ample, calls this method for __repr__ or __str__, which then passes the

call on to the wrapped object. This pattern holds for __iter__,

__add__, and the other operator methods of the prior chapter.

In Python 3.X, this no longer happens: printing does not trigger __get

attr__ (or its __getattribute__ cousin we’ll study in the next chapter)

and a default display is used instead. In 3.X, new-style classes look up

methods invoked implicitly by built-in operations in classes and skip

the normal instance lookup entirely. Explicit name attribute fetches are

routed to __getattr__ the same way in both 2.X and 3.X, but built-in

OOP and Delegation: “Wrapper” Proxy Objects | 943

www.it-ebooks.info

operation method lookup differs in ways that may impact some dele-

gation-based tools.

We’ll return to this issue in the next chapter as a new-style class change,

and see it live in Chapter 38 and Chapter 39, in the context of managed

attributes and decorators. For now, keep in mind that for delegation

coding patterns, you may need to redefine operator overloading meth-

ods in wrapper classes (either by hand, by tools, or by superclasses) if

they are used by embedded objects and you want them to be intercepted

in new-style classes.

Pseudoprivate Class Attributes

Besides larger structuring goals, class designs often must address name usage too. In

Chapter 28’s case study, for example, we noted that methods defined within a general

tool class might be modified by subclasses if exposed, and noted the tradeoffs of this

policy—while it supports method customization and direct calls, it’s also open to ac-

cidental replacements.

In Part V, we learned that every name assigned at the top level of a module file is

exported. By default, the same holds for classes—data hiding is a convention, and

clients may fetch or change attributes in any class or instance to which they have a

reference. In fact, attributes are all “public” and “virtual,” in C++ terms; they’re all

accessible everywhere and are looked up dynamically at runtime.1

That said, Python today does support the notion of name “mangling” (i.e., expansion)

to localize some names in classes. Mangled names are sometimes misleadingly called

“private attributes,” but really this is just a way to localize a name to the class that

created it—name mangling does not prevent access by code outside the class. This

feature is mostly intended to avoid namespace collisions in instances, not to restrict

access to names in general; mangled names are therefore better called “pseudoprivate”

than “private.”

Pseudoprivate names are an advanced and entirely optional feature, and you probably

won’t find them very useful until you start writing general tools or larger class hierar-

chies for use in multiprogrammer projects. In fact, they are not always used even when

they probably should be—more commonly, Python programmers code internal names

with a single underscore (e.g., _X), which is just an informal convention to let you know

that a name shouldn’t generally be changed (it means nothing to Python itself).

1. This tends to scare people with a C++ background disproportionately. In Python, it’s even possible to

change or completely delete a class’s method at runtime. On the other hand, almost nobody ever does

this in practical programs. As a scripting language, Python is more about enabling than restricting. Also,

recall from our discussion of operator overloading in Chapter 30 that __getattr__ and __setattr__ can

be used to emulate privacy, but are generally not used for this purpose in practice. More on this when we

code a more realistic privacy decorator in Chapter 39.

944 | Chapter 31: Designing with Classes

www.it-ebooks.info

Because you may see this feature in other people’s code, though, you need to be some-

what aware of it, even if you don’t use it yourself. And once you learn its advantages

and contexts of use, you may find this feature to be more useful in your own code than

some programmers realize.

Name Mangling Overview

Here’s how name mangling works: within a class statement only, any names that

start with two underscores but don’t end with two underscores are automatically ex-

panded to include the name of the enclosing class at their front. For instance, a name

like __X within a class named Spam is changed to _Spam__X automatically: the original

name is prefixed with a single underscore and the enclosing class’s name. Because the

modified name contains the name of the enclosing class, it’s generally unique; it won’t

clash with similar names created by other classes in a hierarchy.

Name mangling happens only for names that appear inside a class statement’s code,

and then only for names that begin with two leading underscores. It works for every

name preceded with double underscores, though—both class attributes (including

method names) and instance attribute names assigned to self. For example, in a class

named Spam, a method named __meth is mangled to _Spam__meth, and an instance at-

tribute reference self.__X is transformed to self._Spam__X.

Despite the mangling, as long as the class uses the double underscore version every-

where it refers to the name, all its references will still work. Because more than one class

may add attributes to an instance, though, this mangling helps avoid clashes—but we

need to move on to an example to see how.

Why Use Pseudoprivate Attributes?

One of the main issues that the pseudoprivate attribute feature is meant to alleviate has

to do with the way instance attributes are stored. In Python, all instance attributes wind

up in the single instance object at the bottom of the class tree, and are shared by all

class-level method functions the instance is passed into. This is different from the

C++ model, where each class gets its own space for data members it defines.

Within a class’s method in Python, whenever a method assigns to a self attribute (e.g.,

self.attr = value), it changes or creates an attribute in the instance (recall that inher-

itance searches happen only on reference, not on assignment). Because this is true even

if multiple classes in a hierarchy assign to the same attribute, collisions are possible.

For example, suppose that when a programmer codes a class, it is assumed that the

class owns the attribute name X in the instance. In this class’s methods, the name is set,

and later fetched:

class C1:

def meth1(self): self.X = 88 # I assume X is mine

def meth2(self): print(self.X)

Pseudoprivate Class Attributes | 945

www.it-ebooks.info

Suppose further that another programmer, working in isolation, makes the same as-

sumption in another class:

class C2:

def metha(self): self.X = 99 # Me too

def methb(self): print(self.X)

Both of these classes work by themselves. The problem arises if the two classes are ever

mixed together in the same class tree:

class C3(C1, C2): ...

I = C3() # Only 1 X in I!

Now, the value that each class gets back when it says self.X will depend on which class

assigned it last. Because all assignments to self.X refer to the same single instance,

there is only one X attribute—I.X—no matter how many classes use that attribute name.

This isn’t a problem if it’s expected, and indeed, this is how classes communicate—the

instance is shared memory. To guarantee that an attribute belongs to the class that uses

it, though, prefix the name with double underscores everywhere it is used in the class,

as in this 2.X/3.X file, pseudoprivate.py:

class C1:

def meth1(self): self.__X = 88 # Now X is mine

def meth2(self): print(self.__X) # Becomes _C1__X in I

class C2:

def metha(self): self.__X = 99 # Me too

def methb(self): print(self.__X) # Becomes _C2__X in I

class C3(C1, C2): pass

I = C3() # Two X names in I

I.meth1(); I.metha()

print(I.__dict__)

I.meth2(); I.methb()

When thus prefixed, the X attributes will be expanded to include the names of their

classes before being added to the instance. If you run a dir call on I or inspect its

namespace dictionary after the attributes have been assigned, you’ll see the expanded

names, _C1__X and _C2__X, but not X. Because the expansion makes the names more

unique within the instance, the class coders can be fairly safe in assuming that they

truly own any names that they prefix with two underscores:

% python pseudoprivate.py

{'_C2__X': 99, '_C1__X': 88}

This trick can avoid potential name collisions in the instance, but note that it does not

amount to true privacy. If you know the name of the enclosing class, you can still access

either of these attributes anywhere you have a reference to the instance by using the

fully expanded name (e.g., I._C1__X = 77). Moreover, names could still collide if un-

knowing programmers use the expanded naming pattern explicitly (unlikely, but not

946 | Chapter 31: Designing with Classes

www.it-ebooks.info

impossible). On the other hand, this feature makes it less likely that you will acciden-

tally step on a class’s names.

Pseudoprivate attributes are also useful in larger frameworks or tools, both to avoid

introducing new method names that might accidentally hide definitions elsewhere in

the class tree and to reduce the chance of internal methods being replaced by names

defined lower in the tree. If a method is intended for use only within a class that may

be mixed into other classes, the double underscore prefix virtually ensures that the

method won’t interfere with other names in the tree, especially in multiple-inheritance

scenarios:

class Super:

def method(self): ... # A real application method

class Tool:

def __method(self): ... # Becomes _Tool__method

def other(self): self.__method() # Use my internal method

class Sub1(Tool, Super): ...

def actions(self): self.method() # Runs Super.method as expected

class Sub2(Tool):

def __init__(self): self.method = 99 # Doesn't break Tool.__method

We met multiple inheritance briefly in Chapter 26 and will explore it in more detail

later in this chapter. Recall that superclasses are searched according to their left-to-right

order in class header lines. Here, this means Sub1 prefers Tool attributes to those in

Super. Although in this example we could force Python to pick the application class’s

methods first by switching the order of the superclasses listed in the Sub1 class header,

pseudoprivate attributes resolve the issue altogether. Pseudoprivate names also prevent

subclasses from accidentally redefining the internal method’s names, as in Sub2.

Again, I should note that this feature tends to be of use primarily for larger, multiprog-

rammer projects, and then only for selected names. Don’t be tempted to clutter your

code unnecessarily; only use this feature for names that truly need to be controlled by

a single class. Although useful in some general class-based tools, for simpler programs,

it’s probably overkill.

For more examples that make use of the __X naming feature, see the lister.py mix-in

classes introduced later in this chapter in the multiple inheritance section, as well as

the discussion of Private class decorators in Chapter 39.

If you care about privacy in general, you might want to review the emulation of private

instance attributes sketched in the section “Attribute Access: __getattr__ and __se-

tattr__” on page 909 in Chapter 30, and watch for the more complete Private class

decorator we’ll build with delegation in Chapter 39. Although it’s possible to emulate

true access controls in Python classes, this is rarely done in practice, even for large

systems.

Pseudoprivate Class Attributes | 947

www.it-ebooks.info

Methods Are Objects: Bound or Unbound

Methods in general, and bound methods in particular, simplify the implementation of

many design goals in Python. We met bound methods briefly while studying __call__ in

Chapter 30. The full story, which we’ll flesh out here, turns out to be more general and

flexible than you might expect.

In Chapter 19, we learned how functions can be processed as normal objects. Methods

are a kind of object too, and can be used generically in much the same way as other

objects—they can be assigned to names, passed to functions, stored in data structures,

and so on—and like simple functions, qualify as “first class” objects. Because a class’s

methods can be accessed from an instance or a class, though, they actually come in two

flavors in Python:

Unbound (class) method objects: no self

Accessing a function attribute of a class by qualifying the class returns an unbound

method object. To call the method, you must provide an instance object explicitly

as the first argument. In Python 3.X, an unbound method is the same as a simple

function and can be called through the class’s name; in 2.X it’s a distinct type and

cannot be called without providing an instance.

Bound (instance) method objects: self + function pairs

Accessing a function attribute of a class by qualifying an instance returns a bound

method object. Python automatically packages the instance with the function in

the bound method object, so you don’t need to pass an instance to call the method.

Both kinds of methods are full-fledged objects; they can be transferred around a pro-

gram at will, just like strings and numbers. Both also require an instance in their first

argument when run (i.e., a value for self). This is why we’ve had to pass in an instance

explicitly when calling superclass methods from subclass methods in previous exam-

ples (including this chapter’s employees.py); technically, such calls produce unbound

method objects along the way.

When calling a bound method object, Python provides an instance for you automati-

cally—the instance used to create the bound method object. This means that bound

method objects are usually interchangeable with simple function objects, and makes

them especially useful for interfaces originally written for functions (see the sidebar

“Why You Will Care: Bound Method Callbacks” on page 953 for a realistic use case

in GUIs).

To illustrate in simple terms, suppose we define the following class:

class Spam:

def doit(self, message):

print(message)

Now, in normal operation, we make an instance and call its method in a single step to

print the passed-in argument:

948 | Chapter 31: Designing with Classes

www.it-ebooks.info

object1 = Spam()

object1.doit('hello world')

Really, though, a bound method object is generated along the way, just before the

method call’s parentheses. In fact, we can fetch a bound method without actually call-

ing it. An object.name expression evaluates to an object as all expressions do. In the

following, it returns a bound method object that packages the instance (object1) with

the method function (Spam.doit). We can assign this bound method pair to another

name and then call it as though it were a simple function:

object1 = Spam()

x = object1.doit # Bound method object: instance+function

x('hello world') # Same effect as object1.doit('...')

On the other hand, if we qualify the class to get to doit, we get back an unbound method

object, which is simply a reference to the function object. To call this type of method,

we must pass in an instance as the leftmost argument—there isn’t one in the expression

otherwise, and the method expects it:

object1 = Spam()

t = Spam.doit # Unbound method object (a function in 3.X: see ahead)

t(object1, 'howdy') # Pass in instance (if the method expects one in 3.X)

By extension, the same rules apply within a class’s method if we reference self attributes

that refer to functions in the class. A self.method expression is a bound method object

because self is an instance object:

class Eggs:

def m1(self, n):

print(n)

def m2(self):

x = self.m1 # Another bound method object

x(42) # Looks like a simple function

Eggs().m2() # Prints 42

Most of the time, you call methods immediately after fetching them with attribute

qualification, so you don’t always notice the method objects generated along the way.

But if you start writing code that calls objects generically, you need to be careful to treat

unbound methods specially—they normally require an explicit instance object to be

passed in.

For an optional exception to this rule, see the discussion of static and

class methods in the next chapter, and the brief mention of one in the

next section. Like bound methods, static methods can masquerade as

basic functions because they do not expect instances when called. For-

mally speaking, Python supports three kinds of class-level methods—

instance, static, and class—and 3.X allows simple functions in classes,

too. Chapter 40’s metaclass methods are distinct too, but they are es-

sentially class methods with less scope.

Methods Are Objects: Bound or Unbound | 949

www.it-ebooks.info

Unbound Methods Are Functions in 3.X

In Python 3.X, the language has dropped the notion of unbound methods. What we

describe as an unbound method here is treated as a simple function in 3.X. For most

purposes, this makes no difference to your code; either way, an instance will be passed

to a method’s first argument when it’s called through an instance.

Programs that do explicit type testing might be impacted, though—if you print the type

of an instance-less class-level method, it displays “unbound method” in 2.X, and

“function” in 3.X.

Moreover, in 3.X it is OK to call a method without an instance, as long as the method

does not expect one and you call it only through the class and never through an instance.

That is, Python 3.X will pass along an instance to methods only for through-instance

calls. When calling through a class, you must pass an instance manually only if the

method expects one:

C:\code> c:\python33\python

>>> class Selfless:

def __init__(self, data):

self.data = data

def selfless(arg1, arg2): # A simple function in 3.X

return arg1 + arg2

def normal(self, arg1, arg2): # Instance expected when called

return self.data + arg1 + arg2

>>> X = Selfless(2)

>>> X.normal(3, 4) # Instance passed to self automatically: 2+(3+4)

>>> Selfless.normal(X, 3, 4) # self expected by method: pass manually

>>> Selfless.selfless(3, 4) # No instance: works in 3.X, fails in 2.X!

The last test in this fails in 2.X, because unbound methods require an instance to be

passed by default; it works in 3.X because such methods are treated as simple functions

not requiring an instance. Although this removes some potential error trapping in 3.X

(what if a programmer accidentally forgets to pass an instance?), it allows a class’s

methods to be used as simple functions as long as they are not passed and do not expect

a “self” instance argument.

The following two calls still fail in both 3.X and 2.X, though—the first (calling through

an instance) automatically passes an instance to a method that does not expect one,

while the second (calling through a class) does not pass an instance to a method that

does expect one (error message text here is per 3.3):

>>> X.selfless(3, 4)

TypeError: selfless() takes 2 positional arguments but 3 were given

>>> Selfless.normal(3, 4)

TypeError: normal() missing 1 required positional argument: 'arg2'

950 | Chapter 31: Designing with Classes

www.it-ebooks.info

Because of this change, the staticmethod built-in function and decorator described in

the next chapter is not needed in 3.X for methods without a self argument that are

called only through the class name, and never through an instance—such methods are

run as simple functions, without receiving an instance argument. In 2.X, such calls are

errors unless an instance is passed manually or the method is marked as being static

(more on static methods in the next chapter).

It’s important to be aware of the differences in behavior in 3.X, but bound methods are

generally more important from a practical perspective anyway. Because they pair to-

gether the instance and function in a single object, they can be treated as callables

generically. The next section demonstrates what this means in code.

For a more visual illustration of unbound method treatment in Python

3.X and 2.X, see also the lister.py example in the multiple inheritance

section later in this chapter. Its classes print the value of methods fetched

from both instances and classes, in both versions of Python—as un-

bound methods in 2.X and simple functions in 3.X. Also note that this

change is inherent in 3.X itself, not the new-style class model it man-

dates.

Bound Methods and Other Callable Objects

As mentioned earlier, bound methods can be processed as generic objects, just like

simple functions—they can be passed around a program arbitrarily. Moreover, because

bound methods combine both a function and an instance in a single package, they can

be treated like any other callable object and require no special syntax when invoked.

The following, for example, stores four bound method objects in a list and calls them

later with normal call expressions:

>>> class Number:

def __init__(self, base):

self.base = base

def double(self):

return self.base * 2

def triple(self):

return self.base * 3

>>> x = Number(2) # Class instance objects

>>> y = Number(3) # State + methods

>>> z = Number(4)

>>> x.double() # Normal immediate calls

>>> acts = [x.double, y.double, y.triple, z.double] # List of bound methods

>>> for act in acts: # Calls are deferred

print(act()) # Call as though functions

Methods Are Objects: Bound or Unbound | 951

www.it-ebooks.info

Like simple functions, bound method objects have introspection information of their

own, including attributes that give access to the instance object and method function

they pair. Calling the bound method simply dispatches the pair:

>>> bound = x.double

>>> bound.__self__, bound.__func__

(<__main__.Number object at 0x...etc...>, <function Number.double at 0x...etc...>)

>>> bound.__self__.base

>>> bound() # Calls bound.__func__(bound.__self__, ...)

Other callables

In fact, bound methods are just one of a handful of callable object types in Python. As

the following demonstrates, simple functions coded with a def or lambda, instances that

inherit a __call__, and bound instance methods can all be treated and called the same

way:

>>> def square(arg):

return arg ** 2 # Simple functions (def or lambda)

>>> class Sum:

def __init__(self, val): # Callable instances

self.val = val

def __call__(self, arg):

return self.val + arg

>>> class Product:

def __init__(self, val): # Bound methods

self.val = val

def method(self, arg):

return self.val * arg

>>> sobject = Sum(2)

>>> pobject = Product(3)

>>> actions = [square, sobject, pobject.method] # Function, instance, method

>>> for act in actions: # All three called same way

print(act(5)) # Call any one-arg callable

>>> actions[-1](5) # Index, comprehensions, maps

>>> [act(5) for act in actions]

[25, 7, 15]

>>> list(map(lambda act: act(5), actions))

[25, 7, 15]

952 | Chapter 31: Designing with Classes

www.it-ebooks.info

Technically speaking, classes belong in the callable objects category too, but we nor-

mally call them to generate instances rather than to do actual work—a single action is

better coded as a simple function than a class with a constructor, but the class here

serves to illustrate its callable nature:

>>> class Negate:

def __init__(self, val): # Classes are callables too

self.val = -val # But called for object, not work

def __repr__(self): # Instance print format

return str(self.val)

>>> actions = [square, sobject, pobject.method, Negate] # Call a class too

>>> for act in actions:

print(act(5))

-5

>>> [act(5) for act in actions] # Runs __repr__ not __str__!

[25, 7, 15, −5]

>>> table = {act(5): act for act in actions} # 3.X/2.7 dict comprehension

>>> for (key, value) in table.items():

print('{0:2} => {1}'.format(key, value)) # 2.6+/3.X str.format

25 => <function square at 0x0000000002987400>

15 => <bound method Product.method of <__main__.Product object at ...etc...>>

-5 => <class '__main__.Negate'>

7 => <__main__.Sum object at 0x000000000298BE48>

As you can see, bound methods, and Python’s callable objects model in general, are

some of the many ways that Python’s design makes for an incredibly flexible language.

You should now understand the method object model. For other examples of bound

methods at work, see the upcoming sidebar “Why You Will Care: Bound Method

Callbacks” on page 953 as well as the prior chapter’s discussion of callback handlers

in the section on the method __call__.

Why You Will Care: Bound Method Callbacks

Because bound methods automatically pair an instance with a class’s method function,

you can use them anywhere a simple function is expected. One of the most common

places you’ll see this idea put to work is in code that registers methods as event callback

handlers in the tkinter GUI interface (named Tkinter in Python 2.X) we’ve met before.

As review, here’s the simple case:

def handler():

...use globals or closure scopes for state...

...

widget = Button(text='spam', command=handler)

To register a handler for button click events, we usually pass a callable object that takes

no arguments to the command keyword argument. Function names (and lambdas) work

Methods Are Objects: Bound or Unbound | 953

www.it-ebooks.info

here, and so do class-level methods—though they must be bound methods if they ex-

pect an instance when called:

class MyGui:

def handler(self):

...use self.attr for state...

def makewidgets(self):

b = Button(text='spam', command=self.handler)

Here, the event handler is self.handler—a bound method object that remembers both

self and MyGui.handler. Because self will refer to the original instance when handler

is later invoked on events, the method will have access to instance attributes that can

retain state between events, as well as class-level methods. With simple functions, state

normally must be retained in global variables or enclosing function scopes instead.

See also the discussion of __call__ operator overloading in Chapter 30 for another way

to make classes compatible with function-based APIs, and lambda in Chapter 19 for

another tool often used in callback roles. As noted in the former of these, you don’t

generally need to wrap a bound method in a lambda; the bound method in the preceding

example already defers the call (note that there are no parentheses to trigger one), so

adding a lambda here would be pointless!

Classes Are Objects: Generic Object Factories

Sometimes, class-based designs require objects to be created in response to conditions

that can’t be predicted when a program is written. The factory design pattern allows

such a deferred approach. Due in large part to Python’s flexibility, factories can take

multiple forms, some of which don’t seem special at all.

Because classes are also “first class” objects, it’s easy to pass them around a program,

store them in data structures, and so on. You can also pass classes to functions that

generate arbitrary kinds of objects; such functions are sometimes called factories in

OOP design circles. Factories can be a major undertaking in a strongly typed language

such as C++ but are almost trivial to implement in Python.

For example, the call syntax we met in Chapter 18 can call any class with any number

of positional or keyword constructor arguments in one step to generate any sort of

instance:2

def factory(aClass, *pargs, **kargs): # Varargs tuple, dict

return aClass(*pargs, **kargs) # Call aClass (or apply in 2.X only)

class Spam:

2. Actually, this syntax can invoke any callable object, including functions, classes, and methods. Hence,

the factory function here can also run any callable object, not just a class (despite the argument name).

Also, as we learned in Chapter 18, Python 2.X has an alternative to aClass(*pargs, **kargs): the

apply(aClass, pargs, kargs) built-in call, which has been removed in Python 3.X because of its

redundancy and limitations.

954 | Chapter 31: Designing with Classes

www.it-ebooks.info

def doit(self, message):

print(message)

class Person:

def __init__(self, name, job=None):

self.name = name

self.job = job

object1 = factory(Spam) # Make a Spam object

object2 = factory(Person, "Arthur", "King") # Make a Person object

object3 = factory(Person, name='Brian') # Ditto, with keywords and default

In this code, we define an object generator function called factory. It expects to be

passed a class object (any class will do) along with one or more arguments for the class’s

constructor. The function uses special “varargs” call syntax to call the function and

return an instance.

The rest of the example simply defines two classes and generates instances of both by

passing them to the factory function. And that’s the only factory function you’ll ever

need to write in Python; it works for any class and any constructor arguments. If you

run this live (factory.py), your objects will look like this:

>>> object1.doit(99)

>>> object2.name, object2.job

('Arthur', 'King')

>>> object3.name, object3.job

('Brian', None)

By now, you should know that everything is a “first class” object in Python—including

classes, which are usually just compiler input in languages like C++. It’s natural to pass

them around this way. As mentioned at the start of this part of the book, though, only

objects derived from classes do full OOP in Python.

Why Factories?

So what good is the factory function (besides providing an excuse to illustrate first-

class class objects in this book)? Unfortunately, it’s difficult to show applications of

this design pattern without listing much more code than we have space for here. In

general, though, such a factory might allow code to be insulated from the details of

dynamically configured object construction.

For instance, recall the processor example presented in the abstract in Chapter 26, and

then again as a composition example earlier in this chapter. It accepts reader and writer

objects for processing arbitrary data streams. The original version of this example

manually passed in instances of specialized classes like FileWriter and SocketReader to

customize the data streams being processed; later, we passed in hardcoded file, stream,

and formatter objects. In a more dynamic scenario, external devices such as configu-

ration files or GUIs might be used to configure the streams.

Classes Are Objects: Generic Object Factories | 955

www.it-ebooks.info

In such a dynamic world, we might not be able to hardcode the creation of stream

interface objects in our scripts, but might instead create them at runtime according to

the contents of a configuration file.

Such a file might simply give the string name of a stream class to be imported from a

module, plus an optional constructor call argument. Factory-style functions or code

might come in handy here because they would allow us to fetch and pass in classes that

are not hardcoded in our program ahead of time. Indeed, those classes might not even

have existed at all when we wrote our code:

classname = ...parse from config file...

classarg = ...parse from config file...

import streamtypes # Customizable code

aclass = getattr(streamtypes, classname) # Fetch from module

reader = factory(aclass, classarg) # Or aclass(classarg)

processor(reader, ...)

Here, the getattr built-in is again used to fetch a module attribute given a string name

(it’s like saying obj.attr, but attr is a string). Because this code snippet assumes a

single constructor argument, it doesn’t strictly need factory—we could make an in-

stance with just aclass(classarg). The factory function may prove more useful in the

presence of unknown argument lists, however, and the general factory coding pattern

can improve the code’s flexibility.

Multiple Inheritance: “Mix-in” Classes

Our last design pattern is one of the most useful, and will serve as a subject for a more

realistic example to wrap up this chapter and point toward the next. As a bonus, the

code we’ll write here may be a useful tool.

Many class-based designs call for combining disparate sets of methods. As we’ve seen,

in a class statement, more than one superclass can be listed in parentheses in the header

line. When you do this, you leverage multiple inheritance—the class and its instances

inherit names from all the listed superclasses.

When searching for an attribute, Python’s inheritance search traverses all superclasses

in the class header from left to right until a match is found. Technically, because any

of the superclasses may have superclasses of its own, this search can be a bit more

complex for larger class trees:

• In classic classes (the default until Python 3.0), the attribute search in all cases

proceeds depth-first all the way to the top of the inheritance tree, and then from

left to right. This order is usually called DFLR, for its depth-first, left-to-right path.

• In new-style classes (optional in 2.X and standard in 3.X), the attribute search is

usually as before, but in diamond patterns proceeds across by tree levels before

moving up, in a more breadth-first fashion. This order is usually called the new-

956 | Chapter 31: Designing with Classes

www.it-ebooks.info

style MRO, for method resolution order, though it’s used for all attributes, not just

methods.

The second of these search rules is explained fully in the new-style class discussion in

the next chapter. Though difficult to understand without the next chapter’s code (and

somewhat rare to create yourself), diamond patterns appear when multiple classes in a

tree share a common superclass; the new-style search order is designed to visit such a

shared superclass just once, and after all its subclasses. In either model, though, when

a class has multiple superclasses, they are searched from left to right according to the

order listed in the class statement header lines.

In general, multiple inheritance is good for modeling objects that belong to more than

one set. For instance, a person may be an engineer, a writer, a musician, and so on, and

inherit properties from all such sets. With multiple inheritance, objects obtain the un-

ion of the behavior in all their superclasses. As we’ll see ahead, multiple inheritance

also allows classes to function as general packages of mixable attributes.

Though a useful pattern, multiple inheritance’s chief downside is that it can pose a

conflict when the same method (or other attribute) name is defined in more than one

superclass. When this occurs, the conflict is resolved either automatically by the in-

heritance search order, or manually in your code:

•Default: By default, inheritance chooses the first occurrence of an attribute it finds

when an attribute is referenced normally—by self.method(), for example. In this

mode, Python chooses the lowest and leftmost in classic classes, and in nondia-

mond patterns in all classes; new-style classes may choose an option to the right

before one above in diamonds.

•Explicit: In some class models, you may sometimes need to select an attribute ex-

plicitly by referencing it through its class name—with superclass.method(self),

for instance. Your code breaks the conflict and overrides the search’s default—to

select an option to the right of or above the inheritance search’s default.

This is an issue only when the same name appears in multiple superclasses, and you do

not wish to use the first one inherited. Because this isn’t as common an issue in typical

Python code as it may sound, we’ll defer details on this topic until we study new-style

classes and their MRO and super tools in the next chapter, and revisit this as a “gotcha”

at the end of that chapter. First, though, the next section demonstrates a practical use

case for multiple inheritance-based tools.

Coding Mix-in Display Classes

Perhaps the most common way multiple inheritance is used is to “mix in” general-

purpose methods from superclasses. Such superclasses are usually called mix-in classes

—they provide methods you add to application classes by inheritance. In a sense, mix-

in classes are similar to modules: they provide packages of methods for use in their

client subclasses. Unlike simple functions in modules, though, methods in mix-in

Multiple Inheritance: “Mix-in” Classes | 957

www.it-ebooks.info

classes also can participate in inheritance hierarchies, and have access to the self in-

stance for using state information and other methods in their trees.

For example, as we’ve seen, Python’s default way to print a class instance object isn’t

incredibly useful:

>>> class Spam:

def __init__(self): # No __repr__ or __str__

self.data1 = "food"

>>> X = Spam()

>>> print(X) # Default: class name + address (id)

<__main__.Spam object at 0x00000000029CA908> # Same in 2.X, but says "instance"

As you saw in both Chapter 28’s case study and Chapter 30’s operator overloading

coverage, you can provide a __str__ or __repr__ method to implement a custom string

representation of your own. But, rather than coding one of these in each and every class

you wish to print, why not code it once in a general-purpose tool class and inherit it in

all your classes?

That’s what mix-ins are for. Defining a display method in a mix-in superclass once

enables us to reuse it anywhere we want to see a custom display format—even in classes

that may already have another superclass. We’ve already seen tools that do related

work:

•Chapter 28’s AttrDisplay class formatted instance attributes in a generic

__repr__ method, but it did not climb class trees and was utilized in single-inher-

itance mode only.

•Chapter 29’s classtree.py module defined functions for climbing and sketching

class trees, but it did not display object attributes along the way and was not ar-

chitected as an inheritable class.

Here, we’re going to revisit these examples’ techniques and expand upon them to code

a set of three mix-in classes that serve as generic display tools for listing instance at-

tributes, inherited attributes, and attributes on all objects in a class tree. We’ll also use

our tools in multiple-inheritance mode and deploy coding techniques that make classes

better suited to use as generic tools.

Unlike Chapter 28, we’ll also code this with a __str__ instead of a __repr__. This is

partially a style issue and limits their role to print and str, but the displays we’ll be

developing will be rich enough to be categorized as more user-friendly than as-code.

This policy also leaves client classes the option of coding an alternative lower-level

display for interactive echoes and nested appearances with a __repr__. Using

__repr__ here would still allow an alternative __str__, but the nature of the displays

we’ll be implementing more strongly suggests a __str__ role. See Chapter 30 for a review

of these distinctions.

958 | Chapter 31: Designing with Classes

www.it-ebooks.info

Listing instance attributes with __dict__

Let’s get started with the simple case—listing attributes attached to an instance. The

following class, coded in the file listinstance.py, defines a mix-in called ListInstance

that overloads the __str__ method for all classes that include it in their header lines.

Because this is coded as a class, ListInstance is a generic tool whose formatting logic

can be used for instances of any subclass client:

#!python

# File listinstance.py (2.X + 3.X)

class ListInstance:

"""

Mix-in class that provides a formatted print() or str() of instances via

inheritance of __str__ coded here; displays instance attrs only; self is

instance of lowest class; __X names avoid clashing with client's attrs

"""

def __attrnames(self):

result = ''

for attr in sorted(self.__dict__):

result += '\t%s=%s\n' % (attr, self.__dict__[attr])

return result

def __str__(self):

return '<Instance of %s, address %s:\n%s>' % (

self.__class__.__name__, # My class's name

id(self), # My address

self.__attrnames()) # name=value list

if __name__ == '__main__':

import testmixin

testmixin.tester(ListInstance)

All the code in this section runs in both Python 2.X and 3.X. A coding note: this code

exhibits a classic comprehension pattern, and you could save some program real estate

by implementing the __attrnames method here more concisely with a generator ex-

pression that is triggered by the string join method, but it’s arguably less clear—ex-

pressions that wrap lines like this should generally make you consider simpler coding

alternatives:

def __attrnames(self):

return ''.join('\t%s=%s\n' % (attr, self.__dict__ [attr])

for attr in sorted(self.__dict__))

ListInstance uses some previously explored tricks to extract the instance’s class name

and attributes:

• Each instance has a built-in __class__ attribute that references the class from which

it was created, and each class has a __name__ attribute that references the name in

the header, so the expression self.__class__.__name__ fetches the name of an in-

stance’s class.

Multiple Inheritance: “Mix-in” Classes | 959

www.it-ebooks.info

• This class does most of its work by simply scanning the instance’s attribute dic-

tionary (remember, it’s exported in __dict__) to build up a string showing the

names and values of all instance attributes. The dictionary’s keys are sorted to

finesse any ordering differences across Python releases.

In these respects, ListInstance is similar to Chapter 28’s attribute display; in fact, it’s

largely just a variation on a theme. Our class here uses two additional techniques,

though:

• It displays the instance’s memory address by calling the id built-function, which

returns any object’s address (by definition, a unique object identifier, which will

be useful in later mutations of this code).

• It uses the pseudoprivate naming pattern for its worker method: __attrnames. As

we learned earlier in this chapter, Python automatically localizes any such name

to its enclosing class by expanding the attribute name to include the class name (in

this case, it becomes _ListInstance__attrnames). This holds true for both class

attributes (like methods) and instance attributes attached to self. As noted in

Chapter 28’s first-cut version, this behavior is useful in a general tool like this, as

it ensures that its names don’t clash with any names used in its client subclasses.

Because ListInstance defines a __str__ operator overloading method, instances de-

rived from this class display their attributes automatically when printed, giving a bit

more information than a simple address. Here is the class in action, in single-inheritance

mode, mixed in to the previous section’s class (this code works the same in both Python

3.X and 2.X, though 2.X default repr displays use the label “instance” instead of “ob-

ject”):

>>> from listinstance import ListInstance

>>> class Spam(ListInstance): # Inherit a __str__ method

def __init__(self):

self.data1 = 'food'

>>> x = Spam()

>>> print(x) # print() and str() run __str__

<Instance of Spam, address 43034496:

data1=food

You can also fetch and save the listing output as a string without printing it with str,

and interactive echoes still use the default format because we’re left __repr__ as an

option for clients:

>>> display = str(x) # Print this to interpret escapes

>>> display

'<Instance of Spam, address 43034496:\n\tdata1=food\n>'

>>> x # The __repr__ still is a default

<__main__.Spam object at 0x000000000290A780>

960 | Chapter 31: Designing with Classes

www.it-ebooks.info

The ListInstance class is useful for any classes you write—even classes that already

have one or more superclasses. This is where multiple inheritance comes in handy: by

adding ListInstance to the list of superclasses in a class header (i.e., mixing it in), you

get its __str__ “for free” while still inheriting from the existing superclass(es). The file

testmixin0.py demonstrates with a first-cut testing script:

# File testmixin0.py

from listinstance import ListInstance # Get lister tool class

class Super:

def __init__(self): # Superclass __init__

self.data1 = 'spam' # Create instance attrs

def ham(self):

pass

class Sub(Super, ListInstance): # Mix in ham and a __str__

def __init__(self): # Listers have access to self

Super.__init__(self)

self.data2 = 'eggs' # More instance attrs

self.data3 = 42

def spam(self): # Define another method here

pass

if __name__ == '__main__':

X = Sub()

print(X) # Run mixed-in __str__

Here, Sub inherits names from both Super and ListInstance; it’s a composite of its own

names and names in both its superclasses. When you make a Sub instance and print it,

you automatically get the custom representation mixed in from ListInstance (in this

case, this script’s output is the same under both Python 3.X and 2.X, except for object

addresses, which can naturally vary per process):

c:\code> python testmixin0.py

<Instance of Sub, address 44304144:

data1=spam

data2=eggs

data3=42

This testmixin0 testing script works, but it hardcodes the tested class’s name in the

code, and makes it difficult to experiment with alternatives—as we will in a moment.

To be more flexible, we can borrow a page from Chapter 25’s module reloaders, and

pass in the object to be tested, as in the following improved test script, testmixin—the

one actually used by all the lister class modules’ self-test code. In this context the object

passed in to the tester is a mix-in class instead of a function, but the principle is similar:

everything qualifies as a passable “first class” object in Python:

#!python

# File testmixin.py (2.X + 3.X)

"""

Generic lister mixin tester: similar to transitive reloader in

Chapter 25, but passes a class object to tester (not function),

Multiple Inheritance: “Mix-in” Classes | 961

www.it-ebooks.info

and testByNames adds loading of both module and class by name

strings here, in keeping with Chapter 31's factories pattern.

"""

import importlib

def tester(listerclass, sept=False):

class Super:

def __init__(self): # Superclass __init__

self.data1 = 'spam' # Create instance attrs

def ham(self):

pass

class Sub(Super, listerclass): # Mix in ham and a __str__

def __init__(self): # Listers have access to self

Super.__init__(self)

self.data2 = 'eggs' # More instance attrs

self.data3 = 42

def spam(self): # Define another method here

pass

instance = Sub() # Return instance with lister's __str__

print(instance) # Run mixed-in __str__ (or via str(x))

if sept: print('-' * 80)

def testByNames(modname, classname, sept=False):

modobject = importlib.import_module(modname) # Import by namestring

listerclass = getattr(modobject, classname) # Fetch attr by namestring

tester(listerclass, sept)

if __name__ == '__main__':

testByNames('listinstance', 'ListInstance', True) # Test all three here

testByNames('listinherited', 'ListInherited', True)

testByNames('listtree', 'ListTree', False)

While it’s at it, this script also adds the ability to specify test module and class by name

string, and leverages this in its self-test code—an application of the factory pattern’s

mechanics described earlier. Here is the new script in action, being run by the lister

module that imports it to test its own class (with the same results in 2.X and 3.X again);

we can run the test script itself too, but that mode tests the two lister variants, which

we have yet to see (or code!):

c:\code> python listinstance.py

<Instance of Sub, address 43256968:

data1=spam

data2=eggs

data3=42

c:\code> python testmixin.py

<Instance of Sub, address 43977584:

data1=spam

data2=eggs

data3=42

962 | Chapter 31: Designing with Classes

www.it-ebooks.info

...and tests of two other lister classes coming up...

The ListInstance class we’ve coded so far works in any class it’s mixed into because

self refers to an instance of the subclass that pulls this class in, whatever that may be.

Again, in a sense, mix-in classes are the class equivalent of modules—packages of

methods useful in a variety of clients. For example, here is ListInstance working again

in single-inheritance mode on a different class’s instances, loaded with import, and

displaying attributes assigned outside the class:

>>> import listinstance

>>> class C(listinstance.ListInstance): pass

>>> x = C()

>>> x.a, x.b, x.c = 1, 2, 3

>>> print(x)

<Instance of C, address 43230824:

a=1

b=2

c=3

Besides the utility they provide, mix-ins optimize code maintenance, like all classes do.

For example, if you later decide to extend ListInstance’s __str__ to also print all the

class attributes that an instance inherits, you’re safe; because it’s an inherited method,

changing __str__ automatically updates the display of each subclass that imports the

class and mixes it in. And since it’s now officially “later,” let’s move on to the next

section to see what such an extension might look like.

Listing inherited attributes with dir

As it is, our ListerInstance mix-in displays instance attributes only (i.e., names at-

tached to the instance object itself). It’s trivial to extend the class to display all the

attributes accessible from an instance, though—both its own and those it inherits from

its classes. The trick is to use the dir built-in function instead of scanning the instance’s

__dict__ dictionary; the latter holds instance attributes only, but the former also col-

lects all inherited attributes in Python 2.2 and later.

The following mutation codes this scheme; I’ve coded this in its own module to facil-

itate simple testing, but if existing clients were to use this version instead they would

pick up the new display automatically (and recall from Chapter 25 that an import’s

as clause can rename a new version to a prior name being used):

#!python

# File listinherited.py (2.X + 3.X)

class ListInherited:

"""

Use dir() to collect both instance attrs and names inherited from

its classes; Python 3.X shows more names than 2.X because of the

implied object superclass in the new-style class model; getattr()

Multiple Inheritance: “Mix-in” Classes | 963

www.it-ebooks.info

fetches inherited names not in self.__dict__; use __str__, not

__repr__, or else this loops when printing bound methods!

"""

def __attrnames(self):

result = ''

for attr in dir(self): # Instance dir()

if attr[:2] == '__' and attr[-2:] == '__': # Skip internals

result += '\t%s\n' % attr

else:

result += '\t%s=%s\n' % (attr, getattr(self, attr))

return result

def __str__(self):

return '<Instance of %s, address %s:\n%s>' % (

self.__class__.__name__, # My class's name

id(self), # My address

self.__attrnames()) # name=value list

if __name__ == '__main__':

import testmixin

testmixin.tester(ListInherited)

Notice that this code skips __X__ names’ values; most of these are internal names that

we don’t generally care about in a generic listing like this. This version also must use

the getattr built-in function to fetch attributes by name string instead of using instance

attribute dictionary indexing—getattr employs the inheritance search protocol, and

some of the names we’re listing here are not stored on the instance itself.

To test the new version, run its file directly—it passes the class it defines to the test-

mixin.py file’s test function to be used as a mix-in in a subclass. This output of this test

and lister class varies per release, though, because dir results differ. In Python 2.X, we

get the following; notice the name mangling at work in the lister’s method name (I

truncated some of the full value displays to fit on this page):

c:\code> c:\python27\python listinherited.py

<Instance of Sub, address 35161352:

_ListInherited__attrnames=<bound method Sub.__attrnames of <test...more...>>

__doc__

__init__

__module__

__str__

data1=spam

data2=eggs

data3=42

ham=<bound method Sub.ham of <testmixin.Sub instance at 0x00000...more...>>

spam=<bound method Sub.spam of <testmixin.Sub instance at 0x00000...more...>>

In Python 3.X, more attributes are displayed because all classes are “new style” and

inherit names from the implied object superclass; more on this in Chapter 32. Because

so many names are inherited from the default superclass, I’ve omitted many here—

there are 32 in total in 3.3. Run this on your own for the full listing:

964 | Chapter 31: Designing with Classes

www.it-ebooks.info

c:\code> c:\python33\python listinherited.py

<Instance of Sub, address 43253152:

_ListInherited__attrnames=<bound method Sub.__attrnames of <test...more...>>

__class__

__delattr__

__dict__

__dir__

__doc__

__eq__

...more names omitted 32 total...

__repr__

__setattr__

__sizeof__

__str__

__subclasshook__

__weakref__

data1=spam

data2=eggs

data3=42

ham=<bound method Sub.ham of <testmixin.tester.<locals>.Sub ...more...>>

spam=<bound method Sub.spam of <testmixin.tester.<locals>.Sub ...more...>>

As one possible improvement to address the proliferation of inherited built-in names

and long values here, the following alternative for ___attrnames in file listinheri-

ted2.py of the book example’s package groups the double-underscore names separately,

and minimizes line wrapping for large attribute values; notice how it escapes a % with

%% so that just one remains for the final formatting operation at the end:

def __attrnames(self, indent=' '*4):

result = 'Unders%s\n%s%%s\nOthers%s\n' % ('-'*77, indent, '-'*77)

unders = []

for attr in dir(self): # Instance dir()

if attr[:2] == '__' and attr[-2:] == '__': # Skip internals

unders.append(attr)

else:

display = str(getattr(self, attr))[:82-(len(indent) + len(attr))]

result += '%s%s=%s\n' % (indent, attr, display)

return result % ', '.join(unders)

With this change, the class’s test output is a bit more sophisticated, but also more

concise and usable:

c:\code> c:\python27\python listinherited2.py

<Instance of Sub, address 36299208:

Unders-----------------------------------------------------------------------------

__doc__, __init__, __module__, __str__

Others-----------------------------------------------------------------------------

_ListInherited__attrnames=<bound method Sub.__attrnames of <testmixin.Sub insta

data1=spam

data2=eggs

data3=42

ham=<bound method Sub.ham of <testmixin.Sub instance at 0x000000000229E1C8>>

spam=<bound method Sub.spam of <testmixin.Sub instance at 0x000000000229E1C8>>

Multiple Inheritance: “Mix-in” Classes | 965

www.it-ebooks.info

c:\code> c:\python33\python listinherited2.py

<Instance of Sub, address 43318912:

Unders-----------------------------------------------------------------------------

__class__, __delattr__, __dict__, __dir__, __doc__, __eq__, __format__, __ge__,

__getattribute__, __gt__, __hash__, __init__, __le__, __lt__, __module__, __ne__,

__new__, __qualname__, __reduce__, __reduce_ex__, __repr__, __setattr__, __sizeof__,

__str__, __subclasshook__, __weakref__

Others-----------------------------------------------------------------------------

_ListInherited__attrnames=<bound method Sub.__attrnames of <testmixin.tester.<l

data1=spam

data2=eggs

data3=42

ham=<bound method Sub.ham of <testmixin.tester.<locals>.Sub object at 0x0000000

spam=<bound method Sub.spam of <testmixin.tester.<locals>.Sub object at 0x00000

Display format is an open-ended problem (e.g., Python’s standard pprint “pretty

printer” module may offer options here too), so we’ll leave further polishing as a sug-

gested exercise. The tree lister of the next section may be more useful in any event.

Looping in __repr__: One caution here—now that we’re displaying in-

herited methods too, we have to use __str__ instead of __repr__ to

overload printing. With __repr__, this code will fall into recursive loops

—displaying the value of a method triggers the __repr__ of the method’s

class, in order to display the class. That is, if the lister’s __repr__ tries

to display a method, displaying the method’s class will trigger the lister’s

__repr__ again. Subtle, but true! Change __str__ to __repr__ here to see

this for yourself. If you must use __repr__ in such a context, you can

avoid the loops by using isinstance to compare the type of attribute

values against types.MethodType in the standard library, to know which

items to skip.

Listing attributes per object in class trees

Let’s code one last extension. As it is, our latest lister includes inherited names, but

doesn’t give any sort of designation of the classes from which the names are acquired.

As we saw in the classtree.py example near the end of Chapter 29, though, it’s straight-

forward to climb class inheritance trees in code. The following mix-in class, coded in

the file listtree.py, makes use of this same technique to display attributes grouped by

the classes they live in—it sketches the full physical class tree, displaying attributes

attached to each object along the way. The reader must still infer attribute inheritance,

but this gives substantially more detail than a simple flat list:

#!python

# File listtree.py (2.X + 3.X)

class ListTree:

"""

Mix-in that returns an __str__ trace of the entire class tree and all

966 | Chapter 31: Designing with Classes

www.it-ebooks.info

its objects' attrs at and above self; run by print(), str() returns

constructed string; uses __X attr names to avoid impacting clients;

recurses to superclasses explicitly, uses str.format() for clarity;

"""

def __attrnames(self, obj, indent):

spaces = ' ' * (indent + 1)

result = ''

for attr in sorted(obj.__dict__):

if attr.startswith('__') and attr.endswith('__'):

result += spaces + '{0}\n'.format(attr)

else:

result += spaces + '{0}={1}\n'.format(attr, getattr(obj, attr))

return result

def __listclass(self, aClass, indent):

dots = '.' * indent

if aClass in self.__visited:

return '\n{0}<Class {1}:, address {2}: (see above)>\n'.format(

dots,

aClass.__name__,

id(aClass))

else:

self.__visited[aClass] = True

here = self.__attrnames(aClass, indent)

above = ''

for super in aClass.__bases__:

above += self.__listclass(super, indent+4)

return '\n{0}<Class {1}, address {2}:\n{3}{4}{5}>\n'.format(

dots,

aClass.__name__,

id(aClass),

here, above,

dots)

def __str__(self):

self.__visited = {}

here = self.__attrnames(self, 0)

above = self.__listclass(self.__class__, 4)

return '<Instance of {0}, address {1}:\n{2}{3}>'.format(

self.__class__.__name__,

id(self),

here, above)

if __name__ == '__main__':

import testmixin

testmixin.tester(ListTree)

This class achieves its goal by traversing the inheritance tree—from an instance’s

__class__ to its class, and then from the class’s __bases__ to all superclasses recursively,

scanning each object’s attribute __dict__ along the way. Ultimately, it concatenates

each tree portion’s string as the recursion unwinds.

It can take a while to understand recursive programs like this, but given the arbitrary

shape and depth of class trees, we really have no choice here (apart from explicit stack

Multiple Inheritance: “Mix-in” Classes | 967

www.it-ebooks.info

equivalents of the sorts we met in Chapter 19 and Chapter 25, which tend to be no

simpler, and which we’ll omit here for space and time). This class is coded to keep its

business as explicit as possible, though, to maximize clarity.

For example, you could replace the __listclass method’s loop statement in the first

of the following with the implicitly run generator expression in the second, but the

second seems unnecessarily convoluted in this context—recursive calls embedded in a

generator expression—and has no obvious performance advantage, especially given this

program’s limited scope (neither alternative makes a temporary list, though the first

may create more temporary results depending on the internal implementation of

strings, concatenation, and join—something you’d need to time with Chapter 21’s

tools to determine):

above = ''

for super in aClass.__bases__:

above += self.__listclass(super, indent+4)

...or...

above = ''.join(

self.__listclass(super, indent+4) for super in aClass.__bases__)

You could also code the else clause in __listclass like the following, as in the prior

edition of this book—an alternative that embeds everything in the format arguments

list; relies on the fact that the join call kicks off the generator expression and its recursive

calls before the format operation even begins building up the result text; and seems

more difficult to understand, despite the fact that I wrote it (never a good sign!):

self.__visited[aClass] = True

genabove = (self.__listclass(c, indent+4) for c in aClass.__bases__)

return '\n{0}<Class {1}, address {2}:\n{3}{4}{5}>\n'.format(

dots,

aClass.__name__,

id(aClass),

self.__attrnames(aClass, indent), # Runs before format!

''.join(genabove),

dots)

As always, explicit is better than implicit, and your code can be as big a factor in this

as the tools it uses.

Also notice how this version uses the Python 3.X and 2.6/2.7 string format method

instead of % formatting expressions, in an effort to make substitutions arguably clearer;

when many substitutions are applied like this, explicit argument numbers may make

the code easier to decipher. In short, in this version we exchange the first of the fol-

lowing lines for the second:

return '<Instance of %s, address %s:\n%s%s>' % (...) # Expression

return '<Instance of {0}, address {1}:\n{2}{3}>'.format(...) # Method

This policy has an unfortunate downside in 3.2 and 3.3 too, but we have to run the

code to see why.

968 | Chapter 31: Designing with Classes

www.it-ebooks.info

Running the tree lister

Now, to test, run this class’s module file as before; it passes the ListTree class to

testmixin.py to be mixed in with a subclass in the test function. The file’s tree-sketcher

output in Python 2.X is as follows:

c:\code> c:\python27\python listtree.py

<Instance of Sub, address 36690632:

_ListTree__visited={}

data1=spam

data2=eggs

data3=42

....<Class Sub, address 36652616:

__doc__

__init__

__module__

spam=<unbound method Sub.spam>

........<Class Super, address 36652712:

__doc__

__init__

__module__

ham=<unbound method Super.ham>

........>

........<Class ListTree, address 30795816:

_ListTree__attrnames=<unbound method ListTree.__attrnames>

_ListTree__listclass=<unbound method ListTree.__listclass>

__doc__

__module__

__str__

........>

....>

Notice in this output how methods are unbound now under 2.X, because we fetch them

from classes directly. In the previous section’s version they displayed as bound methods,

because ListInherited fetched these from instances with getattr instead (the first ver-

sion indexed the instance __dict__ and did not display inherited methods on classes at

all). Also observe how the lister’s __visited table has its name mangled in the instance’s

attribute dictionary; unless we’re very unlucky, this won’t clash with other data there.

Some of the lister class’s methods are mangled for pseudoprivacy as well.

Under Python 3.X in the following, we again get extra attributes which may vary within

the 3.X line, and extra superclasses—as we’ll learn in the next chapter, all top-level

classes inherit from the built-in object class automatically in 3.X; Python 2.X classes

do so manually if they desire new-style class behavior. Also notice that the attributes

that were unbound methods in 2.X are simple functions in 3.X, as described earlier in

this chapter (and that again, I’ve deleted most built-in attributes in object to save space

here; run this on your own for the complete listing):

Multiple Inheritance: “Mix-in” Classes | 969

www.it-ebooks.info

c:\code> c:\python33\python listtree.py

<Instance of Sub, address 44277488:

_ListTree__visited={}

data1=spam

data2=eggs

data3=42

....<Class Sub, address 36990264:

__doc__

__init__

__module__

__qualname__

spam=<function tester.<locals>.Sub.spam at 0x0000000002A3C840>

........<Class Super, address 36989352:

__dict__

__doc__

__init__

__module__

__qualname__

__weakref__

ham=<function tester.<locals>.Super.ham at 0x0000000002A3C730>

............<Class object, address 506770624:

__class__

__delattr__

__dir__

__doc__

__eq__

...more omitted: 22 total...

__repr__

__setattr__

__sizeof__

__str__

__subclasshook__

............>

........>

........<Class ListTree, address 36988440:

_ListTree__attrnames=<function ListTree.__attrnames at 0x0000000002A3C158>

_ListTree__listclass=<function ListTree.__listclass at 0x0000000002A3C1E0>

__dict__

__doc__

__module__

__qualname__

__str__

__weakref__

............<Class object:, address 506770624: (see above)>

........>

....>

This version avoids listing the same class object twice by keeping a table of classes

visited so far (this is why an object’s id is included—to serve as a key for a previously

970 | Chapter 31: Designing with Classes

www.it-ebooks.info

displayed item in the report). Like the transitive module reloader of Chapter 25, a

dictionary works to avoid repeats in the output because class objects are hashable and

thus may be dictionary keys; a set would provide similar functionality.

Technically, cycles are not generally possible in class inheritance trees—a class must

already have been defined to be named as a superclass, and Python raises an exception

as it should if you attempt to create a cycle later by __bases__ changes—but the visited

mechanism here avoids relisting a class twice:

>>> class C: pass

>>> class B(C): pass

>>> C.__bases__ = (B,) # Deep, dark magic!

TypeError: a __bases__ item causes an inheritance cycle

Usage variation: Showing underscore name values

This version also takes care to avoid displaying large internal objects by skipping

__X__ names again. If you comment out the code that treats these names specially:

for attr in sorted(obj.__dict__):

# if attr.startswith('__') and attr.endswith('__'):

# result += spaces + '{0}\n'.format(attr)

# else:

result += spaces + '{0}={1}\n'.format(attr, getattr(obj, attr))

then their values will display normally. Here’s the output in 2.X with this temporary

change made, giving the values of every attribute in the class tree:

c:\code> c:\python27\python listtree.py

<Instance of Sub, address 35750408:

_ListTree__visited={}

data1=spam

data2=eggs

data3=42

....<Class Sub, address 36353608:

__doc__=None

__init__=<unbound method Sub.__init__>

__module__=testmixin

spam=<unbound method Sub.spam>

........<Class Super, address 36353704:

__doc__=None

__init__=<unbound method Super.__init__>

__module__=testmixin

ham=<unbound method Super.ham>

........>

........<Class ListTree, address 31254568:

_ListTree__attrnames=<unbound method ListTree.__attrnames>

_ListTree__listclass=<unbound method ListTree.__listclass>

__doc__=

Mix-in that returns an __str__ trace of the entire class tree and all

its objects' attrs at and above self; run by print(), str() returns

Multiple Inheritance: “Mix-in” Classes | 971

www.it-ebooks.info

constructed string; uses __X attr names to avoid impacting clients;

recurses to superclasses explicitly, uses str.format() for clarity;

__module__=__main__

__str__=<unbound method ListTree.__str__>

........>

....>

This test’s output is much larger in 3.X and may justify isolating underscore names in

general as we did earlier. In fact, this test may not even work in some currently recent

3.X releases as is:

c:\code> c:\python33\python listtree.py

...etc...

File "listtree.py", line 18, in __attrnames

result += spaces + '{0}={1}\n'.format(attr, getattr(obj, attr))

TypeError: Type method_descriptor doesn't define __format__

I debated recoding to work around this issue, but it serves as a fair example of debugging

requirements and techniques in a dynamic open source project like Python. Per the

following note, the str.format call no longer supports certain object types that are the

values of built-in attribute names—yet another reason these names are probably better

skipped.

Debugging a str.format issue: In 3.X, running the commented-out ver-

sion works in 3.0 and 3.1, but there seems to be a bug, or at least a

regression, here in 3.2 and 3.3—these Pythons fail with an exception

because five built-in methods in object do not define a __format__ ex-

pected by str.format, and the default in object is apparently no longer

applied correctly in such cases with empty and generic formatting tar-

gets. To see this live, it’s enough to run simplified code that isolates the

problem:

c:\code> py −3.1

>>> '{0}'.format(object.__reduce__)

"<method '__reduce__' of 'object' objects>"

c:\code> py −3.3

>>> '{0}'.format(object.__reduce__)

TypeError: Type method_descriptor doesn't define __format__

Per both prior behavior and current Python documentation, empty tar-

gets like this are supposed to convert the object to its str print string

(see both the original PEP 3101 and the 3.3 language reference manual).

Oddly, the {0} and {0:s} string targets both now fail, but the {0!s}

forced str conversion target works, as does manual str preconversion

—apparently reflecting a change for a type-specific case that neglected

perhaps more common generic usage modes:

c:\code> py −3.3

>>> '{0:s}'.format(object.__reduce__)

TypeError: Type method_descriptor doesn't define __format__

>>> '{0!s}'.format(object.__reduce__)

"<method '__reduce__' of 'object' objects>"

972 | Chapter 31: Designing with Classes

www.it-ebooks.info

>>> '{0}'.format(str(object.__reduce__))

"<method '__reduce__' of 'object' objects>"

To fix, wrap the format call in a try statement to catch the exception;

use % formatting expressions instead of the str.format method; use one

of the aforementioned still-working str.format usage modes and hope

it does not change too; or wait for a repair of this in a later 3.X release.

Here’s the recommended workaround using the tried-and-true % (it’s

also noticeably shorter, but I won’t repeat Chapter 7’s comparisons

here):

c:\code> py −3.3

>>> '%s' % object.__reduce__

"<method '__reduce__' of 'object' objects>"

To apply this in the tree lister’s code, change the first of these to its

follower:

result += spaces + '{0}={1}\n'.format(attr, getattr(obj, attr))

result += spaces + '%s=%s\n' % (attr, getattr(obj, attr))

Python 2.X has the same regression in 2.7 but not 2.6—inherited from

the 3.2 change, apparently—but does not show object methods in this

chapter’s example. Since this example generates too much output in 3.X

anyhow, it’s a moot point here, but is a decent example of real-world

coding. Unfortunately, using newer features like str.format sometimes

puts your code in the awkward position of beta tester in the current 3.X

line!

Usage variation: Running on larger modules

For more fun, uncomment the underscore handler lines to enable them again, and try

mixing this class into something more substantial, like the Button class of Python’s

tkinter GUI toolkit module. In general, you’ll want to name ListTree first (leftmost)

in a class header, so its __str__ is picked up; Button has one, too, and the leftmost

superclass is always searched first in multiple inheritance.

The output of the following is fairly massive (20K characters and 330 lines in 3.X—and

38K if you forget to uncomment the underscore detection!), so run this code on your

own to see the full listing. Notice how our lister’s __visited dictionary attribute mixes

harmlessly with those created by tkinter itself. If you’re using Python 2.X, also recall

that you should use Tkinter for the module name instead of tkinter:

>>> from listtree import ListTree

>>> from tkinter import Button # Both classes have a __str__

>>> class MyButton(ListTree, Button): pass # ListTree first: use its __str__

>>> B = MyButton(text='spam')

>>> open('savetree.txt', 'w').write(str(B)) # Save to a file for later viewing

20513

>>> len(open('savetree.txt').readlines()) # Lines in the file

330

>>> print(B) # Print the display here

<Instance of MyButton, address 43363688:

Multiple Inheritance: “Mix-in” Classes | 973

www.it-ebooks.info

_ListTree__visited={}

_name=43363688

_tclCommands=[]

_w=.43363688

children={}

master=.

...much more omitted...

>>> S = str(B) # Or print just the first part

>>> print(S[:1000])

Experiment arbitrarily on your own. The main point here is that OOP is all about code

reuse, and mix-in classes are a powerful example. Like almost everything else in pro-

gramming, multiple inheritance can be a useful device when applied well. In practice,

though, it is an advanced feature and can become complicated if used carelessly or

excessively. We’ll revisit this topic as a gotcha at the end of the next chapter.

Collector module

Finally, to make importing our tools even easier, we can provide a collector module

that combines them in a single namespace—importing just the following gives access

to all three lister mix-ins at once:

# File lister.py

# Collect all three listers in one module for convenience

from listinstance import ListInstance

from listinherited import ListInherited

from listtree import ListTree

Lister = ListTree # Choose a default lister

Importers can use the individual class names as is, or alias them to a common name

used in subclasses that can be modified in the import statement:

>>> import lister

>>> lister.ListInstance # Use a specific lister

>>> lister.Lister # Use Lister default

>>> from lister import Lister # Use Lister default

>>> Lister

>>> from lister import ListInstance as Lister # Use Lister alias

>>> Lister

Python often makes flexible tool APIs nearly automatic.

974 | Chapter 31: Designing with Classes

www.it-ebooks.info

Room for improvement: MRO, slots, GUIs

Like most software, there’s much more we could do here. The following gives some

pointers on extensions you may wish to explore. Some are interesting projects, and two

serve as segue to the next chapter, but for space will have to remain in the suggested

exercise category here.

General ideas: GUIs, built-ins

Grouping double-underscore names as we did earlier may help reduce the size of

the tree display, though some like __init__ are user-defined and may merit special

treatment. Sketching the tree in a GUI might be a natural next step too—the

tkinter toolkit that we utilized in the prior section’s lister examples ships with

Python and provides basic but easy support, and others offer richer but more com-

plex alternatives. See the notes at the end of Chapter 28’s case study for more

pointers in this department.

Physical trees versus inheritance: using the MRO (preview)

In the next chapter, we’ll also meet the new-style class model, which modifies the

search order for one special multiple inheritance case (diamonds). There, we’ll also

study the class.__mro__ new-style class object attribute—a tuple giving the class

tree search order used by inheritance, known as the new-style MRO.

As is, our ListTree tree lister sketches the physical shape of the inheritance tree,

and expects the viewer to infer from this where an attribute is inherited from. This

was its goal, but a general object viewer might also use the MRO tuple to auto-

matically associate an attribute with the class from which it is inherited—by scan-

ning the new-style MRO (or the classic classes’ DFLR ordering) for each inherited

attribute in a dir result, we can simulate Python’s inheritance search, and map

attributes to their source objects in the physical class tree displayed.

In fact, we will write code that comes very close to this idea in the next chapter’s

mapattrs module, and reuse this example’s test classes there to demonstrate the

idea, so stay tuned for an epilogue to this story. This might be used instead of or

in addition to displaying attribute physical locations in __attrnames here; both

forms might be useful data for programmers to see. This approach is also one way

to deal with slots, the topic of the next note.

Virtual data: slots, properties, and more (preview)

Because they scan instance __dict__ namespace dictionaries, the ListInstance and

ListTree classes presented here raise some subtle design issues. In Python classes,

some names associated with instance data may not be stored at the instance itself.

This includes topics presented in the next chapter such as new-style properties,

slots, and descriptors, but also attributes dynamically computed in all classes with

tools like __getattr__. None of these “virtual” attributes’ names are stored in an

instance’s namespace dictionary, so none will be displayed as part of an instance’s

own data.

Multiple Inheritance: “Mix-in” Classes | 975

www.it-ebooks.info

Of these, slots seem the most strongly associated with an instance; they store data

on instances, even though their names don’t appear in instance namespace dic-

tionaries. Properties and descriptors are associated with instances too, but they

don’t reserve space in the instance, their computed nature is much more explicit,

and they may seem closer to class-level methods than instance data.

As we’ll see in the next chapter, slots function like instance attributes, but are

created and managed by automatically created items in classes. They are a relatively

infrequently used new-style class option, where instance attributes are declared in

a __slots__ class attribute, and not physically stored in an instance’s __dict__; in

fact, slots may suppress a __dict__ entirely. Because of this, tools that display in-

stances by scanning their namespaces alone won’t directly associate the instance

with attributes stored in slots. As is, ListTree displays slots as class attributes

wherever they appear (though not at the instance), and ListInstance doesn’t dis-

play them at all.

Though this will make more sense after we study this feature in the next chapter,

it impacts code here and similar tools. For example, if in textmixin.py we assign

__slots__=['data1'] in Super and __slots__=['data3'] in Sub, only the data2 at-

tribute is displayed in the instance by these two lister classes. ListTree also displays

data1 and data3, but as attributes of the Super and Sub class objects and with a

special format for their values (technically, they are class-level descriptors, another

new-style tool introduced in the next chapter).

As the next chapter will explain, to show slot attributes as instance names, tools

generally need to use dir to get a list of all attributes—both physically present and

inherited—and then use either getattr to fetch their values from the instance, or

fetch values from their inheritance source via __dict__ in tree scans and accept the

display of the implementations of some at classes. Because dir includes the names

of inherited “virtual” attributes—including both slots and properties—they would

be included in the instance set. As we’ll also find, the MRO might assist here to

map dir attribute to their sources, or restrict instance displays to names coded in

user-defined classes by filtering out names inherited from the built-in object.

ListInherited is immune to most of this, because it already displays the full dir

results set, which include both __dict__ names and all classes’ __slots__ names,

though its display is of marginal use as is. A ListTree variant using the dir technique

along with the MRO sequence to map attributes to classes would apply to slots

too, because slots-based names appear in class’s __dict__ results individually as

slot management tools, though not in the instance __dict__.

Alternatively, as a policy we could simply let our code handle slot-based attributes

as it currently does, rather than complicating it for a rarely used, advanced feature

that’s even questionable practice today. Slots and normal instance attributes are

different kinds of names. In fact, displaying slots names as attributes of classes

instead of instances is technically more accurate—as we’ll see in the next chapter

their implementation is at classes, though their space is at instances.

976 | Chapter 31: Designing with Classes

www.it-ebooks.info

Ultimately, attempting to collect all the “virtual” attributes associated with a class

may be a bit of a pipe dream anyhow. Techniques like those outlined here may

address slots and properties, but some attributes are entirely dynamic, with no

physical basis at all: those computed on fetch by generic method such as __get

attr__ are not data in the classic sense. Tools that attempt to display data in a

wildly dynamic language Python must come with the caveat that some data is

ethereal at best!

We’ll also make a minor extension to this section’s code in the exercises at the end of

this part of the book, to list superclass names in parentheses at the start of instance

displays, so keep it filed for future reference for now. To better understand the last of

the preceding two points, we need to wrap up this chapter and move on to the next

and last in this part of the book.

Learning Python 5E Manual

Navigation menu

Versions of this User Manual:

Views

Navigation