Python: Visual QuickStart Guide Python 3rd Edition

User Manual:

Open the PDF directly: View PDF .
Page Count: 226

Download
Open PDF In Browser	View PDF

V I S U A L Q U I C K S TA RT G U I D E

Python
Third Edition

LARRY ULLMAN
TOBY DONALDSON

L E A R N T H E Q U I C K A N D E A S Y WAY !
From the Library of Mo Medwani

V I S UA L Q U I C K S TA R T G U I D E

Python
TOBY DONALDSON

Peachpit Press

From the Library of Mo Medwani

Visual QuickStart Guide

Python, Third Edition
Toby Donaldson
Peachpit Press
www.peachpit.com
To report errors, please send a note to errata@peachpit.com
Peachpit Press is a division of Pearson Education
Copyright © 2014 by Toby Donaldson
Editor: Scout Festa
Production Editor: Katerina Malone
Compositor: David Van Ness
Indexer: Valerie Haynes Perry
Cover Design: RHDG / Riezebos Holzbaur Design Group, Peachpit Press
Interior Design: Peachpit Press
Logo Design: MINE™ www.minesf.com

Notice of Rights
All rights reserved. No part of this book may be reproduced or transmitted in any form by any means,
electronic, mechanical, photocopying, recording, or otherwise, without the prior written permission of the
publisher. For information on getting permission for reprints and excerpts, contact permissions@peachpit.com.

Notice of Liability
The information in this book is distributed on an “As Is” basis, without warranty. While every precaution has
been taken in the preparation of the book, neither the author nor Peachpit shall have any liability to any
person or entity with respect to any loss or damage caused or alleged to be caused directly or indirectly by the
instructions contained in this book or by the computer software and hardware products described in it.

Trademarks
Visual QuickStart Guide is a registered trademark of Peachpit Press, a division of Pearson Education.
Many of the designations used by manufacturers and sellers to distinguish their products are claimed as
trademarks. Where those designations appear in this book, and Peachpit was aware of a trademark claim,
the designations appear as requested by the owner of the trademark. All other product names and services
identified throughout this book are used in editorial fashion only and for the benefit of such companies with no
intention of infringement of the trademark. No such use, or the use of any trade name, is intended to convey
endorsement or other affiliation with this book.
ISBN-13: 978-0-321-92955-6
ISBN-10:
0-321-92955-1
9 8 7 6 5 4 3 2 1
Printed and bound in the United States of America

From the Library of Mo Medwani

Acknowledgments
Thanks to Clifford Colby and Scout Festa for their expertise and
patience in bringing this edition of the book to life; to the many students
at SFU who continue to teach me how best to learn Python; to John
Edgar and the other computer science teachers at SFU with whom I’ve
had the pleasure to work; and to Bonnie, Thomas, and Emily for recommending I avoid using the word blithering more than once in these
acknowledgments. And a special thank you to Guido van Rossum and
the rest of the Python community for creating a programming language
that is so much fun to use.

From the Library of Mo Medwani

Contents at a Glance
Chapter 1

Introduction to Programming . . . . . . . . . . . . . 1

Chapter 2

Arithmetic, Strings, and Variables . . . . . . . . . . 9

Chapter 3

Writing Programs . . . . . . . . . . . . . . . . . . . . 31

Chapter 4

Flow of Control . . . . . . . . . . . . . . . . . . . . . 43

Chapter 5

Functions . . . . . . . . . . . . . . . . . . . . . . . . . 67

Chapter 6

Strings . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

Chapter 7

Data Structures . . . . . . . . . . . . . . . . . . . . . 101

Chapter 8

Input and Output . . . . . . . . . . . . . . . . . . . . 123

Chapter 9

Exception Handling . . . . . . . . . . . . . . . . . . 143

Chapter 10

Object-Oriented Programming . . . . . . . . . . . 153

Chapter 11

Case Study: Text Statistics . . . . . . . . . . . . . . 177

Appendix A Popular Python Packages . . . . . . . . . . . . . . . 195
Appendix B Comparing Python 2 and Python 3 . . . . . . . . 199
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203

iv

Contents at a Glance

From the Library of Mo Medwani

Table of Contents
Chapter 1

Introduction to Programming . . . . . . . . . . . . . . 1
The Python Language . . .
What Is Python Useful For?
How Programmers Work . .
Installing Python . . . . . .

Chapter 2

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

2
3
4
6

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

10
11
13
16
17
19
20
22
24
26
28
29

Writing Programs . . . . . . . . . . . . . . . . . . . . . 31
Using IDLE’s Editor . . . . . . . . . .
Compiling Source Code . . . . . . .
Reading Strings from the Keyboard .
Printing Strings on the Screen . . . .
Source Code Comments . . . . . . .
Structuring a Program . . . . . . . .

Chapter 4

.
.
.
.

Arithmetic, Strings, and Variables . . . . . . . . . . . 9
The Interactive Command Shell
Integer Arithmetic . . . . . . . .
Floating Point Arithmetic . . . .
Other Math Functions . . . . .
Strings . . . . . . . . . . . . . .
String Concatenation . . . . . .
Getting Help . . . . . . . . . . .
Converting Between Types . .
Variables and Values . . . . . .
Assignment Statements . . . .
How Variables Refer to Values
Multiple Assignment . . . . . .

Chapter 3

.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

32
35
36
39
41
42

Flow of Control . . . . . . . . . . . . . . . . . . . . . . 43
Boolean Logic . . . . . . . . .
If-Statements . . . . . . . . .
Code Blocks and Indentation
Loops . . . . . . . . . . . . . .

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

44
49
51
54

Table of Contents v

From the Library of Mo Medwani

Comparing For-Loops and While-Loops . . . . . . . . . 59
Breaking Out of Loops and Blocks . . . . . . . . . . . . 64
Loops Within Loops . . . . . . . . . . . . . . . . . . . . . 66

Chapter 5

Functions . . . . . . . . . . . . . . . . . . . . . . . . . . 67
Calling Functions . . .
Defining Functions . .
Variable Scope . . . .
Using a main Function
Function Parameters .
Modules . . . . . . . .

Chapter 6

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

68
70
73
75
76
80

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

84
87
89
92
98

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

102
103
104
108
. 110
. 113
. 115
. 118
122

Input and Output . . . . . . . . . . . . . . . . . . . . . 123
Formatting Strings . . . . .
String Formatting . . . . . .
Reading and Writing Files .
Examining Files and Folders
Processing Text Files . . . .
Processing Binary Files . .
Reading Webpages . . . . .

vi

.
.
.
.
.
.

Data Structures . . . . . . . . . . . . . . . . . . . . . . 101
The type Command .
Sequences . . . . . .
Tuples . . . . . . . .
Lists . . . . . . . . . .
List Functions . . . .
Sorting Lists . . . . .
List Comprehensions
Dictionaries . . . . .
Sets . . . . . . . . . .

Chapter 8

.
.
.
.
.
.

Strings . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
String Indexing . . . . . .
Characters . . . . . . . . .
Slicing Strings . . . . . . .
Standard String Functions
Regular Expressions . . .

Chapter 7

.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

124
126
128
. 131
134
138
. 141

Table of Contents

From the Library of Mo Medwani

Chapter 9

Exception Handling . . . . . . . . . . . . . . . . . . 143
Exceptions . . . . . . . . . . . . . . . . . . . . . . . . . 144
Catching Exceptions . . . . . . . . . . . . . . . . . . . 146
Clean-Up Actions . . . . . . . . . . . . . . . . . . . . . 150

Chapter 10

Object-Oriented Programming . . . . . . . . . . . . 153
Writing a Class . . .
Displaying Objects .
Flexible Initialization
Setters and Getters .
Inheritance . . . . . .
Polymorphism . . . .
Learning More . . . .

Chapter 11

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

154
156
160
162
168
. 171
175

Case Study: Text Statistics . . . . . . . . . . . . . . . 177
Problem Description . . . . . . . . . . . . . . .
Keeping the Letters We Want . . . . . . . . . .
Testing the Code on a Large Data File . . . . .
Finding the Most Frequent Words . . . . . . .
Converting a String to a Frequency Dictionary
Putting It All Together . . . . . . . . . . . . . .
Exercises . . . . . . . . . . . . . . . . . . . . . .
The Final Program . . . . . . . . . . . . . . . .

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

178
180
182
184
187
188
190
192

Appendix A Popular Python Packages . . . . . . . . . . . . . . . 195
Some Popular Packages . . . . . . . . . . . . . . . . . 196

Appendix B Comparing Python 2 and Python 3 . . . . . . . . 199
What’s New in Python 3 . . . . . . . . . . . . . . . . . 200

Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203

Table of Contents vii

From the Library of Mo Medwani

This page intentionally left blank

From the Library of Mo Medwani

1
Introduction to
Programming
Before we dive into the details of Python
programming, it helps to learn a bit about
what Python is and what kinds of programs
it is used for. We will also outline exactly
what it is that programmers do. Finally,
we’ll learn how to install Python and run
the IDLE editor that comes with it.
If you are new to programming, this short
introduction should help you get your footing in preparation for learning the Python
programming language.

In This Chapter
The Python Language

2

What Is Python Useful For?

3

How Programmers Work

4

Installing Python

6

If you already have a grasp of the basic
concepts, feel free to jump ahead to the
sections on how to install Python and run
the editor.

From the Library of Mo Medwani

The Python Language
So what is Python? Briefly, it is a computer
programming language and a corresponding set of software tools and libraries. It was
originally developed in the early 1990s by
Guido van Rossum, and it is now actively
maintained by dozens of programmers
around the world (including van Rossum).
Python was designed to be easy to read
and learn. Compared with programs written in most other programming languages,
Python programs look neat and clean:
Python has few unnecessary symbols, and
it uses straightforward English names.
Python is a very productive language:
Once you’re proficient with Python, you
can get more done with it in less time
than you can in most other programming
languages. Python supports—but doesn’t
force you to use—object-oriented programming (OOP).
Python comes with a wide range of readymade libraries that can be used in your
own programs; as some Python programmers like to say, Python comes with “batteries included.”
A very practical feature of Python is its
maintainability. Since Python programs are
relatively easy to read and modify, they are
easy for programmers to keep up to date.
Program maintenance can easily account
for 50 percent or more of the work a
programmer does, and so Python’s support
for maintenance is a big win in the eyes of
many professionals.
Finally, a word about the name. According
to Python’s originator, Guido van Rossum,
Python was named after the Monty Python
comedy troupe. Despite this mirthful origin,
Python now uses a pair of iconic blue and
yellow snakes—presumably pythons—as its
standard symbol.

2

Chapter 1

From the Library of Mo Medwani

What Is Python
Useful For?
While Python is a general-purpose language that can be used to write any kind
of program, it is especially popular for the
following applications:
■

Scripts. These short programs automate common administrative tasks,
such as adding new users to a system,
uploading files to a website, downloading webpages without using a browser,
and so on.

■

Website development. A number of
Python projects—such as Django
(www.djangoproject.com), Bottle
(www.bottlepy.org), and Zope
(www.zope.org)—are popular among
developers as tools for quickly creating dynamic websites. For instance, the
popular news site www.reddit.com was
written using Python.

■

Text processing. Python has excellent
support for handling strings and text
files, including regular expressions and
Unicode.

■

Scientific computing. Many superb
scientific Python libraries are available
on the web, providing functions for
statistics, mathematics, and graphing.

■

Education. Thanks to its relative
simplicity and utility, Python is becoming more and more popular as a first
programming language in schools.

Of course, Python isn’t the best choice
for all projects. It is often slower than
languages such as Java, C#, or C++. So,
for example, you wouldn’t use Python to
create a new operating system.
But when you need to minimize the
amount of time a programmer spends on a
project, Python is often an excellent choice.

Introduction to Programming 3

From the Library of Mo Medwani

How Programmers
Work
While there is no strict recipe for writing
programs, most programmers follow a
similar process.

The programming process
1. Determine what your program is supposed to do—that is, figure out its
requirements.
2. Write the source code (in our case, the
Python code) in IDLE (Python’s integrated development environment) or
any other text editor. This is often the
most interesting and challenging step,
and it often involves creative problem solving. Python source code files
end with .py: web.py, urlexpand.py,
clean.py, and so on.
3. Convert the source code to object
code using the Python interpreter.
Python puts object code in .pyc files.
For example, if your source code is in
urlexpand.py, its object code will be
put in urlexpand.pyc.
4. Run, or execute, the program. With
Python, this step is usually done immediately and automatically after step 2 is
finished. In practice, Python programmers rarely work directly with object
code or .pyc files.

4

Chapter 1

From the Library of Mo Medwani

5. Finally, check the program’s output.
If errors are discovered, go back to
step 2 to try to fix them. The process of
fixing errors is called debugging. For
large or complex programs, debugging
can sometimes take up most of the
program development time, so experienced programmers try to design their
programs in ways that will minimize
debugging time.
As A shows, this is an iterative process:
You write your program, test it, fix errors,
test it again, and so on until the program
behaves correctly.

A The basic steps of writing any computer

program. Typically, after you check your program
output, you find errors and so must go back to the
code-writing step to fix them.

Lingo Alert
We typically call the contents of a .py file
a program, source code, or just code.
Object code is sometimes referred to
as executable code, the executable, or
even just software.

Introduction to Programming 5

From the Library of Mo Medwani

Installing Python
Python is a hands-on language, so now we
will see how to install it on your computer.

To install Python on Windows:
1. Go to the Python download page at
www.python.org/download.
2. Choose the most recent version of
Python 3 (it should have a name like
Python 3.x, where x is some small number). This will take you to the appropriate download page with instructions
for downloading Python on different
computer systems.

B The starting screen of the IDLE editor. The

first line tells you which version of Python you are
using—in this case, it is version 3.0b1.

3. Click the appropriate installer link for
your computer. For instance, if you are
running Windows, click Windows x86
MSI Installer (3.x).
4. Once the installer has finished downloading, run it by double-clicking it.
5. After the installation has finished (which
could take a few minutes), test to see
that Python is installed properly. Open
the Windows Start menu and choose
All Programs. You should see an entry
for Python 3.0 (often highlighted in
yellow). Select IDLE (Python GUI), and
wait a moment for the IDLE program to
launch B.
6. Try typing in 24 * 7 and pressing Return.
The number 168 should appear.

6

Chapter 1

From the Library of Mo Medwani

Installing Python on the Mac
OS X comes with a version of Python
already installed, although it lacks the
IDLE editor and is typically not the most
up-to-date version. To install a more recent
version of Python, follow the instructions
given at www.python.org/download/mac/.
Or, just download and run an installer from
www.pythonmac.org/packages/. Be careful
to ensure that you have the right version of
Python (3.0 or better) and that the Mac OS
version number matches yours.

Installing Python on Linux
If you are using Linux, chances are you
already have Python installed. To find out,
open a command-line window and type
python. If you get something similar to the
text shown in B, then Python is working.
Be sure to check the version number: This
book covers Python 3. If you have Python
2.x or earlier, then you should install
Python 3.
The exact details for doing so will depend
upon your Linux system. For example, on
Ubuntu Linux, you would search for Python
in the Synaptic Package Manager. You
can also get Linux installation help from
www.python.org/download.

Introduction to Programming 7

From the Library of Mo Medwani

This page intentionally left blank

From the Library of Mo Medwani

2
Arithmetic, Strings,
and Variables
The first step in learning how to program is
to understand the basic Python data types:
integers (whole numbers), floating point
numbers (numbers with a decimal point),
and strings. All programs use these (and
other) data types, so it is important to have
a good grasp of their basic uses.
Strings, in particular, are used in so many
different kinds of programs that Python
provides a tremendous amount of support
for them. In this chapter, we’ll introduce the
basics of strings, and then we’ll return to
them in a later chapter.
We’ll also introduce the important concept
of a programming variable. Variables are
used to store and manipulate data, and
it’s hard to write a useful program without
employing at least a few of them.

In This Chapter
The Interactive Command Shell

10

Integer Arithmetic

11

Floating Point Arithmetic

13

Other Math Functions

16

Strings

17

String Concatenation

19

Getting Help

20

Converting Between Types

22

Variables and Values

24

Assignment Statements

26

How Variables Refer to Values

28

Multiple Assignment

29

Just like learning how to play the piano or
speak a foreign language, the best way to
learn how to program is to practice. Thus,
we’ll introduce all of this using the interactive command shell IDLE, and ideally you
should follow along on your own computer
by typing the examples as we go.

From the Library of Mo Medwani

The Interactive
Command Shell
Let’s see how to interact with the Python
shell. Start IDLE; you should find it listed as
a program in your Start menu on Windows.
On Mac or Linux you should be able to run
it directly from a command line by typing
python. The window that pops up is the
Python interactive command shell, and it
looks something like what’s shown in A.

Lingo Alert
The interactive command shell is
often abbreviated as interactive
shell, command shell, shell, or even
command line.
Shell transcripts are sometimes called
transcripts, interactive sessions, or just
sessions.

The shell prompt
In a Python transcript, >>> is the Python
shell prompt. A >>> marks a line of input
from you, the user, while lines without a
>>> are generated by Python. Thus it is
easy to distinguish at a glance what is from
Python and what is from you.

Transcripts
A shell transcript is a snapshot of the command shell showing a series of user inputs
and Python replies. We’ll be using them
frequently; they’re a great way to learn
Python by seeing real examples in action.

A What you should see when you first launch the Python interactive command shell. The top two lines tell
you what version of Python you are running. The version you see here is Python 3.30, and it was created a
little before 11 a.m. on September 29, 2012.
Python 3.3.0 (v3.3.0:bd8afb90ebf2, Sep 29 2012, 10:55:48) [MSC v.1600 32 bit (Intel)] on win32
Type "copyright", "credits" or "license()" for more information.
>>>

10

Chapter 2

From the Library of Mo Medwani

Integer Arithmetic
An integer is a whole number, such as 25,
−86, or 0. Python supports the four basic
arithmetic operations: + (addition), − (subtraction), * (multiplication), and / (division).
Python also uses ** for exponentiation
and % to calculate remainders (for example,
25 % 7 is 4 because 7 goes into 25 three
times, with 4 left over). For example:
>>> 5 + 9
14
>>> 22 - 6
16
>>> 12 * 14
168
>>> 22 / 7
3.1428571428571428
>>> 2 ** 4
16
>>> 25 % 7
4
>>> 1 + 2 * 3
7
>>> (1 + 2) * 3
9

Integer division
Python also has an integer division operator, //. It works like /, except that it always
returns an integer. For example, 7 // 3
evaluates to the integer 2; the digits after
the decimal are simply chopped off
(// doesn’t round!).

Arithmetic, Strings, and Variables 11

From the Library of Mo Medwani

Order of evaluation
Table 2.1 summarizes Python’s basic arithmetic operators. They are grouped from
lowest precedence to highest precedence.
For example, when Python evaluates
the expression 1 + 2 * 3, it evaluates *
before + because * has higher precedence
(so the expression evaluates to 7—not 9!).
Operators at the same level of precedence
are evaluated in the order they are written.
You can use round brackets, (), to change
the order of evaluation—so, for example,
(1 + 2) * 3 evaluates to 9. In other
words, Python arithmetic follows the same
evaluation rules as regular arithmetic.

Unlimited size
Unlike most other programming languages,
Python puts no limit on the size of an
integer. You can do calculations involving numbers with dozens (or hundreds, or
thousands) of digits:

TABLE 2.1 Basic Arithmetic Operators
Name

Operator

Example

addition

+

>>>3 + 4
7

subtraction

–

>>> 5 – 3
2

multiplication

*

>>> 2 * 3
6

division

/

>>> 3 / 2
1.5

integer division

//

>>> 3 // 2
1

remainder

%

>>> 25 % 3
1

exponentiation

**

>>> 3 ** 3
27

>>> 27 ** 100
136891479058588375991326027382088315
➝ 966463695625337436471480190078368
➝ 997177499076593800206155688941388
➝ 250484440597994042813512732765695
➝ 774566001

12

Chapter 2

From the Library of Mo Medwani

B Examples of basic floating point arithmetic

using the Python command shell. Notice that
approximation errors are quite common, so exact
values are often not printed.
>>> 3.8 + -43.2
-39.400000000000006
>>> 12.6 * 0.5
6.3
>>> 12.6 + 0.01
12.61
>>> 365.0 / 12
30.416666666666668
>>> 8.8 ** -5.4
7.939507629591553e-06
>>> 5.6 // 2
2.0
>>> 5.6 % 3.2
2.3999999999999995

Floating Point
Arithmetic
Floating point arithmetic is done with floating point numbers, which in Python are
numbers that contain a decimal point. For
instance, –3.1, 2.999, and –4.0 are floating
point numbers. We’ll call them floats for
short.
All the basic arithmetic operations that
work with integers also work with floats,
even % (remainder) and // (integer division). See B for some examples.

Float literals
Very large or small floats are often written
in scientific notation:
>>> 8.8 ** -5.4
7.939507629591553e-06

The e-06 means to multiply the preceding number by 10–6. You can use scientific
notation directly if you like:
>>> 2.3e02
230.0

Python is quite forgiving about the use of
decimal points:
>>> 3.
3.0
>>> 3.0
3.0
It’s usually clearer to write 5.0 instead
of 5., as the latter notation can be quite confusing—it looks like the end of a sentence.
The difference between 5 and 5.0
matters: 5 is an integer, while 5.0 is a floating
point number. Their internal representations
are significantly different.

You can write numbers like 0.5 with or
without the leading 0:
>>> .5
0.5
>>> 0.5
0.5

Arithmetic, Strings, and Variables 13

From the Library of Mo Medwani

Overflow
Unlike integers, floating point numbers
have minimum and maximum values that,
if exceeded, will cause overflow errors.
An overflow error means you’ve tried to
calculate a number that Python cannot
represent as a float, because it is either too
big or too small C. Overflow errors can be
silent errors, meaning that Python does the
calculation incorrectly without telling you
that anything bad has happened. Generally
speaking, it is up to you, the programmer,
to avoid overflow errors.

C Floating point overflow: 500.0 ** 10000 is too
big to store as a float.

>>> 500.0 ** 10000
Traceback (most recent call last):
File "", line 1, in 
500.0 ** 10000
OverflowError: (34, 'Result too large')

Limited precision
Precision (or accuracy) is a fundamental difficulty with floats on all computers.
Numbers are represented in binary (base
2) in a computer, and it turns out that not all
floating point numbers can be represented
precisely in binary. Even the simplest
examples can have problems:
>>> 1 - 2 / 3
0.33333333333333337

This should have an infinite number of 3s
after the decimal, but there are only (!) 17
digits here. Plus, the last digit is wrong
(the 7 should be a 3).

14

Chapter 2

From the Library of Mo Medwani

These small errors are not usually a problem: 17 digits after the decimal is enough
for most programs. However, little errors
like this have a nasty habit of becoming big errors when you are doing lots of
calculations. If you are, say, computing
the stresses on a newly designed bridge,
it is necessary to take small floating point
errors into account to ensure that they
don’t balloon into significant errors.
In general, you should prefer integers
to floating point numbers. They are always
accurate and never suffer overflow.

Complex numbers
Python has built-in support for complex
numbers—that is, numbers that involve the
square root of –1. In Python, 1j denotes
the square root of –1:
>>> 1j
1j
>>> 1j * 1j
(-1+0j)

Complex numbers are useful in certain
engineering and scientific calculations; we
won’t be using them again in this book.

Arithmetic, Strings, and Variables 15

From the Library of Mo Medwani

Other Math Functions
Python comes with many different modules of prewritten code, including the math
module. Table 2.2 lists some of the most
commonly used math module functions.

Using return values
We say that these functions return a value.
That means they evaluate to either an integer or a floating point number, depending
on the function.
You can use these functions anywhere that
you can use a number. Python automatically evaluates the function and replaces it
with its return value.

Importing a module
To use the math module, or any existing
Python module, you must first import it:

TABLE 2.2 Some math Module Functions
Name

Description

ceil(x)

Ceiling of x

cos(x)

Cosine of x

degrees(x)

Converts x from radians to
degrees

exp(x)

e to the power of x

factorial(n)

Calculates n! = 1*2*3*…*n
n must be an integer

log(x)

Base e logarithm of x

log(x, b)

Base b logarithm of x

pow(x, y)

x to the power of y

radians(x

Converts x from degrees to
radians

sin(x)

Sine of x

sqrt(x)

Square root of x

tan(x)

Tangent of x

>>> import math

You can now access any math function by
putting math. in front of it:
>>> math.sqrt(5)
2.2360679774997898
>>> math.sqrt(2) * math.tan(22)
0.012518132023611912

An alternative way of importing a module
is this:
>>> from math import *

Now you can call all the math module functions without first appending math.:
>>> log(25 + 5)
3.4011973816621555
>>> sqrt(4) * sqrt(10 * 10)
20.0

16

When using the from math import *
style of importing, if you have functions with
the same name as any of the functions in the
math module, the math functions will overwrite them!
Thus, it’s generally safer to use the
import math style of importing. This will
never overwrite existing functions.
You can also import specific functions
from the math module. For example, from
math import sqrt, tan imports just the
sqrt and tan functions.

Chapter 2

From the Library of Mo Medwani

Strings
A string is a sequence of one or more characters, such as "cat!", "567-45442", and
"Up and Down". Characters include letters,
numbers, punctuation, plus hundreds of
other special symbols and unprintable
characters.

Indicating a string
Python lets you write string literals in three
main ways:
■

Single quotes, such as 'http', 'open
house', or 'cat'

■

Double quotes, such as "http", "open
house", or "cat"

■

Triple quotes, such as """http""", or
multiline strings, such as
"""
Me and my monkey
Have something to hide
"""

Many Python programmers prefer using
single quotes to indicate strings, simply
because they involve less typing than double
quotes (which require pressing the Shift key).
One of the main uses of single and
double quotes is to conveniently handle
" and ' characters inside strings:

"It's great"
'She said "Yes!"'
You’ll get an error if you use the wrong
kind of quote within a string.
Triple quotes are useful when you need
to create long, multiline strings. They can also
contain " and ' characters at the same time.

Arithmetic, Strings, and Variables 17

From the Library of Mo Medwani

String length
To determine the number of characters in a
string, use the len(s) function:
>>> len('pear')
4
>>> len('up, up, and away')
16
>>> len("moose")
5
>>> len("")
0

The last example uses the empty string,
usually denoted by '' or "". The empty
string has zero characters in it.
Since len evaluates to (that is, returns) an
integer, we can use len anywhere that an
integer is allowed—for example:
>>> 5 + len('cat') * len('dog')
14

18

Chapter 2

From the Library of Mo Medwani

String Concatenation
You can create new strings by “adding”
together old strings:
>>> 'hot ' + 'dog'
'hot dog'
>>> 'Once' + " " + 'Upon' + ' ' +
➝ "a Time"
'Once Upon a Time'

This operation is known as concatenation.
There’s a neat shortcut for concatenating
the same string many times:
>>> 10 * 'ha'
'hahahahahahahahahaha'
>>> 'hee' * 3
'heeheehee'
>>> 3 * 'hee' + 2 * "!"
'heeheehee!!'

The result of string concatenation is always
another string, so you can use concatenation anywhere that requires a string:
>>> len(12 * 'pizza pie!')
120
>>> len("house" + 'boat') * '12'
'121212121212121212'

Arithmetic, Strings, and Variables 19

From the Library of Mo Medwani

Getting Help
Python is a largely self-documenting language. Most functions and modules come
with brief explanations to help you figure
out how to use them without resorting to a
book or website.

Listing functions in a module
Once you’ve imported a module, you can
list all of its functions using the dir(m)
function:
>>> import math
>>> dir(math)
['__doc__', '__name__',
➝ '__package__', 'acos', 'acosh',
➝ 'asin', 'asinh', 'atan', 'atan2',
➝ 'atanh', 'ceil', 'copysign',
➝ 'cos', 'cosh', 'degrees', 'e',
➝ 'exp', 'fabs', 'factorial',
➝ 'floor', 'fmod', 'frexp', 'hypot',
➝ 'isinf', 'isnan', 'ldexp', 'log',
➝ 'log10', 'log1p', 'modf', 'pi',
➝ 'pow', 'radians', 'sin', 'sinh',
➝ 'sqrt', 'sum', 'tan', 'tanh',
➝ 'trunc']

This gives you a quick overview of the
functions in a module, and many Python
programmers use dir(m) all the time.
For now, you can ignore the names beginning with a double underscore __; they
are used only in more advanced Python
programming.

20

Chapter 2

From the Library of Mo Medwani

To see a list of all the built-in functions
in Python, type dir(__builtins__) at the
command prompt.
An alternative way to see the doc
string for a function f is to use the help(f)
function.
You can run the Python help utility by
typing help() at a prompt. This will provide
you with all kinds of useful information, such
as a list of all available modules, help with
individual functions and keywords, and more.
You can also get help from the Python
documentation (www.python.org/doc/). There
you’ll find helpful tutorials, plus complete
details of all the Python language and standard modules.

Printing documentation strings
Another useful trick is to print a function’s
documentation string (doc string for short):
>>> print(math.tanh.__doc__)
tanh(x)
Return the hyperbolic tangent of x.

Most built-in Python functions, along with
most functions in Python’s standard modules (such as math), have short doc strings
you can access in this way.
As another example, here’s the doc string
for the built-in function bin:
>>> print(bin.__doc__)
bin(number) -> string
Return the binary representation of
➝ an integer or long integer.
>>> bin(25)
'0b11001'

Arithmetic, Strings, and Variables 21

From the Library of Mo Medwani

Converting
Between Types
Converting from one type of data to
another is a common task, and Python
provides a number of built-in functions to
make this easy.

Implicit Conversions
Sometimes Python will convert
between numeric types without requiring an explicit conversion function. For
example:
>>> 25 * 8.5
212.5

Converting integers and
strings to floats:
To convert the integer 3 to a float, use the
float(x) function:

Here, 25 is automatically converted to
25.0, and then multiplied by 8.5. In general, when you mix integers and floats in
the same expression, Python automatically converts the integers to floats.

>>> float(3)
3.0

Converting a string to a float is similar:
>>> float('3.2')
3.2000000000000002
>>> float('3')
3.0

Converting integers and
floats to strings:
The str(n) function converts any number
to a corresponding string:
>>> str(85)
'85'
>>> str(-9.78)
'-9.78'

22

Chapter 2

From the Library of Mo Medwani

Converting a float to an integer:
Rounding
Many people are surprised that
round(8.5) is 8 in Python, and not 9.
In elementary school, you were probably taught that numbers ending with
.5 should always be rounded up.
But always rounding up leads to a bias
that can cause inaccurate calculations.
So Python uses a different strategy for
rounding, called “round half to even,”
or, sometimes, “bankers rounding.” The
idea is that numbers ending in .5 are
rounded to the nearest even integer.
Thus, sometimes numbers ending in
.5 are rounded down, and sometimes
they are rounded up.
This strategy might seem strange at
first, and it is different from how rounding works in Python 2. But it is generally accepted as the standard way to
round numbers on a computer. If you
are curious about the details, take a
look at the Wikipedia entry on rounding:
http://en.wikipedia.org/wiki/Rounding.

This is a little tricky because you must
decide how to handle any digits after the
decimal in your float. The int(x) function simply chops off extra digits, while
round(x) does the usual kind of rounding
off:
>>> int(8.64)
8
>>> round(8.64)
9
>>> round(8.5)
8

Converting strings to numbers:
This is easily done with the int(s) and
float(s) functions:
>>> int('5')
5
>>> float('5.1')
5.1
For most applications, you should
be able to handle numeric conversions
using int(x), float(x), and round(x).
However, for more specific conversions,
the Python math module has a number of
functions for removing digits after decimals:
math.trunc(x), math.ceil(x), and
math.floor(x).
The int(s) and float(s) conversions
from strings to floats/integers assume that
the string s “looks” like a Python float/integer.
If not, you’ll get an error message saying the
conversion could not be done.

Arithmetic, Strings, and Variables 23

From the Library of Mo Medwani

Variables and Values
Variables are one of the most important
concepts in all of programming. In Python,
variables label, or point to, a value.

Lingo Alert
Just like variables, functions, modules,
and classes all have names. We refer to
these names collectively as identifiers.

For example:
>>> fruit = "cherry"
>>> fruit
'cherry'

Here, fruit is a variable name, and it
points to the string value "cherry". Notice
that variables are not surrounded by quotation marks.
The line fruit = "cherry" is called an
assignment statement. The = (equals sign)
is called the assignment operator and is
used to make a variable point to a value.
When Python encounters a variable, it
replaces it with the value it points to. Thus:
>>> cost = 2.99
>>> 0.1 * cost
0.29900000000000004
>>> 1.06 * cost + 5.99
9.1594000000000015

24

Chapter 2

From the Library of Mo Medwani

TABLE 2.3 Legal and Illegal Variable Names
Legal

Illegal

M

"m"

x1

1x

tax_rate

tax rate

taxRate

taxRate!

Else

else

D else is a Python keyword, so it cannot be used

Rules for making variable names
Variable names must follow a few basic
rules (see Table 2.3 for some examples):
■

A variable name can be of any length,
although the characters in it must be
either letters, numbers, or the underscore character (_). Spaces, dashes,
punctuation, quotation marks, and other
such characters are not allowed.

■

The first character of a variable name
cannot be a number; it must be a letter
or an underscore character.

■

Python is case sensitive—it distinguishes between uppercase and lowercase letters. Thus TAX, Tax, and tax
are three completely different variable
names.

■

You cannot use Python keywords as
variable names. For example, if, else,
while, def, or, and, not, in, and is are
some of Python’s keywords (we’ll learn
what these are used for later in the
book). If you try to use one as a variable, you’ll get an error D.

as a variable.

>>> else = 25
SyntaxError: invalid syntax

Arithmetic, Strings, and Variables 25

From the Library of Mo Medwani

Assignment
Statements
Assignment statements have three main
parts: a left-hand side, the assignment
operator, and a right-hand side E.
Assignment statements have two purposes: They define new variable names,
and they make already-defined variables
point to values. For instance:
>>> x = 5

E Anatomy of an assignment statement. This

makes var point to value. The left-hand side must
always be a variable, while the right-hand side
can be a variable, a value, or any expression that
evaluates to a value.

>>> 2 * x + 1
11
>>> x = 99

The first assignment statement, x = 5,
does double duty: It is an initialization
statement. It tells Python to create a new
variable named x and that it should be
assigned the value 5. We can now use x
anywhere an integer can be used.
The second assignment statement, x = 99,
reassigns x to point to a different value.
It does not create x, because x already
exists thanks to the previous assignment
statement.
If you don’t initialize a variable, Python
complains with an error:
>>> 2 * y + 1
Traceback (most recent call last):
File "", line 1, in
➝ 
2 * y + 1
NameError: name 'y' is not defined

26

Chapter 2

From the Library of Mo Medwani

Lingo Alert
A number of terms are commonly used to
describe variables and values. We sometimes say a variable is assigned a value,
or given a value.
A variable with an assigned value is said
to point to its value, or label it, or simply
have it.
Sometimes programmers say a variable
contains its value, as if the variable were
a bucket and the value was inside of
it. The problem with this is that Python
variables don’t quite follow the rules you
would expect a “containment” model to
follow. For instance, an object can be in
only one bucket at a time, but multiple
values are allowed to point to the same
value in Python.

This error message tells you that the variable y has not been defined, and so Python
does not know what value to replace it with
in the expression 2 * y + 1.
A variable can be assigned any value, even
if it comes from other variables. Consider
this sequence of assignments:
>>> x = 5
>>> x
5
>>> y = 'cat'
>>> y
'cat'
>>> x = y
>>> x
'cat'
>>> y
'cat'

Arithmetic, Strings, and Variables 27

From the Library of Mo Medwani

How Variables
Refer to Values
A Python assignment statement of the form
x = expr can be summarized in English
like this: Make x point to the value that
expr evaluates to.
Keep in mind that expr can be any Python
expression that evaluates to a value.

F After running the statement rate

= 0.04.

There’s a nice way of drawing diagrams
to help understand sequences of assignments. For example, after the assignment
rate = 0.04, you can imagine that your
computer’s memory looks like F. Then,
after rate_2008 = 0.06, we get G. Finally,
rate = rate_2008 gives us H.
When a value no longer has any variable
pointing to it (for example, 0.04 in H),
Python automatically deletes it. In general, Python keeps track of all values and
automatically deletes them when they are
no longer referenced by a variable. This is
called garbage collection, and so Python
programmers rarely need to worry about
deleting values themselves.

Assignments don’t copy
It’s essential to understand that assignment
statements don’t make a copy of the value
they point to. All they do is label, and relabel, existing values. Thus, no matter how
big or complex the object a variable points
to, assignment statements are always
exceedingly efficient.

Numbers and strings are immutable
An important feature of Python numbers
and strings is that they are immutable—
that is, they cannot be changed in any
way, ever. Whenever it seems that you are
modifying a number or string, Python is in
fact making a modified copy I.
28

G After rate_2008

= 0.06.

H After rate = rate_2008. Notice that the value
0.04 no longer has any variable pointing to it. Thus
Python automatically deletes it, a process known
as garbage collection.
I Whenever it appears that you are modifying a

string, Python is in fact making a copy. There is no
way to modify numbers or strings in Python.
>>> s = 'apple'
>>> s + 's'
'apples'
>>> s
'apple'
>>> 5 = 1
SyntaxError: can't assign to literal

Chapter 2

From the Library of Mo Medwani

Multiple Assignment
Python has a convenient trick that lets you
assign more than one variable at a time:
>>> x, y, z = 1, 'two', 3.0
>>> x
1
>>> y
'two'
>>> z
3.0
>>> x, y, z
(1, 'two', 3.0)

As the last statement shows, you can also
display multiple values on one line by writing them as a tuple. Tuples always begin
with an open round bracket (() and end
with a closed round bracket ()).

Arithmetic, Strings, and Variables 29

From the Library of Mo Medwani

Swapping variable values
A useful trick you can do with multiple
assignment is to swap the values of two
variables:
>>> a, b = 5, 9
>>> a, b
(5, 9)
>>> a, b = b, a
>>> a, b
(9, 5)

The statement a, b = b, a is said to assign
values to a and b in parallel. Without using
multiple assignment, the standard way to
swap variables is like this:
>>> a, b = 5, 9
>>> temp = a
>>> a = b
>>> b = temp
>>> a, b
(9, 5)

Multiple assignment doesn’t do anything
you can’t already do with regular assignment. It is just a convenient shorthand that
we will sometimes be using.

30

Chapter 2

From the Library of Mo Medwani

3
Writing Programs
Up to now, we’ve been writing single
Python statements and running them at the
interactive command line. While that’s useful for learning about Python functions, it
quickly becomes tiresome when you need
to write many lines of Python code.
Thus we turn to writing programs (also
known as scripts). Programs are just text
files containing a collection of Python
commands. When you run (or execute) a
program, Python performs each statement
in the file one after the other.

In This Chapter
Using IDLE’s Editor

32

Compiling Source Code

35

Reading Strings from the Keyboard

36

Printing Strings on the Screen

39

Source Code Comments

41

Structuring a Program

42

In this chapter, we’ll learn how to write and
run programs in IDLE and from the command line. We’ll see how to get keyboard
input from the user and print strings to the
screen.
You should make an effort to type the
code yourself, since it is an excellent way
to get used to the various rules of writing Python. For larger programs, you can
download the code from this book’s website: http://pythonintro.googlecode.com.

From the Library of Mo Medwani

Using IDLE’s Editor

TABLE 3.1 Some Useful IDLE Shortcuts

IDLE comes with a Python-aware text
editor. The best way to learn about it is to
write a simple program.

Command

To write a new program in IDLE:
1. Launch IDLE.
2. Choose File > New Window.

What It Does

Ctrl-N

Open a new editor window.

Ctrl-O

Open a new file for editing.

Ctrl-S

Save the current program.

F5

Run the current program.

Ctrl-Z

Undo the last action.

Shift-Ctrl-Z

Redo the last undo.

A blank editor window should pop up.
3. To test it, enter the following into it:
print('Welcome to Python!')

4. Save your program by choosing
File > Save. Save it in your Python
programs folder with the name
welcome.py; the .py at the end
indicates that this is a Python file.
5. Run your program by choosing
Run > Run Module.
A Python shell should appear, and
you should see Welcome to Python!
within it.
When you start to get more familiar with
the IDLE editor, you may want to start using
some of the key commands listed in Table
3.1. They can really speed up your editing.
Create a special folder called, say,
python on your computer’s Desktop to store
all your Python programs. Never save them in
the Python directory; otherwise, you run the
risk of accidentally overwriting one of Python’s
core files.
You must type in Python programs
exactly as you see them, character for character. A single wrong character—an extra space,
an l instead of a 1—can cause errors.
If you do see an error when you run your
program, go back to the editor window and
carefully check that your program was typed
correctly, character for character.

32

Chapter 3

From the Library of Mo Medwani

Other Editors
IDLE is an excellent editor for beginners, and even some professionals use
it all the time. But if IDLE is not to your
liking, a quick web search for “programming editors” will give you many other
suggestions. For instance, on Windows,
Notepad++ is a popular free programming editor. Another popular, although
not free, choice is Sublime Text, which
works on Windows, Mac, and Linux
systems.
Take a look at http://wiki.python.org/
moin/PythonEditors for many more
suggestions.

Running programs from
the command line
Another common way to run a Python
program is from the command line. For
example, to run welcome.py, you can open
a command-line window and run it by typing this:
C:\> python welcome.py
Welcome to Python!

You can also just call Python without a program and get a bare-bones (but still quite
useful) version of the interactive interpreter.

To call Python from the
command line:
Type the following:
C:\> python
Python 3.0b2 (r30b2:65106, Jul 18
➝ 2008, 18:44:17) [MSC v.1500 32 bit
➝ (Intel)] on win32
Type "help", "copyright", "credits"
"license" for more information.

➝ or

>>>

Calling Python from the command line is
most commonly used when you run Python
scripts as parts of other programs.

Writing Programs 33

From the Library of Mo Medwani

The easiest way to open a command
window in Windows is to click the Start menu;
then type cmd in the search box and press
Return. This should give you a command-line
window.
Running Python from the command line
is similar on Mac and Linux systems: run a
command shell (the exact details for doing this
differ from system to system, but try browsing
programs available through menus on your
Desktop), and then type python followed by
the name of the program you want to run.
One annoyance with running Python
from the command line is that it is often necessary to configure environment variables, in
particular your system’s path variable, so that
your system knows where to find Python on
your computer. The details are finicky and system specific, and are beyond the scope of this
book. However, it is not hard to find detailed
instructions online if you want to set this up.
For instance, just type set windows path into
your favorite search engine. Take care when
you are modifying environment variables: If
you are not sure exactly what you are doing,
it is quite possible to “break” your system so
that programs no longer run correctly. In that
case, your best option is usually to start over
and reinstall Python.

34

Chapter 3

From the Library of Mo Medwani

Compiling Source Code

A Python consists of three major components:

an interpreter for running single statements; a
compiler for converting .py files to .pyc files; and
a virtual machine for running .pyc files. Note that
IDLE is not strictly part of Python; it is a separate
application that sits on top of Python to make it
easier to use.

We often refer to the statements inside
a Python program as source code, and
so a program file is sometimes called a
source code file, or source file. By convention, all Python source code files end with
the extension .py. This makes it easy for
people and programs to see at a glance
that the file contains Python source code.

Object code
When you run a .py file, Python automatically creates a corresponding .pyc
file A. A .pyc file contains object code, or
compiled code. Object code is essentially
a Python-specific language that represents
your Python source code in a way that is
easier for the computer to run efficiently.
It is not meant for humans to read, and so
most of the time you should just ignore the
.pyc files that start to appear.
A Python program runs using a special
piece of software called a virtual machine.
This is essentially a software simulation of
a computer designed just to run Python,
and it is part of the reason why many .pyc
files can run on different computer systems
without any change.
You will rarely, if ever, have to worry
about .pyc files. Python automatically creates
them when needed, and also automatically
updates them when you change the corresponding .py files. Don’t delete, rename, or
modify the .pyc files!
Since they are meant to be read only by
the computer, .pyc files are not stored as text
files. If you try to view a .pyc file in a text editor, you’ll see nothing but junk characters.

Writing Programs 35

From the Library of Mo Medwani

Reading Strings
from the Keyboard
Reading a string from the keyboard is one
of the most basic ways of getting information from a user. For example, consider this
simple program:
# name.py
name = input('What is your first
➝ name? ')
print('Hello ' + name.capitalize()
'!')

➝+

To run this in IDLE, open name.py in an
IDLE window, and then to run it press
F5 (or, equivalently, choose Run > Run
Module).
You should see this in the window that
appears:
What is your first name? jack
Hello Jack!

You, the user, must type in the name (in this
case, the string 'jack').

Tracing the program
Let’s look carefully at each line of the
program. The first line is a source code
comment, or comment for short. A comment is just a note to the programmer, and
Python ignores it. Python comments always
start with a # symbol and continue to the
end of the line. This particular comment
tells you that the program is stored in a file
called name.py.
The second line calls the input function,
which is the standard built-in function for
reading strings from the keyboard. When it
runs, the prompt 'What is your name?'
appears in the output window, followed by
a blinking cursor. The program waits until
the user enters a string and presses Enter.

36

Chapter 3

From the Library of Mo Medwani

The input function evaluates to whatever
string the user enters, and so the variable
name ends up labeling the string that the
user types in.
The third and final line of the program
displays a greeting. The function
name.capitalize() ensures that the
first character of the string is uppercase
and the remaining characters are lowercase. This way, if the user happens to
enter a name that isn’t correctly capitalized, Python will correct it.
To see what functions are available for
strings, type dir('') at IDLE’s interactive
command line.
If you run name.py with a number of
sample strings, you’ll soon discover that entering a name like 'Jack Aubrey' will actually
uncapitalize the last name: 'Hello Jack
aubrey!'. That’s because the capitalize
function is very simpleminded—it knows nothing about words or spaces.
Another common and useful trick when
reading strings from the keyboard is to use the
strip() function to remove any leading/trailing whitespace characters. For instance:

>>> ' oven '.strip()
'oven'
Stripping a string of unwanted spaces is so
common that we often write calls to input
like this:

name = input('Enter age: ').strip()

Writing Programs 37

From the Library of Mo Medwani

Reading numbers from the keyboard
The input function only returns strings, so
if you need a number data type (for example, to do arithmetic), you must use one of
Python’s numeric conversion functions. For
example, consider this program:
# age.py
age = input('How old are you
➝ today? ')
age10 = int(age) + 10
print('In 10 years you will be ' +
+ ' years old.')

➝ str(age10)

Suppose the user types in 22 in response
to this program. Then the variable age
labels the string '22'—Python does not
automatically convert strings that look
like numbers to integer or float values. If
you want to do arithmetic with a string,
you must first convert it to a number using
either int(s) (if you want an integer) or
float(s) (if you want a float).
The one final trick to notice is that in the
print statement, it’s necessary to convert
the variable age10 (which labels an integer)
back into a string so that it can be printed.
If you forget this conversion, Python issues
an error saying it can’t add numbers and
strings.

38

Different Types of Numbers
All the different types of numbers can
be confusing at first. Consider these four
different values: 5, 5.0, '5', and '5.0'.
While they look similar, they have very
different internal representations.
5 is an integer and can be used directly
for arithmetic.
5.0 is a floating point number that can
also be used for arithmetic, but it allows
for digits after the decimal place.
'5' and '5.0' are strings consisting of
one and three characters, respectively.
Strings are meant for being displayed
on the screen or for doing characterbased operations (such as removing
whitespace or counting characters).
Strings can’t be used to do numeric
arithmetic. Of course, strings can be
used with concatenation, although the
results might be a bit jarring at first—for
example:
>>> 3 * '5'
'555'
>>> 3 * '5.0'
'5.05.05.0'

Chapter 3

From the Library of Mo Medwani

Lingo Alert
Programmers often use the terminology
standard output, abbreviated stdout,
to refer to the window where text goes
when printed. Typically, stdout is a
simple text window that does little more
than display strings: No graphics of any
kind are allowed.
Similarly, standard input, abbreviated
stdin, is the location from where the
input function reads strings. Usually this
is the same window as stdout, but it is
possible to change one or both of stdout
and stdin if necessary.
You will sometimes also see the term
standard error, abbreviated stderr, to
refer to where error messages are displayed. By default, error messages are
usually displayed on stdout.

Printing Strings
on the Screen
The print statement is the standard built-in
function for printing strings to the screen.
As we will see, it is extremely flexible and
has many useful features for formatting
strings and numbers in just the right way.
You can pass any number of strings to
print:
>>> print('jack', 'ate', 'no', 'fat')
jack ate no fat

By default, it prints out each string in the
standard output window, separating the
strings with a space. You can easily change
the string separator like this:
>>> print('jack', 'ate', 'no', 'fat',
➝ sep = '.')
jack.ate.no.fat

By default, a printed string ends with a
newline character: \n. A newline character causes the cursor to move to the next
line when the string is printed, and so, by
default, you can’t print anything on the
same line after calling print:
# jack1.py
print('jack ate ')
print('no fat')

This prints two lines of text:
jack ate
no fat

To put all the text on a single line, you can
specify the end character of the first line to
be the empty string:
# jack2.py
print('jack ate ', end = '')
print('no fat')

Writing Programs 39

From the Library of Mo Medwani

The print function is one of the major
differences between Python 2 and Python 3.
Before Python 3, print was not technically
a function, but instead was a built-in part of
the language. The one advantage of this was
that you didn’t have to type the brackets—for
example, you would type print('jack ate
no fat'). However, despite that small convenience, print’s not being a function made it
very difficult to change the default separator
and ending strings, which is often necessary in
more advanced programs.
Another difference between Python 2
and 3 is that Python 3’s input function was
called raw_input in Python 2. Python 2 also
had a function called input, but it evaluated
the string that the user entered, which was
occasionally handy. There is no equivalent
of the Python 2 input function in Python 3,
although you can easily simulate it by typing
eval(input(prompt)). For example:

>>> eval(input('? '))
? 4 + 5 * 6
34

40

Chapter 3

From the Library of Mo Medwani

Source Code
Comments
We’ve already seen source code comments used to specify the name of a file.
But comments are useful for any kind of
note that you might want to put into a program, such as documentation, reminders,
explanations, or warnings. Python ignores
all comments, and they are only there to be
read by you and other programmers who
might read the source code.
Here’s a sample program that shows some
more uses of comments:
# coins_short.py
# This program asks the user how
➝ many
# coins of various types they have,
# and then prints the total amount
# of money in pennies.
# get the number of nickels, dimes,
# and quarters from the user
n = int(input('Nickels? '))
d = int(input('Dimes? '))
q = int(input('Quarters? '))
# calculate the total amount of
➝ money

total = 5 * n + 10 * d + 25 * q
# print the results
print() # prints a blank line
print(str(total) + ' cents')

Writing Programs 41

From the Library of Mo Medwani

Structuring a Program
As you start to write more programs, you
will soon notice that they tend to follow
a common structure. Typically, programs
are organized as in B: They have an input
part, a processing part, and an output part.
For the small programs that we are starting
out with, this structure is usually obvious
and does not require much thought. But as
your programs get bigger and more complex, it is easy to lose sight of this overall
structure, which often results in messy
code that is hard to understand.
Thus, indicating in comments what parts
are for input, processing, and output is a
good habit to get into. It helps clarify the
different tasks your program performs; and,
when we start writing functions, it provides
a natural way of dividing up your programs
into sensible functions.

42

B Most programs have the structure shown here:

First you get input (for example, from the user
using the input function), then you process it, and
then you display the results for the user to see.

Chapter 3

From the Library of Mo Medwani

4
Flow of Control
The programs we’ve written so far are
straight-line programs that consist of a
sequence of Python statements executed
one after the other. The flow of execution is
simply a straight sequence of statements,
with no branching or looping back to previous statements.
In this chapter, we look at how to change
the order in which statements are executed
by using if-statements and loops. Both are
essential in almost any nontrivial program.

In This Chapter
Boolean Logic

44

If-Statements

49

Code Blocks and Indentation

51

Loops

54

Comparing For-Loops and While-Loops

59

Breaking Out of Loops and Blocks

64

Loops Within Loops

66

Both if-statements and loops are controlled
by logical expressions, and so the first part
of this chapter will introduce the idea of
Boolean logic.
Read the sample programs in this chapter
carefully. Take the time to try them out and
make your own modifications.

From the Library of Mo Medwani

Boolean Logic
In Python, as in most programming languages, decisions are made using Boolean
logic. Boolean logic is all about manipulating so-called truth values, which in Python
are written True and False. Boolean logic
is simpler than numeric arithmetic, and is
a formalization of logical rules you already
know.
We combine Boolean values using four
main logical operators (or logical connectives): not, and, or, and ==. All decisions that can be made by Python—or any
computer language, for that matter—can
be made using these logical operators.
Suppose that p and q are two Python variables each labeling Boolean values. Since
each has two possible values (True or
False), altogether there are four different
sets of values for p and q (see the first two
columns of Table 4.1). We can now define
the logical operators by specifying exactly
what value they return for the different
truth values of p and q. These kinds of
definitions are known as truth tables, and
Python uses an internal version of them to
evaluate Boolean expressions.

TABLE 4.1 Truth Table for Basic Logical Operators
p

q

p == q

p != q

p and q

p or q

not p

False

False

True

False

False

False

True

False

True

False

True

False

True

True

True

False

False

True

False

True

False

True

True

True

False

True

True

False

44

Chapter 4

From the Library of Mo Medwani

Logical equivalence

Logical “or”

Let’s start with ==. The expression p == q
is True only when p and q both have the
same truth value—that is, when p and q
are either both True or both False. The
expression p != q tests if p and q are not
the same, and returns True only when they
have different values.

The Boolean expression p or q is True
exactly when p is True or q is True, or
when both are True. This is summarized
in the sixth column of Table 4.1. The only
slightly tricky case is when both p and q
are True. In this case, the expression p or
q is True.

>>> False == False

>>> False or False

True

False

>>> True == False

>>> False or True

False

True

>>> True == True

>>> True or False

True

True

>>> False != False

>>> True or True

False

True

>>> True != False
True
>>> True != True
False

Logical “and”
The Boolean expression p and q is True
only when both p is True and q is True. In
every other case it is False. The fifth column of Table 4.1 summarizes each case.

Logical negation
Finally, the Boolean expression not p is
True when p is False, and False when p
is True. It essentially flips the value of the
variable.
>>> not True
False
>>> not False
True

>>> False and False
False
>>> False and True
False
>>> True and False
False
>>> True and True
True

Flow of Control 45

From the Library of Mo Medwani

Evaluating larger Boolean
expressions
Since Boolean expressions are used to
control both if-statements and loops, it is
important to understand how they are evaluated. Just as with arithmetic expressions,
Boolean expressions use both brackets and
operator precedence to specify the order in
which their sub-parts are evaluated.

To evaluate a Boolean
expression with brackets:
Suppose we want to evaluate the expression not (True and (False or True)).
We can do it by following these steps:
■

not (True and (False or True))

Expressions in brackets are always
evaluated first, and so we first evaluate False or True, which is True.
This makes the original expression equivalent to this simpler one:
not (True and True).
■

not (True and True)

To evaluate this expression, we again
evaluate the expression in brackets
first: True and True evaluates to True,
which gives us the equivalent expression: not True.
■

not True

Finally, to evaluate this expression, we
simply look up the answer in the last
column of Table 4.1: not True evaluates
to False. Thus, the entire expression
not (True and (False or True))

evaluates to False. You can easily
check that this is the correct answer in
Python itself:
>>> not (True and (False or
➝ True))
False

46

Chapter 4

From the Library of Mo Medwani

TABLE 4.2 Boolean Operator Priority
(Highest to Lowest)
p == q
p != q
not p
p and q

To evaluate a Boolean
expression without brackets:
Suppose we want to evaluate the expression not True and False or True. This is
the same as the previous one, but this time
there are no brackets.
■

p or q

not True and False or True

We first evaluate the operator with
the highest precedence, as listed in
Table 4.2. In this case, not has the highest precedence, and so not True is
evaluated first (the fact that it happens
to be at the start of the expression is a
coincidence). This simplifies the expression to False and False or True.
■

False and False or True

We again evaluate the operator with the
highest precedence. According to Table
4.2, and has higher precedence than
or, and so False and True is evaluated first. The expression simplifies to
False or True.
■

False or True

This final expression evaluates to True,
which is found by looking up the answer
in Table 4.1. Thus the original expression, False and not False or True,
evaluates to True.
Writing complicated Boolean expressions
without brackets is usually a bad idea because
they are hard to read and evaluate—not all
programmers remember the order of precedence of Boolean operators!
One exception is when you use the same
logical operator many times in a row. Then it
is usually easier to read without the brackets.
For example:

>>> (True or (False or (True or
➝ False)))
True
>>> True or False or True or False
True
Flow of Control 47

From the Library of Mo Medwani

Short-circuit evaluation
The definition of the logical operators
given in Table 4.1 is the standard definition
you would find in any logic textbook. However, like most modern programming languages, Python uses a simple trick called
short-circuit evaluation to speed up the
evaluation of some Boolean expressions.

TABLE 4.3 Definition of Boolean Operators
in Python
Operation

Result

p or q

if p is False, then q, else p

p and q

if p is False, then p, else q

Consider the Boolean expression False
and X, where X is any Boolean expression. It turns out that no matter whether X
is True or X is False, the entire expression is False. The reason is that the initial
False makes the whole and-expression
False. The value of False and X does not
depend on X—it is always False. In such
cases, Python does not evaluate X at all—it
simply stops and returns the value False.
This can speed up the evaluation of Boolean expressions.
Similarly, Boolean expressions of the form
True or X are always True, no matter
the value of X. The precise rules for how
Python does short-circuiting are given in
Table 4.3.
Most of the time you can ignore shortcircuiting and just reap its performance
benefits. However, it is useful to remember that Python does this, since every
once in a while it could be the source of a
subtle bug.
It’s possible to use the definitions of and
and or from Table 4.3 to write short and tricky
code that simulates if-statements (which we
will see in the next section). However, such
expressions are usually quite difficult to read,
so if you ever run across such expressions in
other people’s Python code (you should never
put anything so ugly in your programs!), you
may need to refer to Table 4.3 to figure out
exactly what they are doing.

48

Chapter 4

From the Library of Mo Medwani

If-Statements
If-statements let you change the flow of
control in a Python program. Essentially,
they let you write programs that can
decide, while the programming is running,
whether or not to run one block of code
or another. Almost all nontrivial programs
use one or more if-statements, so they are
important to understand.

If/else-statements
Suppose you are writing a passwordchecking program. The user enters their
password, and if it is correct, you log them
in to their account. If it is not correct, then
you tell them they’ve entered the wrong
password:
# password1.py
pwd = input('What is the password? ')
if pwd == 'apple': # note use of == #
➝ instead of =
print('Logging on ...')
else:
print('Incorrect password.')
print('All done!')

It’s pretty easy to read this program: If
the string that pwd labels is 'apple', then
a login message is printed. But if pwd is
anything other than 'apple', the message
incorrect password is printed.
An if-statement always begins with the
keyword if. It is then (always) followed
by a Boolean expression called the
if-condition, or just condition for short.
After the if-condition comes a colon (:).
As we will see, Python uses the : token
to mark the end of conditions in
if-statements, loops, and functions.

Flow of Control 49

From the Library of Mo Medwani

Everything from the if to the : is referred
to as the if-statement header. If the condition in the header evaluates to True,
then the statement print('Logging
on ...') is immediately executed, and
print('Incorrect password.') is
skipped and never executed.
If the condition in the header evaluates to False, then print('Logging on
...') is skipped, and only the statement
print('Incorrect password.') is
executed.
In all cases, the final print('All done!')
statement is executed.

We will often refer to the entire multiline
if structure as a single if-statement.
You must put at least one space after
the if keyword.
The if keyword, the condition, and the
terminating : must appear all on one line
without breaks.
The else-block of an if-statement is
optional. Depending on the problem you are
solving, you may or may not need one.

The general structure of an if/else-statement
is shown in A.

A This flow chart shows the general format and behavior of an if/else-statement.
The code blocks can consist of any number of Python statements (even other
if-statements!).

50

Chapter 4

From the Library of Mo Medwani

Code Blocks and
Indentation
One of the most distinctive features of
Python is its use of indentation to mark
blocks of code. Consider the if-statement
from our password-checking program:
if pwd == 'apple':
print('Logging on ...')
else:
print('Incorrect password.')
print('All done!')

The lines print('Logging on ...') and
print('Incorrect password.') are two
separate code blocks. These ones happen
to be only a single line long, but Python
lets you write code blocks consisting of
any number of statements.
To indicate a block of code in Python, you
must indent each line of the block by the
same amount. The two blocks of code in
our example if-statement are both indented
four spaces, which is a typical amount of
indentation for Python.
In most other programming languages,
indentation is used only to help make
the code look pretty. But in Python, it is
required for indicating what block of code
a statement belongs to. For instance, the
final print('All done!') is not indented,
and so is not part of the else-block.
IDLE is designed to automatically indent
code for you. For instance, pressing Return
after typing the : in an if-header automatically
indents the cursor on the next line.
The amount of indentation matters: A
missing or extra space in a Python block could
cause an error or unexpected behavior. Statements within the same block of code need to
be indented at the same level.

Programmers familiar with other languages
often bristle at the thought that indentation
matters: Many programmers like the freedom to format their code how they please.
However, Python’s indentation rules follow
a style that many programmers already use
to make their code readable. Python simply
takes this idea one step further and gives
meaning to the indentation.

Flow of Control 51

From the Library of Mo Medwani

If/elif-statements
An if/elif-statement is a generalized ifstatement with more than one condition.
It is used for making complex decisions.
For example, suppose an airline has the
following “child” ticket rates: Kids 2 years
old or younger fly for free, kids older than 2
but younger than 13 pay a discounted child
fare, and anyone 13 years or older pays a
regular adult fare. This program determines
how much a passenger should pay:
# airfare.py

After Python gets age from the user, it
enters the if/elif-statement and checks
each condition one after the other in the
order they are given. So first it checks if
age is less than 2, and if so, it indicates
that the flying is free and jumps out of the
elif-condition. If age is not less than 2, then
it checks the next elif-condition to see if
age is between 2 and 13. If so, it prints the
appropriate message and jumps out of the
if/elif-statement. If neither the if-condition
nor the elif-condition is True, then it
executes the code in the else-block.

age = int(input('How old are you? '))
if age <= 2:
print(' free')
elif 2 < age < 13:
print(' child fare)
else:
print('adult fare')

52

elif is short for else if, and you can use
as many elif-blocks as needed.
Each of the code blocks in an if/elifstatement must be consistently indented the
same amount.
As with a regular if-statement, the elseblock is optional. In an if/elif-statement with
an else-block, exactly one of the if/elif-blocks
will be executed. If there is no else-block, then
it is possible that none of the conditions are
True, in which case none of the if/elif-blocks
are executed.

Chapter 4

From the Library of Mo Medwani

Conditional expressions
Python has one more logical operator that
some programmers like (and some don’t!).
It’s essentially a shorthand notation for ifstatements that can be used directly within
expressions. Consider this code:
food = input("What's your favorite
➝ food? ")
reply = 'yuck' if food == 'lamb'
'yum'

➝ else

The expression on the right-hand side of
= in the second line is called a conditional
expression, and it evaluates to either
'yuck' or 'yum'. It’s equivalent to the
following:
food = input("What's your favorite
➝ food? ")
if food == 'lamb':
reply = 'yuck'
else:
reply = 'yum'

Conditional expressions are usually shorter
than the corresponding if/else-statements,
although not always as flexible or easy
to read. In general, you should use them
when they make your code simpler.

Flow of Control 53

From the Library of Mo Medwani

Loops
Now we turn to loops, which are used to
repeatedly execute blocks of code. Python
has two main kinds of loops: for-loops and
while-loops. For-loops are generally easier
to use and less error prone than whileloops, although not quite as flexible.

Lingo Alert
Programmers often use the variable i
because it is short for index, and is also
commonly used in mathematics. When
we start using loops within loops, it is
common to use j and k as other loop
variable names.

For-loops
The basic for-loop repeats a given block
of code some specified number of times.
For example, this snippet of code prints the
numbers 0 to 9 on the screen:
# count10.py
for i in range(10):
print(i)

The first line of a for-loop is called the forloop header. A for-loop always begins with
the keyword for. After that comes the loop
variable, in this case i. Next is the keyword in, typically (but not always) followed
by range(n) and a terminating : token. A
for-loop repeats its body, the code block
underneath it, exactly n times.
Each time the loop executes, the loop
variable i is set to be the next value. By
default, the initial value of i is 0, and it
goes up to n - 1 (not n!) by ones. Starting
numbering at 0 instead of 1 might seem
unusual, but it is common in programming.
If you want to change the starting value of
the loop, add a starting value to range:
for i in range(5, 10):
print(i)

This prints the numbers from 5 to 9.

54

Chapter 4

From the Library of Mo Medwani

If you want to print the numbers from 1
to 10 (instead of 0 to 9), there are two common
ways of doing so. One is to change the start
and end of the range:

for i in range(1, 11):
print(i)
Or, you can add 1 to i inside the loop body:

for i in range(10):
print(i + 1)
If you would like to print numbers in
reverse order, there are again two standard
ways of doing so. The first is to set the range
parameters like this:

for i in range(10, 0, -1):
print(i)
Notice that the first value of range is 10, the
second value is 0, and the third value, called
the step, is −1. Alternatively, you can use a simpler range and modify i in the loop body:

for i in range(10):
print(10 - i)
For-loops are actually more general than
described in this section: They can be used
with any kind of iterator, which is a special
kind of programming object that returns
values. For instance, we will see later that forloops are the easiest way to read the lines of
a text file.

Flow of Control 55

From the Library of Mo Medwani

While-loops
The second kind of Python loop is a whileloop. Consider this program:
# while10.py
i = 0
while i < 10:
print(i)
i = i + 1 # add 1 to i

This prints out the numbers from 0 to 9 on
the screen. It is noticeably more complicated than a for-loop, but it is also more
flexible.
The while-loop itself begins on the line
beginning with the keyword while; this line
is called the while-loop header, and the
indented code underneath it is called the
while-loop body. The header always starts
with while and is followed by the whileloop condition. The condition is a Boolean
expression that returns True or False.
The flow of control through a while-loop
goes like this: First, Python checks if the
loop condition is True or False. If it’s
True, it executes the body; if it’s False, it
skips over the body (that is, it jumps out of
the loop) and runs whatever statements
appear afterward. When the condition
is True, the body is executed, and then
Python checks the condition again. As long
as the loop condition is True, Python keeps
executing the loop. B shows a flow chart
for this program.

B This is a flow chart for code that counts from 0

to 9. Notice that when the loop condition is False
(that is, the no branch is taken in the decision box),
the arrow does not go into a box. That’s because
in our sample code there is nothing after the
while-loop.

The very first line of the sample program
is i = 0, and in the context of a loop it is
known as an initialization statement, or
an initializer. Unlike with for-loops, which
automatically initialize their loop variable, it
is the programmer’s responsibility to give
initial values to any variables used by a
while-loop.

56

Chapter 4

From the Library of Mo Medwani

The last line of the loop body is i = i + 1.
As it says in the source code comment, this
line causes i to be incremented by 1. Thus,
i increases as the loop executes, which
guarantees that the loop will eventually
stop. In the context of a while-loop, this
line is called an increment, or incrementer,
since its job is to increment the loop
variable.
The general form of a while-loop is shown
in the flow chart of C.
Even though almost all while-loops need
an initializer and an incrementer, Python
does not require that you include them. It
is entirely up to you, the programmer, to
remember these lines. Even experienced
programmers find that while-loop initializers and incrementers are a common
source of errors.

C A flow chart for the general form of a while-loop. Note that the

incrementer is not shown explicitly: It is embedded somewhere in

body_block, often (but not always) at the end of that block.

Flow of Control 57

From the Library of Mo Medwani

While-loops are extremely flexible. You
can put any code whatsoever before a whileloop to do whatever kind of initialization is
necessary. The loop condition can be any
Boolean expression, and the incrementer can
be put anywhere within the while-loop body,
and it can do whatever you like.
A loop that never ends is called an infinite loop. For instance, this runs forever:

while True:
print('spam')
Some programmers like to use infinite
loops as a quick way to write a loop. However,
this is generally considered to be poor style
because such loops often become complex
and hard to understand.
Many Python programmers try to use forloops whenever possible and use while-loops
only when absolutely necessary.
While-loops can be written with an elseblock. However, this unusual feature is rarely
used in practice, so we haven’t discussed
it. If you are curious, you can read about it
in the online Python documentation—for
example, http://docs.python.org/3/reference/
compound_stmts.html.

58

Chapter 4

From the Library of Mo Medwani

Comparing For-Loops
and While-Loops
Let’s take a look at a few examples of how
for-loops and while-loops can be used to
solve the same problems. Plus we’ll see a
simple program that can’t be written using
a for-loop.

Calculating factorials
Factorials are numbers of the form 1 × 2
× 3 × … × n, and they tell you how many
ways n objects can be arranged in a line.
For example, the letters ABCD can be
arranged in 1 × 2 × 3 × 4 = 24 different
ways. Here’s one way to calculate factorials using a for-loop:
# forfact.py
n = int(input('Enter an integer
➝ >= 0: '))
fact = 1
for i in range(2, n + 1):
fact = fact * i
print(str(n) + ' factorial is ' +
➝ str(fact))

Here’s another way to do it using a
while-loop:
# whilefact.py
n = int(input('Enter an integer
➝ >= 0: '))
fact = 1
i = 2
while i <= n:
fact = fact * i
i = i + 1
print(str(n) + ' factorial is ' +
➝ str(fact))

continues on next page

Flow of Control 59

From the Library of Mo Medwani

Both of these programs behave the same
from the user’s perspective, but the internals are quite different. As is usually the
case, the while-loop version is a little more
complicated than the for-loop version.
In mathematics, the notation n! is used to
indicate factorials. For example, 4! = 1 × 2 × 3 ×
4 = 24. By definition, 0! = 1. Interestingly, there
is no simple formula for calculating factorials.
Python has no maximum integer, so you
can use these programs to calculate very large
factorials. For example, a deck of cards can be
arranged in exactly 52! ways:

Enter an integer >= 0: 52
52 factorial is 80658175170943878571
➝ 6606368564037669752895054408832778
➝ 24000000000000

60

Chapter 4

From the Library of Mo Medwani

Summing numbers from the user
The following programs ask the user to
enter some numbers, and then prints their
sum. Here is a version using a for-loop:
# forsum.py
n = int(input('How many numbers to
➝ sum? '))
total = 0
for i in range(n):
s = input('Enter number ' +
+ 1) + ': ')

➝ str(i

total = total + int(s)
print('The sum is ' + str(total))

Here’s a program that does that same thing
using a while-loop:
# whilesum.py
n = int(input('How many numbers to
➝ sum? '))
total = 0
i = 1
while i <= n:
s = input('Enter number ' +
+ ': ')

➝ str(i)

total = total + int(s)
i = i + 1
print('The sum is ' + str(total))

Again, the while-loop version is a little
more complex than the for-loop version.
These programs assume that the user
is entering integers. Floating point numbers
will be truncated when int(s) is called.
Of course, you can easily change this to
float(s) if you want to allow floating point
numbers.

Flow of Control 61

From the Library of Mo Medwani

Summing an unknown
number of numbers
Now here’s something that can’t be done
with the for-loops we’ve seen so far. Suppose we want to let users enter a list of
numbers to be summed without asking
them ahead of time how many numbers
they have. Instead, they just type 'done'
when they have no more numbers to add.
Here’s how to do it using a while-loop:
# donesum.py
total = 0
s = input('Enter a number (or
➝ "done"): ')
while s != 'done':
num = int(s)
total = total + num
s = input('Enter a number (or
')

➝ "done"):

print('The sum is ' + str(total))

The idea here is to keep asking users to
enter a number, quitting only when they
enter 'done'. The program doesn’t know
ahead of time how many times the loop
body will be executed.

62

Chapter 4

From the Library of Mo Medwani

Notice a few more details:
■

We must call input in two different
places: before the loop and inside the
loop body. This is necessary because
the loop condition decides whether or
not the input is a number or 'done'.

■

The ordering of the statements in the
loop body is very important. If the loop
condition is True, then we know s is
not 'done', and so we assume it is an
integer. Thus we can convert it to an
integer, add it to the running total, and
then ask the user for more input.

■

We convert the input string s to an
integer only after we know s is not the
string 'done'. If we had written
s = int(input('Enter a number
➝ (or "done"): '))

as we had previously, the program would
crash when the user typed 'done'.
■

There is no need for the i counter
variable anymore. In the previous summing programs, i tracked how many
numbers had been entered so far. As
a general rule, a program with fewer
variables is easier to read, debug, and
extend.

Flow of Control 63

From the Library of Mo Medwani

Breaking Out of
Loops and Blocks
The break statement is a handy way for
exiting a loop from anywhere within the
loop’s body. For example, here is an alternative way to sum an unknown number of
numbers:
# donesum_break.py
total = 0
while True:
s = input('Enter a number (or
➝ "done"): ')
if s == 'done':
break # jump out of the loop
num = int(s)
total = total + num
print('The sum is ' + str(total))

The while-loop condition is simply True,
which means it will loop forever unless
break is executed. The only way for break
to be executed is if s equals 'done'.
An advantage of this program over
donesum.py is that the input statement is
not repeated. But a disadvantage is that
the reason for why the loop ends is buried
in the loop body. It’s not so hard to see it in
this small example, but in larger programs
break statements can be tricky to see. Furthermore, you can have as many breaks as
you want, which adds to the complexity of
understanding the loop.

64

Chapter 4

From the Library of Mo Medwani

Generally, it is wise to avoid the break
statement, and to use it only when it makes
your code simpler or clearer.
A relative of break is the continue statement: When continue is called inside
a loop body, it immediately jumps up to
the loop condition—thus continuing with
the next iteration of the loop. It is a little
less common than break, and generally it
should be avoided altogether.
Both break and continue also work
with for-loops.

Flow of Control 65

From the Library of Mo Medwani

Loops Within Loops
Loops within loops, also known as nested
loops, occur frequently in programming.
For instance, here’s a program that prints
the times tables up to 10:
# timestable.py
for row in range(1, 10):
for col in range(1, 10):
prod = row * col
if prod < 10:
print(' ', end = '')
print(row * col, ' ', end = '')
print()

Look carefully at the indentation of the
code in this program: It’s how you tell what
statements belong to what blocks. The
final print() statement lines up with the
second for, meaning it is part of the outer
for-loop (but not the inner).
Note that the statement if prod < 10 is
used to make the output look neatly formatted. Without it, the numbers won’t line
up nicely.
When using nested loops, be careful with
loop index variables: Do not accidentally reuse
the same variable for a different loop. Most of
the time, every individual loop needs its own
control variables.
You can nest as many loops within
loops as you need, although the complexity
increases greatly as you do so.
As mentioned previously, if you use
break or continue with nested loops,
break only breaks out of the innermost
loop, and continue only “continues” the
innermost loop.

66

Chapter 4

From the Library of Mo Medwani

5
Functions
A function is a reusable chunk of code. It
is a block of code with a name that takes
input, provides output, and can be stored
in files for later use. Pretty much any useful piece of Python code is stored in a
function.
Python has excellent support for functions.
For instance, it provides many ways to pass
data into a function. It also lets you include
documentation strings within the function
itself so that you—or other programmers—
can read how the function works.

In This Chapter
Calling Functions

68

Defining Functions

70

Variable Scope

73

Using a main Function

75

Function Parameters

76

Modules

80

You need to learn a number of details
in order to completely understand functions. With practice, they will soon become
second nature, so be sure to try out the
examples from this chapter.

From the Library of Mo Medwani

Calling Functions
We’ve been calling functions quite a bit so
far, so let’s take a moment to look a little
more carefully at a function call.
Consider the built-in function pow(x, y),
which calculates x ** y—that is to say, x
raised to the power y:
>>> pow(2, 5)
32

Here, we say that pow is the function
name, and that the values 2 and 5 are
arguments that are passed into pow. The
value 32 is the return value, so we say that
pow(2, 5) returns 32. Figure A gives a
high-level overview of a function call.
When you call a function within an expression, Python essentially replaces the function call with its return value. For example,
the expression pow(2, 5) + 8 is the same
as 32 + 8, which evaluates to 40.
When a function takes no input (that is to
say, it has zero arguments), you must still
include the round brackets () after the
function name:
>>> dir()
['__builtins__', '__doc__',
➝ '__name__', '__package__']

A It’s often useful to think of functions as being

black boxes that accept an input (2 and 5 in this
case) and return an output (32). From the point
of view of a programmer calling the pow function,
there is no (easy) way to see inside of pow. All we
know is what the documentation tells us, and what
the function does when we call it.

Calculating Powers
Calling pow(x, y) is the same as calling
x ** y. You may notice that pow(0, 0)
(and also 0 ** 0) is 1 in Python, and
this has been a matter of some debate.
According to some mathematicians,
pow(0, 0) ought to be indeterminate,
or undefined. But others say that it is
more sensible to define pow(0, 0) as
1. Python has obviously sided with the
latter group.

The () tells Python to execute the function.
If you leave off the (), then you get this:
>>> dir


Without the (), Python does not execute
the dir function and instead tells you that
dir labels a function.

68

Chapter 5

From the Library of Mo Medwani

Functions that don’t return a value
Some functions, such as print, are not
meant to return values. Consider:
>>> print('hello')
hello
>>> x = print('hello')
hello
>>> x
>>> print(x)
None

Here the variable x has been assigned a
special value called None. None indicates
“no return value”: It is not a string or a number, so you can’t do any useful calculations
with it.

Reassigning function names
You need to take care not to accidentally
make a built-in function name refer to
some other function or value. Unfortunately, Python doesn’t stop you from writing code like this:
>>> dir = 3
>>> dir
3
>>> dir()
Traceback (most recent call last):
File "", line 1, in
➝ 

dir()
TypeError: 'int' object is not
➝ callable

Here we’ve made dir label the number
3, so the function that dir used to refer to
is no longer accessible! You will need to
restart Python to get it back.

Functions 69

From the Library of Mo Medwani

Defining Functions
Now we turn to creating our own functions.
As an example, let’s write a function to
calculate the area of a circle. Recall that a
circle’s area is pi times its radius squared.
Here is a Python function that does this
calculation:
# area.py
import math
def area(radius):
""" Returns the area of a circle

Naming Functions
As with variable names, a function name
must be one or more characters long and
consist of letters, numbers, or the underscore ( _ ) character. The first character of
the name cannot be a number.
In general, give functions simple English
names that hint at their purpose. Don’t
make them too long or too short. Other
programmers reading your code or using
your functions (including you, a few
months down the road!) will thank you for
choosing helpful names.

with the given radius.
For example:
>>> area(5.5)
95.033177771091246
"""
return math.pi * radius ** 2

Save this function inside a Python file
(area.py would be a good name); then
load it into the IDLE editor and run it by
pressing F5. If everything is typed correctly, a prompt should appear and nothing
else; a function is not executed until you
call it. To call it, just type the name of the
function, with the radius in brackets:
>>> area(1)
3.1415926535897931
>>> area(5.5)
95.033177771091246
>>> 2 * (area(3) + area(4))
157.07963267948966

The area function can be called just like
any other function, the difference being
that you have written the function and so
you have control over what it does and
how it works.

70

Chapter 5

From the Library of Mo Medwani

Parts of a function
A Formatting Convention
Python doc strings tend to follow a standard formatting convention. Triple quotes
are used to mark the beginning and end
of the doc string. The first line is a succinct one-line description of the function
useful to a programmer. After the first
line come more details and examples.

Extra Benefits of Doc Strings
Just as with built-in functions, you can
easily access the doc strings for your
own functions, like this:
>>> print(area.__doc__)
Returns the area of a circle
➝ with the given radius.
For example:
>>> area(5.5)
95.033177771091246

As you will see when you call area in the
IDLE editor, IDLE automatically reads the
function doc string and pops it up as an
automatic tool tip.
Python also has a useful tool called
doctest that can be used to automatically run example Python code found in
doc strings. This is a good way to test
your code, and to help ensure that the
documentation accurately describes the
function. We won’t go into the details of
doctest in this book, but it is easy to use
and quite helpful, and you can read more
about it here: http://docs.python.org/3/
library/doctest.html.

Let’s look at each part of the area function.
The first line, the one that begins with def,
is called the function header; all the code
indented beneath the header is called the
function body.
Function headers always begin with the
keyword def (short for definition), followed
by a space, and then the name of the function (in this case, area). Function names
follow essentially the same rules as names
for variables.
After the function name comes the function parameter list. This is a list of variable
names that label the input to the function. The area function has a single input,
radius, although a function can have any
number of inputs. If a function has 0 inputs,
then only the round brackets are written, ().
Finally, like loops and if-statements, a function header ends with a colon (:).
After the function header comes an
optional documentation string, or doc
string for short. A doc string briefly
explains what the function will do, and it
may include examples or other helpful
information. While doc strings are optional,
they are almost always a good idea: When
you start writing a lot of functions, it is easy
to forget exactly what they do and how
they work, and a well-written doc string
can be a good reminder.
After the doc string comes the main body
of the function. This is simply an indented
block of code that does whatever you need
it to do. The code in this block is allowed to
use the variables from the function header.
Finally, the function should return a value
using the return keyword. When a return
statement is executed, Python jumps out
of the function and back to the point in the
program where it was called.

Functions 71

From the Library of Mo Medwani

In the case of the area function, the
return statement is the last line of the
function, and it simply returns the value of
the area of a circle using the standard formula. Note that it uses the radius parameter in its calculation; the value for radius
is set when the area function is called.
A return is usually the last line of a function to be executed (the only time it isn’t is
when the function ends unexpectedly due
to an exception being thrown, which we
will talk about in a later chapter). You can
put a return anywhere inside a function
body, although it is typically the last physical line of the function.
A function is not required to have an
explicit return statement. For example:
# hello.py
def say_hello_to(name):
""" Prints a hello message.
"""
cap_name = name.capitalize()

Lingo Alert
When a function makes a change in any
way other than returning a value, we call
that change a side effect. Printing to the
screen, writing to a file, and downloading a webpage are all examples of side
effects.
A style of programming known as
functional programming is characterized by its near-complete banishment of
side effects. In functional programming,
the only changes you can make are via
return values. Python has a lot of support
for functional programming, including the
ability to define functions within functions and to pass functions as values to
other functions. When used correctly,
functional programming can be a very
elegant and powerful way of writing
programs.
Although we won’t be covering functional programming in detail in this book,
it is nonetheless wise to avoid function
side effects whenever possible.

print('Hello ' + cap_name + ', how
➝ are you?')

If you don’t put a return anywhere in a
function, Python treats the function as if it
ended with this line:
return None

The special value None is used to indicate
that the function is not meant to be returning a useful value. This is fairly common:
Functions are often used to perform tasks
where the return values don’t matter, such
as printing output to the screen.

72

Chapter 5

From the Library of Mo Medwani

Variable Scope
An important detail that functions bring up
is the issue of scope. The scope of a variable (or function) is where in a program it is
accessible, or visible. Consider these two
functions:
# local.py
import math
def dist(x, y, a, b):
s = (x - a) ** 2 + (y - b) ** 2
return math.sqrt(s)
def rect_area(x, y, a, b):
width = abs(x - a)
height = abs(y - b)
return width * height

Any variable assigned for the first time
within a function is called a local variable.
The function dist has one local variable,
s, while rect_area has two local variables,
width and height.
The parameters of a function are also considered local. Thus dist has a total of five
local variables—x, y, a, b, and s; rect_area
has a total of six. Notice that variables x, y,
a, and b appear in both functions, but they
generally label different values.
Importantly, local variables are usable only
within the function they are local to. No
code outside of the function can access its
local variables.
When a function ends, its local variables
are automatically deleted.

Functions 73

From the Library of Mo Medwani

Global variables
Variables declared outside of any function are called global variables, and they
are readable anywhere by any function or
code within the program. However, there
is a wrinkle in reassigning global variables
within functions you need to be aware of.
Consider the following:

Nothing changed—name still labels the
value 'Jack'. The problem is that Python
treated name inside the change_name
function as being a local variable, and so
ignored the global name variable.
To access the global variable, you must use
the global statement:
# global_correct.py

# global_error.py

name = 'Jack'

name = 'Jack'

def say_hello():

def say_hello():
print('Hello ' + name + '!')
def change_name(new_name):
name = new_name

print('Hello ' + name + '!')
def change_name(new_name):
global name
name = new_name

The variable name is a global variable
because it is not declared inside any function. The say_hello() function reads the
value of name and prints it to the screen as
you would expect:

This makes all the difference. Both functions now work as expected:

>>> say_hello()

>>> change_name('Piper')

Hello Jack!

>>> say_hello()

However, things don’t work as expected
when you call change_name:

Hello Piper!

>>> say_hello()
Hello Jack!

>>> change_name('Piper')
>>> say_hello()
Hello Jack!

74

Chapter 5

From the Library of Mo Medwani

Main in Other Languages
The idea of using a main function is quite
common, and some other programming
languages, notably C, C++, and Java,
actually define the use of main as part of
the language. In Python, however, main
is entirely optional, and used only as a
helpful convention.

Using a main Function
It is usually a good idea to use at least one
function in any Python program you write:
main(). A main() function is, by convention, assumed to be the starting point of
your program. For instance, you could write
the password program from the previous
chapter using a main function:
# password2.py
def main():
pwd = input('What is the
')

➝ password?

if pwd == 'apple':
print('Logging on ...')
else:
print('Incorrect password.')
print('All done!')

Notice that all the code is indented underneath the main function header.
When you run password2.py in IDLE, nothing happens—only the prompt appears.
You must type main() to execute the code
within in it.
The advantage of using a main function is
that it is now easier to rerun programs and
pass in input values.

Functions 75

From the Library of Mo Medwani

Function Parameters
Parameters pass data into a function, and
Python has several kinds of parameters.

Pass by reference
Python passes parameters to a function
using a technique known as pass by
reference. This means that when you pass
parameters, the function refers to the original passed values using new names. For
example, consider this simple program:

B The state of memory after setting x to 3 and y
to 4.

# reference.py
def add(a, b):
return a + b

Run IDLE’s interactive command line and
type this:

C The state of memory just after add(x, y) is
called, and a and b have been set to refer to the
values of x and y, respectively.

>>> x, y = 3, 4
>>> add(x, y)

Pass by Value

7

Some programming languages, such as
C++, can pass parameters using pass by
value. When a parameter is passed by
value, a copy of it is made and passed to
the function. If the value being passed
is large, the copying can take up a lot of
time and memory. Python does not support pass by value.

After you set x and y in the first line,
Python’s memory looks like B. Now
when add(x, y) is called, Python creates
two new variables, a and b, that refer to
the values of x and y C. The values are
assigned in the order they occur—thus a
refers to x because x is the first argument.
Notice that the values are not copied: They
are simply given new names that the function uses to refer to them.
After a and b are summed and the function returns, references a and b are
automatically deleted. The original x and y
are untouched throughout the entire
function call.

76

Chapter 5

From the Library of Mo Medwani

An important example
Passing by reference is simple and efficient, but there are some things it cannot
do. For example, consider this plausibly
named function:

D After x is assigned 1 in the function call

set1(m), m is unchanged and still refers to its

# reference.py
def set1(x):

original value of 5. However, the local variable x
has indeed been set to 1.

x = 1

The purpose of set1 is to set the value of
the passed-in variable to 1. But when you
try it, it does not work as expected:
>>> m = 5
>>> set1(m)
>>> m
5

Surprisingly, the value of m has not
changed. The reason why is a consequence of pass by reference. It’s helpful to
break the example down into steps:
1. Assign 5 to m.
2. Call set1(m): Assign the value of x to
the value of m (so now both m and x
point to 5).
3. Assign 1 to m. Now the situation is as
shown in D.
4. When the set1 function ends, x is
deleted.
The variable m is simply not accessible
within set1, so there is no way to change
what it points to.

Functions 77

From the Library of Mo Medwani

Default values
It’s often useful to include a default value
with a parameter. For example, here we
have given the greeting parameter a
default value of 'Hello':
# greetings.py
def greet(name, greeting = 'Hello'):
print(greeting, name + '!')

You can now call greet in two distinct
ways:
>>> greet('Bob')
Hello Bob!
>>> greet('Bob', 'Good morning')
Good morning Bob!
Default parameters are quite handy and
are used all the time in Python.
A function can use as many default
parameters as it needs, although no parameter
without a default value can appear before a
parameter with one.
An important detail about default parameters is that they are evaluated only once,
the first time they are called. In complicated
programs, this can sometimes be the source
of subtle bugs, so it is useful to keep this fact
in mind.

78

Chapter 5

From the Library of Mo Medwani

Keyword parameters
Another useful way to specify parameters in
Python is by using keywords. For example:
# shopping.py
def shop(where = 'store',
what = 'pasta',
howmuch = '10 pounds'):
print('I want you to go to the',
➝ where)
print('and buy', howmuch, 'of',
+ '.')

➝ what

To call a function that uses keyword parameters, pass data in the form param = value.
For example:
>>> shop()
I want you to go to the store
and buy 10 pounds of pasta.
>>> shop(what = 'towels')
I want you to go to the store
and buy 10 pounds of towels.
>>> shop(howmuch = 'a ton', what =
➝ 'towels')
I want you to go to the store
and buy a ton of towels.
>>> shop(howmuch = 'a ton', what =
where = 'bakery')

➝ 'towels',

I want you to go to the bakery
and buy a ton of towels.

Keyword parameters have two big benefits. First, they make the parameter values
clear, and thus help to make your programs
easier to read. Second, the order in which
you call keyword parameters does not
matter. Both of these are quite helpful in
functions with many parameters; for such
functions it can be difficult to remember
the exact order in which to put the parameters, and what they mean.

Functions 79

From the Library of Mo Medwani

Modules
A module is collection of related functions
and variables.

To create a Python module:
■

Create a .py file containing your functions and assignments. For example,
here is a simple module for printing
shapes to the screen:
# shapes.py
"""A collection of functions
for printing basic shapes.
"""
CHAR = '*'
def rectangle(height, width):
""" Prints a rectangle. """
for row in range(height):
for col in range(width):
print(CHAR, end = '')
print()
def square(side):
""" Prints a square. """
rectangle(side, side)
def triangle(height):
""" Prints a right triangle. """
for row in range(height):
for col in range(1, row + 2):
print(CHAR, end = '')
print()

The only difference between this and a
regular Python program is the intended
use: A module is a toolbox of helpful
functions that you can use when writing

80

Chapter 5

From the Library of Mo Medwani

other programs. Thus a module usually
does not have a main() function.
■

To use a module, you simply import it.
For example:
>>> import shapes
>>> dir(shapes)
['CHAR', '__builtins__',
➝ '__doc__', '__file__',
➝ '__name__', '__package__',
➝ 'rectangle', 'square',
➝ 'triangle']
>>> print(shapes.__doc__)
A collection of functions
for printing basic shapes.
>>> shapes.CHAR
'*'
>>> shapes.square(5)
*****
*****
*****
*****
*****
>>> shapes.triangle(3)
*
**
***

■

You can also import everything at once:
>>> from shapes import *
>>> rectangle(3, 8)
********
********
********

Functions 81

From the Library of Mo Medwani

Namespaces
A very useful fact about modules is that
they form namespaces. A namespace
is essentially a set of unique variable
and function names. The names within a
module are visible outside the module only
when you use an import statement.

The Zen of Python
To see an interesting Python Easter egg,
try importing the module this at the
interactive command line:
>>> import this

To see why this is important, suppose
Jack and Sophie are working together on
a large programming project. Jack is on
the West Coast, and Sophie is on the East.
They agree that Jack will put all his code in
the module jack.py, and Sophie will put all
her code into sophie.py. They work independently, and it turns out that they both
wrote a function called save_file(fname).
However, only the headers of their functions are the same; they do radically different things. Having two functions with the
same name is fine because the functions
are in different modules, so the names are
in different namespaces. The full name of
Jack’s function is jack.save_file(fname),
and the full name of Sophie’s is sophie.
save_file(fname).
Thus modules support independent development, by preventing name clashes. Even
if you are not working with other programmers, name clashes can be an annoying
problem in your own programs, especially
as they get larger and more complex.
Of course, you can still run into name
clashes as follows:
>>> from jack import *
>>> from sophie import *

These kinds of import statements essentially dump everything from each module
into the current namespace, overwriting
anything with the same name as they
go. Thus, it is generally wise to avoid
from ... import * statements in larger
programs.

82

Chapter 5

From the Library of Mo Medwani

6
Strings
After numbers, strings are the most
important data type in Python. Strings are
ubiquitous: You print them to the screen,
you read them from the user, and as we
will see in Chapter 8, files are often treated
as big strings. The World Wide Web can
be thought of as a collection of webpages,
most of which consist of text.
Strings are an example of an aggregate
data structure, and they provide our first
look at indexing and slicing—techniques
that are used to extract substrings from
strings.

In This Chapter
String Indexing

84

Characters

87

Slicing Strings

89

Standard String Functions

92

Regular Expressions

98

The chapter also contains a brief introduction to Python’s regular expression library,
which is a supercharged mini-language
designed for processing strings.

From the Library of Mo Medwani

String Indexing
We introduced strings in Chapter 2, so you
may want to go back to it if you need a
refresher on string basics.
When working with strings, we often want
to access their individual characters. For
example, suppose you know that s is a
string, and you want to print its individual
characters. String indexing is how you do it:
>>> s = 'apple'

A This diagram shows the index values for the

string 'apple'. Square-bracket indexing notation
is used to access individual characters within
the string.

>>> s[0]
'a'
>>> s[1]
'p'
>>> s[2]
'p'
>>> s[3]
'l'
>>> s[4]
'e'

Python uses square brackets to index
strings: The number in the brackets indicates which character to get A. Python’s
index values always start at 0 and always
end at one less than the length of the string.

Why Start at 0?
Beginning programmers often find it odd
that Python indexes begin at 0 instead of
1. It does take some getting used to and
can be the source of off-by-one errors
that plague many programmers. It can
be helpful to think of an index value as
measuring the distance from the first
character of the string, just like a ruler
(which also starts at 0). This makes some
calculations with indexes a little simpler,
and it also fits nicely with the % (mod)
function, which is often used with index
calculations and naturally returns 0s.

So if s labels a string of length n, s[0] is
the first character, s[1] is the second character, s[2] is the third character, and so on
up to s[n-1], which is the last character.
If you try to index past the right end of the
string, you will get an “out of range” error:
>>> s[5]
Traceback (most recent call last):
File "", line 1, in
➝ 

s[5]
IndexError: string index out of range

84

Chapter 6

From the Library of Mo Medwani

Negative indexing
Suppose instead of the first character of s,
you want to access the last character of
s. The ungainly expression s[len(s) - 1]
works, but it’s rather complicated.
Fortunately, Python has a more convenient
way of accessing characters near the right
end of a string: negative indexing. The
idea is that the characters of a string are
indexed with negative numbers going from
right to left:
>>> s = 'apple'

B Python strings have both positive and negative
indexes. In practice, programmers usually use
whatever index is most convenient.

>>> s[-1]
'e'
>>> s[-2]
'l'
>>> s[-3]
'p'
>>> s[-4]
'p'
>>> s[-5]
'a'

Thus the last character of a string is simply
s[-1]. Figure B shows how negative
index values work.

Strings 85

From the Library of Mo Medwani

Accessing characters with a for-loop
If you need to access every character of a
string in sequence, a for-loop can be helpful. For example, this program calculates
the sum of the character codes for a given
string:
# codesum.py
def codesum1(s):
""" Returns the sums of the
character codes of s.
"""
total = 0
for c in s:
total = total + ord(c)
return total

Here is a sample call:
>>> codesum1('Hi there!')
778

When you use a for-loop like this, at the
beginning of each iteration the loop variable c is set to be the next character in s.
The indexing into s is handled automatically by the for-loop.
Compare codesum1 with this alternative
implementation, which uses regular string
indexing:
def codesum2(s):
""" Returns the sums of the
character codes of s.
"""
total = 0
for i in range(len(s)):
total = total + ord(s[i])
return total

This gives the same results as codesum1,
but the implementation is a little more complex and harder to read.

86

Chapter 6

From the Library of Mo Medwani

The Rise of Unicode
In the 1960s, ’70s, and ’80s, the most
popular character encoding scheme
was ASCII (American Standard Code
for Information Interchange). ASCII is far
simpler than Unicode, but its fatal flaw
is that it can represent only 256 different characters—enough for English
and French and a few other similar
languages, but nowhere near enough to
represent the huge variety of characters
and symbols found in other languages.
For instance, Chinese alone has thousands of ideograms that could appear in
text documents.
Essentially, Unicode provides a far larger
set of character codes. Conveniently,
Unicode mimics the ASCII code for the
first 256 characters, so if you are only
dealing with English characters (as we
are in this book), you’ll rarely need to
worry about the details of Unicode. For
more information, see the Unicode home
page (www.unicode.org).

Characters
Strings consist of characters, and characters themselves turn out to be a surprisingly complex issue. As mentioned in
Chapter 2, all characters have a corresponding character code that you can find
using the ord function:
>>> ord('a')
97
>>> ord('b')
98
>>> ord('c')
99

Given a character code number, you can
retrieve its corresponding character using
the chr function:
>>> chr(97)
'a'
>>> chr(98)
'b'
>>> chr(99)
'c'

Character codes are assigned using
Unicode, which is a large and complex
standard for encoding all the symbols and
characters that occur in all the world’s
languages.

Strings

87

From the Library of Mo Medwani

Escape characters
Not all characters have a standard visible symbol. For example, you can’t see a
newline character, a return character, or a
tab (although you can certainly see their
effects). They are whitespace characters,
characters that appear as blanks on the
printed page.
To handle whitespace and other unprintable characters, Python uses a special
notation called escape sequences, or
escape characters. Table 6.1 shows the
most commonly used escape characters.
The backslash, single-quote, and doublequote escape characters are often needed
for putting those characters into a string.
For instance:
>>> print('\' and \" are quotes')
' and " are quotes
>>> print('\\ must be written \\\\')
\ must be written \\

The standard way in Python for ending a
line is to use the \n character:
>>> print('one\ntwo\nthree')
one
two

TABLE 6.1 Some Common Escape Characters
Character

Meaning

\\

Backslash

\'

Single quote

\"

Double quote

\n

Newline (linefeed)

\r

Return (carriage return)

\t

Tab (horizontal tab)

Ending Lines
Different operating systems follow different standards for ending a line of text.
For instance, Windows uses \r\n to
mark the end of a line, whereas OS X and
Linux use just \n; Mac operating systems
before OS X used \r.
Most good editors handle at least the
\r\n and \n styles. Occasionally you
will run into programs (such as Notepad
on Windows) that do not recognize one
of these line-end formats, so text might
appear all on the same line, contain extra
line breaks, or have a ^M character at the
end of each line. The easiest way to deal
with this problem is to use a text editor
that handles line endings correctly.

three

It’s important to realize that an escape
character is only a single character. The
leading \ is needed to tell Python that this
is a special character, but that \ does not
count as an extra character when determining a string’s length. For example:
>>> len('\\')
1
>>> len('a\nb\nc')
5

88

Chapter 6

From the Library of Mo Medwani

Slicing Strings
Slicing is how Python lets you extract a
substring from a string. To slice a string,
you indicate both the first character you
want and one past the last character you
want. For example:
>>> food = 'apple pie'
>>> food[0:5]
'apple'
>>> food[6:9]
'pie'

The indexing for slicing is the same as for
accessing individual characters: The first
index location is always 0, and the last
is always one less than the length of the
string. In general, s[begin:end] returns
the substring starting at index begin and
ending at index end - 1.
Note that if s is a string, then you can
access the character at location i using
either s[i] or s[i:i+1].

Strings 89

From the Library of Mo Medwani

Slicing shortcuts
If you leave out the begin index of a slice,
then Python assumes you mean 0; and
if you leave off the end index, Python
assumes you want everything to the end of
the string. For instance:
>>> food = 'apple pie'
>>> food[:5]
'apple'
>>> food[6:]
'pie'
>>> food[:]
'apple pie'

Here’s a useful example of slicing in practice. This function returns the extension of
a filename:
# extension.py
def get_ext(fname):
""" Returns the extension of file
fname.
"""
dot = fname.rfind('.')
if dot == -1: # no . in fname
return ''
else:
return fname[dot + 1:]

Here’s what get_ext does:
>>> get_ext('hello.text')
'text'
>>> get_ext('pizza.py')
'py'
>>> get_ext('pizza.old.py')
'py'
>>> get_ext('pizza')
''

90

Chapter 6

From the Library of Mo Medwani

The get_ext function works by determining the index position of the rightmost
'.' (hence the use of rfind to search
for it from right to left). If there is no '.'
in fname, the empty string is returned;
otherwise, all the characters from the '.'
onward are returned.

Slicing with negative indexes
You can also use negative index values
with slicing, although it can be more confusing. For example:
>>> food = 'apple pie'
>>> food[-9:-4]
'apple'
>>> food[:-4]
'apple'
>>> food[-3:0]
''
>>> food[-3:]
'pie'

When working with negative slicing, or
negative indexes in general, it is often
useful to write the string you are working
with on a piece of paper, and then write the
positive and negative index values over
the corresponding characters (as in Figure
6.2). While this does take an extra minute
or two, it’s a good way to prevent common
indexing errors.

Strings 91

From the Library of Mo Medwani

Standard String
Functions
Python strings come prepackaged with a
number of useful functions; use dir on any
string (for example, dir('')) to see them
all. While it’s not necessary to memorize
precisely what all these functions do, it is
a good idea to have a general idea of their
abilities so that you can use them when
you need them. Thus, in this section we
present a list of all the functions that come
with a string, grouped together by type.
This is not meant to be a complete reference: A few infrequently used parameters
are left out, and not every detail of every
function is explained. For more complete
details, read a function’s doc string or the
online Python documentation (http://docs.
python.org/3/).

Testing functions
The first and largest group of functions is
composed of ones that test if a string has
a certain form. The testing functions in
Table 6.2 all return either True or False.
Testing functions are sometimes called
Boolean functions, or predicates.

92

TABLE 6.2 String-Testing Functions
Name

Returns true just when …

s.endswith(t)

s ends with string t

s.startsswith(t)

s starts with string t

s.isalnum()

s contains only letters or
numbers

s.isalpha()

s contains only letters

s.isdecimal()

s contains only decimal
characters

s.isdigit()

s contains only digits

s.isidentifier()

s is a valid Python
identifier (that is, name)

s.islower()

s contains only lowercase
letters

s.isnumeric()

s contains only numeric
characters

s.isprintable()

s contains only printable
characters

s.isspace()

s contains only whitespace
characters

s.istitle()

s is a title-case string

s.isupper()

s contains only uppercase
letters

t in s

s contains t as a substring

Chapter 6

From the Library of Mo Medwani

TABLE 6.3 String-Searching Functions
Name

Return Value

s.find(t)

–1, or index where t starts in s

s.rfind(t)

Same as find, but searches
right to left

s.index(t)

Same as find, but raises
ValueError if t is not in s

s.rindex(t)

Same as index, but searches
right to left

Searching functions
As shown in Table 6.3, there are several
ways to find substrings within a string. The
difference between index and find functions is what happens when they don’t find
what they are looking for. For instance:
>>> s = 'cheese'
>>> s.index('eee')
Traceback (most recent call last):
File "", line 1, in
➝ 

s.index('eee')
ValueError: substring not found
>>> s.find('eee')
-1

The find function raises a ValueError;
this is an example of an exception, which
we will talk about in more detail in Chapter 9. The index function returns -1 if the
string being searched for is not found.
Normally, string-searching functions search
the string from left to right, beginning to
end. However, functions beginning with an
r search from right to left. For example:
>>> s = 'cheese'
>>> s.find('e')
2
>>> s.rfind('e')
5

In general, find and index return the
smallest index where the passed-in string
starts, and rfind and rindex return the
largest index where it starts.

Strings 93

From the Library of Mo Medwani

Case-changing functions
Python gives you a variety of functions for
changing the case of letters (Table 6.4).
Keep in mind that Python never modifies
a string: For all these functions, Python
creates and returns a new string. We often
talk as if the string were being modified,
but this is only a convenient phrasing and
does not mean the string is really being
changed.

Formatting functions
The string-formatting functions listed in
Table 6.5 help you to make strings look
nicer for presenting to the user or printing
to a file.
The string format function is especially
powerful, and it includes its own minilanguage for formatting strings. To use
format, you supply it variables or values—
for example:
>>> '{0} likes {1}'.format('Jack',
➝ 'ice cream')

TABLE 6.4 String-Searching Functions
Name

Returned String

s.capitalize()

s[0] is made uppercase

s.lower()

All letters of s are made
lowercase

s.upper()

All letters of s are made
uppercase

s.swapcase()

Lowercase letters are made
uppercase, and uppercase
letters are made lowercase

s.title()

Title-case version of s

TABLE 6.5 String-Formatting Functions
Name

Returned String

s.center(n, ch)

Centers s within a string of
n ch characters

s.ljust(n, ch)

Left-justifies s within a string
of n ch characters

s.rjust(n, ch)

Right-justifies s within a
string of n ch characters

s.format(vars)

See text

'Jack likes ice cream'

The {0} and {1} in the string refer to the
arguments in format: They are replaced by
the values of the corresponding strings or
variables. You can also refer to the names
of keyword parameters:
>>> '{who} {pet} has fleas'.format
➝ (pet = 'dog', who = 'my')
'my dog has fleas'

These examples show the most basic use
of format; there are many other options
for spacing strings, converting numbers
to strings, and so on. All the details are
provided in Python’s online documentation (http://docs.python.org/3/library/string.
html#format-string-syntax).

94

Chapter 6

From the Library of Mo Medwani

TABLE 6.6 String-Stripping Functions
Name

Returned String

s.strip(ch)

Removes all ch characters in
t occurring at the beginning
or end of s

s.lstrip(ch)

Removes all ch characters in
t occurring at the beginning
(that is, the left side) of s

s.rstrip(ch)

Removes all ch characters in
t occurring at the end (that
is, the right side) of s

Stripping functions
The stripping functions shown in Table 6.6
are used for removing unwanted characters
from the beginning or end of a string. By
default, whitespace characters are stripped,
and if a string argument is given, the characters in that string are stripped. For example:
>>> name = ' Gill Bates '
>>> name.lstrip()
'Gill Bates '
>>> name.rstrip()
' Gill Bates'

TABLE 6.7 String-Splitting Functions
Name

Returned String

s.partition(t)

Chops s into three strings
(head, t, tail), where
head is the substring before
t and tail is the substring
after t

>>> name.strip()
'Gill Bates'
>>> title = '_-_- Happy Days!! _-_-'
>>> title.strip()
'_-_- Happy Days!! _-_-'

s.rpartition(t)

Same as partition but
searches for t starting at the
right end of s

>>> title.strip('_-')

s.split(t)

Returns a list of substrings
of s that are separated by t

>>> title.strip('_ -')

s.rsplit(t)

Same as split, but starts
searching for t at the right
end of s

s.splitlines()

Returns a list of lines in s

' Happy Days!! '

'Happy Days!!'

Splitting functions
The splitting functions listed in Table 6.7
chop a string into substrings.
The partition and rpartition functions
divide a string into three parts:
>>> url = 'www.google.com'
>>> url.partition('.')
('www', '.', 'google.com')
>>> url.rpartition('.')
('www.google', '.', 'com')

These partitioning functions always return
a value consisting of three strings in the
form (head, sep, tail). This kind of return
value is an example of a tuple, which we
will learn about in more detail in Chapter 7.

Strings

95

From the Library of Mo Medwani

The split function divides a string into
substrings based on a given separator
string. For example:

TABLE 6.8 String-Replacement Functions
Name

Returned String

>>> url = 'www.google.com'

s.replace(old, new)

Replaces every
occurrence of old
within s with new

s.expandtabs(n)

Replaces each tab
character in s with n
spaces

>>> url.split('.')
['www', 'google', 'com']
>>> story = 'A long time ago, a
➝ princess ate an apple.'
>>> story.split()
['A', 'long', 'time', 'ago,', 'a',
'ate', 'an', 'apple.']

➝ 'princess',

The split function always returns a list of
strings; a Python list always begins with a
[ and ends with a ], and uses commas to
separate elements. As we’ll see in Chapter
7, lists and tuples are very similar, the main
difference being that lists can be modified,
but tuples are constant.

Replacement functions
Python strings come with two replacing
functions, as shown in Table 6.8. Note that
the replace function can easily be used to
delete substrings within a string:
>>> s = 'up, up and away'
>>> s.replace('up', 'down')
'down, down and away'
>>> s.replace('up', '')
', and away'

96

Chapter 6

From the Library of Mo Medwani

Other functions
Finally, Table 6.9 lists the remaining string
functions.

However, it’s not a very flexible function, so most programmers prefer using
one of Python’s other string-formatting
techniques.

The translate and maketrans functions
are useful when you need to convert one
set of characters into another. For instance,
here’s one way to convert strings to
“leet-speak”:

The join function can be quite useful. It
concatenates a sequence of strings, including a separator string. For example:

>>> leet_table = ''.maketrans
➝ ('EIOBT', '31087')

>>> ' '.join(['once', 'upon', 'a',
➝ 'time'])
'once upon a time'

>>> 'BE COOL. SPEAK LEET!'.translate

>>> '-'.join(['once', 'upon', 'a',

➝ (leet_table)

➝ 'time'])

'83 C00L. SP3AK L337!'

'once-upon-a-time'

The online documentation (http://docs.
python.org/3/library/stdtypes.html#str.
maketrans) also explains how to replace
more than single characters.

>>> ''.join(['once', 'upon', 'a',
➝ 'time'])

'onceuponatime'

The zfill function is used for formatting
numeric strings:
>>> '23'.zfill(4)
'0023'
>>> '-85'.zfill(5)
'-0085'

TABLE 6.9 Other String Functions
Name

Returned Value

s.count(t)

Number of times t occurs within s

s.encode()

Sets the encoding of s; see the online documentation (http://docs.python.org/3/
library/stdtypes.html#str.encode) for more details

s.join(seq)

Concatenates the strings in seq, using s as a separator

s.maketrans(old, new)

Creates a translation table used to change the characters in old with the
corresponding characters in new; note that s can be any string—it has no
influence on the returned table

s.translate(table)

Makes the replacements in s using the given translation table (created with
maketrans)

s.zfill(width)

Adds enough 0s to the left of s to make a string of length width

Strings 97

From the Library of Mo Medwani

Regular Expressions
While Python strings provide many useful functions, real-world string processing
often calls for more powerful tools.
Thus, programmers have developed a
mini-language for advanced string processing known as regular expressions.
Essentially, a regular expression is a way to
compactly describe a set of strings. They
can be used to efficiently perform common
string-processing tasks such as matching,
splitting, and replacing text. In this section,
we’ll introduce the basic ideas of regular
expressions, as well as a few commonly
used operators (Table 6.10).

TABLE 6.10 Some Regular Expression
Operators
Operator

Set of Strings Described

xy?

x, xy

x|y

x, y

x*

' ', x, xx, xxx, xxxx, ...

x+

x, xx, xxx, xxxx, …

Simple regular expressions
Consider the string 'cat'. It represents a
single string consisting of the letters c, a,
and t. Now consider the regular expression
'cats?'. Here, the ? does not mean an
English question mark but instead represents a regular expression operator, meaning that the character to its immediate left
is optional. Thus the regular expression
'cats?' describes a set of two strings:
'cat' and 'cats'.
Another regular expression operator is |,
which means “or.” For example, the regular
expression 'a|b|c' describes the set of
three strings 'a', 'b', and 'c'.
The regular expression 'a*' describes
an infinite set of strings: '', 'a', 'aa',
'aaa', 'aaaa', 'aaaaa', and so on. In
other words, 'a*' describes the set of all
strings consisting of a sequence of 0 or
more 'a's. The regular expression 'a+' is
the same as 'a*' but excludes the empty
string ''.

98

Chapter 6

From the Library of Mo Medwani

Finally, within a regular expression you can
use round brackets to indicate what substring an operator ought to apply to. For
example, the regular expression '(ha)+!'
describes these strings: 'ha!', 'haha!',
'hahaha!', and so on. In contrast, 'ha+!'
describes a very different set: 'ha!',
'haa!', 'haaa!', and so on.
You can mix and match these (and many
other) regular expression operators in any
way you want. This turns out to be a very
useful way to describe many commonly
occurring types of strings, such as phone
numbers or email addresses.

Matching with regular expressions
A common application of regular expressions is string matching. For example,
suppose you are writing a program where
the user must enter a string such as done
or quit to end the program. To help
recognize these strings, you could write a
function like this:
# allover.py
def is_done1(s):
return s == 'done' or s == 'quit'

object otherwise. We don’t care about the
details of the match object in this example,
so we only check to see if the result is
None or not.
In such a simple example, the regular
expression version is not much shorter
or better than the first version; indeed,
is_done1 is probably preferable! However,
regular expressions really start to shine
as your programs become larger and
more complex. For instance, suppose we
decide to add a few more possible stopping strings. For the regular expression
version, we just rewrite the regular expression string to be, say, 'done|quit|over|
finished|end|stop'. In contrast, to make
the same change to the first version, we’d
need to include or s == for each string we
added, which would make for a very long
line of code that would be hard to read.
Here’s a more complex example. Suppose you want to recognize funny strings,
which consist of one or more 'ha' strings
followed immediately by one or more '!'s.
For example, 'haha!', 'ha!!!!!', and
'hahaha!!' are all funny strings. It’s easy
to match these using regular expressions:

Using regular expressions, an identically
behaving function might look like this:

# funny.py

# allover.py

def is_funny(s):

import re # use regular expressions
def is_done2(s):
return re.match('done|quit', s)
➝ != None

The first line of this new version imports
Python’s standard regular expression
library. To match a regular expression, we
use the re.match(regex, s) function,
which returns None if regex does not match
s, and a special regular expression match

import re

return re.match('(ha)+!+', s)
None

➝ !=

Notice that the only essential difference
between this is_funny and is_done2 is
that a different regular expression is used
inside match. If you try writing this same
function without using regular expressions, you will quickly see how much work
'(ha)+!+' is doing for us.

continues on next page

Strings

99

From the Library of Mo Medwani

More regular expressions
We have barely scratched the surface of
regular expressions: Python’s re library
is large and has many regular expression
functions that can perform string-processing tasks such as matching, splitting,
and replacing. There are also tricks for
speeding up the processing of commonly
used regular expressions, and numerous
shortcuts for matching commonly used
characters. The documentation for the re
module contains more examples (http://
docs.python.org/3/library/re.html).

100

Chapter 6

From the Library of Mo Medwani

7
Data Structures
In this chapter, we introduce the important idea of data structures: collections of
values along with commonly performed
functions. Python’s programmer-friendly
philosophy is to provide a few powerful and efficient data structures—tuples,
lists, dictionaries, and sets—that can
be combined as needed to make more
complex ones.
In the previous chapter we discussed
strings, which can be thought of as data
structures restricted to storing sequences
of characters. The data structures in this
chapter can contain not just characters
but almost any kind of data.

In This Chapter
The type Command

102

Sequences

103

Tuples

104

Lists

108

List Functions

110

Sorting Lists

113

List Comprehensions

115

Dictionaries

118

Sets

122

Python’s two workhorse data structures
are lists and dictionaries. Lists store data
in sequential order, and dictionaries are
like little databases that efficiently store
and retrieve data using keys.

From the Library of Mo Medwani

The type Command
It’s occasionally useful to check the data
type of a value or a variable. This is easily
done with the built-in type command:
>>> type(5)

>>> type(5.0)

>>> type('5')

>>> type(None)

>>> type(print)


Notice that the output of the type command uses the term class. Roughly speaking, classes and types are synonymous.
The type command can be quite useful for
debugging. For instance, it is not unusual in
Python to work with data collections where
you don’t know the exact type of the data
items, or where you don’t even know the
exact type of the container holding them.
Using type, you can always determine the
exact type of a Python object.

102

Chapter 7

From the Library of Mo Medwani

Order Matters
When we say that sequences are
ordered, we mean that the order of
the elements in the sequence matters.
Strings are ordered because 'abc' is different from 'acb'. Later we will see that
dictionaries and sets are not ordered:
They only care if an item is inside of
them, and in fact they can make no
promises about their relative order.

How Big Can a Sequence Be?
Theoretically, there is no limit to the
length of a sequence: It can contain as
many items as needed. Practically, however, you are restricted by the amount
of RAM available in your computer when
Python is running.

Sequences
In Python, a sequence is an ordered collection of values. Python has three built-in
sequence types: strings, tuples, and lists.
One very nice feature of sequences is that
they can be indexed and sliced, just as
we saw for strings in the previous chapter.
Thus, all sequences have the following
characteristics:
■

Their first positive index is 0, and it is at
the left end.

■

Their first negative index is –1, and it
starts at the right end.

■

Slice notation can be used to make
copies of sub-sequences. For example,
seq[begin:end] returns a copy of
the elements of seq starting at index
location begin and ending at location
end - 1.

■

They can be concatenated (i.e., combined) using + and *. The sequences
must be of the same type for this to
work—that is to say, you cannot concatenate a tuple and a list.

■

Their length is calculated by the len
function. For example, len(s) is the
number of items in sequence s.

■

The expression x in s tests if the
sequence s contains the element x.
That is, x in s returns True if x is somewhere in s, and False otherwise.

In practice, strings and lists are the most
common kinds of sequences. Tuples have
their uses but appear much less often.

Data Structures 103

From the Library of Mo Medwani

Tuples
A tuple is an immutable sequence of 0
or more values. It can contain any Python
value—even other tuples. For example:
>>> items = (-6, 'cat', (1, 2))
>>> items
(-6, 'cat', (1, 2))
>>> len(items)

Trailing Commas
Whereas singleton tuples require a trailing comma, a trailing comma is allowed,
but not required, in longer tuples (and
lists). For example, (1, 2, 3,) is the
same as (1, 2, 3). Some programmers
prefer to always include the trailing
comma so that they never accidentally
leave it out for singleton tuples.

3
>>> items[-1]
(1, 2)
>>> items[-1][0]
1

As you can see, the items of a tuple are
enclosed in round brackets and separated
by commas. The empty tuple is represented by (), but tuples with a single item
(singleton tuples) have the unusual notation (x,). For instance:
>>> type(())

>>> type((5,))

>>> type((5))


If you forget the comma at the end of a
singleton tuple, you have not created a
tuple—all you’ve done is put brackets
around an expression.

104

Chapter 7

From the Library of Mo Medwani

Tuple immutability
As mentioned, tuples are immutable,
meaning that once you’ve created a
tuple, you cannot change it. This is not so
unusual: Strings, integers, and floats are
also immutable. If you do need to change
a tuple, then you must create a new tuple
that embodies the changes. For example,
here’s how you can chop off the first element of a tuple:
>>> lucky = (6, 7, 21, 77)
>>> lucky
(6, 7, 21, 77)
>>> lucky2 = lucky[1:]
>>> lucky2
(7, 21, 77)
>>> lucky
(6, 7, 21, 77)

On the plus side, immutability makes it
impossible to accidentally modify a tuple,
which helps prevent errors. On the minus
side, making even the smallest change to
a tuple requires copying essentially the
whole thing, and so modifying large tuples
can takes extra time and memory. If you
find yourself needing to make frequent
modifications to a tuple, then you should
be using a list instead.

Data Structures 105

From the Library of Mo Medwani

Tuple functions
Table 7.1 lists the most commonly used
tuple functions. Compared with strings and
lists, tuples have a relatively small number
of functions. Here are some examples of
how they are used:
>>> pets = ('dog', 'cat', 'bird',
➝ 'dog')
>>> pets
('dog', 'cat', 'bird', 'dog')

TABLE 7.1 Tuple Functions
Name

Return Value

x in tup

True if x is an element of tup,
False otherwise

len(tup)

Number of elements in tup

tup.count(x)

Number of times element x
occurs in tup

tup.index(x)

Index location of the first
(leftmost) occurrence of x in
tup; if x is not in tup, raises a
ValueError exception

>>> 'bird' in pets
True
>>> 'cow' in pets
False
>>> len(pets)
4
>>> pets.count('dog')
2
>>> pets.count('fish')
0
>>> pets.index('dog')
0
>>> pets.index('bird')
2
>>> pets.index('mouse')
Traceback (most recent call last):
File "", line 1, in
➝ 

pets.index('mouse')
ValueError: tuple.index(x): x not in
➝ list

106

Chapter 7

From the Library of Mo Medwani

As with strings, you can use + and * to
concatenate tuples:
>>> tup1 = (1, 2, 3)
>>> tup2 = (4, 5, 6)
>>> tup1 + tup2
(1, 2, 3, 4, 5, 6)
>>> tup1 * 2
(1, 2, 3, 1, 2, 3)

Data Structures 107

From the Library of Mo Medwani

Lists
Lists are essentially the same as tuples but
with one key difference: Lists are mutable.
That is, you can add, remove, or modify
elements to a list without making a copy.
In practice, lists are used far more frequently than tuples (indeed, some Python
programmers are only faintly aware that
tuples exist!).
The elements of a list are separated by
commas and enclosed in square brackets.
As with strings and tuples, you can easily get the length of a list (using len), and
concatenate lists (using + and *):
>>> numbers = [7, -7, 2, 3, 2]
>>> numbers
[7, -7, 2, 3, 2]
>>> len(numbers)

>>> lst = [3, (1,), 'dog', 'cat']
>>> lst[0]
3
>>> lst[1]
(1,)
>>> lst[2]
'dog'
>>> lst[1:3]
[(1,), 'dog']
>>> lst[2:]
['dog', 'cat']
>>> lst[-3:]
[(1,), 'dog', 'cat']
>>> lst[:-3]

5
>>> numbers + numbers
[7, -7, 2, 3, 2, 7, -7, 2, 3, 2]
>>> numbers * 2
[7, -7, 2, 3, 2, 7, -7, 2, 3, 2]

108

And just as with strings and tuples, you can
use indexing and slicing to access individual elements and sublists:

[3]

Notice that lists can contain any kinds of
values: numbers, strings, or even other
sequences. The empty list is denoted
by [], and a singleton list containing just
one element x is written [x] (in contrast
to tuples, no trailing comma is necessary
for a singleton list).

Chapter 7

From the Library of Mo Medwani

Mutability
As mentioned earlier, mutability is the key
feature that distinguishes lists from tuples.
For example:
>>> pets = ['frog', 'dog', 'cow',
➝ 'hamster']
>>> pets

A A Python list points to its values.

['frog', 'dog', 'cow', 'hamster']
>>> pets[2] = 'cat'
>>> pets
['frog', 'dog', 'cat', 'hamster']

B A self-referential list. Note that the second

element is not pointing to the first element of the
list, but to the entire list itself.

Lingo Alert
Many Python programmers speak as if a
list contains its elements. Although that
is not technically accurate, it is a common
and convenient phrasing. Much of the
time it does not cause any confusion. But
when it comes to finding errors in programs that process lists, it is often essential to understand that lists actually point
to their values and don’t contain them.

As you can see, this sets the second element of the list pets to point to 'cat'. The
string 'cow' gets replaced and is automatically deleted by Python.
Figure A shows a helpful diagrammatic
representation of a list. Just as with variables, it is important to understand that the
elements of a list only point to their values
and do not actually contain them.
The fact that lists point to their values can
be the source of some surprising behavior.
Consider this nasty example:
>>> snake = [1, 2, 3]
>>> snake[1] = snake
>>> snake
[1, [...], 3]

Here, we’ve made an element of a list
point to the list itself: We’ve created a selfreferential data structure. The [...] in the
printout indicates that Python recognizes
the self-reference and does not stupidly
print the list forever(!). Figure B shows diagrammatically what snake looks like.

Data Structures 109

From the Library of Mo Medwani

List Functions
Lists come with many useful functions
(Table 7.2). All of these functions, except
for count (which just returns a number),
modify the list you call them with. Thus,
these are mutating functions, so you need
to use them with care. It is distressingly
easy, for instance, to accidentally delete a
wrong element or insert a new value at the
wrong place.
The append function adds an element to
the end of a list. One common programming pattern is to create an empty list at
the beginning of a function and then add
values to it in the rest of the function. For
example, here is a function that creates a
string of messages based on a list of input
numbers:

TABLE 7.2 List Functions
Name

Return Value

s.append(x)

Appends x to the end of s

s.count(x)

Returns the number of times
x appears in s

s.extend(lst)

Appends each item of lst
to s

s.index(x)

Returns the index value of
the leftmost occurrence of x

s.insert(i, x)

Inserts x before index
location i (so that s[i] == x)

s.pop(i)

Removes and returns the
item at index i in s

s.remove(x)

Removes the leftmost
occurrence of x in s

s.reverse()

Reverses the order of the
elements of s

s.sort()

Sorts the elements of s into
increasing order

# numnote.py
def numnote(lst):
msg = []
for num in lst:
if num < 0:
s = str(num) + ' is negative'
elif 0 <= num <= 9:
s = str(num) + ' is a digit'
msg.append(s)
return msg

For example:
>>> numnote([1, 5, -6, 22])
['1 is a digit', '5 is a digit',
'-6 is negative']

110

Chapter 7

From the Library of Mo Medwani

Lingo Alert
In computer programming, the term pop
usually refers to the act of removing the
last element of a list. The related term,
push, refers to adding an element to the
same end (that is, exactly what Python’s
append does). When push and pop are
used on the same list, we often refer
to it as a stack: We say that items are
pushed onto the top of the stack and
then popped from the top of the stack.
Despite their simplicity, stacks form the
basis of a number of more advanced programming behaviors, such as recursion
and undo.

To print the messages on their own individual lines, you could do this:
>>> for msg in numnote([1, 5, -6,
➝ 22]): print(msg)
1 is a digit
5 is a digit
-6 is negative

Or even this:
>>> print('\n'.join(numnote([1, 5,
➝ -6, 22])))
1 is a digit
5 is a digit
-6 is negative

The extend function is similar to append,
but it adds an entire sequence:
>>> lst = []
>>> lst.extend('cat')
>>> lst
['c', 'a', 't']
>>> lst.extend([1, 5, -3])
>>> lst
['c', 'a', 't', 1, 5, -3]

Data Structures 111

From the Library of Mo Medwani

The pop function removes an element at
a given index position and then returns it.
For example:
>>> lst = ['a', 'b', 'c', 'd']
>>> lst.pop(2)
'c'
>>> lst
['a', 'b', 'd']
>>> lst.pop()
'd'
>>> lst
['a', 'b']

Notice that if you don’t give pop an index,
it removes and returns the element at the
end of the list.
The remove(x) function removes the first
occurrence of x from a list. However, it
does not return x:
>>> lst = ['a', 'b', 'c', 'a']
>>> lst.remove('a')
>>> lst
['b', 'c', 'a']

As the name suggests, reverse reverses
the order of the elements of a list:
>>> lst = ['a', 'b', 'c', 'a']
>>> lst
['a', 'b', 'c', 'a']
>>> lst.reverse()
>>> lst
['a', 'c', 'b', 'a']

It’s important to realize that reverse does
not make a copy of the list: It moves the
items within the list itself, so we say the
reversal is done in place.

112

Chapter 7

From the Library of Mo Medwani

Lingo Alert
The order in which Python sorts a list
of sequences is called lexicographical ordering. This is just a general term
meaning “alphabetical order,” except that
it applies to any sequence of orderable
values, not just letters. The idea is that
elements are ordered by their initial element, then their second element, then
their third element, and so on.

Sorting Lists
Sorting data is one of the most common
things that computers do. Sorted data is
usually easier to work with than unsorted
data, for both humans and computers. For
instance, finding the smallest element of a
sorted list requires no searching at all: It is
simply the first element of the list. Humans
often prefer to see data in sorted order—
just imagine a phone book that was not
printed alphabetically!
In Python, sorting is most easily done using
the list sort() function. In practice, it can
be used to quickly sort lists with tens of
thousands of elements. Like reverse(),
sort() modifies the list in place:
>>> lst = [6, 0, 4, 3, 2, 6]
>>> lst
[6, 0, 4, 3, 2, 6]
>>> lst.sort()
>>> lst
[0, 2, 3, 4, 6, 6]

Data Structures 113

From the Library of Mo Medwani

The sort function always sorts elements
into ascending order, from smallest to
largest. If you want the elements sorted in
reverse order, from largest to smallest, the
simple trick of calling reverse after sort
works well:
>>> lst = ['up', 'down', 'cat',
➝ 'dog']
>>> lst
['up', 'down', 'cat', 'dog']
>>> lst.sort()
>>> lst
['cat', 'dog', 'down', 'up']
>>> lst.reverse()
>>> lst
['up', 'down', 'dog', 'cat']

Python also knows how to sort tuples and
lists. For example:
>>> pts = [(1, 2), (1, -1), (3, 5),
➝ (2, 1)]
>>> pts
[(1, 2), (1, -1), (3, 5), (2, 1)]
>>> pts.sort()
>>> pts
[(1, -1), (1, 2), (2, 1), (3, 5)]

Tuples (and lists) are sorted by their first
element, then by their second element,
and so on.

114

Chapter 7

From the Library of Mo Medwani

List Comprehensions
Lists are used so frequently that Python
provides a special notation for creating them called list comprehensions.
For example, here’s how you can use a
list comprehension to create a list of the
squares of the numbers from 1 to 10:
>>> [n * n for n in range(1, 11)]
[1, 4, 9, 16, 25, 36, 49, 64, 81,
➝ 100]

The main advantage of this notation is
that it is compact and readable. Compare this with equivalent code without a
comprehension:
result = []
for n in range(1, 11):
result.append(n * n)

Once you get the hang of them, list comprehensions are quick and easy to write,
and you will find many uses for them.

Data Structures 115

From the Library of Mo Medwani

Examples of list comprehensions
Let’s see a few more examples of comprehensions. If you want to double each
number on the list and 7, you can do this:
>>> [2 * n + 7 for n in
➝ range(1, 11)]
[9, 11, 13, 15, 17, 19, 21, 23,
27]

➝ 25,

Or if you want the first ten cubes:
>>> [n ** 3 for n in range(1, 11)]
[1, 8, 27, 64, 125, 216, 343, 512,
➝ 729, 1000]

You can also use strings in comprehensions. For example:
>>> [c for c in 'pizza']
['p', 'i', 'z', 'z', 'a']
>>> [c.upper() for c in 'pizza']
['P', 'I', 'Z', 'Z', 'A']

A common application of comprehensions
is to modify an existing list in some way.
For instance:
>>> names = ['al', 'mei', 'jo',
➝ 'del']
>>> names
['al', 'mei', 'jo', 'del']
>>> cap_names = [n.capitalize() for
in names]

➝n

>>> cap_names
['Al', 'Mei', 'Jo', 'Del']
>>> names
['al', 'mei', 'jo', 'del']

116

Chapter 7

From the Library of Mo Medwani

Filtered comprehensions
List comprehensions can also filter out
elements you don’t want. For example, the
following comprehension returns a list containing just the positive elements of nums:

Here’s a comprehension that removes all
the vowels from a word written inside a
function:
# eatvowels.py
def eat_vowels(s):

>>> nums = [-1, 0, 6, -4, -2, 3]

""" Removes the vowels from s.

>>> result = [n for n in nums if
➝ n > 0]

"""

>>> result

➝ if

return ''.join([c for c in s
c.lower() not in 'aeiou'])

[6, 3]

It works like this:

Here’s equivalent code without a
comprehension:

>>> eat_vowels('Apple Sauce')

result = []
nums = [-1, 0, 6, -4, -2, 3]
for n in nums:
if n > 0:
result.append(n)

Again, we see that list comprehensions
are more compact and readable than a
regular loop.

'ppl Sc'

The body of eat_vowels looks rather cryptic at first, and the trick to understanding it
is to read it a piece at a time. First, look at
the comprehension:
[c for c in s if c.lower() not in
➝ 'aeiou']

This is a filtered comprehension that scans
through the characters of s one at a time. It
converts each character to lowercase and
then checks to see if it is a vowel. If it is a
vowel, it is skipped and not added to the
resulting list; otherwise, it is added.
The result of this comprehension is a list of
strings, so we use join to concatenate all
the strings into a single string that is then
immediately returned.

Generator Expressions
There’s one more simplification we could make to eat_vowels: The square brackets in the comprehension can be removed:
' '.join(c for c in s if c.lower() not in 'aeiou')

The expression inside join is an example of a generator expression. In more advanced Python
programming, generator expressions can be used to efficiently generate only the needed part of a
list or sequence, with the elements being generated on demand instead of all at once as with a list
comprehension.

Data Structures 117

From the Library of Mo Medwani

Dictionaries
A Python dictionary is an extremely
efficient data structure for storing pairs of
values in the form key:value. For example:

Lingo Alert
Dictionaries are also referred to as
associative arrays, maps, or hash
tables.

>>> color = {'red' : 1, 'blue' : 2,
➝ 'green' : 3}
>>> color
{'blue': 2, 'green': 3, 'red': 1}

The dictionary color has three members.
One of them is 'blue':2, where 'blue' is
the key and 2 is its associated value.
You access values in a dictionary by their
keys:
>>> color['green']
3
>>> color['red']
1

Accessing dictionary values by their keys
is extremely efficient, even if the dictionary
has many thousands of pairs.

Hashing
Python’s dictionaries use a clever
programming trick known as hashing.
Essentially, each key in a dictionary is
converted to a number called its hash
value using a specially designed hash
function. The associated values are
stored in an underlying list at the index
location of their hash value. Accessing a
value involves converting the supplied
key to a hash value and then jumping to
that index location in the list. The exact
details of hashing are tricky, but thankfully Python takes care of everything
for us.

Like lists, dictionaries are mutable: You
can add or remove key:value pairs. For
example:
>>> color = {'red' : 1, 'blue' : 2,
➝ 'green' : 3}
>>> color
{'blue': 2, 'green': 3, 'red': 1}
>>> color['red'] = 0
>>> color
{'blue': 2, 'green': 3, 'red': 0}

118

Chapter 7

From the Library of Mo Medwani

Key restrictions
Dictionary keys have a couple of restrictions you need to be aware of. First, keys
are unique within the dictionary: You can’t
have two key:value pairs in the same dictionary with the same key. For example:
>>> color = {'red' : 1, 'blue' : 2,
➝ 'green' : 3, 'red' : 4}
>>> color
{'blue': 2, 'green': 3, 'red': 4}

Even though we’ve written the key 'red'
twice, Python only stores the second pair,
'red':4. There’s simply no way to have
duplicate keys: Dictionary keys must
always be unique.
The second restriction on keys is that they
must be immutable. So, for example, a
dictionary key cannot be a list or another
dictionary. The reason for this requirement
is that the location in a dictionary where a
key:value pair is stored is calculated from
the key. If the key changes even slightly,
the location of the key:value pair in the
dictionary can also change. If that happens,
then pairs in the dictionary can become
lost and inaccessible.
Neither of these restrictions holds for
values. Values can be mutable and can
appear as many times as you like within
the same dictionary.

Data Structures 119

From the Library of Mo Medwani

Dictionary functions
Table 7.3 lists the functions that come with
all dictionaries.
As we’ve seen, the standard way to
retrieve a value from a dictionary is to use
square-bracket notation: d[key] returns
the value associated with key. Calling
d.get(key) will do the same thing. If you
call either function when key is not in d,
you’ll get a KeyError.
If you are not sure whether a key is in a
dictionary ahead of time, you can check by
calling key in d. This expression returns
True if key is in the d, and False otherwise. It is an extremely efficient check
(especially as compared with using in with
sequences!).
You can also retrieve dictionary values
using the pop(key) and popitem() functions. The difference between pop(key)
and get(key) is that pop(key) returns
the value associated with key and also

TABLE 7.3 Dictionary Functions
Name

Return Value

d.items()

Returns a view of the (key, value) pairs in d

d.keys()

Returns a view of the keys of d

d.values()

Returns a view of the values in d

d.get(key)

Returns the value associated with key

d.pop(key)

Removes key and returns its corresponding value

d.popitem()

Returns some (key, value) pair from d

d.clear()

Removes all items from d

d.copy()

A copy of d

d.fromkeys(s, t)

Creates a new dictionary with keys taken from s and values taken from t

d.setdefault(key, v)

If key is in d, returns its value; if key is not in d, returns v and adds (key, v) to d

d.update(e)

Adds the (key, value) pairs in e to d; e may be another dictionary or a sequence
of pairs

120

Chapter 7

From the Library of Mo Medwani

removes its pair from the dictionary (get
only returns the value). The popitem()
function returns and removes some (key,
value) pair from the dictionary. You don’t
know ahead of time which pair will be
popped, so it’s useful only when you don’t
care about the order in which you access
the dictionary elements.
The items(), keys(), and values() functions all return a special object known
as a view. A view is linked to the original
dictionary, so that if the dictionary changes,
so does the view. For example:
>>> color
{'blue': 2, 'orange': 4, 'green': 3,
➝ 'red': 0}
>>> k = color.keys()
>>> for i in k: print(i)
blue
orange
green
red
>>> color.pop('red')
0
>>> color
{'blue': 2, 'orange': 4, 'green': 3}
>>> for i in k: print(i)
blue
orange
green

Data Structures 121

From the Library of Mo Medwani

Sets
In Python, sets are collections of 0 or more
items with no duplicates. A set is similar
to a dictionary that only has keys and no
associated values.
Sets come in two categories: mutable sets
and immutable frozensets. You can add
and remove elements from a regular set,
whereas a frozenset can never change
once it is created.

Sets and Dictionaries
Sets are a relatively new addition to
Python. Before sets, programmers used
dictionaries to simulate sets, and indeed
the first implementations of sets in
Python did the same. If you find yourself
using dictionaries and not caring about
the values, then changing your code to
use sets might make it more readable.

Perhaps the most common use of sets is
to remove duplicates from a sequence.
For example:
>>> lst = [1, 1, 6, 8, 1, 5, 5, 6,
➝ 8, 1, 5]
>>> s = set(lst)
>>> s
{8, 1, 5, 6}

Just as with dictionaries, the order of the
elements in the set cannot be guaranteed.
Calling dir(set) in the interactive command line will list the functions that all sets
come with—there are quite a few! Since
sets are not as frequently used as lists and
dictionaries, we won’t list them all here.
But keep sets in mind, and when you need
them, refer to their online documentation at http://docs.python.org/3/library/
stdtypes.html#set-types-set-frozenset.

122

Chapter 7

From the Library of Mo Medwani

8
Input and Output
To be useful, a program needs to communicate with the world around it. It needs
to interact with the user, or read and write
files, or access webpages, and so on.
In general, we refer to this as input and
output, or I/O for short.
We’ve already seen basic console I/O,
which involves printing messages and
using the input function to read strings
from the user. Now we’ll see some string
formatting that lets you make fancy output
strings for console I/O and anywhere you
need a formatted string.

In This Chapter
Formatting Strings

124

String Formatting

126

Reading and Writing Files

128

Examining Files and Folders

131

Processing Text Files

134

Processing Binary Files

138

Reading Webpages

141

Then we’ll turn to file I/O, which is all about
reading and writing files. Python provides a
lot of support for basic file I/O, making it as
easy as possible for programmers. In particular, we’ll see how to use text files, binary
files, and the powerful pickle module.

From the Library of Mo Medwani

Formatting Strings
Python provides a number of different
ways to create formatted strings. We will
discuss the older string interpolation and
the newer format strings.

String interpolation
String interpolation is a simple approach
to string formatting that Python borrows
from the C programming language. For
instance, here’s how you can control the
number of decimal places in a float:
>>> x = 1/81
>>> print(x)
0.0123456790123
>>> print('value: %.2f' % x)
value: 0.01
>>> print('value: %.5f' % x)
value: 0.01235

String interpolation expressions always
have the form format % values, where
format is a string containing one or more
occurrences of the % character. In the
example 'x = %.2f' % x, the substring
%.2f is a formatting command that tells
Python to take the first supplied value (x)
and to display it as a floating point value
with two decimal places.

124

Chapter 8

From the Library of Mo Medwani

TABLE 8.1 Some Conversion Specifiers

Conversion specifiers
The character f in the format string is a
conversion specifier, and it tells Python
how to render the corresponding value.
Table 8.1 lists the most commonly used
conversion specifiers.

Specifier

Meaning

d

Integer

o

Octal (base 8) value

x

Lowercase hexadecimal (base 16)

X

Uppercase hexadecimal (base 16)

e

Lowercase float exponential

The e, E, and f specifiers give you different
ways of representing floats. For example:

E

Uppercase float exponential

>>> x

F

Float

0.012345679012345678

s

String

>>> print('x = %f' % x)

%

% character

x = 0.012346
>>> print('x = %e' % x)

Octal and Hexadecimal
The o and x conversion specifiers, which
convert values to base 8 (octal) and base
16 (hexadecimal), respectively, might
seem to be of questionable value. However, in many computer-oriented applications it is convenient to represent values
in base 16 or, less frequently, base 8. As
we will see later in this chapter, hexadecimal is commonly used when dealing
with binary files.

x = 1.234568e-02
>>> print('x = %E' % x)
x = 1.234568E-02

You can put as many specifiers as you
need in a format string, although you must
supply exactly one value for each specifier.
For example:
>>> a, b, c = 'cat', 3.14, 6
>>> s = 'There\'s %d %ss older than
➝ %.2f years' % (c, a, b)
>>> s
"There's 6 cats older than 3.14
➝ years"

As this example shows, the format string
acts as a simple template that gets filled
in by the values. The values are given in
a tuple, and they must be in the order in
which you want them replaced.
The d, f, and s conversion specifiers are
the most frequently used, so they are the ones
worth remembering. In particular, f is the easiest way to control the format of floats.
If you need the % character to appear as
% itself, then you must type '%%'.

Input and Output 125

From the Library of Mo Medwani

String Formatting
A second way to create fancy strings in
Python is to use format strings with the
string function format(value, format_
spec). For example:
>>> 'My {pet} has {prob}'.format
➝ (pet = 'dog', prob='fleas')
'My dog has fleas'

In a format string, anything within curly
braces is replaced. This is called named
replacement, and it is especially readable
in this example.

Templating Packages
When neither string interpolation
nor format strings are powerful or
flexible enough, you may want to
use a templating package, such as
Cheetah (www.cheetahtemplate.org)
or the one that comes with Django
(www.djangoproject.com). Both allow
you to do some very sophisticated
replacements and are good choices
if you are making, say, a lot of dynamically generated webpages.

You can also replace values by position:
>>> 'My {0} has {1}'.format
➝ ('dog', 'fleas')
'My dog has fleas'

Or apply formatting codes similar to interpolated strings:
>>> '1/81 = {x}'.format(x=1/81)
'1/81 = 0.0123456790123'
>>> '1/81 = {x:f}'.format(x=1/81)
'1/81 = 0.012346'
>>> '1/81 = {x:.3f}'.format(x=1/81)
'1/81 = 0.012'

126

Chapter 8

From the Library of Mo Medwani

You can specify formatting parameters
within braces, like this:
>>> 'num = {x:.{d}f}'.format
➝ (x=1/81, d=3)
'num = 0.012'
>>> 'num = {x:.{d}f}'.format
d=4)

➝ (x=1/81,

'num = 0.0123'

This is something you can’t do with regular
string interpolation.
If you need the { or }characters to
appear as themselves in a format string, type
them as {{ and }}.
Format strings are more flexible and
powerful than string interpolation, but also
more complicated. If you are creating only a
few simple formatted strings, string interpolation is probably the best choice. Otherwise,
format strings are more useful for larger and
more complex formatting jobs, such as creating webpages or form letters for email.

Input and Output 127

From the Library of Mo Medwani

Reading and
Writing Files
A file is a named collection of bits stored
on a secondary storage device, such
as a hard disk, USB drive, flash memory
stick, and so on. We distinguish between
two categories of files: text files, which
are essentially strings stored on disk, and
binary files, which are everything else.
Text files have the following characteristics:
■

They are essentially “strings on disk.”
Python source code files and HTML
files are examples of text files.

■

They can be edited with any text editor. Thus, they are relatively easy for
humans to read and modify.

■

They tend to be difficult for programs to
read. Typically, programs called parsers
are needed to read each different kind
of text file. For instance, Python uses a
special-purpose parser to help read .py
files, and an HTML-specific parser is
needed to read HTML files.

■

They are usually larger than equivalent
binary files. This can be a major problem when, for instance, you need to
send a large text file over the Internet.
Thus, text files are often compressed
(for example, into zip format) to speed
up transmission and to save disk space.

128

Chapter 8

From the Library of Mo Medwani

Binary files have the following
characteristics:
■

They are not usually human-readable,
at least within a regular text editor. A
binary file is displayed in a text editor
as a random-looking series of characters. Some kinds of binary files, such as
JPEG image files, have special viewers
for displaying their content.

■

They usually take up less space than
equivalent text files. For instance, a
binary file might group the information
within it in chunks of 32 bits without
using separator characters, such as
commas or spaces.

■

They are often easier for programs to
read and write than text files. Although
each binary file is different, it’s often not
necessary to write complex parsers to
read them.

■

They are often tied to a specific program and are often unusable if you lack
that program. Some popular binary files
may have their file formats published
so that you can, if so motivated, write
your programs to read and write them.
However, this usually requires substantial effort.

Input and Output 129

From the Library of Mo Medwani

Folders

The current working directory

In addition to files, folders (or directories)
are used to store files and other folders.
The folder structure of most file systems is
quite large and complex, forming a hierarchical folder structure.

Many programs use the idea of a current
working directory, or cwd. This is simply
one directory that has been designated
as the default directory: Whenever you
do something to a file or a folder without
providing a full pathname, Python assumes
you mean a file or a folder in the current
working directory.

A pathname is the name used to identify
a file or a folder. The full pathname can be
quite long. For example, the Python folder
on my Windows computer has this full
pathname: C:\Documents and Settings\tjd\
Desktop\python.
Windows pathnames use a backward
slash ( \) character to separate names in a
path, and they start with the letter of the
disk drive (in this example, C:).
On Mac and Linux systems, a forward slash
(/) is used to separate names. Plus, there
is no drive letter at the start. For example,
here is the full pathname for my Python
folder on Linux: /home/tjd/Desktop/python.
Recall that if you want to write a \ character in a Python string, it must be doubled:

'C:\\home\\tjd\\Desktop\\python'
To avoid the double backslashes, you can use
a raw string:

r'C:\home\tjd\Desktop\python'
Getting Python programs to work with
both styles of pathnames is a bit tricky, and
you should read the documentation for
Python’s os.path module for (much!) more
information.

130

Chapter 8

From the Library of Mo Medwani

Examining Files
and Folders
Python provides many functions that return
information about your computer’s files
and folders (its file system). Table 8.2 lists
a few of the most useful ones.
Let’s write a couple of useful functions to
see how these work. For instance, a common task is retrieving the files and folders
in the current working directory. Writing
os.listdir(os.getcwd()) is unwieldy, so
we can write this function:
# list.py
def list_cwd():
return os.listdir(os.getcwd())

The following two related helper functions
use list comprehensions to return just the
files and folders in the current working
directory:
# list.py
def files_cwd():
return [p for p in list_cwd()
if os.path.isfile(p)]
def folders_cwd():
return [p for p in list_cwd()
if os.path.isdir(p)]

TABLE 8.2 Useful File and Folder Functions
Name

Action

os.getcwd()

Returns the name of the current working directory

os.listdir(p)

Returns a list of strings of the names of all the files and folders in the folder specified
by path p

os.chdir(p)

Sets the current working directory to be path p

os.path.isfile(p) Returns True just when path p specifies the name of a file, and False otherwise
os.path.isdir(p)

Returns True just when path p specifies the name of a folder, and False otherwise

os.stat(fname)

Returns information about fname, such as its size in bytes and the last modification time

Input and Output 131

From the Library of Mo Medwani

If you just want a list of, say, the .py files
in the current working directory, then this
will work:

A Neat Trick

# list.py

The cwd_size_in_bytes function can
be written as a single-line function:

def list_py(path = None):

def cwd_size_in_bytes2():

if path == None:
path = os.getcwd()
return [fname for fname in
➝ os.listdir(path)
if os.path.isfile(fname)
if fname.endswith('.py')]

This function plays a useful trick with its
input parameter: If you call list_py()
without a parameter, it runs on the current
working directory. Otherwise, it runs on the
directory you pass in.

return sum(size_in_bytes(f)
for f in files_cwd())

The details of how cwd_size_in_bytes2
works is beyond the scope of an
introductory book, but if you are curious about this more compact form,
search the web for python generator
expressions.

Finally, here’s a function that returns the
sum of the sizes of the files in the current
working directory:
# list.py
def size_in_bytes(fname):
return os.stat(fname).st_size
def cwd_size_in_bytes():
total = 0
for name in files_cwd():
total = total +
➝ size_in_bytes(name)
return total

132

Chapter 8

From the Library of Mo Medwani

To save space, we’ve removed the
doc strings for these functions. However,
the supplementary code files on Google’s
“pythonintro” website (http://pythonintro.
googlecode.com) all include doc strings.
You can tell from the name
cwd_size_in_bytes that the return value
will be in bytes. Putting the unit of the return
value in the function name prevents the need
to check the documentation for the units.
In general, it’s a good idea to use lots of
functions. Even single-line functions such as
list_dir() are useful because they make
your programs easier to read and maintain.
The os.stat() function is fairly complex and provides much more information
about files than we’ve shown here. Check
Python’s online documentation for more
information (http://docs.python.org/3/library/
os.html).

Input and Output 133

From the Library of Mo Medwani

Processing Text Files
Python makes it relatively easy to process
text files. In general, file processing follows
the three steps shown in A.

Reading a text file, line by line
Perhaps the most common way of reading
a text file is to read it one line at a time. For
example, this prints the contents of a file to
the screen:
# printfile.py
def print_file1(fname):
f = open(fname, 'r')

A The three main steps for processing a text

file. A file must be opened before you can use it,
and then it should be closed when you are done
with it to ensure that all changes are committed
to the file.

for line in f:
print(line, end = '')
f.close() # optional

The first line of the function opens the file:
open requires the name of the file you want
to process, and also the mode you want it
opened in. We are only reading the file, so
we open the file in read mode 'r'. Table
8.3 lists Python’s main file modes.
The open function returns a special file
object, which represents the file on disk.
Importantly, open does not read the file
into RAM. Instead, in this example, the file
is read a line at a time using a for-loop.

TABLE 8.3 Python File Modes
Character

Meaning

'r'

Open for reading (default)

'w'

Open for writing

'a'

Open for appending to the end of
the file

'b'

Binary mode

't'

Text mode (default)

'+'

Open a file for reading and writing

The last line of print_file1 closes the
file. As the comment notes, this is optional:
Python almost always automatically closes
files for you. In this case, variable f is local
to print_file1, so when print_file1
ends, Python automatically closes and then
deletes the file object (not the file itself, of
course!) that f points to.

134

Chapter 8

From the Library of Mo Medwani

Reading by Default
When reading a text file, you can use
open with just the filename. For example:
f = open(fname)

When no mode parameters are supplied,
Python assumes you are opening a text
file for reading.

The print statement in print_file1
sets end = '' because the lines of a file
always end with a \n character. Thus if we
had written just print(line), the file would
be displayed with extra blank lines (try it
and see!).
If errors occur while a file is open, it is
possible that the program could end without
the file being properly closed. In the next chapter, we will see how to handle such errors and
ensure that a file is always correctly closed.

Reading a text file as a string
Another common way of reading a text file
is to read it as one big string. For example:
# printfile.py
def print_file2(fname):
f = open(fname, 'r')
print(f.read())
f.close()

This is shorter and simpler than print_
file1, so many programmers prefer it.
However, if the file you are reading is very
large, it will take up a lot of RAM, which
could slow down, or even crash, your
computer.
Finally, we note that many programmers
would write this as a single line:
print(open(fname, 'r').read())

While this more compact form might take
some getting used to, many programmers
like this style because it is both quick to
type and still relatively readable.

Input and Output 135

From the Library of Mo Medwani

Writing to a text file

Appending to a text file

Writing text files is only a little more
involved than reading them. For example,
this function creates a new text file named
story.txt:

One common way of adding strings to a
text file is to append them to the end of the
file. Unlike 'w' mode, this does not delete
anything that might already be in the file.
For example:

# write.py
def make_story1():
f = open('story.txt', 'w')
f.write('Mary had a little
➝ lamb,\n')
f.write('and then she had some
➝ more.\n')

def add_to_story(line, fname =
➝ 'story.txt'):
f = open(fname, 'a')
f.write(line)

The important thing to note here is that the
file is opened in append mode 'a'.

The 'w' tells Python to open the file in
write mode. To put text into the file, you
call f.write with the string you want to put
into the file. Strings are written to the file in
the order in which they are given.
Important: If story.txt already exists,
then calling open('story.txt', 'w') will
delete it! If you want to avoid overwriting
story.txt, you need to check to see if it
exists:
# write.py
import os
def make_story2():
if os.path.isfile('story.txt'):
print('story.txt already exists)
else:
f = open('story.txt', 'w')
f.write('Mary had a little
➝ lamb,\n')
f.write('and then she had some
➝ more.\n')

136

Chapter 8

From the Library of Mo Medwani

Inserting a string at
the start of a file
Writing a string to the beginning of a file
is not as easy as appending one to the
end because the Windows, Linux, and
Macintosh operating systems don’t directly
support inserting text at the beginning of a
text file. Perhaps the simplest way to insert
text at the beginning of a file is to read the
file into a string, insert the new text into the
string, and then write the string back to the
original file. For example:
def insert_title(title, fname =
➝ 'story.txt'):
f = open(fname, 'r+')
temp = f.read()
temp = title + '\n\n' + temp
f.seek(0) # reset file pointer
# to beginning
f.write(temp)

First, notice that we open the file using
the special mode 'r+', which means the
file can be both read from and written
to. Then we read the entire file into the
string temp and insert the title using string
concatenation.
Before writing the newly created string
back into the file, we first have to tell the
file object f to reset its internal file pointer.
All text file objects keep track of where
they are in the file, and after f.read() is
called, the file pointer is at the very end.
Calling f.seek(0) puts it back at the start
of the file, so that when we write to f, it
begins at the start of the file.

Input and Output 137

From the Library of Mo Medwani

Processing Binary Files
If a file is not a text file, then it is considered to be a binary file. Binary files are
opened in 'b' mode, and you access the
individual bytes of the file. For example:
def is_gif(fname):
f = open(fname, 'br')
first4 = tuple(f.read(4))
return first4 == (0x47, 0x49,
➝ 0x46, 0x38)

This function tests if fname is a GIF image
file by checking to see if its first four bytes
are (0x47, 0x49, 0x46, 0x38) (all GIFs
must start with those four bytes).
In Python, numbers like 0x47 are base-16
hexadecimal numbers, or hex for short.
They are very convenient for dealing with
bytes, since each hexadecimal digit corresponds to a pattern of four bits, and so one
byte can be described using two hex digits
(such as 0x47).
Notice that the file is opened in 'br'
mode, which means binary reading
mode. When reading a binary file, you call
f.read(n), which reads the next n bytes.
As with text files, binary file objects use
a file pointer to keep track of which byte
should be read next in the file.

138

Chapter 8

From the Library of Mo Medwani

Pickling
Lingo Alert
The Python pickle module performs
what is generally known as object serialization, or just serialization. The idea
is to take a complex data structure and
convert it to a stream of bytes—that is,
create a serial representation of the data
structure.

Accessing the individual bytes of binary
files is a very low-level operation that,
while useful in systems programming, is
less useful in higher-level applications
programming.
Pickling is often a much more convenient
way to deal with binary files. Python’s
pickle module lets you easily read
and write almost any data structure. For
example:
# picklefile.py
import pickle
def make_pickled_file():
grades = {'alan' : [4, 8, 10, 10],
'tom' : [7, 7, 7, 8],
'dan' : [5, None, 7, 7],
'may' : [10, 8, 10, 10]}
outfile = open('grades.dat', 'wb')
pickle.dump(grades, outfile)
def get_pickled_data():
infile = open('grades.dat', 'rb')
grades = pickle.load(infile)
return grades

Essentially, pickling lets you store a data
structure on disk using pickle.dump and
then retrieve it later with pickle.load.
This is an extremely useful feature in many
application programs, so you should keep
it in mind whenever you need to store
binary data.

Input and Output 139

From the Library of Mo Medwani

In addition to data structures, pickling
can store functions.
You can’t use pickling to read or write
binary files that have a specific format, such
as GIF files. For such files, you must work byte
by byte.
Python has a module called shelve that
provides an even higher-level way to store and
retrieve data. The shelve module essentially
lets you treat a file as if it were a dictionary. For
more details, see the Python documentation
(http://docs.python.org/3/library/shelve.html).
Python also has a module named
sqlite3, which provides an interface to the
SQLite database. This lets you write SQL commands to store and retrieve data very much
like using a larger database product such as
Postgres or MySQL. For more details, see the
Python documentation (http://docs.python.
org/3/library/sqlite3.html).

140

Chapter 8

From the Library of Mo Medwani

Reading Webpages
Python has good support for accessing the
web. One common task is to have a program automatically read a webpage. This is
easily done using the urllib module:
>>> import urllib.request
>>> page = urllib.request.
➝ urlopen('http://www.python.org')
>>> html = resp.read()
>>> html[:25]
b'>> import webbrowser
>>> webbrowser.open
➝ ('http://www.yahoo.com')
True
>>>
Input and Output 141

From the Library of Mo Medwani

This page intentionally left blank

From the Library of Mo Medwani

9
Exception Handling
Exceptions are a solution to a difficult
problem: How can programs deal with
unexpected errors? For instance, what
happens if a file disappears in the middle
of being read because some other program on your computer has deleted it? Or
what if the website your program is downloading pages from suddenly crashes?

In This Chapter
Exceptions

144

Catching Exceptions

146

Clean-Up Actions

150

In these and many other situations, what
Python does is raise an exception. An
exception is a special kind of error object
that you can catch and then examine in
order to determine how to handle the error.
Exceptions can change the flow of control
of your program. Depending on when it
occurs, an exception can cause the flow of
control to jump out of the middle of a function or loop into another block of code that
does error handling.
Often, you cannot be sure exactly which
line might raise an exception, and this creates some tricky problems. Thus Python
provides special exception-handling constructs for both catching exceptions and
executing clean-up code whether or not
an exception is raised.

From the Library of Mo Medwani

Exceptions
An example of an exception is IOError,
which is raised when you try to open a file
that doesn’t exist:
>>> open('unicorn.dat')
Traceback (most recent call last):
File "", line 1, in
➝ 

Lingo Alert
In Python, when an exception occurs, we
say that it has been raised, or has been
thrown. If we do nothing with a raised
exception, the program usually halts
immediately with a traceback, or a stack
trace. However, especially in programs
meant to be used by other people, we
usually catch and handle exceptions, as
we will see shortly.

open('unicorn.dat')
File "C:\Python30\lib\io.py", line
in __new__

➝ 284,

return open(*args, **kwargs)
File "C:\Python30\lib\io.py", line
in open

➝ 223,

closefd)
IOError: [Errno 2] No such file or
'unicorn.dat'

➝ directory:

When an exception is raised and is not
caught or handled in any way, Python
immediately halts the program and outputs
a traceback, which is a list of the functions that were called before the exception occurred. This can be quite useful in
pinning down exactly what line causes an
error.
The last line of the traceback indicates that
an IOError exception has been raised,
and, specifically, it means that unicorn.dat
could not be found in the current working
directory. The error message given by an
IOError differs depending on the exact
reason for the exception.

144

Chapter 9

From the Library of Mo Medwani

Raising an exception
As we saw with the open function, Python’s
built-in functions and library functions
usually raise exceptions when something
unexpected happens.
For instance, dividing by zero throws an
exception:
>>> 1/0
Traceback (most recent call last):
File "", line 1, in
➝ 

1/0
ZeroDivisionError: int division or
by zero

➝ modulo

Syntax errors can also cause exceptions in
Python:
>>> x := 5
SyntaxError: invalid syntax
➝ (, line 1)
>>> print('hello world)
SyntaxError: EOL while scanning
literal (,
➝ line 1)
➝ string

You can also intentionally raise an exception anywhere in your code using the
raise statement. For example:
>>> raise IOError('This is a test!')
Traceback (most recent call last):
File "", line 1, in
➝ 

raise IOError('This is a test!')
IOError: This is a test!

Python has numerous built-in exceptions
organized into a hierarchy. See the Python
documentation (http://docs.python.org/3/
library/exceptions.html#bltin-exceptions)
for more details.

Exception Handling 145

From the Library of Mo Medwani

Catching Exceptions
You have essentially two options for dealing with a raised exception:
1. Ignore the exception and let your
program crash with a traceback. This
is usually what you want when you
are developing a program, since the
traceback provides helpful debugging
information.
2. Catch the exception and print a friendly
error message, or possibly even try to
fix the problem. This is almost always
what you want to do with a program
meant to be used by non-programmers.
Regular users don’t want to deal with
tracebacks!
Here’s an example of how to catch an
exception. Suppose you want to read an
integer from the user, prompting repeatedly until a valid integer is entered:

What Exceptions
Do Functions Raise?
How do we know to check for an exception named ValueError in get_age()?
The answer depends on the function’s
documentation. A well-documented
function will tell you what exceptions
it might raise. For instance, the documentation for the open function (http://
docs.python.org/3/library/functions.
html?#open) tells you that it might raise
an IOError. However, not all of Python’s
built-in functions are so forthcoming: The
documentation for the int function says
nothing about what exceptions it might
raise. In this case, you have to figure
out the possible exceptions by reading
samples of other Python code, or by
doing command-line experiments.

def get_age():
while True:
try:
n = int(input('How old are
➝ you? '))
return n
except ValueError:
print('Please enter an integer
➝ value.')

Inside this function’s while-loop is a
try/except block. You put whatever code
you like in the try part of the block, with
the understanding that one or more lines
of that code might raise an exception.

146

Chapter 9

From the Library of Mo Medwani

If any line of the try block does raise an
exception, then the flow of control immediately jumps to the except block, skipping
over any statements that have not been
executed yet. In this example, the return
statement will be skipped when an exception is raised.
If the try block raises no exceptions, then
the except ValueError block is ignored
and not executed.
So in this example, the int() function
raises a ValueError if the user enters a
string that is not a valid integer. When that
happens, the flow of control jumps to the
except ValueError block and prints the
error message. When a ValueError is
raised, the return statement is skipped—
the flow of control jumps immediately to
the except block.
If the user enters a valid integer, then no
exception is raised, and Python proceeds
to the following return statement, thus
ending the function.

Exception Handling 147

From the Library of Mo Medwani

Try/except blocks
Try/except blocks work a little bit like
if-statements. However, they are different in an important way: If-statements
decide what to do based on the evaluation of Boolean expressions, whereas try/
except blocks decide what to do based on
whether or not an exception is raised.
A function can raise more than one kind of
exception, and it can even raise the same
type of exception for different reasons.
Look at these three different int() exceptions (the tracebacks have been trimmed
for readability):
>>> int('two')
ValueError: invalid literal for
➝ int() with base 10: 'two'
>>> int(2, 10)
TypeError: int() can't convert nonwith explicit base

➝ string

>>> int('2', 1)
ValueError: int() arg 2 must be >= 2
<= 36

➝ and

So int() raises ValueError for at least
two different reasons, and it raises
TypeError in at least one other case.

148

Chapter 9

From the Library of Mo Medwani

Catching multiple exceptions
You can write try/except blocks to handle
multiple exceptions. For example, you can
group together multiple exceptions in the
except clause:
def convert_to_int1(s, base):
try:
return int(s, base)
except (ValueError, TypeError):
return 'error'

Or, if you care about the specific exception
that is thrown, you can add extra except
clauses:
def convert_to_int2(s, base):
try:
return int(s, base)
except ValueError:
return 'value error'
except TypeError:
return 'type error'

Catching any exception
If you write an except clause without any
exception name, it will catch any and all
exceptions:
def convert_to_int3(s, base):
try:
return int(s, base)
except:
return 'error'

This form of except clause will catch any
exception—it doesn’t care about what kind
of error has occurred, just that one has
occurred. In many situations, this is all you
need.

Exception Handling 149

From the Library of Mo Medwani

Clean-Up Actions
A finally code block can be added to
any try/except block to perform clean-up
actions. For example:
def invert(x):
try:
return 1 / x
except ZeroDivisionError:
return 'error'
finally:
print('invert(%s) done' % x)

The code block underneath finally will
always be executed after the try block or
the except block. This is quite useful when
you have code that you want to perform
regardless of whether an exception is
raised. For instance, file close statements
are often put in finally clauses so that
files are guaranteed to be closed, even if
an unexpected IOError occurs.

150

Chapter 9

From the Library of Mo Medwani

The with statement
Alternative Formatting
The print statements in these two snippets of code use string interpolation to
print a right-justified and zero-padded
number before each line of the printed
file. If you prefer string formatting, you
could replace the print statements with
this one:
print('{0:04} {1}'.format(num,
➝ line), end = '')

Python’s with statement is another way to
ensure that clean-up actions (such as closing a file) are done as soon as possible,
even if there is an exception. For example,
consider this code, which prints a file to the
screen with numbers for each line:
num = 1
f = open(fname)
for line in f:
print('%04d %s' % (num, line),
= '')

➝ end

num = num + 1
# following code

What’s unknown here is when the file
object f is closed. At some point after the
for-loop, f will usually be closed. But we
don’t know when precisely that will happen; it will remain unclosed but unused for
an indeterminate amount of time, which
might be a problem if other programs try to
access the file.
To ensure that the file is closed as soon
as it is no longer needed, use a with
statement:
num = 1
with open(fname, 'r') as f:
for line in f:
print('%04d %s' % (num, line),
= '')

➝ end

num = num + 1

The onscreen results are the same as the
previous code, but when you use a with
statement, the file objects’ clean-up action
(that is to say, closing the file) is automatically called as soon as the for-loop ends.
Thus f does not sit around unclosed.

Exception Handling 151

From the Library of Mo Medwani

This page intentionally left blank

From the Library of Mo Medwani

10
Object-Oriented
Programming
In this chapter, we will briefly look at
object-oriented programming, or OOP for
short. OOP is a methodology for organizing
programs that encourages careful design
and code reuse. Most modern programming languages support it, and it is has
proved to be a practical way to structure
and create large programs.
Essentially, an object is a collection of
data, and functions that operate on that
data. We’ve already been using objects in
Python; numbers, strings, lists, dictionaries,
and functions are all examples of objects.

In This Chapter
Writing a Class

154

Displaying Objects

156

Flexible Initialization

160

Setters and Getters

162

Inheritance

168

Polymorphism

171

Learning More

175

To create new kinds of objects, you must
first create a class. A class is essentially a
blueprint for creating an object of a particular kind. The class specifies what data
and functions the objects will contain, and
how they relate to other classes. An object
encapsulates both the data and functions
that operate on that data.
An important OOP feature is inheritance:
You can create new classes that inherit their
data and functions from an existing class.
When used properly, inheritance can save
you from rewriting code, and it can also
make your programs easier to understand.

From the Library of Mo Medwani

Writing a Class
Let’s jump right into OOP by creating a
simple class to represent a person:
# person.py
class Person:
""" Class to represent a person
"""

Lingo Alert
In some OOP languages, __init__ is
called a constructor, because it constructs the object. A constructor is called
every time a new object is created. In
languages such as Java and C++, an
explicit new keyword is used to indicate
when an object is being constructed.

def __init__(self):
self.name = ''
self.age = 0

This defines a class named Person. It
defines the data and functions a Person
object will contain. We’ve started simple
and given Person a name and an age. The
only function so far is __init__, which
is the standard function for initializing an
object’s values. As we will see, Python
automatically calls __init__ when you
create a Person object.
A function defined inside a class is called a
method. Just like __init__, methods must
have self as their first parameter (self will
be discussed in more detail shortly).
We can use Person objects like this:
>>> p = Person()
>>> p
<__main__.Person object at
➝ 0x00AC3370>
>>> p.age
0
>>> p.name
''
>>> p.age = 55
>>> p.age
55
>>> p.name = 'Moe'
>>> p.name
'Moe'
154

Chapter 10

From the Library of Mo Medwani

To create a Person object, we simply call
Person(). This causes Python to run the
__init__ function in the Person class and
to return a new object of type Person.
The age and name variables are inside an
object, and every newly created Person
object has its own personal copy of age
and name. To access age or name, you must
specify what object holds them using dot
notation.

A In this example, the variable p points to a

Person object (represented by the circle). As we
know from looking at the Person class, a Person
object contains an age and a name. These can be

used just like regular variables, with the stipulation
that they be accessed using dot notation—that
is, p.age and p.name. The special variable self
is automatically added by Python to all objects; it
points to the object itself and lets functions within
the class unambiguously refer to the data and
functions within the object.

The self parameter
You’ll notice that we don’t provide
any parameters for Person(), but the
__init__(self) function expects an
input named self. That’s because in OOP,
self is a variable that refers to the object
itself A. This is a simple idea, but one that
trips up many beginners.
All classes should have an
__init__(self) method whose job is to
initialize the object—for example, initializing
an object’s variables. The __init__ method
is only called once when the object is created.
As we will see, you can provide extra parameters to __init__ if needed.
We have followed standard Python
terminology and given the first parameter of
__init__ the name self. This name is not
required: You can use any variable name you
like instead of self. However, the use of self
is a universal convention in Python, and using
any other name would likely just cause confusion for any programmer trying to read your
code. Some other languages, such as Java and
C++, use—and require—the name this.
Objects can be used like any other data
type in Python: You can pass them to functions, store them in lists and dictionaries,
pickle them in files, and so on.

Object-Oriented Programming

155

From the Library of Mo Medwani

Displaying Objects
As mentioned, a method is a function
defined within an object. Let’s add a
method to the Person class that prints the
contents of a Person object:
# person.py
class Person:
""" Class to represent a person
"""
def __init__(self):
self.name = ''
self.age = 0
def display(self):
print("Person('%s', age)" %
➝ (self.name, self.age))

The display method prints the contents of
a Person object to the screen in a format
useful to a programmer:
>>> p = Person()
>>> p.display()
Person('', 0)
>>> p.name = 'Bob'
>>> p.age = 25
>>> p.display()
Person('Bob', 25)

156

Chapter 10

From the Library of Mo Medwani

The display method works fine, but we
can do better: Python provides some special methods that let you customize objects
for seamless printing. For instance, the
special __str__ method is used to generate a string representation of an object:
# person.py
class Person:
# __init__ removed for space
def display(self):
print("Person('%s', age)" %
self.age))

➝ (self.name,

def __str__(self):
return "Person('%s', age)" %
self.age)

➝ (self.name,

Now we can write code like this:
>>> p = Person()
>>> str(p)
"Person('', 0)"

We can use str to simplify the display
method:
# person.py
class Person:
# __init__ removed for space
def display(self):
print(str(self))
def __str__(self):
return "Person('%s', age)" %
self.age)

➝ (self.name,

Object-Oriented Programming

157

From the Library of Mo Medwani

You can also define a special method
named __repr__ that returns the “official”
representation of an object. For example,
the default representation of a Person is
not very helpful:
>>> p = Person()
>>> p
<__main__.Person object at
➝ 0x012C3170>

By adding a __repr__ method, we can
control the string that is printed here. In
most objects, it is the same as the __str__
method:
# person.py
class Person:
# __init__ removed for space
def display(self):
print(str(self))
def __str__(self):
return "Person('%s', age)" %
➝ (self.name, self.age)
def __repr__(self):
return str(self)

Now Person objects are easier to work
with:
>>> p = Person()
>>> p
Person('', 0)
>>> str(p)
"Person('', 0)"

158

Chapter 10

From the Library of Mo Medwani

When creating your own classes and
objects, it is almost always worthwhile to write
__str__ and __repr__ functions. They are
extremely useful for displaying the contents of
your objects, which is helpful when debugging
your programs.
If you define a __repr__ method but
not a __str__ method, then when you call
str() on the object, it will run __repr__.
Once you’ve added the __repr__
method, the display method for Person can
be further simplified:

def display(self):
print(self)
In practice, it’s often not necessary to write a
display method.
The Python documentation recommends
that the string representation of an object be
the same as the code you would write to create that object. This is a very useful convention: It lets you easily re-create objects by
cutting and pasting the string representation
into the command line.

Object-Oriented Programming

159

From the Library of Mo Medwani

Flexible Initialization
If you want to create a Person object with a
particular name and age, you must currently
do this:
>>> p = Person()
>>> p.name = 'Moe'
>>> p.age = 55
>>> p
Person('Moe', 55)

A more convenient approach is to pass
the name and age to __init__ when
the object is constructed. So let’s rewrite
__init__ to allow for this:
# person.py
class Person:
def __init__(self, name = '',
age = 0):
self.name = name
self.age = age

Now initializing a Person is much simpler:
>>> p = Person('Moe', 55)
>>> p
Person('Moe', 55)

160

Chapter 10

From the Library of Mo Medwani

Since the parameters to __init__ have
default values, you can even create an
“empty” Person:
>>> p = Person()
>>> p
Person('', 0)

Notice that inside the __init__ method
we use self.name and name (and also
self.age and age). The variable name
refers to the value passed into __init__,
and self.name refers to the value stored
in the object. The use of self helps make
clear which is which.
Although it is easy to create default values for __init__ parameters and thus allow
the creation of empty Person objects, it is not
so clear if this is a good idea from a design
point of view. An empty Person does not
have a real name or age, so you will need to
check for that in code that processes Person
objects. Constantly checking for special cases
can soon become a real burden that’s easy to
forget about. Thus, many programmers prefer
not to give the __init__ parameters default
values in cases like this.

Object-Oriented Programming 161

From the Library of Mo Medwani

Setters and Getters
As it stands now, we can both read and
write the name and age values of a Person
object using dot notation:
>>> p = Person('Moe', 55)
>>> p.age
55
>>> p.name
'Moe'
>>> p.name = 'Joe'
>>> p.name
'Joe'
>>> p
Person('Joe', 55)

A problem with this is that we could, accidentally, set the age to be a nonsensical
value, such as –45 or 509. With regular
Python variables, there is no way to restrict
what values they can be assigned. But
within an object, we can write special setter and getter methods that give us control
over how values are accessed.
First, let’s add a setter method that
changes age only if a sensible value
is given:
def set_age(self, age):
if 0 < age <= 150:
self.age = age

162

Chapter 10

From the Library of Mo Medwani

Now we can write code like this:

Decorators
Decorators are a general-purpose
construct in Python used to systematically modify existing functions. They are
usually placed at the beginning of a
function, and start with the @ character.
We will use them in this book for this one
example of creating setters and getters.

>>> p = Person('Jen', 25)
>>> p
Person('Jen', 25)
>>> p.set_age(30)
>>> p
Person('Jen', 30)
>>> p.set_age(-6)
>>> p
Person('Jen', 30)

A common complaint about this kind of setter is that typing p.set_age(30) is more
cumbersome than p.age = 30. Property
decorators solve this problem.

Property decorators
Property decorators combine the brevity
of variables with the flexibility of functions. Decorators indicate that a function
or method is special in some way, and
here we use them to indicate setters and
getters.
A getter returns the value of a variable,
and we indicate this using the @property
decorator:
@property
def age(self):
""" Returns this person's age.
"""
return self._age

This age method takes no parameters
(other than the required self). We’ve
put @property before it, which indicates
that it’s a getter function. The name of
the method, age, will be used to set the
variable.

Object-Oriented Programming

163

From the Library of Mo Medwani

We have also renamed the underlying
self.age variable to self._age. Putting
an underscore in front of an object variable
is a common convention, and we use it
here to distinguish it from the age method.
You need to replace every occurrence
of self.age in Person with self._age.
For consistency, it is also a good idea
to everywhere replace self.name with
self._name. The modified Person class
should look like this:
# person.py
class Person:
def __init__(self, name = '',
age = 0):
self._name = name
self._age = age

@property
def age(self):
return self._age

def set_age(self, age):
if 0 < age <= 150:
self._age = age

def display(self):
print(self)

def __str__(self):
return "Person('%s', %s)" %
➝ (self._name, self._age)

def __repr__(self):
return str(self)

164

Chapter 10

From the Library of Mo Medwani

To create an age setter, we rename the
set_age method to age and decorate it
with @age.setter:
@age.setter
def age(self, age):
if 0 < age <= 150:
self._age = age

With these changes, we can now write
code like this:
>>> p = Person('Lia', 33)
>>> p
Person('Lia', 33)
>>> p.age = 55
>>> p.age
55
>>> p.age = -4
>>> p.age
55

The setter and getters for age work just as
if we were using the variable age directly.
The difference is that now when you call,
say, p.age = -4, Python is really calling
the age(self, age) method. Similarly,
when you write p.age, the age(self)
method is called. Thus we get the advantage of the simple assignment syntax
combined with the flexibility of controlling
how variables are set and get.

Object-Oriented Programming

165

From the Library of Mo Medwani

Private variables
It’s still possible to access self._age
directly:
>>> p._age = -44

Lingo Alert
Variables that don’t begin with an underscore are called public variables, and any
code can access them.

>>> p
Person('Lia', -44)

The problem is that _age might be modified in some way that makes the object
inconsistent, and so we don’t usually want
to allow it.
One way to decrease the chance of this
kind of problem is to rename self._age
to self.__age—that is to say, to put two
underscores in front of the variable name.
The two underscores declare that age is
a private variable that is not meant to be
accessed by any code outside of Person.
To access self.__age directly, you now
have to put _Person on the front, like this:
>>> p._Person__age = -44
>>> p
Person('Lia', -44)

While this does not prevent you from
modifying internal variables, it does make it
almost impossible to do so accidentally.

166

Chapter 10

From the Library of Mo Medwani

When writing large programs, a useful rule of thumb is to always make object
variables private (that is, starting with two
underscores) by default, and then change
them to be public if you have a good reason to
do so. That way, you will prevent errors caused
by unintended meddling with the internals of
an object.
The syntax for creating setters and getters is strange at first, but once you get used
to it, it is fairly clear. Keep in mind that you
don’t always need to create special setters
and getters; for simple objects, like the original
Person, regular variables may be fine.
Some programmers prefer to avoid
setters whenever possible, thus making the
object immutable ( just like numbers, strings,
and tuples). In an object with no setters, after
you create the object, there is no “official” way
to change anything within it. As with other
immutable objects, this can prevent many
subtle errors and allow different variables to
share the same object (thus saving memory).
The downside, of course, is that if you do need
to modify the object, your only option is to create a new object that incorporates the change.
If the programmer tries to set the age to
be something out of range, then age(self,
age) doesn’t make any change. An alternative
approach is to purposely raise an exception,
thus requiring any code that calls it to handle
the exception. The advantage of raising an
exception is that it might help you find more
errors. Trying to set the age to be a nonsensical value is likely a sign of a problem elsewhere in your program.

Object-Oriented Programming

167

From the Library of Mo Medwani

Inheritance
Inheritance is a mechanism for reusing
classes. Essentially, inheritance allows you
to create a brand new class by adding
extra variables and methods to a copy of
an existing class.
Suppose we are creating a game that has
human players and computer players. Let’s
create a Player class that contains things
common to all players, such as the score
and a name:
# players.py
class Player:
def __init__(self, name):
self._name = name
self._score = 0
def reset_score(self):
self._score = 0
def incr_score(self):
self._score = self._score + 1
def get_name(self):
return self._name
def __str__(self):
return "name = '%s', score = %s"
➝ % (self._name, self._score)
def __repr__(self):
return 'Player(%s)' % str(self)

We can use Player objects this way:
>>> p = Player('Moe')
>>> p
Player(name = 'Moe', score = 0)
>>> p.incr_score()
>>> p
Player(name = 'Moe', score = 1)
>>> p.reset_score()
>>> p
Player(name = 'Moe', score = 0)

168

Chapter 10

From the Library of Mo Medwani

Lingo Alert
Many different terms are used to
describe inheritance. Given that class
Human inherits from class Player, we can
say the following:
.

Human extends Player.

.

Human is derived from Player.

.

Human is a subclass of Player, and
Player is a superclass of Human.

.

Human isa Player.

The last term, isa, implies that all humans
are players. Thinking about possible isa
relationships between classes is one way
to create class hierarchies.

Let’s assume that there are two kinds of
players: humans and computers. The main
difference is that humans enter their moves
from the keyboard, whereas computers generate their moves from functions.
Otherwise they are the same, each having
a name and a score.
So let’s write a Human class that represents a Human player. One way to do
that would be to cut and paste a new
copy of the Player class, and then add
a make_move(self) method that asks
the player to make a move. While that
approach certainly would work, a better
way is to use inheritance. We can define
the Human class to inherit all the variables
and methods from the Player class so
that we don’t have to rewrite them:
class Human(Player):
pass

In Python, the pass statement means
“Do nothing.” This is a complete—and
useful!—definition for the Human class.
It simply inherits the code from Player,
which lets us do the following:
>>> h = Human('Jerry')
>>> h
Player(name = 'Jerry', score = 0)
>>> h.incr_score()
>>> h
Player(name = 'Jerry', score = 1)
>>> h.reset_score()
>>> h
Player(name = 'Jerry', score = 0)

This is pretty impressive given that we
wrote only two lines of code for the
Human class!

Object-Oriented Programming

169

From the Library of Mo Medwani

Overriding methods
One small wart is that the string representation of h says Player when it would be
more accurate for it to say Human. We can
fix that by giving Human its own __repr__
method:
class Human(Player):
def __repr__(self):
return 'Human(%s)' % str(self)

Now we get this:
>>> h = Human('Jerry')
>>> h

B A class diagram showing how the Player,

Human, and Computer classes relate. The arrows

indicate inheritance, and the entire diagram is a
hierarchy of classes. The more abstract (that is,
general) classes appear near the top, and the
more concrete (that is, specific) ones nearer the
bottom.

Human(name = 'Jerry', score = 0)

This is an example of method overriding:
The __repr__ method in Human overrides the __repr__ method inherited from
Player. This is a common way to customize inherited classes.

Lingo Alert
It’s also common to use the term parent
class to refer to the base class, and
child class to refer to the derived class.

Now it’s easy to write a similar Computer
class to represent computer moves:
class Computer(Player):
def __repr__(self):
return Computer(%s)' % str(self)

These three classes form a small class
hierarchy, as shown in the class diagram
of B. The Player class is called the
base class, and the other two classes are
derived, or extended, classes.
Essentially, an extended class inherits the
variables and methods from the base class.
Any code you want to be shared by all the
derived classes should be placed inside
the base class.

170

Chapter 10

From the Library of Mo Medwani

Polymorphism
To demonstrate the power of OOP, let’s
implement a simple game called Undercut.
In Undercut, two players simultaneously
pick an integer from 1 to 10 (inclusive). If a
player picks a number one less than the
other player—if he undercuts the other
player by 1—then he wins. Otherwise, the
game is a draw. For example, if Thomas
and Bonnie are playing Undercut, and they
pick the numbers 9 and 10, respectively,
then Thomas wins. If, instead, they choose
4 and 7, the game is a draw.
Here’s a function for playing one game
of Undercut:
def play_undercut(p1, p2):
p1.reset_score()
p2.reset_score()
m1 = p1.get_move()
m2 = p2.get_move()
print("%s move: %s" % (p1.get_
m1))

➝ name(),

print("%s move: %s" % (p2.get_
m2))

➝ name(),

if m1 == m2 - 1:
p1.incr_score()
return p1, p2, '%s wins!' %
➝ p1.get_name()

elif m2 == m1 - 1:
p2.incr_score()
return p1, p2, '%s wins!' %
➝ p2.get_name()

else:
return p1, p2, 'draw: no winner'

If you read this function carefully, you
will note that p1.get_move() and
p2.get_move() are called. We haven’t yet
implemented these functions because they
are game-dependent. So let’s do that now.

Object-Oriented Programming

171

From the Library of Mo Medwani

Implementing the move functions
Even though moves in Undercut are just
numbers from 1 to 10, humans and computers determine their moves in very different
ways. Human players enter a number from
1 to 10 at the keyboard, whereas computer
players use a function to generate their
moves. Thus the Human and Computer
classes need their own special-purpose
get_move(self) methods.
Here is a get_move method for the human
(the error messages have been shortened
to save space; fuller and more user-friendly
messages are given in the accompanying
source code on the website):
class Human(Player):
def __repr__(self):
return 'Human(%s)' % str(self)

def get_move(self):
while True:
try:
n = int(input('%s move (1 ➝ 10): ' % self.get_name()))
if 1 <= n <= 10:
return n
else:
print('Oops!')
except:
print(Oops!')

This code asks the user to enter an integer
from 1 to 10 and doesn’t quit until the user
does so. The try/except structure is used
to catch the exception that the int function
will throw if the user enters a non-integer
(like “two”).

172

Chapter 10

From the Library of Mo Medwani

For the computer’s move, we will simply
have it always return a random number
from 1 to 10 (we can improve the computer
strategy later if we want):
import random
class Computer(Player):
def __repr__(self):
return 'Computer(%s)' % str(self)
def get_move(self):
return random.randint(1, 10)

Playing Undercut
With all the pieces in place, we can now
start playing Undercut. Let’s try a game
between a human and a computer:
>>> c = Computer('Hal Bot')
>>> h = Human('Lia')
>>> play_undercut(c, h)
Lia move (1 - 10): 7
Hal Bot move: 10
Lia move: 7
(Computer(name = 'Hal Bot',
➝ score = 0), Human(name = 'Lia',
➝ score = 0), 'draw: no winner')

It’s important to realize that the player
objects must be created outside of the
play_undercut function. That’s good
design: The play_undercut function worries only about playing the game, and not
about how to initialize the player objects.
The play_undercut function returns a
3-tuple of the form (p1, p2, message). The
p1 and p2 values are the player objects
that were initially passed in; if one player
happens to win the game, then her score
will have been incremented. The message
is a string indicating who won the game or
if it was a draw.

Object-Oriented Programming 173

From the Library of Mo Medwani

It’s possible to pass two computer players
to play_undercut:
>>> c1 = Computer('Hal Bot')
>>> c2 = Computer('MCP Bot')
>>> play_undercut(c1, c2)
Hal Bot move: 8
MCP Bot move: 7
(Computer(name = 'Hal Bot',
➝ score = 0), Computer(name = 'MCP
➝ Bot', score = 1), 'MCP Bot wins!')

There’s no human player in this game, so
the user is not asked to enter a number.
We can also pass in two human players:
>>> h1 = Human('Bea')
>>> h2 = Human('Dee')
>>> play_undercut(h1, h2)
Bea move (1 - 10): 5
Dee move (1 - 10): 4

Dumb Interface
While play_undercut works if you pass
it two Human objects, it is not a very
sensible interface: The second player will
get to see the first player’s move! For this
to actually be fun for two humans, you
would need to think of some way to keep
the first player’s move hidden from the
second player.
These two examples, plus the earlier one
of a human playing against a computer,
all show the power of polymorphism:
We’ve used the same play_undercut
function to get very different behaviors.
Instead of writing three different functions, we wrote only one and changed
the objects we gave it.
In practice, this often turns out to be a
big win. Although it takes experience
and careful attention to design details to
make polymorphism work out, it is often
worth the extra time and effort.

Bea move: 5
Dee move: 4
(Human(name = 'Bea', score = 0),
➝ Human(name = 'Dee', score = 1),
➝ 'Dee wins!')

174

Chapter 10

From the Library of Mo Medwani

Learning More
This chapter introduced a few of the essentials of OOP. Python has many more OOP
features you can learn about by reading
the online documentation.
Creating good object-oriented designs is
a major topic. Using objects well is much
harder than merely using them. One
popular way of organizing object-oriented
programs is to use object-oriented design
patterns, which are proven recipes for
using objects to solve common programming problems.
The most influential book on this topic
is Design Patterns: Elements of Reusable Object-Oriented Software, by Erich
Gamma, Richard Helm, Ralph Johnson, and
John Vlissides. Once you’ve learned all the
technical details of OOP, reading this book
would be an excellent next step to learning
about larger design issues.

Object-Oriented Programming

175

From the Library of Mo Medwani

This page intentionally left blank

From the Library of Mo Medwani

11
Case Study:
Text Statistics
So far, most of the code we’ve seen
consists of a few statements that demonstrate a feature of Python. New programmers quickly discover that it is a big step
to go from these small snippets to entire
programs. Bigger programs require more
careful planning, and require some understanding of how best to combine individual
Python features. When you first start writing larger programs, there can be a lot of
trial and error.
In this chapter we will walk through the
development of a larger Python program.
We’ll start with a description of a problem
we want to solve, and then create and test
a Python program that solves it.

In This Chapter
Problem Description

178

Keeping the Letters We Want

180

Testing the Code on a Large Data File

182

Finding the Most Frequent Words

184

Converting a String to a Frequency
Dictionary

187

Putting It All Together

188

Exercises

190

The Final Program

192

It is difficult to show how messy writing a
program can be. It will appear that we go
straight from a clear problem description
to a clean and simple solution. In reality,
the process of writing a program is never
so simple. There is a lot of trial and error,
there are false starts, and you often have
to backtrack to re-do things. By writing programs, you start to learn how best to combine techniques and what sorts of solutions
tend to work with what sorts of problems.

From the Library of Mo Medwani

Problem Description
When asked to write a program that solves
some non-trivial problem, beginning
programmers often don’t know where to
start. At a high level at least, the answer is
simple: You start writing a big program by
first understanding the problem you want
to solve. This sounds simple, but misunderstanding what problem you are trying
to solve is an extremely common programming error. Sometimes, writing a program is
hard because you don’t really understand
what it is you want to do.
The problem we want to solve here is to
calculate, and print, statistics about the
contents of a text file. We want to know
how many characters, lines, and words a
given text file contains. In addition to the
number of words, we also want to know
the top ten most frequently occurring
words in the file, sorted by frequency.
Let’s look at an example using a short
piece of text:

A useful thing to do in Python is to play
with examples in the interpreter. For
example:
>>> s = 'A long time ago, in a
➝ galaxy far, far away ...'
>>> len(s)
46
>>> s.split()
['A', 'long', 'time', 'ago,', 'in',
'galaxy', 'far,', 'far',
➝ 'away', '...']
➝ 'a',

As you can see, the len function tells us
there are 46 characters in the string. The
split function divides a string into words;
ignoring the '...' at the end, we can see
there are ten total words in s.
Look carefully at the list of words that
split returns. The word far occurs twice,
but split treats them as the two different
strings: “far,” (with a comma at the end)
and “far” (without a comma). Similarly, A
and a are the same word, differing only in
capitalization.

A long time ago, in a galaxy far, far
away …
We can see that it contains:
■

One line of text. We assume that the
return-line character, \n, is used to indicate the end of a line, and that every
text file (that is not empty!) is at least
one line long.

■

Forty-six characters, including spaces
and punctuation.

■

Ten words in total. However, there are
only eight unique words, because far
and A both occur twice.

178

Chapter 11

From the Library of Mo Medwani

To handle these sorts of details, we will
give a precise definition of what it means
for a string to be a word. For us, a word will
be a string that is one or more characters
in length, and each character is one of the
lowercase letters a to z. We will ignore
non-letters (e.g., digits and punctuation),
and convert uppercase letters to lowercase. So our sentence becomes this:

We can count the number of unique words
by converting the list to a set (recall that a
set never stores duplicates):

Original: A long time ago, in a galaxy far,
far away …

There are some downsides to getting rid of
non-lowercase letters. First, the number of
characters will be wrong since some characters have been removed. But we can
deal with this by counting the characters
before modifying them. Second, there’s no
good way to remove punctuation symbols from some words. For instance, how
should you handle the apostrophe in I’d? If
you delete it (and convert the I to lowercase), you get id, which is a different word.
If you replace the apostrophe with a space,
then you get I and d—one word, and one
non-word. To solve this problem we will
treat apostrophes—and also hyphens—as
“letters.” Third, changing punctuation
can change the meaning of words. For
instance, uncapitalized versions of some
names are words, such as Polish and
polish, or Bonnie and bonnie. We will just
ignore this particular problem, as it does
not seem to be a very big one.

Modified: a long time ago in a galaxy far
far away
Splitting the modified sentence into words
now gives more accurate results:
>>> t = 'a long time ago in a galaxy
➝ far far away'
>>> t.split()
['a', 'long', 'time', 'ago', 'in',
'galaxy', 'far', 'far',
➝ 'away']
➝ 'a',

>>> len(t.split())
10

>>> set(t.split())
{'a', 'ago', 'far', 'away', 'time',
➝ 'long', 'in', 'galaxy'}
>>> len(set(t.split()))
8

Case Study: Text Statistics

179

From the Library of Mo Medwani

Keeping the
Letters We Want

Another Normalize Function

Next, let’s think about how to automatically
convert a string to the format we want.
Converting a string to lowercase is easy:

def normalize2(s):

A more compact way to write this function is this:

"""Convert s to normalized

>>> s = "I'd like a copy!"

➝ string.

>>> s.lower()

"""

"i'd like a copy!"

Getting rid of characters we don’t want is
a bit trickier. One way to do it is to use the
string replace function to replace individual characters with nothing; for example:

return ''.join(c for c in
if c in keep)

➝ s.lower()

Many experienced programmers prefer
this function because it is short and, at
least for them, readable.

>>> s = "I'd like a copy!"
>>> s.replace('!', '')
"I'd like a copy"

The problem with this way of doing things
is that replace needs to be called many
times; that is, once for each character we
don’t want. There are many more characters that we don’t want to keep than we do
want to keep, so this turns out to be quite
inefficient.

180

Chapter 11

From the Library of Mo Medwani

Regular Expressions
Another approach to solving this problem is to use regular expressions. For
instance, you could create a regular
expression defining a word, and then use
the findall function to extract all the
words from a given string. Since we want
to illustrate basic Python programming,
we won't use any regular expressions in
the code that follows.

A better approach is to keep the letters we
want. For example:
# Set of all characters to keep
keep = {'a', 'b', 'c', 'd', 'e',
'f', 'g', 'h', 'i', 'j',
'k', 'l', 'm', 'n', 'o',
'p', 'q', 'r', 's', 't',
'u', 'v', 'w', 'x', 'y',
'z',
' ', '-', "'"}

def normalize(s):
"""Convert s to a normalized
➝ string.

"""
result = ''
for c in s.lower():
if c in keep:
result += c
return result

This function loops through the string s
one character at a time, appending it to
the end of result only if it’s in the set of
characters we want to keep.

Case Study: Text Statistics

181

From the Library of Mo Medwani

Testing the Code on
a Large Data File
We’ve written only a small amount of
code, but it is enough to do some useful
experiments. In the examples that follow, we’ll use a file called bill.txt. It is
a 5.4 megabyte text file containing the
complete works of Shakespeare (which
are free on the Project Gutenberg site,
www.gutenberg.org). This is a relatively
large file, and so is a good test of the efficiency of our code.
One way to process a text file is to read
the entire thing into memory as a string.
Let’s try this by hand in the interpreter:
>>> bill = open('bill.txt',
➝ 'r').read()
>>> len(bill)
5465395
>>> bill.count('\n')
124796
>>> len(bill.split())
904087
>>> len(normalize(bill).split())
897610

We can see that the file has about 5.4
million characters, 125 thousand lines, and
about 900 thousand words.

182

Chapter 11

From the Library of Mo Medwani

Now let’s automate this by putting all the
code in a function:
def file_stats(fname):
"""Print statistics for the given
file.
"""
s = open(fname, 'r').read()
num_chars = len(s)
num_lines = s.count('\n')
num_words = len(normalize(s).
➝ split())

print("The file '%s' has: " %
➝ fname)

print("

%s characters" %

➝ num_chars)

print("

%s lines" % num_lines)

print("

%s words" % num_words)

Calling file_stats prints this:
>>> file_stats('bill.txt')
The file 'bill.txt' has:
5465395 characters
124796 lines
897610 words

On my computer, it takes about 1 second
to run this program. That includes the time
it takes to load the file into memory and to
do all the processing. Not bad for a simple
Python program!

Case Study: Text Statistics

183

From the Library of Mo Medwani

Finding the Most
Frequent Words
Let’s consider the problem of finding the
most frequently occurring words in a text
file. The basic idea will be to use a dictionary whose keys are words and whose values are the counts of the words in the file.
For example, consider our original example
text (in normalized form):
a long time ago in a galaxy far far
away
We can make a count of all the words
like this:
a: 2
long: 1
time: 1
ago: 1
in: 1
galaxy: 1
far: 2
away: 1
If we convert this to a Python dictionary, it
looks like this:
d = {
'a': 2,
'long': 1,
'time': 1,
'ago': 1,
'in': 1,
'galaxy': 1,
'far': 2,
'away': 1
}

184

Chapter 11

From the Library of Mo Medwani

We can extract a lot of useful information
from this dictionary:
■

d.keys() is the list of all the unique

words in the file.
■

len(d.keys()) is the number of unique

words in the file.
■

sum(d[k] for k in d) is the sum
of all the values in d; that is, the total
number of words (including duplicates)
in the file. The sum function is a built-in
Python function that returns the sum of
a sequence.

Dictionaries do not store their data in
sorted order, and so to get a list of all the
words sorted from most frequent to least
frequent, we’ll need to convert it to a list of
tuples, like this:
lst = []
for k in d:
pair = (d[k], k)
lst.append(pair)
#
# [(2, 'a'), (1, 'ago'),
#

(1, 'galaxy'), (1, 'time'),

#

(2, 'far'), ...]

lst.sort()
#
# [(1, 'ago'), (1, 'away'),
#

(1, 'galaxy'), (1, 'in'),

#

(1, 'long'), ...]

lst.reverse()
#
# [(2, 'far'), (2, 'a'),
#

(1, 'time'), (1, 'long'),

#

(1, 'in'), ...]

Case Study: Text Statistics

185

From the Library of Mo Medwani

The for-loop converts the dictionary d to a
list of (count, word) tuples. We do this conversion so that we can then use the list sort
function to order the data by frequency. By
default, the sort function orders data from
smallest to biggest, and so we reverse the
list to put the most frequently occurring
words—the ones we are usually most interested in—at the start of the list.
With lst ordered from most frequent word
to least frequent word, we can use slicing
to access, say, the top three most frequent
words on the list:
print(lst[:3])
#
# [(2, 'far'), (2, 'a'),
#

(1, 'time')]

Or, if we want neater formatting, we can
do this:
for count, word in lst:
print('%4s %s' % (count, word))

Which prints:
2 far
2 a
1 time
1 long
1 in
1 galaxy
1 away
1 ago

Notice that the word counts are preceded
by three blanks each. That’s because the
format command %4s in the print statement
puts the numbers right-justified in a field of
length 4. As long as you have no word with
10,000 or more occurrences, this will keep
the margins of the counts perfectly aligned.

186

Chapter 11

From the Library of Mo Medwani

Converting a String
to a Frequency
Dictionary
Now let’s write a function that takes any
string, s, and generates a dictionary whose
keys are the words of s, and whose values
are the frequency counts for the words:
def make_freq_dict(s):
"""Returns a dictionary whose keys
are the words of s, and whose
values are the counts of those
words.
"""
s = normalize(s)
words = s.split()
d = {}
for w in words:
if w in d:

# seen w before?

d[w] += 1
else:
d[w] = 1
return d

The idea of this function is to scan through
each word of the string s, adding it to the
dictionary d as we go. The if-statement,
if w in d, is true if w is a key in d, and
false otherwise. If w is a key in d, then that
means we’ve seen w before, and so increment its frequency count by 1. But if w is not
a key in d, then we add it as a new key with
the statement d[w] = 1.

Case Study: Text Statistics

187

From the Library of Mo Medwani

Putting It All Together
We now have all the pieces to make a
function that automatically calculates and
displays statistics for any given text file:
def print_file_stats(fname):
"""Print statistics for the given file.
"""
s = open(fname, 'r').read()
num_chars = len(s)

# count characters before normalizing s

num_lines = s.count('\n')

# count lines before normalizing s

d = make_freq_dict(s)
num_words = sum(d[w] for w in d)

# count number of words in s

# create list of (count, pair) words ordered from
# most frequent to least frequent
lst = [(d[w], w) for w in d]
lst.sort()
lst.reverse()
# print the results to the screen
print("The file '%s' has: " % fname)
print("

%s characters" % num_chars)

print("

%s lines" % num_lines)

print("

%s words" % num_words)

print("\nThe top 10 most frequent words are:")
i = 1

# i is the number of the list item

for count, word in lst[:10]:
print('%2s. %4s %s' % (i, count, word))
i += 1

188

Chapter 11

From the Library of Mo Medwani

Running this on the bill.txt file prints this:
The file 'bill.txt' has:
5465395 characters
124796 lines
897610 words
The top 10 most frequent words are:
1. 27568 the
2. 26705 and
3. 20115 i
4. 19211 to
5. 18263 of
6. 14391 a
7. 13606 you
8. 12460 my
9. 11107 that
10. 11001 in

This list of words is not unexpected,
although perhaps unexciting. In English
text, the most frequent words are almost
always small function words, like the and
and. You need to go farther down the list
to find more interesting words.
This program takes less than 1.5 seconds
to run on my computer (a typical desktop
machine), which is pretty good for such a
large file.

Case Study: Text Statistics

189

From the Library of Mo Medwani

Exercises
1. Modify print_file_stats so that it
also prints the total number of unique
words in the file.
2. Modify print_file_stats so that it
prints the average length of the words
in the file.
3. A hapax legomenon is a word that
occurs exactly once in a file. Modify
print_file_stats so that it prints the
total number of hapax legomena.
4. As mentioned, the ten most frequent
words in bill.txt are function words, like
the and and. Often, we are not interested in those words, and so we can
create a set of stop words that contain
all the words we want to ignore.
Add a new variable called stop_words
in the programming containing
print_file_stats, like this:
stop_words = {'the', 'and', 'i',
➝ 'to', 'of', 'a', 'you', 'my',
➝ 'that', 'in'}

Of course, you can change the list of
stop words to be anything you like.
Now modify the code in your program
so that the words on stop_list are not
included in any of the statistics.

190

Chapter 11

From the Library of Mo Medwani

5. (Challenging) The print_file_stats
function takes a file name as input, and
then reads the entire file into a single
string. The problem with this approach
is that storing the entire file as a string
uses a lot of memory if the file is big.
An alternative approach that usually
uses much less memory is to read the
file a line at a time.
Write a new function called
print_file_stats_lines that does
the same thing as print_file_stats,
except it reads the input file line by line.
The output of the two functions should
be the same when they are run on the
same file.

Case Study: Text Statistics

191

From the Library of Mo Medwani

The Final Program
# wordstats.py
# Set of all allowable characters.
keep = {'a', 'b', 'c', 'd', 'e',
'f', 'g', 'h', 'i', 'j',
'k', 'l', 'm', 'n', 'o',
'p', 'q', 'r', 's', 't',
'u', 'v', 'w', 'x', 'y',
'z',
' ', '-', "'"}

def normalize(s):
"""Convert s to a normalized string.
"""
result = ''
for c in s.lower():
if c in keep:
result += c
return result

def make_freq_dict(s):
"""Returns a dictionary whose keys are the words of s, and whose values
are the counts of those words.
"""
s = normalize(s)
words = s.split()
d = {}
for w in words:
if w in d:

# add 1 to its count if w has been seen before

d[w] += 1
else:
d[w] = 1

# initialize to 1 if this is the first time w has been seen

return d

192

Chapter 11

From the Library of Mo Medwani

def print_file_stats(fname):
"""Print statistics for the given file.
"""
s = open(fname, 'r').read()

num_chars = len(s)

# count characters before normalizing s

num_lines = s.count('\n')

# count lines before normalizing s

d = make_freq_dict(s)
num_words = sum(d[w] for w in d)

# count number of words in s

# create list of (count, pair) words ordered from
# most frequent to least frequent
lst = [(d[w], w) for w in d]
lst.sort()
lst.reverse()

# print the results to the screen
print("The file '%s' has: " % fname)
print("

%s characters" % num_chars)

print("

%s lines" % num_lines)

print("

%s words" % num_words)

print("\nThe top 10 most frequent words are:")
i = 1

# i is the number of the list item

for count, word in lst[:10]:
print('%2s. %4s %s' % (i, count, word))
i += 1

def main():
print_file_stats('bill.txt')

if __name__ == '__main__':
main()

Case Study: Text Statistics

193

From the Library of Mo Medwani

This page intentionally left blank

From the Library of Mo Medwani

A
Popular Python
Packages
Part of the reason for Python’s popularity is
the availability of many high-quality libraries that help with various software tasks.
In this appendix are descriptions of a few
popular packages.

In This Appendix
Some Popular Packages

196

It is useful to keep in mind that many of
these packages may work only with specific
versions of Python (which you can always
download for free from www.python.org).
In particular, many packages do not yet
support Python 3, so you may need to use
Python 2.6 (or later) to run some of these.
Fortunately, if you already know Python 3,
it is not too hard to step back a version to
use Python 2. Appendix B briefly discusses
some of the major differences between
Python 2 and Python 3.

From the Library of Mo Medwani

Some Popular
Packages
PIL: The Python Imaging Library
PIL (http://www.pythonware.com/products/
pil/index.htm) is an image-processing
library. It works with many different kinds of
image formats, and can do things like crop,
resize, rotate, and filter images.

Tkinter: Python GUIs
Tkinter comes with the Python library
and is the standard means of accessing
the popular Tk GUI tool kit. If you want to
create a graphical user interface (GUI) in
Python, this should be your first stop. See
http://docs.python.org/3/library/tkinter.html
for more information.

Django: Interactive websites
Django (www.djangoproject.com) is a
framework for creating interactive websites. In this way, it is similar to Ruby on
Rails, but it uses Python instead of Ruby as
the underlying programming language.

Bottle: Interactive websites
Bottle (http://bottlepy.org/docs/dev/) is
similar to Django in the sense that it is a
framework for creating interactive websites. In contrast to Django, Bottle is a small
and light framework that might be a better
choice for smaller websites.

196

Appendix A

From the Library of Mo Medwani

Pygame: 2D animation
Pygame (www.pygame.org) lets you create
and control two-dimensional animations,
especially for games. It provides tools for
graphical animation and sound and for
input devices such as joysticks. There
are also introductory tutorials and sample
programs at the Pygame website to help
get you started.

SciPy: Scientific computing
SciPy (www.scipy.org) is a large and popular library of software tools for scientific
computing (it even has its own associated
conferences!). It provides mathematical
software to do things such as solve optimization problems, perform numerical linear
algebra calculations, process signals, and
much more.

Twisted: Network programming
Twisted (http://twistedmatrix.com/trac) is a
popular Python library for network programming. It supports numerous networking protocols, and includes things like web
servers, mail servers, and chat clients/
servers.

PyPI: The Python Package Index
The Python Package Index (http://pypi.
python.org/pypi) is a frequently updated
list of thousands of user-submitted Python
packages. It’s a good place to look for
special-purpose Python libraries, or just
to browse to see what uses Python has
been put to.
You can easily find thousands of other
Python packages by searching the web.
For almost any programming task that
someone has done before, you are likely
to find a Python library!

Popular Python Packages 197

From the Library of Mo Medwani

This page intentionally left blank

From the Library of Mo Medwani

B
Comparing Python 2
and Python 3
Python 3 was released at the end of
2008 and marked a major update to
Python. Some of the changes introduced
in Python 3 are not backward-compatible
with Python 2, and so development of
Python 2 has continued in parallel with
the newer Python 3.

In This Appendix
What’s New in Python 3

200

In this chapter, we’ll summarize some of
the main changes to Python 3 and also
explain how you can convert a Python 2
program to Python 3.

From the Library of Mo Medwani

What’s New in
Python 3
Python 3 introduced many new features;
the following are some of the most visible:
■

Python 3’s print function is indeed
a function. In Python 2, print was a
language construct, similar to if and
for. The problem with Python 2’s print
was that it was difficult to modify—for
example, changing print statements to
print to a file instead of to the console
is much easier in Python 3 because you
can just reassign the print function.

■

Dividing integers in Python 3 works as
you would expect when fractions are
involved:
Python3>>> 1 / 2
0.5

However, Python 2 chops off all digits
after the decimal when dividing integers:
Python2>>> 1 / 2
0

While Python 2’s way of dividing integers appears in other programming
languages, many programmers find it
counterintuitive and the cause of subtle
errors.
■

Python 2 has two kinds of classes:
old-style classes and new-style
classes. Python 3 drops old-style
classes completely.

■

Python 3 renames a couple of important
functions: The input and range functions are called raw_input and xrange
in Python 2.

■

The format strings described in Chapter 9 exist only in Python 3, and not
Python 2. Python 2 only has string
interpolation with the % operator.

200

Appendix B

From the Library of Mo Medwani

Many other technical changes were
made in Python 3. For a complete list
of differences, see “What’s New in
Python 3.0” (http://docs.python.org/3/
whatsnew/3.0.html).
It is often not too difficult to convert a
Python 2 program into Python 3. A useful
tool that helps with this process is 2to3
(http://docs.python.org/3/library/2to3.html).
It can automatically convert almost all
Python 2 programs into equivalent Python
3 programs.

Which version of Python
should you use?
When deciding what version of Python to
use—2 or 3—there are a few things to take
into consideration:
■

If you must work with programs that are
written in Python 2, then you should
probably use Python 2. Otherwise, you
would need to convert all the existing
Python 2 programs into Python 3, which
might be difficult.

■

Some special-purpose libraries may
only work with one version of Python,
and so if you need to use one of
those, your choice of Python may be
constrained.

■

If you are just starting out as a programmer and have no old Python programs
to maintain or special-purpose libraries that you must use, then Python 3 is
probably the best choice.

Comparing Python 2 and Python 3 201

From the Library of Mo Medwani

This page intentionally left blank

From the Library of Mo Medwani

Index
Numbers

[] (square brackets)

2to3 tool, using for Python conversions, 201
5 vs. 5.0, 13

using with lists, 108
using with strings, 84
− (subtraction) operator, 12

Symbols
'+ file module, meaning of, 134, 137
== operator, 44
+ (addition) operator, 12
= (assignment) operator, example of, 24
\ (backward slash)

using with pathnames, 130
writing, 130
) (closed round bracket), using with
tuples, 29
% conversion specifier, meaning of, 125
@ (decorators), using, 163
/ (division) operator, 12
// (division) operator, 11
" (double quote), using with strings, 17
__ (double underscore), use of, 20–21
'' and "" (empty strings), using, 18
\" escape character, 88
\\ escape character, 88
\' escape character, 88
** (exponentiation) operator, 12
// (integer division) operator, 11–12
% (mod) function, using with strings, 84
* (multiplication) operator, 12
# (number sign), using with comments, 36
( (open round bracket), using with
tuples, 29
% (remainder) operator, 12
() (round brackets)
using with functions, 68
using with regular expressions, 99
using with tuples, 104
>>> (shell prompt), 10
' (single quote), using with strings, 17

A
'a file module, meaning of, 134
addition (+) operator, 12

aggregate data structures, strings as, 83
and operator, 44–45
append function, using with lists, 110–111

append mode, using with text files, 134, 136
area function
calling, 70–71
parts of, 71
return statement, 72
arithmetic operators. See also floating point
arithmetic; integer arithmetic; math
functions
addition (+), 12
division (/), 12
exponentiation (**), 12
integer division (//), 11–12
multiplication (*), 12
remainder (%), 12
subtraction (−), 12
ASCII (American Standard Code for Information
Interchange), 87
assignment (=) operator, example of, 24
assignment statements
diagrams, 28
example of, 24
initialization statement, 26
labeling values, 28
left-hand side, 26
multiple, 29
operator, 26
right-hand side, 26
associative arrays. See dictionaries

Index

203

From the Library of Mo Medwani

B
'b file module, meaning of, 134, 138
backward slash (\)

using with pathnames, 130
writing, 130
base class, explained, 169–170
bill.txt file, using, 182–183, 189
bin built-in function, printing doc string for, 21
binary files, processing, 138–140
binary mode, indicating, 134
binary vs. text files, 128–129
blocks of code
breaking out of, 64–65
indenting, 51–53
indicating, 51
Boolean logic
== operator, 44
and operator, 44–45, 48
with brackets (()), 46
definition of operators, 48
evaluating expressions, 46–47
explained, 44
False values, 44
logical equivalence, 45
logical negation, 44
logical operators, 44
not operator, 44–46
operator priority, 47
or operator, 44–45
or operator, 48
short-circuit evaluation, 48
True values, 44
truth table, 44
truth values, 44
without brackets (()), 47
Bottle framework, 196
brackets, preceding word counts with, 186
break statement, using, 64–65

C
calculating
area of circle, 70
factorials, 59–60
powers, 68
case sensitivity, explained, 25
case study. See text statistics case study

204

case-changing functions
s.capitalize(), 94
s.lower(), 94
s.swapcase(), 94
s.title(), 94
s.upper(), 94
catching exceptions, 146–149
ceil(x) function, 16
character codes, finding, 87
character length, determining, 178
characters
accessing with for-loop, 86
escape, 88
getting rid of unwanted, 180–181
whitespace, 88
Cheetah templating package, 126
child class, explained, 170
chr function, using, 87
circle, calculating area of, 70
class diagram, example of, 170
class hierarchy, example of, 170
classes
defined, 153
deriving, 169–170
extending, 169–170
and methods, 154
and objects, 153
reusing, 168–170
self parameter, 155
subclasses of, 169–170
writing, 154–155
clean-up actions
finally code block, 150
with statement, 151
closed round bracket ()), using with tuples, 29
code blocks
breaking out of, 64–65
indenting, 51–53
indicating, 51
command line
calling Python from, 33–34
environment variables, 34
path variable, 34
running programs from, 33
command shell
interacting with, 10
shell prompt, 10

Index

From the Library of Mo Medwani

command window, opening, 34
comments
defined, 36
using, 41–42
compiled code. See object code
compiling source code, 35
complex numbers, 15
concatenating
strings, 19
tuples, 107
conditional expressions, 53
constructors, explained, 154
continue statement, using, 64–65
conversion specifiers
% character, 125
base 8 value, 125
base 16, 125
float, 125
integers, 125
lowercase float exponential, 125
lowercase hexadecimal, 125
octal value, 125
string, 125
uppercase hexadecimal, 125
uppercase float exponential, 125
converting
floats to integers, 23
floats to strings, 22
integers to floats, 22
integers to strings, 22
strings to floats, 22
strings to numbers, 23
cost(x) function, 16
count function, using with lists, 110
current working directory
cwd_size_in_bytes function, 132
explained, 130

D
d conversion specifier, meaning of, 125

data structures
defined, 101
dictionaries, 118–121
list comprehensions, 115–117
list functions, 110–112
lists, 108–109
reading, 139

self-referential, 109
sequences, 103
sets, 122
sorting lists, 113–114
tuples, 104–107
type command, 102
writing, 139
data types
checking with type command, 102
converting between, 22–23
converting numeric types, 22
explained, 9
floats to strings, 22
implicit conversions, 22–23
integers to floats, 22
integers to strings, 22
strings, 9
strings to floats, 22
decorators (@), using 163
degrees(x) function, 16
derived class, explained, 169–170
dictionaries
converting to, 184, 187
converting to list of tuples, 185–186
defined, 118
extracting information from, 185
key restrictions, 119
and sets, 122
unique keys, 119
dictionary functions
d.clear(), 120
d.copy(), 120
d.fromkeys(), 120
d.get(key), 120
d.items(), 120–121
d.keys(), 120–121
d.popitem(), 120–121
d.pop(key), 120
d.setdefault(), 120
d.update(), 120
d.values(), 120–121
dir ('') command, entering, 37
dir function, using, 92
directory
current working, 130–132
default, 130
dir(m) function, using, 20

Index

205

From the Library of Mo Medwani

display method, using, 157, 159
division (// and /) operators, 11–12
Django framework, 196
documentation strings
accessing for functions, 71
benefits, 71
formatting convention, 71
printing, 21
documentation website, accessing, 133
dot notation, using with objects, 155
double quote ("), using with strings, 17
double underscore (__), use of, 20–21

E
e conversion specifier, meaning of, 125
E conversion specifier, meaning of, 125

Easter egg example, 82
eat_vowels example, 117

editor window, opening in IDLE, 32
elif (else if) statements, 52
else statements, 49–50
empty lists, denoting, 108
empty strings ('' and ""), using, 18, 39
ending lines of text, 88
environment variables, 34
errors, handling, 143
escape characters
\", 88
\', 88
\\, 88
\n, 88
\r, 88
\t, 88
exceptions
built-in, 145
catching, 146–149
checking for, 146
defined, 143
IOError, 143
outputting tracebacks, 144
raising, 143–145
syntax errors, 145
throwing, 144
executable code. See object code
exponentiation (**) operator, 12
exp(x) function, 16

206

F
F conversion specifier, meaning of, 125

factorials, calculating, 59–60
factorial(x) function, 16

file modules, 134
file_stats, calling, 183
files
examining, 131–133
functions, 131
reading, 128–130
text vs. binary, 128–129
writing, 128–130
finally code block, adding, 150
find function vs. index, 93
float, conversion specifier for, 125
float literals, 13
floating point arithmetic. See also arithmetic
operators
5 vs. 5.0, 13
complex numbers, 15
decimal points, 13
errors, 15
examples, 13
limited precision, 14–15
overflow, 14
scientific notation, 13
silent errors, 14
truncating, 61
floats
converting integers to, 22
converting strings to, 22
converting to integers, 23
converting to strings, 22
float(s) conversions, making, 23
flow of control
backing out of blocks, 64–65
backing out of loops, 64–65
Boolean logic, 44–48
code blocks, 51–53
explained, 43
for-loops vs. while-loops, 59–63
if-statements, 49–50
indentation, 51–53
loops, 54–58
nested loops, 66

Index

From the Library of Mo Medwani

folders
backward slash (\), 130
functions, 131
pathnames, 130
structure, 130
for-loops
accessing characters with, 86
changing starting value of, 54
headers, 54
i (index) variable, 54, 63
printing numbers, 55
using iterators with, 55
vs. while-loops, 58–63
format function
using, 94
using with strings, 124
format strings
named replacement, 126
using, 126–127
using curly braces ({}), 127
formatting functions for strings. See
string-formatting functions
formatting parameters, specifying, 127
f.read(), calling, 137–138
frequency dictionaries, converting strings
to, 187
f.seek(), calling, 137
function names, reassigning, 69
function parameters
default values, 78
keyword parameters, 79
pass by reference, 76–77
pass by value, 76
state of memory, 76
functional programming style, 72
functions. See also string functions; tuple
functions
accessing doc strings for, 71
append, 110–111
availability to strings, 37
as black boxes, 68
calculating powers, 68
calling, 68–69
chr, 87
count, 110
defined, 67

defining, 70–72
files and folders, 131
listing built-in, 21
listing in modules, 20
main(), 75
vs. methods, 154
modules, 80–81
naming, 70
not returning values, 69
ord, 87
side effects, 72
using, 133
using round brackets (()) with, 68
variable scope, 73–74

G
generator expressions
explained, 117
searching for, 132
getters and setters
avoiding setters, 167
decorators, 163
name and age values, 162
private variables, 166
property decorators, 163–165
syntax, 167
using, 162–167
global variables, explained, 74

H
hapax legomenon, explained, 190
hash tables. See dictionaries
hashing, using with dictionaries, 118
help
documentation, 21
listing functions in modules, 20
utility, 21
help(f) function, using, 21
hexadecimal numbers, explained, 138
Human class, writing, 169

I
i (index) variable, use of, 54, 63

identifiers, explained, 24
IDLE (integrated development environment), 4

Index

207

From the Library of Mo Medwani

IDLE editor
alternatives, 33
starting screen, 6
using, 32–34
IDLE shortcuts
opening editor window, 32
opening files for editing, 32
redoing last undo, 32
running programs, 32
saving programs, 32
undoing actions in IDLE, 32
if/elif-statements, 52
if/else-statements, 49–50
if-statements
explained, 49
flow chart, 50
headers, 50
structure, 50
immutable objects, 167
importing
modules, 16, 81
this module at command line, 82
indenting code blocks, 51–53
index function vs. find, 93
indexing
beginning at 0, 84
negative, 85, 91
strings, 84–86
using % (mod) function for, 84
infinite loops, 58
inheritance
defined, 153, 168–169
Human class, 169
isa terminology, 169
overriding methods, 170
Player class, 168–169
__init__ function, using, 155, 160–161
initialization, flexibility of, 160–161
initialization statement, explained, 26
input built-in function
explained, 36–37
using, 123
installing Python
on Linux systems, 7
on Macs, 7
on Windows systems, 6
int function, documentation for, 146

208

integer arithmetic. See also arithmetic
operators; math functions
defined, 11
division, 11
operators, 12
order of evaluation, 12
unlimited size, 12
integer division (//) operator, 11–12
integers
conversion specifier, 125
converting floats to, 23
converting to floats, 22
converting to strings, 22
lack of maximum, 60
interactive command shell, 10
interpreter, playing with examples in, 178
int(s) conversions, making, 23
I/O (input and output)
console, 123
examining files, 131–133
examining folders, 131–133
explained, 123
formatting strings, 124–125
processing binary files, 138–140
processing text files, 134–137
reading files, 128–130
reading webpages, 141
string formatting, 126–127
writing files, 128–130
IOError, raising, 143
isa terminology, using with inheritance, 169
iterators, using with for-loops, 55

J
join function

using, 97
using with list comprehensions, 117

K
keywords
restriction for variables, 25
using, 79

L
len function, using with characters, 178

letters, keeping desired, 180–181
lexicographical ordering, 113

Index

From the Library of Mo Medwani

lines of text, ending, 88
Linux, installing Python on, 7
list comprehensions
examples, 116
explained, 115
filtering, 117
generator expressions, 117
list functions
mutating, 110
s.append(), 110–111
s.count(), 110
s.extend(), 110
s.index(), 110
s.insert(), 110
s.pop(), 110
s.remove(), 110, 112
s.reverse(), 110, 112
s.sort(), 110
lists. See also tuples
[] (square brackets), 108
containing elements vs. pointing, 109
empty, 108
lexicographical ordering, 113
mutability, 109
pointing to values, 109
pop and push, 111–112
self-referential data structure, 109
sorting, 113–114
using, 108
local variable, explained, 73
log(x) functions, 16
loops
breaking out of, 64–65
infinite loops, 58
for-loops, 54–55
nesting, 66
while-loops, 56–58
lowercase float exponential, conversion
specifier for, 125
lowercase hexadecimal, conversion specifier
for, 125

M
^M character, handling, 88
Macs, installing Python on, 7
main() function, using, 75
maps. See dictionaries

math functions. See also arithmetic operators;
integer arithmetic
importing modules, 16
return values, 16
math module
ceil(x) function, 16
cost(x) function, 16
degrees(x) function, 16
exp(x) function, 16
factorial(x) function, 16
log(x) functions, 16
pow(x) function, 16
radians(x) function, 16
sin(x) function, 16
sqrt(x) function, 16
tan(x) function, 16
using, 16
methods
vs. functions, 154
overriding, 170
mod (%) function, using with strings, 84
modules
creating, 80
importing, 16, 81
listing functions in, 20
namespaces, 82
pickle, 140
shelve, 140
sqlite3, 140
urllib, 141
using, 81
webbrowser, 141
move functions, implementing for Undercut
game, 172–173
multiplication (*) operator, 12

N
\n (newline) character, explained, 39, 88
n! notation, using, 60

name clashes, preventing, 82
namespaces
explained, 82
preventing name clashes, 82
negative indexing, 85, 91
nested loops
break statement, 66
continue statement, 66
using, 66

Index

209

From the Library of Mo Medwani

new keyword, using with constructors, 154
newline (\n) character, explained, 39
None value, using with functions, 72
normalize() function, using, 180
not operator, 44–46
number sign (#), using with comments, 41

numbers
converting strings to, 23
floating point, 38
immutable quality, 28
integers, 38
reading from keyboard, 38
as strings, 38
summing, 62
summing from users, 61
types of, 38

O
o conversion specifier, meaning of, 125

object code
converting source code to, 5
explained, 35
object serialization, explained, 139
objects
and classes, 153
creating, 159
defined, 153
displaying, 156–159
dot notation, 155
immutable, 167
string representation of, 159
using, 155
octal values, conversion specifier for, 125
OOP (object-oriented programming), 2
classes, 153–155
constructors, 154
explained, 153
getters, 162–167
inheritance, 168–170
initialization, 160–161
objects, 156–159
polymorphism, 171–174
setters, 162–167
open function
documentation, 146
using, 135
open round bracket ((), using with tuples, 29

210

operators. See arithmetic operators;
assignment operator
or operator, 44–45
ord function, using with character codes, 87
order of evaluation, 12
ordered sequences, 103
os.chdir() function, 131
os.getcwd() function, 131
os.listdir() function, 131
os.path.isdir() function, 131
os.path.isfile() function, 131
os.stat() function, 131, 133
overflow errors, 14

P
packages
Bottle, 196
Django, 196
PIL (Python Imaging Library), 196
Pygame, 197
PyPI (Python Package Index), 197
SciPy, 197
Tkinter, 196
Twisted, 197
parent class, explained, 170
partition function, using, 95
pass by reference, explained, 76
pass by value, explained, 76
path variable, 34
pathnames, using with folders, 130
Person class
adding method to, 156
creating, 154
Person objects
with name and age, 160–161
working with, 158
pi calculation, doing, 70
pickle module
restriction, 140
using, 139
PIL (Python Imaging Library) package, 196
play_undercut function, analyzing, 174
Player class, creating, 168
polymorphism
defined, 153
power of, 174
Undercut game, 171–174
pop, using on lists, 111–112

Index

From the Library of Mo Medwani

powers, calculating, 68
pow(x) function, 16
print statement
using, 39–40, 135
using string interpolation with, 151
printing
documentation strings, 21
numbers in for-loops, 55
strings on screen, 39–40
private variables, 166–167
problems, understanding, 178
programming
process, 4–5
requirements, 4
source code, 5
programming problems, understanding, 178
programs
checking output, 5
defined, 31
flow of execution, 43
managing variables, 167
running, 5
running from command line, 33
running with IDLE, 32
storing, 32
straight-line, 43
structuring, 42
tracing, 36–37
writing in IDLE, 32
property decorators, using, 163–165
public variables, 166
push, using on lists, 111
.py files
versus .pyc files, 35
contents of, 5
listing, 132
running, 35
.pyc files
contents, 35
explained, 4
Pygame 2D animation package, 197
PyPI (Python Package Index) package, 197
Python 2
classes, 200
converting into Python 3, 201
dividing integers, 200
vs. Python 3, 40, 200–201
raw_input function, 200

string interpolation, 200
xrange function, 200
Python 3
dividing integers, 200
format strings, 200
input function, 200
print function, 200
range function, 200
Python components
compiler, 35
interpreter, 35
virtual machine, 35
Python language
calling from command line, 33–34
design, 2
download page, 6
education, 3
installing on Linux, 7
installing on Macs, 7
installing on Windows, 6
libraries, 2
maintainability, 2
origin of name, 2
scientific computing, 3
scripts, 3
text processing, 3
uses, 3
website development, 3
Python packages
Bottle, 196
Django, 196
PIL (Python Imaging Library), 196
Pygame, 197
PyPI (Python Package Index), 197
SciPy, 197
Tkinter, 196
Twisted, 197
pythonintro website, accessing, 133

Q
quotes (' and "), using with strings, 17
quotes, triple, 17

R
'r' file module, meaning of, 134
\r escape character, 88
radians(x) function, 16

Index

211

From the Library of Mo Medwani

re module, accessing documentation for, 100

reading
files, 128–130
text files as strings, 135
webpages, 141
regular expressions
examples, 98–99
matching with, 99
operators, 98
using, 181
using round brackets (()) with, 99
x* operator, 98
x|y operator, 98
x+ operator, 98
xy? operator, 98
remainder (%) operator, 12
remove function, using with lists, 112
replace function, using with strings, 96, 180
__repr__ method, using, 158–159
return statement, using with area function, 72
return values, using, 16
reverse function, using with lists, 112
round brackets (())
using with functions, 68
using with regular expressions, 99
using with tuples, 104
rpartition function, using, 95

S
s conversion specifier, meaning of, 125

saving programs with IDLE, 32. See also IDLE
(integrated development environment)
scientific notation, using, 13
SciPy scientific computing package, 197
scope. See variable scope
scripts. See programs
searching functions for strings. See stringsearching functions
self parameter, using with classes, 155
sentences, splitting into words, 179
sequence types
lists, 103
strings, 103
tuples, 103–107
sequences. See also values
defined, 103
ordered, 103

212

size restriction, 103
serialization, explained, 139
sessions. See shell transcripts
sets
calling dir(set), 122
and dictionaries, 122
explained, 122
immutable frozensets, 122
mutable, 122
online documentation, 122
setters and getters
avoiding setters, 167
decorators, 163
name and age values, 162
private variables, 166
property decorators, 163–165
syntax, 167
using, 162–167
shell prompt (>>>), 10
shell transcript, explained, 10
shelve module, explained, 140
side effects, relationship to functions, 72
sin(x) function, 16
single quote ('), using with strings, 17
slicing strings
explained, 89
with negative indexes, 91
shortcuts, 90–91
software. See object code
sort function, using with lists, 114
sorting
lists, 113–114
tuples, 114
source code
comments, 36, 41–42
compiling, 35
converting to object code, 5
writing, 5
split function, using, 96, 178–179
splitting functions for strings. See stringsplitting functions
sqlite3 module, explained, 140
sqrt(x) function, 16
square brackets ([])
using with lists, 108
using with strings, 84
standard error (stderr), explained, 39

Index

From the Library of Mo Medwani

standard input (stdin), explained, 39
standard output (stdout), explained, 39
stop words, creating set of, 190
string functions. See also functions
case-changing, 94
for contents of substrings, 92
s.count(), 97
for searching, 93
s.encode(), 97
s.endswith(), 92
s.find(), 93
s.index(), 93
s.isalnum(), 92
s.isalpha(), 92
s.isdecimal(), 92
s.isdigit(), 92
s.isidentifier(), 92
s.islower(), 92
s.isnumeric(), 92
s.isprintable(), 92
s.isspace(), 92
s.istitle(), 92
s.isupper(), 92
s.join(), 97
s.maketrans(), 97
split, 95–96
s.rfind(), 93
s.rindex(), 93
s.startswith(), 92
s.translate(), 97
for stripping, 95–96
s.zfill(), 97
for testing, 92
string interpolation, 124, 151
string literals, writing, 17
string-formatting functions
s.center(), 94
s.format(), 94
s.ljust(), 94
s.rjust(), 94
string-replacement functions
s.expandtabs(), 96
s.replace(), 96
strings
as aggregate data structures, 83
characters, 86–88
concatenating, 19

conversion specifiers, 125
converting floats to, 22
converting integers to, 22
converting to floats, 22
converting to formats, 180–181
converting to frequency dictionaries, 187
converting to numbers, 23
creating, 19
defined, 17
empty, 18
escape characters, 88
extracting substrings from, 89
formatting, 124–127
immutable quality, 28
indexing, 84–86
indicating, 17
inserting at start of files, 137
lengths, 18
number of characters in, 18
printing on screen, 39–40
reading from keyboard, 36–38
regular expressions, 98–100
representations of objects, 159
returning list of, 131
slicing, 89–91
splitting, 178–179
square brackets ([]) for indexing, 84
uses of, 9
using quotes (' and ") with, 17
using strip() function with, 37
as words, 179
string-searching functions
s.find(), 93
s.index(), 93
s.rfind(), 93
s.rindex(), 93
string-splitting functions
s.partition(), 95
s.rpartition(), 95
s.rsplit(), 95
s.split(), 95
s.splitlines(), 95
string-stripping functions
s.lstrip(), 95
s.rstrip(), 95
s.strip(), 95

Index

213

From the Library of Mo Medwani

string-testing functions
for contents of substrings, 92
s.endswith(), 92
s.isalnum(), 92
s.isalpha(), 92
s.isdecimal(), 92
s.isdigit(), 92
s.isidentifier(), 92
s.islower(), 92
s.isnumeric(), 92
s.isprintable(), 92
s.isspace(), 92
s.istitle(), 92
s.isupper(), 92
s.startswith(), 92
strip() function, using with strings, 37
subclasses, using with classes, 169–170
substrings, extracting from strings, 89
subtraction (−) operator, 12
summing
numbers, 62
numbers from users, 61
syntax errors, causing, 145

T
't file module, meaning of, 134, 137
tan(x) function, 16

templating packages, using, 126
testing functions. See Boolean logic; stringtesting functions
text files
appending to, 136
closing, 134
opening, 134
processing, 134–137
reading as strings, 135
reading line by line, 134–137
writing to, 136
text mode, indicating, 134
text statistics case study
completing, 188–189
converting strings to formats, 180–181
final program, 192–193
finding frequent words, 184–186
normalize() function, 180–181
problem description, 178–179

214

regular expressions, 181
strings to frequency dictionary, 187
testing code on data file, 182–183
text vs. binary files, 128–129
this module, importing at command line, 82
Tkinter package, 196
tracebacks, outputting, 144
tracing programs, 36–37
transcripts, explained, 10
True values, returning for paths, 131
try/except blocks
adding finally code block to, 150
examples of, 146–148
in Undercut game, 172
tuple functions. See also functions
len(), 106
tup.count(), 106
tup.index(), 106
x in tup, 106
tuples. See also lists
concatenating, 107
creating list of, 185–186
defined, 103
example of, 95
immutability, 105
round brackets (()), 104
singleton, 104
sorting, 114
trailing commas, 104
writing values as, 29
Twisted network programming package, 197
type command, using, 102
types. See data types

U
Undercut game
implementing, 171–174
move functions, 172–173
playing, 173–174
try/except blocks, 172
Unicode, rise of, 87
uppercase float exponential, conversion
specifier for, 125
uppercase hexadecimal, conversion specifier
for, 125
urllib module, using, 141

Index

From the Library of Mo Medwani

V
ValueError example, 146–147

values. See also sequences
assigning in parallel, 30
assigning to variables, 27
displaying multiple, 29
referring variables to, 28
replacing by position, 126
and variables, 24–25
writing as tuples, 29
variable names
case sensitivity, 25
first character, 25
keywords, 25
lengths, 25
rules for, 25
variable scope
explained, 73
global variables, 74
local variables, 73
variable values, swapping, 30
variables
adding multiple, 29
assigned values, 27
assigning values to, 27
explained, 9
pointing to values, 27
private vs. public, 166–167
referring to values, 28
terminology, 27
and values, 24–25
virtual machine, explained, 35
von Rossum, Guido, 2

W
'w file module, meaning of, 134

web browsers, creating, 141
webbrowser module, explained, 141
webpages, reading, 141
websites
2to3 conversion for Python, 201
Bottle, 196
Django, 196
online documentation, 133

PIL (Python Imaging Library), 196
Pygame, 197
PyPI (Python Package Index), 197
Python download page, 6
pythonintro, 133
re module documentation, 100
SciPy, 197
templating packages, 126
Tkinter, 196
Twisted, 197
Unicode home page, 87
while-loops
flexibility of, 58
flow of control, 56
vs. for-loops, 58–63
form of, 57
incrementers, 57
initializers, 57
sample program, 56
try/except block in, 146
whitespace characters, handling, 88
Windows, installing Python on, 6
with statement, using, 151
word counts, preceding with brackets, 186
words
creating set of stop words, 190
finding frequent, 184–186
getting sorted list of, 185–186
splitting sentences into, 179
strings as, 179
writing
data structures, 139
files, 128–130
opening text files for, 134
to text files, 136

X
x = expr, 28
x conversion specifier, meaning of, 125
X conversion specifier, meaning of, 125

Z
zfill function, using, 97

Index

215

From the Library of Mo Medwani

Unlimited online access to all Peachpit, Adobe
Press, Apple Training and New Riders videos
and books, as well as content from other
leading publishers including: O’Reilly Media,
Focal Press, Sams, Que, Total Training, John
Wiley & Sons, Course Technology PTR, Class
on Demand, VTC and more.

No time commitment or contract required!
Sign up for one month or a year.
All for $19.99 a month

SIGN UP TODAY
peachpit.com /creativeedge

From the Library of Mo Medwani

Join the

Peachpit
Affiliate Team!

You love our books and you
love to share them with your colleagues and
friends...why not earn some $$ doing it!

If you have a website, blog or even a Facebook page,
you can start earning money by putting a Peachpit
link on your page.
If a visitor clicks on that link and purchases something
on peachpit.com, you earn commissions* on all sales!
Every sale you bring to our site will earn you a
commission. All you have to do is post an ad and
we’ll take care of the rest.

Apply and get started!
It’s quick and easy to apply.
To learn more go to:
http://www.peachpit.com/affiliates/
*Valid for all books, eBooks and video sales at www.Peachpit.com

From the Library of Mo Medwani

Source Exif Data:

File Type                       : PDF
File Type Extension             : pdf
MIME Type                       : application/pdf
Linearized                      : No
Encryption                      : Standard V2.3 (128-bit)
User Access                     : Print, Annotate, Fill forms, Extract, Print high-res
Page Layout                     : SinglePage
Page Count                      : 226
Page Mode                       : UseOutlines
XMP Toolkit                     : 3.1-701
About                           : uuid:97f79606-d870-4ce5-bfcc-02bc01aae5a2
Producer                        : PDFKit.NET 4.0.12.0
Keywords                        : 
PDF Version                     : 1.6
Code Mantra 002 C0020 LLC       : http://www.codemantra.com
Universal 0020 PDF              : The process that creates this PDF constitutes a trade secret of codeMantra, LLC and is protected by the copyright laws of the United States
Create Date                     : 2013:06:11 12:53:47-06:00
Modify Date                     : 2017:01:30 18:35:33-07:00
Metadata Date                   : 2017:01:30 18:35:33-07:00
Creator Tool                    : PScript5.dll Version 5.2
Document ID                     : uuid:8C8F35D067D2E211B68BDC3D91713BBF
Instance ID                     : uuid:b8ba43d4-d781-11e2-89aa-109add4446cd
Derived From Instance ID        : uuid:a1a46d86-4f2c-6a40-8aed-3f9bbeee5f0d
Derived From Document ID        : xmp.id:B3103C773B216811871FDA8F155C9F42
Derived From Rendition Class    : proof:pdf
Format                          : application/pdf
Title                           : Python: Visual QuickStart Guide
Creator                         : Toby Donaldson
Description                     : 
Subject                         : 
Author                          : Toby Donaldson
Universal PDF                   : The process that creates this PDF constitutes a trade secret of codeMantra, LLC and is protected by the copyright laws of the United States
Code Mantra LLC                 : http://www.codemantra.com

EXIF Metadata provided by EXIF.tools

Python: Visual QuickStart Guide Python 3rd Edition

Navigation menu

Versions of this User Manual:

Views

Navigation