The Hitchhiker’s Guide To Python Kenneth Reitz, Tanya Schlusser Best Practices For Development



User Manual: Pdf

Open the PDF directly: View PDF PDF.
Page Count: 322 [warning: Documents this large are best viewed by clicking the View PDF Link!]

Kenneth Reitz & Tanya Schlusser
H i t c h h i k e r s
Guide to
P y t h o n
Kenneth Reitz and Tanya Schlusser
The Hitchhiker’s Guide to Python
Best Practices for Development
Boston Farnham Sebastopol Tokyo
Beijing Boston Farnham Sebastopol Tokyo
The Hitchhiker’s Guide to Python
by Kenneth Reitz and Tanya Schlusser
Copyright © 2016 Kenneth Reitz, Tanya Schlusser. All rights reserved.
Printed in the United States of America.
Published by O’Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472.
O’Reilly books may be purchased for educational, business, or sales promotional use. Online editions are
also available for most titles ( For more information, contact our corporate/
institutional sales department: 800-998-9938 or
Editor: Dawn Schanafelt
Production Editor: Nicole Shelby, Nicholas Adams
Copyeditor: Jasmine Kwityn
Proofreader: Amanda Kersey
Indexer: WordCo Indexing Services, Inc.
Interior Designer: David Futato
Cover Designer: Randy Comer
Illustrator: Rebecca Demarest
September 2016: First Edition
Revision History for the First Edition
2016-08-26: First Release
See for release details.
The O’Reilly logo is a registered trademark of O’Reilly Media, Inc. e Hitchhiker’s Guide to Python, the
cover image, and related trade dress are trademarks of O’Reilly Media, Inc.
While the publisher and the authors have used good faith efforts to ensure that the information and
instructions contained in this work are accurate, the publisher and the authors disclaim all responsibility
for errors or omissions, including without limitation responsibility for damages resulting from the use of
or reliance on this work. Use of the information and instructions contained in this work is at your own
risk. If any code samples or other technology this work contains or describes is subject to open source
licenses or the intellectual property rights of others, it is your responsibility to ensure that your use
thereof complies with such licenses and/or rights.
Dedicated to you
Table of Contents
Preface. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi
Part I. Getting Started
1. Picking an Interpreter. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
The State of Python 2 Versus Python 3 3
Recommendations 4
So…3? 4
Implementations 5
CPython 5
Stackless 5
PyPy 6
Jython 6
IronPython 6
PythonNet 6
Skulpt 7
MicroPython 7
2. Properly Installing Python. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
Installing Python on Mac OS X 9
Setuptools and pip 11
virtualenv 11
Installing Python on Linux 12
Setuptools and pip 12
Development Tools 13
virtualenv 14
Installing Python on Windows 15
Setuptools and pip 17
virtualenv 18
Commercial Python Redistributions 18
3. Your Development Environment. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
Text Editors 22
Sublime Text 23
Vim 23
Emacs 25
TextMate 26
Atom 26
Code 27
IDEs 27
PyCharm/IntelliJ IDEA 29
Aptana Studio 3/Eclipse + LiClipse + PyDev 29
WingIDE 30
Spyder 30
Komodo IDE 31
Eric (the Eric Python IDE) 31
Visual Studio 32
Enhanced Interactive Tools 32
IPython 33
bpython 33
Isolation Tools 33
Virtual Environments 34
pyenv 36
Autoenv 36
virtualenvwrapper 37
Buildout 38
Conda 38
Docker 39
Part II. Getting Down to Business
4. Writing Great Code. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
Code Style 43
PEP 8 44
PEP 20 (a.k.a. The Zen of Python) 45
General Advice 46
vi | Table of Contents
Conventions 52
Idioms 54
Common Gotchas 58
Structuring Your Project 61
Modules 61
Packages 65
Object-Oriented Programming 66
Decorators 67
Dynamic Typing 68
Mutable and Immutable Types 69
Vendorizing Dependencies 71
Testing Your Code 72
Testing Basics 74
Examples 76
Other Popular Tools 80
Documentation 82
Project Documentation 82
Project Publication 83
Docstring Versus Block Comments 84
Logging 84
Logging in a Library 85
Logging in an Application 86
Choosing a License 88
Upstream Licenses 88
Options 88
Licensing Resources 90
5. Reading Great Code. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
Common Features 92
HowDoI 93
Reading a Single-File Script 93
Structure Examples from HowDoI 96
Style Examples from HowDoI 97
Diamond 99
Reading a Larger Application 100
Structure Examples from Diamond 105
Style Examples from Diamond 109
Tablib 112
Reading a Small Library 112
Structure Examples from Tablib 116
Style Examples from Tablib 124
Requests 126
Table of Contents | vii
Reading a Larger Library 126
Structure Examples from Requests 130
Style Examples from Requests 135
Werkzeug 140
Reading Code in a Toolkit 141
Style Examples from Werkzeug 148
Structure Examples from Werkzeug 149
Flask 155
Reading Code in a Framework 156
Style Examples from Flask 162
Structure Examples from Flask 163
6. Shipping Great Code. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
Useful Vocabulary and Concepts 168
Packaging Your Code 169
Conda 169
PyPI 170
Freezing Your Code 172
PyInstaller 174
cx_Freeze 176
py2app 177
py2exe 178
bbFreeze 178
Packaging for Linux-Built Distributions 179
Executable ZIP Files 180
Part III. Scenario Guide
7. User Interaction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185
Jupyter Notebooks 185
Command-Line Applications 186
GUI Applications 194
Widget Libraries 194
Game Development 200
Web Applications 200
Web Frameworks/Microframeworks 201
Web Template Engines 204
Web Deployment 209
8. Code Management and Improvement. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213
Continuous Integration 213
viii | Table of Contents
System Administration 214
Server Automation 216
System and Task Monitoring 220
Speed 223
Interfacing with C/C++/FORTRAN Libraries 232
9. Software Interfaces. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237
Web Clients 238
Web APIs 238
Data Serialization 243
Distributed Systems 246
Networking 246
Cryptography 251
10. Data Manipulation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 259
Scientific Applications 260
Text Manipulation and Text Mining 264
String Tools in Pythons Standard Library 264
Image Manipulation 267
11. Data Persistence. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271
Structured Files 271
Database Libraries 272
A. Additional Notes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 287
Index. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 295
Table of Contents | ix
Python is big. Really big. You just won’t believe how vastly hugely mind-bogglingly
big it is.
This guide is not intended to teach you the Python language (we cite lots of great
resources that do that) but is rather an (opinionated) insider’s guide to our communi‐
ty’s favorite tools and best practices. The primary audience is new to mid-level
Python programmers who are interested in contributing to open source or in begin‐
ning a career or starting a company using Python, although casual Python users
should also find Part I and Chapter 5 helpful.
The first part will help you choose the text editor or interactive development environ‐
ment that fits your situation (for example, those using Java frequently may prefer
Eclipse with a Python plug-in) and surveys options for other interpreters that may
meet needs you don’t yet know Python could address (e.g., theres a MicroPython
implementation based around the ARM Cortex-M4 chip). The second section dem‐
onstrates Pythonic style by highlighting exemplary code in the open source commu‐
nity that will hopefully encourage more in-depth reading and experimentation with
open source code. The final section briefly surveys the vast galaxy of libraries most
commonly used in the Python community—providing an idea of the scope of what
Python can do right now.
All of the royalties from the print version of this book will be directly donated to the
Django Girls, a giddily joyous global organization dedicated to organizing free
Django and Python workshops, creating open-sourced online tutorials, and curating
amazing first experiences with technology. Those who wish to contribute to the
online version can read more about how to do it at our website.
Conventions Used in This Book
The following typographical conventions are used in this book:
Indicates new terms, URLs, email addresses, filenames, and file extensions.
Constant width
Used for program listings, as well as within paragraphs to refer to program ele‐
ments such as variable or function names, databases, data types, environment
variables, statements, and keywords.
Constant width bold
Shows commands or other text that should be typed literally by the user.
Constant width italic
Shows text that should be replaced with user-supplied values or by values deter‐
mined by context.
This element signifies a tip or suggestion.
This element signifies a general note.
This element indicates a warning or caution.
Safari® Books Online
Safari Books Online is an on-demand digital library that deliv‐
ers expert content in both book and video form from the
worlds leading authors in technology and business.
xii | Preface
Technology professionals, software developers, web designers, and business and crea‐
tive professionals use Safari Books Online as their primary resource for research,
problem solving, learning, and certification training.
Safari Books Online offers a range of plans and pricing for enterprise, government,
education, and individuals.
Members have access to thousands of books, training videos, and prepublication
manuscripts in one fully searchable database from publishers like O’Reilly Media,
Prentice Hall Professional, Addison-Wesley Professional, Microsoft Press, Sams, Que,
Peachpit Press, Focal Press, Cisco Press, John Wiley & Sons, Syngress, Morgan Kauf‐
mann, IBM Redbooks, Packt, Adobe Press, FT Press, Apress, Manning, New Riders,
McGraw-Hill, Jones & Bartlett, Course Technology, and hundreds more. For more
information about Safari Books Online, please visit us online.
How to Contact Us
Please address comments and questions concerning this book to the publisher:
O’Reilly Media, Inc.
1005 Gravenstein Highway North
Sebastopol, CA 95472
800-998-9938 (in the United States or Canada)
707-829-0515 (international or local)
707-829-0104 (fax)
We have a web page for this book, where we list errata, examples, and any additional
information. You can access this page at
To comment or ask technical questions about this book, send email to bookques‐
For more information about our books, courses, conferences, and news, see our web‐
site at
Find us on Facebook:
Follow us on Twitter:
Watch us on YouTube:
Preface | xiii
Welcome, friends, to e Hitchhiker’s Guide to Python.
This book is, to the best of my knowledge, the first of its kind: designed and curated
by a single author (myself—Kenneth), with the majority of the content provided by
hundreds of people from all over the world, for free. Never before in the history of
mankind has the technology been available to allow a beautiful collaboration of this
size and scale.
This book was made possible with:
Love brings us together to conquer all obstacles.
Soware projects
Python, Sphinx, Alabaster, and Git.
GitHub and Read the Docs.
Lastly, I’d like to extend a personal thank you to Tanya, who did all the hard work of
converting this work into book form and preparing it for publication, and the incred‐
ible O’Reilly team—Dawn, Jasmine, Nick, Heather, Nicole, Meg, and the dozens of
other people who worked behind the scenes to make this book the best it could be.
xiv | Preface
Getting Started
This part of the guide focuses on setting up a Python environment. It was inspired by
Stuart Elliss guide for Python on Windows, and consists of the following chapters
and topics:
Chapter 1, Picking an Interpreter
We compare Python 2 and Python 3, and share some interpreter options other
than CPython.
Chapter 2, Properly Installing Python
We show how to get Python, pip, and virtualenv.
Chapter 3, Your Development Environment
We describe our favorite text editors and IDEs for Python development.
1If you don’t do much low-level networking programming, the change was barely noticeable outside of the
print statement becoming a function. Otherwise, “unhappy with” is kind of a polite understatement—devel‐
opers responsible for large, popular web, socket, or networking libraries that deal with unicode and byte
strings had (or still have) extensive changes to make. Details about the change, direct from the first introduc‐
tion of Python 3 to the world, start off with: “Everything you thought you knew about binary data and Uni‐
code has changed.
Picking an Interpreter
The State of Python 2 Versus Python 3
When choosing a Python interpreter, one looming question is always present:
Should I choose Python 2 or Python 3?” The answer is not as obvious as one might
think (although 3 is becoming more compelling every day).
Here is the state of things:
Python 2.7 has been the standard for a long time.
Python 3 introduced major changes to the language, which some developers are
unhappy with.1
Python 2.7 will receive necessary security updates until 2020.
Python 3 is continually evolving, like Python 2 did in years past.
You can now see why this is not such an easy decision.
2Someone whos really amazingly together. We mean, who really knows where their towel is.
3Heres a link to a high-level list of changes to Pythons Standard Library.
The way we see it, a truly hoopy frood2 would use Python 3. But if you can only use
Python 2, at least youre still using Python. These are our recommendations:
Use Python 3 if
You love Python 3.
You don’t know which one to use.
You embrace change.
Use Python 2 if
You love Python 2 and are saddened by the future being Python 3.
The stability requirements of your software would be impacted.3
Software that you depend on requires it.
If youre choosing a Python interpreter to use, and arent opinionated, then use the
newest Python 3.x—every version brings new and improved standard library mod‐
ules, security, and bug fixes. Progress is progress. So only use Python 2 if you have a
strong reason to, such as a Python 2–exclusive library that has no adequate Python 3–
ready alternative, a need for a specific implementation (see “Implementations on
page 5), or you (like some of us) love and are inspired by Python 2.
Check out Can I Use Python 3? to see whether any Python projects youre depending
on will block adoption of Python 3.
For further reading, try Python2orPython3, which lays out some of the reasoning
behind a backward-incompatible break in the language specification, and links to
detailed specifications of the differences.
If youre a beginner, there are far more important things to worry about than cross-
compatibility between all of the Python versions. Just get something working for the
system you’ve got, and cross this bridge later.
4 | Chapter 1: Picking an Interpreter
4The reference implementation accurately reflects the languages definition. Its behavior is how all other imple‐
mentations should behave.
5C extension modules are written in C for use in Python.
When people speak of Python, they often mean not just the language but also the
CPython implementation. Python is actually a specification for a language that can be
implemented in many different ways.
The different implementations may be for compatibility with other libraries, or
maybe for a little speed. Pure Python libraries should work regardless of your Python
implementation, but those built on C (like NumPy) won’t. This section provides a
quick rundown on the most popular implementations.
This guide presumes you’re working with the standard CPython
implementation of Python 3, although we’ll frequently add notes
when relevant for Python 2.
CPython is the reference implementation4 of Python, written in C. It compiles Python
code to intermediate bytecode which is then interpreted by a virtual machine. CPy‐
thon provides the highest level of compatibility with Python packages and C exten‐
sion modules.5
If you are writing open source Python code and want to reach the widest possible
audience, use CPython. To use packages that rely on C extensions to function, CPy‐
thon is your only implementation option.
All versions of the Python language are implemented in C because CPython is the
reference implementation.
Stackless Python is regular CPython (so it should work with all of the libraries that
CPython can use), but with a patch that decouples the Python interpreter from the
call stack, making it possible to change the order of execution of code. Stackless intro‐
duces the contepts of tasklets, which can wrap functions and turn them into “micro-
threads” that can be serialized to disk for future execution and scheduled, by default
in round-robin execution.
Implementations | 5
The greenlet library implements this same stack-switching functionality for CPython
users. Much of the functionality has also been implemented in PyPy.
PyPy is a Python interpreter implemented in a restricted statically typed subset of the
Python language called RPython, making certain kinds of optimization possible. The
interpreter features a just-in-time compiler and supports multiple backends, such as
C, Common Intermediate Language (CIL), and Java Virtual Machine (JVM) byte‐
PyPy aims for maximum compatibility with the reference CPython implementation
while improving performance. If you are looking to increase performance of your
Python code, its worth giving PyPy a try. On a suite of benchmarks, its currently over
five times faster than CPython.
It supports Python 2.7, and PyPy3 targets Python 3. Both versions are available from
the PyPy download page.
Jython is a Python interpreter implementation that compiles Python code to Java
bytecode which is then executed by the JVM. Additionally, it is able to import and use
any Java class like a Python module.
If you need to interface with an existing Java code base or have other reasons for
needing to write Python code for the JVM, Jython is the best choice.
Jython currently supports up to Python 2.7.
IronPython is an implementation of Python for the .NET framework. It can use both
Python and .NET framework libraries, and can also expose Python code to other lan‐
guages in the .NET framework.
Python Tools for Visual Studio integrates IronPython directly into the Visual Studio
development environment, making it an ideal choice for Windows developers.
IronPython supports Python2.7.
Python for .NET is a package that provides near seamless integration of a natively
installed Python installation with the .NET Common Language Runtime (CLR). This
is the inverse approach to that taken by IronPython, meaning PythonNet and IronPy‐
thon complement rather than compete with each other.
6 | Chapter 1: Picking an Interpreter
In conjunction with Mono, PythonNet enables native Python installations on non-
Windows operating systems, such as OS X and Linux, to operate within the .NET
framework. It can be run in addition to IronPython without conflict.
PythonNet supports from Python 2.3 up to Python 2.7; the installation instructions
are on the PythonNet readme page.
Skulpt is a JavaScript implementation of Python. It has not ported all of the CPython
standard library; the library has the modules math, random, turtle, image, unittest,
and parts of time, urllib, DOM, and re. It is intended for use in teaching. There is also
a way to add your own modules.
Notable examples of its use are Interactive Python and CodeSkulptor.
Skulpt supports most of Python 2.7 and Python 3.3. See the Skulpt GitHub page for
MicroPython is an implementation of Python 3 optimized to run on a microcontrol‐
ler; it supports 32-bit ARM processors with the Thumb v2 instruction set, such as the
Cortex-M range used in low-cost microcontrollers. It includes these modules from
Pythons Standard Library, plus a few MicroPython-specific libraries for board details,
memory information, network access, and a modified version of the ctypes optimized
for smaller size. It is not the same as the Raspberry Pi, which has a Debian or other C-
based operating system, with Python installed. The pyboard actually uses MicroPy‐
thon as its operating system.
From here on out, were using CPython on a Unix-like system, on
OS X, or on a Windows system.
On to installation—grab your towel!
Implementations | 7
Properly Installing Python
This chapter walks through CPython installation on the Mac OS X, Linux, and Win‐
dows platforms. Sections on packaging tools (like Setuptools and pip) are repetitive,
so you should skip straight to the section for your particular operating system, and
ignore the others.
If you are part of an organization that recommends you use a commercial Python
distribution, such as Anaconda or Canopy, you should follow your vendor’s instruc‐
tions. There is also a small note for you in Commercial Python Redistributions” on
page 18.
If Python already exists on your system, do not, on any account,
allow anybody to change the symbolic link to the python exe‐
cutable to point at anything other than what it is already pointing
at. That would be almost as bad as reading Vogon poetry out loud.
(Think of the system-installed code that depends on a specific
Python in a specific place…)
Installing Python on Mac OS X
The latest version of Mac OS X, El Capitan, comes with its own Mac-specific imple‐
mentation of Python 2.7.
You don’t need to install or configure anything else to use Python. But we strongly
recommend installing Setuptools, pip, and virtualenv before you start building
Python applications for real-world use (i.e., contributing to collaborative projects).
You’ll learn more about these tools and how to install them later in this section. In
particular, you should always install Setuptools, as it makes it much easier for you to
use other third-party Python libraries.
1Other people have different opinions. The OS X Python implementation is not the same. It even has some
separate OS X–specific libraries. A small rant on this subject criticizing our recommendation is at the Stupid
Python Ideas blog. It raises valid concerns about collision of some names for people who switch-hit between
OS X’s CPython 2.7 and the canonical CPython 2.7. If this is a concern, use a virtual environment. Or, at the
very least, leave the OS X Python 2.7 where it is so that the system runs smoothly, install the standard Python
2.7 implemented in CPython, modify your path, and never use the OS X version. Then everything works fine,
including products that rely on Apples OS X–specific version.
2The best option is to pick Python 3, honestly, or to use virtual environments from the start and install nothing
but virtualenv and maybe virtualenvwrapper according to the advice of Hynek Schlawack.
3This will ensure that the Python you use is the one Homebrew just installed, while leaving the systems origi‐
nal Python exactly as it is.
The version of Python that ships with OS X is great for learning, but its not good for
collaborative development. The version shipped with OS X may also be out of date
from the official current Python release, which is considered the stable production
version.1 So, if all you want to do is write scripts for yourself to pull information from
websites, or process data, you dont need anything else. But if you are contributing to
open source projects, or working on a team with people that may have other operat‐
ing systems (or ever intend to in the future2), use the CPython release.
Before you download anything, read through the end of the next few paragraphs for
notes and warnings. Before installing Python, you’ll need to install GCC. It can be
obtained by downloading Xcode, the smaller Command-Line Tools (you need an
Apple account to download it), or the even smaller osx-gcc-installer package.
If you already have Xcode installed, do not install osx-gcc-installer.
In combination, the software can cause issues that are difficult to
While OS X comes with a large number of Unix utilities, those familiar with Linux
systems will notice one key component missing: a decent package manager. Home‐
brew fills this void.
To install Homebrew, open Terminal or your favorite OS X terminal emulator and
run the following code:
$ ruby -e "$(curl -fsSL ${BREW_URI})"
The script will explain what changes it will make and prompt you before the installa‐
tion begins. Once you’ve installed Homebrew, insert the Homebrew directory at the
top of your PATH environment variable.3 You can do this by adding the following line
at the bottom of your ~/.prole file:
export PATH=/usr/local/bin:/usr/local/sbin:$PATH
10 | Chapter 2: Properly Installing Python
4A symbolic link is a pointer to the actual file location. You can confirm where the link points to by typing, for
example, ls -l /usr/local/bin/python3 at the command prompt.
5Packages that are compliant with Setuptools at a minimum provide enough information for the library to
identify and obtain all package dependencies. For more information, see the documentation for Packaging
and Distributing Python Projects, PEP 302, and PEP 241.
And then to install Python, run this once in a terminal:
$ brew install python3
Or for Python 2:
$ brew install python
By default, Python will then be installed in /usr/local/Cellar/python3/ or /usr/local/
Cellar/python/ with symbolic links4 to the interpreter at /usr/local/python3 or /usr/
local/python. People who use the --user option to pip install will need to work
around a bug involving distutils and the Homebrew configuration. We recommend
just using virtual environments, described in “virtualenv” on page 11.
Setuptools and pip
Homebrew installs Setuptools and pip for you. The executable installed with pip will
be mapped to pip3 if you are using Python 3 or to pip if you are using Python 2.
With Setuptools, you can download and install any compliant5 Python software over
a network (usually the Internet) with a single command (easy_install). It also ena‐
bles you to add this network installation capability to your own Python software with
very little work.
Both pips pip command and Setuptoolss easy_install command are tools to install
and manage Python packages. pip is recommended over easy_install because it
can also uninstall packages, its error messages are more digestible, and partial pack‐
age installs can’t happen (installs that fail partway through will unwind everything
that happened so far). For a more nuanced discussion, see pip vs easy_install in the
Python Packaging User Guide, which should be your first reference for current pack‐
aging information.
To upgrade your installation of pip, type the following in a shell:
$ pip install --upgrade pip
virtualenv creates isolated Python environments. It creates a folder containing all
the necessary executables to use the packages that a Python project would need. Some
Installing Python on Mac OS X | 11
6Advocates of this practice say it is the only way to ensure nothing ever overwrites an existing installed library
with a new version that could break other version-dependent code in the OS.
7For additional details, see the pip installation instructions.
people believe best practice is to install nothing except virtualenv and Setuptools and
to then always use virtual environments.6
To install virtualenv via pip, run pip at the command line of a terminal shell:
$ pip3 install virtualenv
or if you are using Python 2:
$ pip install virtualenv
Once you are in a virtual environment, you can always use the command pip,
whether you are working with Python 2 or Python 3, so that is what we will do in the
rest of this guide. “Virtual Environmentson page 34 describes usage and motivation
in more detail.
Installing Python on Linux
Ubuntu started releasing with only Python 3 installed (and Python 2 available via
apt-get) as of Wily Werewolf (Ubuntu 15.10). All of the details are on Ubuntus
Python page. Fedoras release 23 is the first with only Python 3 (both Python 2.7 and 3
are available on releases 20–22), and otherwise Python 2.7 will be available via its
package manager.
Most parallel installations of Python 2 and Python 3 make a symbolic link from
python2 to a Python 2 interpreter and from python3 to a Python 3 interpreter. If you
decide to use Python 2, the current recommendation on Unix-like systems (see
Python Enhancement Proposal [PEP 394]) is to explicitly specify python2 in your
shebang notation (e.g., #!/usr/bin/env python2 as the first line in the file) rather
than rely on the environment python pointing where you expect.
Although not in PEP 394, it has also become convention to use pip2 and pip3 to link
to the respective pip package installers.
Setuptools and pip
Even if pip is available through a package installer on your system, to ensure you get
the most recent version, follow these steps.
First, download
Next, open a shell, change directories to the same location as, and type:
12 | Chapter 2: Properly Installing Python
8Packages that are compliant with Setuptools at a minimum provide enough information for it to identify and
obtain all package dependencies. For more information, see the documentation for Packaging and Distribut‐
ing Python Projects, PEP 302, and PEP 241.
$ wget
$ sudo python3
or for Python 2:
$ wget
$ sudo python
This will also install Setuptools.
With the easy_install command that’s installed with Setuptools, you can download
and install any compliant8 Python software over a network (usually the Internet). It
also enables you to add this network installation capability to your own Python soft‐
ware with very little work.
pip is a tool that helps you easily install and manage Python packages. It is recom‐
mended over easy_install because it can also uninstall packages, its error messages
are more digestible, and partial package installs can’t happen (installs that fail partway
through will unwind everything that happened so far). For a more nuanced discus‐
sion, see pip vs easy_install in the Python Packaging User Guide, which should
be your first reference for current packaging information.
Development Tools
Almost everyone will at some point want to use Python libraries that depend on C
extensions. Sometimes your package manager will have these, prebuilt, so you can
check first (using yum search or apt-cache search); and with the newer wheels for‐
mat (precompiled, platform-specific binary files), you may be able to get binaries
directly from PyPI, using pip. But if you expect to create C extensions in the future,
or if the people maintaining your library haven’t made wheels for your platform, you
will need the development tools for Python: various C libraries, make, and the GCC
compiler. The following are some useful packages that use C libraries:
Concurrency tools
The threading library threading
The event-handling library (Python 3.4+) asyncio
The coroutine-based networking library curio
The coroutine-based networking library gevent
The event-driven networking library Twisted
Installing Python on Linux | 13
Scientic analysis
The linear algebra library NumPy
The numerical toolkit SciPy
The machine learning library scikit-learn
The plotting library Matplotlib
Data/database interface
The interface to the HDF5 data format h5py
The PostgreSQL database adapter Psycopg
The database abstraction and object-relational mapper SQLAlchemy
On Ubuntu, in a terminal shell, type:
$ sudo apt-get update --fix-missing
$ sudo apt-get install python3-dev # For Python 3
$ sudo apt-get install python-dev # For Python 2
Or on Fedora, in a terminal shell, type:
$ sudo yum update
$ sudo yum install gcc
$ sudo yum install python3-devel # For Python 3
$ sudo yum install python2-devel # For Python 2
and then pip3 install --user desired-package will be able to build tools that
must be compiled. (Or pip install --user desired-package for Python 2.) You
also will need the tool itself installed (for details on how to do this, see the HDF5
installation documentation). For PostgreSQL on Ubuntu, youd type this in a terminal
$ sudo apt-get install libpq-dev
or on Fedora:
$ sudo yum install postgresql-devel
virtualenv is a command installed with the virtualenv package that creates isolated
Python environments. It creates a folder containing all the necessary executables to
use the packages that a Python project would need.
To install virtualenv using Ubuntus package manager, type:
$ sudo apt-get install python-virtualenv
or on Fedora:
14 | Chapter 2: Properly Installing Python
9Or consider IronPython (discussed in “IronPython” on page 6) if you want to integrate Python with the .NET
framework. But if you’re a beginner, this should probably not be your first Python interpreter. This whole
book talks about CPython.
$ sudo yum install python-virtualenv
Or via pip, run pip at the command line of a terminal shell, and use the --user
option to install it locally for yourself rather than doing a system install:
$ pip3 install --user virtualenv
or if you are using Python 2:
$ sudo pip install --user virtualenv
Once you are in a virtual environment, you can always use the command pip,
whether you are working with Python 2 or Python 3, so that is what we will do in the
rest of this guide. “Virtual Environmentson page 34 describes usage and motivation
in more detail.
Installing Python on Windows
Windows users have it harder than other Pythonistas—because its harder to compile
anything on Windows, and many Python libraries use C extensions under the hood.
Thanks to wheels, binaries can be downloaded from PyPI using pip (if they exist), so
things have gotten a little easier.
There are two paths here: a commercial distribution (discussed in Commercial
Python Redistributions on page 18) or straight-up CPython. Anaconda is much eas‐
ier, especially when you’re going to do scientific work. Actually, pretty much everyone
who does scientific computing on Windows with Python (except those developing C-
based Python libraries of their own) will recommend Anaconda. But if you know
your way around compiling and linking, if you want to contribute to open source
projects that use C code, or if you just dont want a commercial distribution (what
you need is free), we hope you consider installing straight-up CPython.9
As time progresses, more and more packages with C libraries will have wheels on
PyPI, and so can be obtained via pip. The trouble comes when required C library
dependencies are not bundled with the wheel. This dependency problem is another
reason you may prefer commercial Python redistributions like Anaconda.
Use CPython if you are the kind of Windows user who:
Doesn’t need Python libraries that rely on C extensions
Owns a Visual C++ compiler (not the free one)
Can handle setting up MinGW
Installing Python on Windows | 15
10 You must know at least what version of Python you’re using and whether you selected 32-bit or 64-bit Python.
We recommend 32-bit, as every third-party DLL will have a 32-bit version and some may not have 64-bit
versions. The most widely cited location to obtain compiled binaries is Christoph Gohlkes resource site. For
scikit-learn, Carl Kleffner is building binaries using MinGW in preparation for eventual release on PyPI.
11 Anaconda has more free stuff, and comes bundled with Spyder, a better IDE. If you use Anaconda, you’ll find
Anacondas free package index and Canopy’s package index to be helpful.
12 Meaning you are 100% certain that any Dynamically Linked Libraries (DLLs) and drivers you need are avail‐
able in 64 bit.
13 The PATH lists every location the operating system will look to find executable programs, like Python and
Python scripts like pip. Each entry is separated by a semicolon.
Is game to download binaries by hand10 and then pip install the binary
If you will use Python as a substitute for R or MATLAB, or just want to get up to
speed quickly and will install CPython later if necessary (see Commercial Python
Redistributions” on page 18 for some tips), use Anaconda.11
If you want your interface to be mostly graphical (point-and-click), or if Python is
your first language and this is your first install, use Canopy.
If your entire team has already committed to one of these options, then you should
go with whatever is currently being used.
To install the standard CPython implementation on Windows, you first need to
download the latest version of Python 3 or Python 2.7 from the official website. If you
want to be sure you are installing a fully up-to-date version (or are certain you really,
really want the 64-bit installer12), then use the Python Releases for Windows site to
find the release you need.
The Windows version is provided as an MSI package. This format allows Windows
administrators to automate installation with their standard tools. To install the pack‐
age manually, just double-click the file.
By design, Python installs to a directory with the version number embedded (e.g.,
Python version 3.5 will install at C:\Python35\) so that you can have multiple versions
of Python on the same system without conflicts. Of course, only one interpreter can
be the default application for Python file types. The installer does not automatically
modify the PATH environment variable,13 so that you always have control over which
copy of Python is run.
Typing the full path name for a Python interpreter each time quickly gets tedious, so
add the directories for your default Python version to the PATH. Assuming that the
Python installation you want to use is in C:\Python35\, you will want to add this to
your PATH:
16 | Chapter 2: Properly Installing Python
14 Windows PowerShell provides a command-line shell and scripting language that is similar enough to Unix
shells that Unix users will be able to function without reading a manual, but with features specifically for use
with Windows. It is built on the .NET Framework. For more information, see Microsofts “Using Windows
15 The installer will prompt you whether its OK to overwrite the existing installation. Say yes; releases in the
same minor version are backward-compatible.
16 For additional details, see the pip installation instructions.
17 Packages that are compliant with Setuptools at a minimum provide enough information for the library to
identify and obtain all package dependencies. For more information, see the documentation for “Packaging
and Distributing Python Projects, PEP 302, and PEP 241.
You can do this easily by running the following in PowerShell:14
PS C:\> [Environment]::SetEnvironmentVariable(
The second directory (Scripts) receives command files when certain packages are
installed, so it is a very useful addition. You do not need to install or configure any‐
thing else to use Python.
Having said that, we strongly recommend installing Setuptools, pip, and virtualenv
before you start building Python applications for real-world use (i.e., contributing to
collaborative projects). Youll learn more about these tools and how to install them
later in this section. In particular, you should always install Setuptools, as it makes it
much easier for you to use other third-party Python libraries.
Setuptools and pip
The current MSI packaged installers install Setuptools and pip for you with Python,
so if you are following along with this book and just installed now, you have them
already. Otherwise, the best way to get them with Python 2.7 installed is to upgrade to
the newest release.15 For Python 3, in versions 3.3 and prior, download the script get-,16 and run it. Open a shell, change directories to the same location as,
and then type:
PS C:\> python
With Setuptools, you can download and install any compliant17 Python software over
a network (usually the Internet) with a single command (easy_install). It also ena‐
bles you to add this network installation capability to your own Python software with
very little work.
Both pips pip command and Setuptoolss easy_install command are tools to install
and manage Python packages. pip is recommended over easy_install because it
can also uninstall packages, its error messages are more digestible, and partial pack‐
Installing Python on Windows | 17
age installs can’t happen (installs that fail partway through will unwind everything
that happened so far). For a more nuanced discussion, see pip vs easy_install in
the Python Packaging User Guide, which should be your first reference for current
packaging information.
The virtualenv command creates isolated Python environments. It creates a folder
containing all the necessary executables to use the packages that a Python project
would need. Then, when you activate the environment using a command in the new
folder, it prepends that folder to your PATH environment variable—the Python in the
new folder becomes the first one found, and the packages in its subfolders are the
ones used.
To install virtualenv via pip, run pip at the command line of a PowerShell terminal:
PS C:\> pip install virtualenv
“Virtual Environments” on page 34 describes usage and motivation in more detail.
On OS X and Linux, because Python comes installed for use by system or third-party
software, they must specifically distinguish between the Python 2 and Python 3 ver‐
sions of pip. On Windows, there is no need to do this, so whenever we say pip3, we
mean pip for Windows users. Regardless of OS, once you are in a virtual environ‐
ment, you can always use the command pip, whether you are working with Python 2
or Python 3, so that is what we will do in the rest of this guide.
Commercial Python Redistributions
Your IT department or classroom teaching assistant may have asked you to install a
commercial redistribution of Python. This is intended to simplify the work an orga‐
nization needs to do to maintain a consistent environment for multiple users. All of
the ones listed here provide the C implementation of Python (CPython).
A technical reviewer for the first draft of this chapter said we massively understated
the trouble it is to use a regular CPython installation on Windows for most users: that
even with wheels, compiling and/or linking to external C libraries is painful for
everyone but seasoned developers. We have a bias toward plain CPython, but the
truth is if youre going to be a consumer of libraries and packages (as opposed to a
creator or contributor), you should just download a commercial redistribution and
get on with your life—doubly so if youre a Windows user. Later, when you want to
contribute to open source, you can install the regular distribution of CPython.
18 | Chapter 2: Properly Installing Python
18 Intel and Anaconda have a partnership, and all of the Intel accelerated packages are only available using
conda. However, you can always conda install pip and use pip (or pip install conda and use conda)
when you want to.
It is easier to go back to a standard Python installation if you do
not alter the default settings in vendor-specific installations.
Heres what these commercial distributions have to offer:
e Intel Distribution for Python
The purpose of the Intel Distribution for Python is to deliver high-performance
Python in an easy-to-access, free package. The primary boost to performance
comes from linking Python packages with native libraries such as the Intel Math
Kernel Library (MKL), and enhanced threading capabilities that include the Intel
Threading Building Blocks (TBB) library. It relies on Continuums conda for
package management, but also comes with pip. It can be downloaded by itself or
installed from in a conda environment.18
It provides the SciPy stack and the other common libraries listed in the release
notes (PDF). Customers of Intel Parallel Studio XE get commercial support and
everyone else can use the forums for help. So, this option gives you the scientific
libraries without too much fuss, and otherwise is a regular Python distribution.
Continuum Analytics’ Anaconda
Continuum Analytics’ distribution of Python is released under the BSD license
and provides tons of precompiled science and math binaries on its free package
index. It has a different package manager than pip, called conda, that also man‐
ages virtual environments, but acts more like Buildout (discussed in “Buildout
on page 38) than like virtualenv—managing libraries and other external depen‐
dencies for the user. The package formats are incompatibile, so each installer
can’t install from the other’s package index.
The Anaconda distribution comes with the SciPy stack and other tools. Ana‐
conda has the best license and the most stuff for free; if youre going to use a
commercial distribution—especially if youre already comfortable working with
the command line already and like R or Scala (also bundled)—choose this. If you
don’t need all of those other things, use the miniconda distribution instead. Cus‐
tomers get various levels of indemnification (related to open source licenses, and
who can use what when, or whom gets sued for what), commercial support, and
extra Python libraries.
Commercial Python Redistributions | 19
ActiveStates ActivePython
ActiveStates distribution is released under the ActiveState Community License
and is free for evaluation only; otherwise it requires a license. ActiveState also
provides solutions for Perl and Tcl. The main selling point of this distribution is
broad indemnification (again related to open source licenses) for the more than
7,000 packages in its cultivated package index, reachable using the ActiveState
pypm tool, a replacement for pip.
Enthought’s Canopy
Enthoughts distribution is released under the Canopy Software License, with a
package manager, enpkg, that is used in place of pip to connect to Canopy’s pack‐
age index.
Enthought provides free academic licenses to students and staff from degree-
granting institutions. Distinguishing features from Enthoughts distribution are
graphical tools to interact with Python, including its own IDE that resembles
MATLAB, a graphical package manager, a graphical debugger, and a graphical
data manipulation tool. Like the other commercial redistributors, there is indem‐
nification and commercial support, in addition to more packages for customers.
20 | Chapter 2: Properly Installing Python
1If at some point you want to build C extensions for Python, check out “Extending Python with C or C++. For
more details, see Chapter 15 of Python Cookbook.
Your Development Environment
This chapter provides an overview of the text editors, integrated development envi‐
ronments (IDEs), and other development tools currently popular in the Python edit
→ test → debug cycle.
We unabashedly prefer Sublime Text (discussed in “Sublime Text” on page 23) as an
editor and PyCharm/IntelliJ IDEA (discussed in “PyCharm/IntelliJ IDEAon page
29) as an IDE but recognize that the best option depends on the type of coding you
do and the other languages you use. This chapter lists a number of the most popular
ones and reasons for choosing them.
Python does not need build tools like Make or Javas Ant or Maven because it is inter‐
preted, not compiled,1 so we do not discuss them here. But in Chapter 6, well
describe how to use Setuptools to package projects and Sphinx to build documenta‐
We also wont cover version control systems, as these are language-independent, but
the people who maintain the C (reference) implementation of Python just moved
from Mercurial to Git (see PEP 512). The original justification to use Mercurial, in
PEP 374, contains a small but useful comparison between today’s top four options:
Subversion, Bazaar, Git, and Mercurial.
This chapter concludes with a brief review of the current ways to manage different
interpreters to replicate different deployment situations while coding.
Text Editors
Just about anything that can edit plain text will work for writing Python code; how‐
ever, choosing the right editor can save you hours per week. All of the text editors
listed in this section support syntax highlighting and can be extended via plug-ins to
use static code checkers (linters) and debuggers.
Table 3-1 lists our favorite text editors in descending order of preference and articu‐
lates why a developer would choose one over the other. The rest of the chapter briefly
elaborates on each editor. Wikipedia has a very detailed text editor comparison chart
for those who need to check for specific features.
Table 3-1. Text editors at a glance
Tool Availability Reason to use
Text Open API/has free trial
OS X, Linux, Windows
It’s fast, with a small footprint.
It handles large (> 2 GB) les well.
Extensions are written in Python.
Vim Open source/donations appreciated
OS X, Linux, Windows, Unix
You already love Vi/Vim.
It (or at least Vi) is preinstalled on every OS except Windows.
It can be a console application.
Emacs Open source/donations appreciated
OS X, Linux, Windows, Unix
You already love Emacs.
Extensions are written in Lisp.
It can be a console application.
TextMate Open source/need a license
OS X only
Great UI.
Nearly all interfaces (static code check/debug/test) come
Good Apple tools—for example, the interface to xcodebuild (via
the Xcode bundle).
Atom Open source/free
OS X, Linux, Windows
Extensions are written in JavaScript/HTML/CSS.
Very nice GitHub integration.
Code Open API (eventually)/free
OS X, Linux, Windows (but Visual
Studio, the corresponding IDE, only
works on Windows)
IntelliSense (code completion) worthy of Microsoft’s VisualStudio.
Good for Windows devs, with support for .Net, C#, and F#.
Caveat: not yet extensible (to come).
22 | Chapter 3: Your Development Environment
2Snippets are sets of frequently typed code, like CSS styles or class definitions, that can be autocompleted if you
type a few charaters and then hit the Tab key.
Sublime Text
Sublime Text is our recommended text editor for code, markup, and prose. Its speed
is the first thing cited when people recommend it; the number of packages available
(3,000+) is next.
Sublime Text was first released in 2008 by Jon Skinner. Written in Python, it has
excellent support for editing Python code and uses Python for its package extension
API. A “Projects” feature allows the user to add/remove files or folders—these can
then be searched via the “Goto Anything” function, which identifies locations within
the project that contain the search term(s).
You need PackageControl to access the Sublime Text package repository. Popular
packages include SublimeLinter, an interface to the user’s selection of installed static
code checkers; Emmett for web development snippets;2 and Sublime SFTP for remote
editing via FTP.
Anaconda (no relation to the commercial Python distribution of the same name),
released in 2013, by itself turns Sublime almost into an IDE, complete with static code
checks, docstring checks, a test runner, and capability to look up the definition of or
locate uses of highlighted objects.
Vim is a console-based text editor (with optional GUI) that uses keyboard shortcuts
for editing instead of menus or icons. It was first released in 1991 by Bram Moole‐
naar, and its predecessor, Vi, was released in 1976 by Bill Joy. Both are written in C.
Vim is extensible via vimscript, a simple scripting language. There are options to use
other languages: to enable Python scripting, set the build configuration flags when
building from the C source to --enable-pythoninterp and/or --enable-
python3interp before you build from source. To check whether Python or Python3
are enabled, type :echo has("python") or :echo has("python3"); the result will be
“1” if True or “0” if False.
Vi (and frequently Vim) is available out of the box on pretty much every system but
Windows, and there is an executable installer for Vim on Windows. Users who can
tolerate the learning curve will become extremely efficient; so much that the basic Vi
key bindings are available as a configuration option in most other editors and IDEs.
Text Editors | 23
3Just open the editor by typing vi (or vim) then Enter on the command line, and once inside, type :help then
Enter for the tutorial.
4To locate your home directory on Windows, open Vim and type :echo $HOME.
If you want to work for a large company in any sort of IT role, a
functioning awareness of Vi is necessary.3 Vim is much more fea‐
tureful than Vi, but is close enough that a Vim user can function in
If you only develop in Python, you can set the default settings for indentation and
line wrapping to values compliant with PEP 8. To do that, create a file called .vimrc in
your home directory,4 and add the following:
set textwidth=79 " lines longer than 79 columns will be broken
set shiftwidth=4 " operation >> indents 4 columns; << unindents 4 columns
set tabstop=4 " a hard TAB displays as 4 columns
set expandtab " insert spaces when hitting TABs
set softtabstop=4 " insert/delete 4 spaces when hitting a TAB/BACKSPACE
set shiftround " round indent to multiple of 'shiftwidth'
set autoindent " align the new line indent with the previous line
With these settings, newlines are inserted after 79 characters and indentation is set to
four spaces per tab, and if you are inside an indented statement, your next line will
also be indented to the same level.
There is also a syntax plug-in called python.vim that features some improvements
over the syntax file included in Vim 6.1, and a small plug-in, SuperTab, that makes
code completion more convenient by using the Tab key or any other customized keys.
If you also use Vim for other languages, there is a handy plug-in called indent, which
handles indentation settings for Python source files.
These plug-ins supply you with a basic environment for developing in Python. If your
Vim is compiled with +python (the default for Vim 7 and newer), you can also use the
plug-in vim-flake8 to do static code checks from within the editor. It provides the
function Flake8, which runs PEP8 and Pyflakes, and can be mapped to any hotkey or
action you want in Vim. The plug-in will display errors at the bottom of the screen
and provide an easy way to jump to the corresponding line.
If you think its handy, you can make Vim call Flake8 every time you save a Python
file by adding the following line to your .vimrc:
autocmd BufWritePost *.py call Flake8()
Or, if you are already using syntastic, you can set it to run Pyflakes on write and show
errors and warnings in the quickfix window. Heres an example configuration to do
that and also show status and warning messages in the status bar:
24 | Chapter 3: Your Development Environment
5We love Raymond Hettinger. If everyone coded the way he recommends, the world would be a much better
set statusline+=%#warningmsg#
set statusline+=%{SyntasticStatuslineFlag()}
set statusline+=%*
let g:syntastic_auto_loc_list=1
let g:syntastic_loc_list_height=5
Python-mode is a complex solution for working with Python code in Vim. If you like
any of the features listed here, use it (but be aware it will slow down Vims launch a
little bit):
Asynchronous Python code checking (pylint, pyflakes, pep8, mccabe), in any
Code refactoring and autocompletion with rope
Fast Python folding (you can hide and show code within indents)
Support for virtualenv
The ability to search through Python documentation and run Python code
Auto PEP8 error fixes
Emacs is another powerful text editor. It now has a GUI but can still be run directly in
the console. It is fully programmable (Lisp), and with a little work can be wired up as
a Python IDE. Masochists and Raymond Hettinger5 use it.
Emacs is written in Lisp and was first released in 1976 by Richard Stallman and Guy
L. Steele, Jr. Built-in features include remote edit (via FTP), a calendar, mail send/
read, and even a shrink (Esc, then x, then doctor). Popular plug-ins include
YASnippet to map custom code snippets to keystrokes, and Tramp for debugging. It
is extensible via its own dialect of Lisp, elisp plus.
If you are already an Emacs user, EmacsWiki’s “Python Programming in Emacs has
the best advice for Python packages and configuration. Those new to Emacs can get
started with the official Emacs tutorial.
There are three major Python modes for Emacs right now:
Text Editors | 25
6Electron is a platform to build cross-platform desktop applications using HTML, CSS, and JavaScript.
Fabián Ezequiel Gallinas python.el, now bundled with Emacs (version 24.3+),
implements syntax highlighting, indentation, movement, shell interaction, and a
number of other common Emacs edit-mode features.
Jorgen Schäfer’s Elpy aims to provide a full-featured interative development envi‐
ronment within Emacs, including debugging, linters, and code completion.
Pythons source distribution ships with an alternate version in the directory Misc/
python-mode.el. You can download it from the Web as a separate file from
launchpad. It has some tools for programming by speech, additional keystroke
shortcuts, and allows you to set up a complete Python IDE.
TextMate is a GUI with Emacs roots that works only on OS X. It has a truly Apple-
worthy user interface that somehow manages to be unobtrusive while exposing all of
the commands with minimal discovery effort.
TextMate is written in C++ and was first released in 2004 by Allan Oddgard and
Ciarán Walsh. Sublime Text (discussed in “Sublime Text” on page 23) can directly
import TextMate snippets, and Microsofts Code (discussed in Code” on page 27) can
directly import TextMate syntax highlighting.
Snippets in any language can be added in bundled groups, and it can otherwise be
extended with shell scripts: the user can highlight some text and pipe it as standard
input through the script using the Cmd+| (pipe) key combination. The script output
replaces the highlighted text.
It has built-in syntax highlighting for Apples Swift and Objective C, and (via the
Xcode bundle) an interface to xcodebuild. A veteran TextMate user will not have
problems coding in Python using this editor. New users who dont spend much time
coding for Apple products are probably better off with the newer cross-platform edi‐
tors that borrow heavily from TextMates best-loved features.
Atom is a “hackable text editor for the 21st century,” according to the folks at GitHub
who created it. It was first released in 2014, is written in CoffeeScript (JavaScript) and
Less (CSS), and is built on top of Electron (formerly Atom Shell),6 which is GitHubs
application shell based on io.js and Chromium.
26 | Chapter 3: Your Development Environment
Atom is extensible via JavaScript and CSS, and users can add snippets in any language
(including TextMate-style snippet definitions). As you’d expect, it has very nice Git‐
Hub integration. It comes with native package control and a plethora of packages
(2,000+). Recommended for Python development is Linter combined with linter-
flake8. Web developers may also like the Atom development server, which runs a
small HTTP server and can display the HTML preview within Atom.
Microsoft announced Code in 2015. It is a free, closed source text editor in the Visual
Studio family, also built on GitHubs Electron. It is cross-platform and has key bind‐
ings just like TextMate.
It comes with an extension API—check out the VS Code Extension Marketplace to
browse existing extensions—and merges what its developers thought were the best
parts of TextMate and Atom with Microsoft. It has IntelliSense (code completion)
worthy of VisualStudio, and good support for .Net, C#, and F#.
Visual Studio (the sister IDE to the Code text editor) still only works on Windows,
even though Code is cross-platform.
Many developers use both a text editor and an IDE, switching to the IDE for larger,
more complex, or more collaborative projects. Table 3-2 highlights the distinguishing
features of some popular IDEs, and the sections that follow provide more in-depth
information on each one.
One feature frequently cited as a reason to go to a full IDE (outside of great code
completion and debugging tools) is the ability to quickly switch between Python
interpreters (e.g., from Python 2 to Python 3 to IronPython); this is available in the
free version of all of the IDEs listed in Table 3-2, Visual Studio now offers this at all
Additional features that may or may not come free are tools that interface with ticket‐
ing systems, deployment tools (e.g., Heroku or Google App Engine), collaboration
tools, remote debugging, and extra features for using web development frameworks
such as Django.
IDEs | 27
Table 3-2. IDEs at a glance
Tool Availability Reason to use
IDEA Open API/paid professional edition
Open source/free community
OS X, Linux, Windows
Nearly perfect code completion.
Good support for virtual environments.
Good support for web frameworks (in the paid version).
Aptana Studio 3 /
Eclipse + LiClipse +
Open source/free
OS X, Linux, Windows
You already love Eclipse.
Java support (LiClipse/Eclipse).
WingIDE Open API/free trial
OS X, Linux, Windows
Great debugger (web)—best of the IDEs listed here.
Extensible via Python.
Spyder Open source/free
OS X, Linux, Windows
Data science: IPython integrated, and it is bundled with
NumPy, SciPy, and matplotlib.
The default IDE in popular scientic Python distributions:
Anaconda, Python(x,y), and WinPython.
NINJA-IDE Open source/donations appreciated
OS X, Linux, Windows
Intentionally lightweight.
Strong Python focus.
Komodo IDE Open API/text editor (Komodo
Edit) is open source
OS X, Linux, Windows
Python, PHP, Perl, Ruby, Node.
Extensions are based on Mozilla add-ons.
Eric (the Eric
Python IDE) Open source/donations appreciated
OS X, Linux, Windows
Ruby + Python.
Intentionally lightweight.
Great debugger (scientic)—can debug one thread while
others continue.
Visual Studio
(Community) Open API/free community edition
Paid professional or enterprise
Windows only
Great integration with Microsoft languages and tools.
IntelliSense (code completion) is fantastic.
Project management and deployment assistance, including
sprint planning tools and manifest templates in the
Enterprise edition.
Caveat: cannot use virtual environments except in the
Enterprise (most expensive) edition.
28 | Chapter 3: Your Development Environment
PyCharm/IntelliJ IDEA
PyCharm is our favorite Python IDE. The top reasons are its nearly perfect code
completion tools, and the quality of its tools for web development. Those in the sci‐
entific community recommend the free edition (which doesn’t have the web develop‐
ment tools) as just fine for their needs, but not as often as they choose Spyder
(discussed in Spyder” on page 30).
PyCharm is developed by JetBrains, also known for IntelliJ IDEA, a proprietary Java
IDE that competes with Eclipse. PyCharm (first released in 2010) and IntelliJ IDEA
(first released in 2001) share the same code base, and most of PyCharms features can
be brought to IntelliJ with the free Python plug-in.
JetBrains recommends PyCharm for a simpler UI, or IntelliJ IDEA if you want to
introspect into Jython functions, perform cross-language navigation, or do Java-to-
Python refactoring. (PyCharm works with Jython but only as a possible choice for
interpreter; the introspection tools arent there.) The two are licensed separately—so
choose before you buy.
The IntelliJ Community Edition and PyCharm Commuity Edition are open sourced
(Apache 2.0 License) and free.
Aptana Studio 3/Eclipse + LiClipse + PyDev
Eclipse is written in Java and was first released in 2001 by IBM as an open, versatile
Java IDE. PyDev, the Eclipse plug-in for Python development, was released in 2003
by Aleks Totic, who later passed the torch to Fabio Zadrozny. It is the most popular
Eclipse plug-in for Python development.
Although the Eclipse community doesnt push back online when people advocate for
IntelliJ IDEA in forums comparing the two, Eclipse is still the most commonly used
Java IDE. This is relevant for Python developers who interface with tools written in
Java, as many popular ones (e.g., Hadoop, Spark, and proprietary versions of these)
come with instructions and plug-ins for development with Eclipse.
A fork of PyDev is baked into Aptanas Studio 3, which is an open source suite of
plug-ins bundled with Eclipse that provide an IDE for Python (and Django), Ruby
(and Rails), HTML, CSS, and PHP. The primary focus of Aptantas owner, Appcelera‐
tor, is the Appcelerator Studio, a proprietary mobile platform for HTML, CSS, and
JavaScript that requires a monthly license (once your app goes live). General PyDev
and Python support is there, but is not a priority. That said, if you like Eclipse and are
primarily a JavaScript developer making apps for mobile platforms with occasional
forays into Python, especially if you use Appcelerator at work, Aptanas Studio 3 is a
good choice.
IDEs | 29
LiClipse was born out of a desire to have a better multilanguge experience in Eclipse,
and easy access to fully dark themes (i.e., in addition to the text background, menus
and borders will also be dark). It is a proprietary suite of Eclipse plug-ins written by
Zadrozny; part of the license fees (optional) go to keeping PyDev totally free and
open source (EPL License; the same as Eclipse). It comes bundled with PyDev, so
Python users don’t need to install it themselves.
WingIDE is a Python-specific IDE; probably the second most popular Python IDE
after PyCharm. It runs on Linux, Windows, and OS X.
Its debugging tools are very good and include tools to debug Django templates.
WingIDE users cite its debugger, the quick learning curve, and a lightweight footprint
as reasons they prefer this IDE.
Wing was released in 2000 by Wingware and is written in Python, C, and C++. It
supports extensions but does not have a plug-in repository yet, so users have to
search for others’ blogs or GitHub accounts to find existing packages.
Spyder (an abbreviation of Scientific PYthon Development EnviRonment) is an IDE
specifically geared toward working with scientific Python libraries.
Spyder is written in Python by Carlos Córdoba. It is open source (MIT License), and
offers code completion, syntax highlighting, a class and function browser, and object
inspection. Other features are available via community plug-ins.
Spyder includes integration with pyflakes, pylint, and rope, and comes bundled with
NumPy, SciPy, IPython, and Matplotlib. It is itself bundled with the popular Scientific
Python distributions Anaconda, Python(x, y), and WinPython.
NINJA-IDE (from the recursive acronym: “Ninja-IDE Is Not Just Another IDE”) is a
cross-platform IDE designed to build Python applications. It runs on Linux/X11, Mac
OS X, and Windows. Installers for these platforms can be downloaded from NINJA-
IDE’s website.
NINJA-IDE is developed in Python and Qt, open sourced (GPLv3 License), and is
intentionally lightweight. Out of the box, its best-liked feature is that it highlights
problem code when running static code checkers or debugging, and the ability to pre‐
view web pages in-browser. It is extensible via Python, and has a plug-in repository.
The idea is that users will add only the tools they need.
30 | Chapter 3: Your Development Environment
Development slowed for a while, but a new NINJA-IDE v3 is planned for some time
in 2016, and there is still active communication on the NINJA-IDE listserv. The com‐
munity has many native Spanish speakers, including the core development team.
Komodo IDE
Komodo IDE is developed by ActiveState and is a commercial IDE for Windows,
Mac, and Linux. KomodoEdit, the IDE’s text editor, is the open source (Mozilla pub‐
lic license) alternative.
Komodo was first released in 2000 by ActiveState and uses the Mozilla and Scintilla
code base. It is extensible via Mozilla add-ons. It suports Python, Perl, Ruby, PHP, Tcl,
SQL, Smarty, CSS, HTML, and XML. Komodo Edit does not have a debugger, but one
is available as a plug-in. The IDE does not support virtual environments, but does
allow the user to select which Python interpreter to use. Django support is not as
extensive as in WingIDE, PyCharm, or Eclipse + PyDev.
Eric (the Eric Python IDE)
Eric is open source (GPLv3 licence) with more than 10 years of active development. It
is written in Python and based on the Qt GUI toolkit, integrating the Scintilla editor
control. It is named after Eric Idle, a member of the Monty Python troupe, and in
homage to the IDLE IDE, bundled with Python distributions.
Its features include source code autocompletion, syntax highlighting, support for ver‐
sion control systems, Python 3 support, an integrated web browser, a Python shell, an
integrated debugger, and a flexible plug-in system. It does not have extra tools for web
Like NINJA-IDE and Komodo IDE, it is intentionally lightweight. Faithful users
believe it has the best debugging tools around, including the ability to stop and debug
one thread while others continue to run. If you wish to use Matplotlib for interactive
plotting in this IDE, you must use the Qt4 backend:
# This must come first:
import matplotlib
# And then pyplot will use the Qt4 backend:
import matplotlib.pyplot as plt
This link is to the most recent documentation for the Eric IDE. Users leaving positive
notes on Eric IDE’s web page are almost all from the scientific computation (e.g.,
weather models, or computational fluid dynamics) community.
IDEs | 31
Visual Studio
Professional programmers who work with Microsoft products on Windows will want
Visual Studio. It is written in C++ and C#, and its first version appeared in 1995. In
late 2014, the first Visual Studio Community Edition was made available for free for
noncommercial developers.
If you intend to work with primarily with enterprise software and use Microsoft
products like C# and F#, this is your IDE.
Be sure to install with the Python Tools for Visual Studio (PTVS), which is a check‐
box in the list of custom installation options that is by default unchecked. The
instructions for installing with Visual Studio or after installing Visual Studio are on
the PTVS wiki page.
Enhanced Interactive Tools
The tools listed here enhance the interactive shell experience. IDLE is actually an
IDE, but not included in the preceding section because most people do not consider
it robust enough to use in the same way (for enterprise projects) as the other IDEs
listed; however, it is fantastic for teaching. IPython is incorporated into Spyder by
default, and can be incorporated into others of the IDEs. They do not replace the
Python interpreter, but rather augment the user’s chosen interpreter shell with addi‐
tional tools and features.
IDLE, which stands for Integrated Development and Learning Environment (and is
also the last name of Monty Python member Eric Idle), is part of the Python standard
library; it is distributed with Python.
IDLE is completely written in Python by Guido van Rossum (Pythons BDFL—
Benevolent Dictator for Life) and uses the Tkinter GUI toolkit. Though IDLE is not
suited for full-blown development using Python, it is quite helpful to try out small
Python snippets and experiment with different features in Python.
It provides the following features:
A Python shell window (interpreter)
A multiwindow text editor that colorizes Python code
Minimal debugging capability
32 | Chapter 3: Your Development Environment
IPython provides a rich toolkit to help you make the most out of using Python inter‐
actively. Its main components are:
Powerful Python shells (terminal- and Qt-based)
A web-based notebook with the same core features as the terminal shell, plus
support for rich media, text, code, mathematical expressions, and inline plots
Support for interactive data visualization (i.e., when configured, your Matplotlib
plots pop up in windows) and use of GUI toolkits
Flexible, embeddable interpreters to load into your own projects
Tools for high-level and interactive parallel computing
To install IPython, type the following in a terminal shell or in PowerShell:
$ pip install ipython
bpython is an alternative interface to the Python interpreter for Unix-like operating
systems. It has the following features:
Inline syntax highlighting
Auto indentation and autocompletion
Expected parameter list for any Python function
A “rewind” function to pop the last line of code from memory and re-evaluate it
The ability to send entered code to a pastebin (to share code online)
The ability to save entered code to a file
To install bpython, type the following in a terminal shell:
$ pip install bpython
Isolation Tools
This section provides more details about the most widely used isolation tools, from
virtualenv, which isolates Python environments from each other, to Docker, which
virtualizes the entire system.
These tools provide various levels of isolation between the running application and
its host environment. They make it possible to test and debug code against different
versions of Python and library dependencies, and can be used to provide a consistent
deployment environment.
Isolation Tools | 33
8Or if you prefer, use Set-ExecutionPolicy AllSigned instead.
Virtual Environments
A Python virtual environment keeps dependencies required by different projects in
separate places. By installing multiple Python environments, your global site-packages
directory (where user-installed Python packages are stored) can stay clean and man‐
ageable, and you can simultaneously work on a project that, for example, requires
Django 1.3 while also maintaining a project that requires Django 1.0.
The virtualenv command does this by creating a separate folder that contains a soft‐
link to the Python executable, a copy of pip, and a place for Python libraries. It
prepends that location to the PATH upon activation, and then returns the PATH to its
original state when deactivated. It is also possible to use the system-installed version
of Python and system-installed libraries, via command-line options.
You can’t move a virtual environment once its created—the paths
in the executables are all hardcoded to the current absolute path to
the interpreter in the virtual environments bin/ directory.
Create and activate the virtual environment
Setup and activation of Python virtual environments is slightly different on different
operating systems.
On Mac OS X and Linux. You can specify the version of Python with the --python argu‐
ment. Then, use the activate script to set the PATH, entering the virtual environment:
$ cd my-project-folder
$ virtualenv --python python3 my-venv
$ source my-venv/bin/activate
On Windows. If you havent already, you should set the system execution policies to
allow a locally created script to run.8 To do this, open PowerShell as an administrator,
and type:
PS C:\> Set-ExecutionPolicy RemoteSigned
Reply Y to the question that appears, exit, and then, in a regular PowerShell, you can
create a virtual environment like so:
PS C:\> cd my-project-folder
PS C:\> virtualenv --python python3 my-venv
PS C:\> .\my-venv\Scripts\activate
34 | Chapter 3: Your Development Environment
9POSIX stands for Portable Operating System Interface. It comprises a set of IEEE standards for how an oper‐
ating system should behave: the behavior of and interface to basic shell commands, I/O, threading, and other
services and utilities. Most Linux and Unix distributions are considered POSIX compatible, and Darwin (the
operating system underneath Mac OS X and iOS) has been compatible since Leopard (10.5). A “POSIX sys‐
tem” is a system that is considered POSIX compatible.
Add libraries to the virtual environment
Once you have activated the virtual environment, the first pip executable in your
path will be the one located in the my-venv folder you just made, and it will install
libraries into the following directory:
my-venv/lib/python3.4/site-packages/ (on POSIX9 systems)
my-venv\Lib\site-packages (on Windows)
When bundling your own packages or projects for other people, you can use:
$ pip freeze > requirements.txt
while the virtual environment is active. This writes all of the currently installed pack‐
ages (which are hopefully also project dependencies) to the file requirements.txt. Col‐
laborators can install all of the dependencies in their own virtual environment when
given a requirements.txt file by typing:
$ pip install -r requirements.txt
pip will install the listed dependencies, overriding dependency specifications in sub‐
packages if conflicts exist. Dependencies specified in requirements.txt are intended to
set the entire Python environment. To set dependencies when distributing a library, it
is better to use the install_requires keyword argument to the setup() function in
a file.
Be careful to not use pip install -r requirements.txt outside
of a virtual environment. If you do, and anything in require‐
ments.txt is a different version than the one installed on your com‐
puter, pip will overwrite the other version of the library with the
one specified in requirements.txt.
Deactivate the virtual environment
To return to normal system settings, type:
$ deactivate
For more information, see the Virtual Environments docs, the official virtualenv docs,
or the official Python packaging guide. The package pyvenv, which is distributed as
part of the Python standard library in Python versions 3.3 and above, does not
Isolation Tools | 35
replace virtualenv (in fact, it is a dependency of virtualenv), so these instructions
work for all versions of Python.
pyenv is a tool that allows multiple versions of the Python interpreter to be used at
the same time. This solves the problem of having different projects that each require
different versions of Python, but you would still need to use virtual environments if
the dependency conflict was in the libraries (e.g., requiring different Django ver‐
sions). For example, you can install Python 2.7 for compatibility in one project, and
still use Python 3.5 as the default interpreter. pyenv isnt just limited to the CPython
versions—it will also install PyPy, Anaconda, Miniconda, Stackless, Jython, and Iron‐
Python interpreters.
Pyenv works by filling a shims directory with a shim version of the Python interpreter
and executables like pip and 2to3. These will be the executables found if the direc‐
tory is prepended to the $PATH environment variable. A shim is a pass-through func‐
tion that interprets the current situation and selects the most appropriate function to
perform the desired task. For example, when the system looks for a program named
python, it looks inside the shims directory first, and uses the shim version, which in
turn passes the command on to pyenv. Pyenv then works out which version of
Python should be run based on environment variables, *.python-version files, and the
global default.
For virtual environments, there is the plug-in pyenv-virtualenv, which automates the
creation of different environments, and also makes it possible to use the existing
pyenv tools to switch to different environments.
Autoenv provides a lightweight option to manage different environment settings out‐
side of the scope of virtualenv. It overrides the cd shell command so that when you
change into a directory containing a .env file (e.g., setting the PATH and an environ‐
ment variable with a database URL), Autoenv automagically activates the environ‐
ment, and when you cd out of it, the effect is undone. It does not work in Windows
Install it on Mac OS X using brew:
$ brew install autoenv
or on Linux:
$ git clone git:// ~/.autoenv
$ echo 'source ~/.autoenv/' >> ~/.bashrc
and then open a new terminal shell.
36 | Chapter 3: Your Development Environment
virtualenvwrapper provides a set of commands that extend Python virtual environ‐
ments for more control and better manageability. It places all your virtual environ‐
ments in one directory and provides empty hook functions that can be run before or
after creation/activation of the virtual environment or of a project—for example, the
hook could set environment variables by sourcing the .env file within a directory.
The problem with placing such functions with the installed items is that the user
must somehow acquire these scripts to completely duplicate the environment on
another machine. It could be useful on a shared development server, if all of the envi‐
ronments were placed in a shared folder and used by multiple users.
To skip the full virtualenvwrapper installation instructions, first make sure virtualenv
is already installed. Then, on OS X or Linux, type the following in a command termi‐
$ pip install virtualenvwrapper
Or use pip install virtualenvwrapper if you are using Python 2, and add this to
your ~/.prole:
export VIRTUALENVWRAPPER_PYTHON=/usr/local/bin/python3
and then add the following to you ~/.bash_prole or favorite other shell profile:
source /usr/local/bin/
Finally, close the current terminal window, and open a new one to activate your new
profile, and virtualenvwrapper will be available.
On Windows, use virtualenvwrapper-win instead. With virtualenv already installed,
PS C:\> pip install virtualenvwrapper-win
Then, on both platforms, the following commands are the most commonly used:
mkvirtualenv my_venv
Creates the virtual environment in the folder ~/.virtualenvs/my_venv. Or on
Windows, my_venv will be created in the directory identified by typing %USERPRO
FILE%\Envs on the command line. The location is customizable via the environ‐
ment variable $WORKON_HOME.
workon my_venv
Activates the virtual environment or switches from the current environment to
the specified one.
Deactivates the virtual environment.
Isolation Tools | 37
10 An egg is a ZIP file with a specific structure, containing distribution content. Eggs have been replaced by
wheels as of PEP 427. They were introduced by the very popular (and now de facto) packaging library, Setup‐
tools, which provides a useful interface to the Python Standard Library’s distutils. You can read all about the
differences between the formats in “Wheel vs Egg in the Python Packaging User Guide.
rmvirtualenv my_venv
Deletes the virtual environment.
virtualenvwrapper provides tab-completion on environment names, which really
helps when you have a lot of environments and have trouble remembering their
names. A number of other convenient functions are documented in the full list of vir‐
tualenvwrapper commands.
Buildout is a Python framework that allows you to create and compose recipes
Python modules containing arbitrary code (usually system calls to make directories
or to check out and build source code, and to add non-Python parts to the project,
such as a database or a web server). Install it using pip:
$ pip install zc.buildout
Projects that use Buildout would include zc.buildout and the recipes they need in
their requirements.txt, or would directly include custom recipes with the source code.
They also include the configuration file buildout.cfg, and the script in its
top directory. If you run the script by typing python, it will read the
configuration file to determine which recipes to use, plus each recipes configuration
options (e.g., the specific compiler flags and library linking flags).
Buildout gives a Python project with non-Python parts portability—another user can
reconstruct the same environment. This is different from the script hooks in Virtua‐
lenvwrapper, which would need to be copied and transmitted along with the require‐
ments.txt file to be able to re-create a virtual environment.
It includes parts to install eggs,10 which can be skipped in the newer versions of
Python that use wheels instead. See the Buildout tutorial for more information.
Conda is like pip, virtualenv, and Buildout together. It comes with the Anaconda dis‐
tribution of Python and is Anacondas default package manager. It can be installed via
$ pip install conda
And pip can be installed via conda:
38 | Chapter 3: Your Development Environment
11 YAML YAML Aint Markup Language, is a markup language intended to be both human-readable and
12 A virtual machine is an application that emulates a computer system by imitating the desired hardware and
providing the desired operating system on a host computer.
$ conda install pip
The packages are stored on different repositories (pip pulls from http://, and conda pulls from, and they use dif‐
ferent formats, so the tools are not interchangeable.
This table created by Continuum (the creators of Anaconda) pro‐
vides a side-by-side comparison of all three options: conda, pip,
and virtualenv.
conda-build, Continuums analogue to Buildout, can be installed on all platforms by
conda install conda-build
Like with Buildout, the conda-build configuration file format is called a “recipe,” and
the recipes are not limited to using Python tools. Unlike Buildout, the code is speci‐
fied in shell script, not Python, and the configuration is specified in YAML,11 not
Pythons ConfigParser format.
The main advantage of conda over pip and virtualenv is for Windows users—Python
libraries built as C extensions may or may not be present as wheels, but they are
almost always present on the Anaconda package index. And if a package is not avail‐
able via conda, it is possible to install pip and then install packages hosted on PyPI.
Docker helps with environment isolation like virtualenv, conda, or Buildout, but
instead of providing a virtual environment, it provides a Docker container. Containers
provide greater isolation than environments. For example, you can have containers
running, each with different network interfaces, firewalling rules, and a different
hostname. These running containers are managed by a separate utility, the Docker
Engine, that coordinates access to the underlying operating system. If youre running
Docker containers on OS X, Windows, or on a remote host, you’ll also need Docker
Machine, which interfaces with the virtual machine(s)12 that will run the Docker
Docker containers were originally based on Linux Containers, which were themselves
originally related to the shell command chroot. chroot is kind of a system-level ver‐
Isolation Tools | 39
13 A virtual environment inside of a Docker container will isolate your Python environment, preserving the OS’s
Python for the utilities that may be needed to support your application—in keeping with our advice to not
install anything via pip (or anything else) in your system Python directory.
sion of the virtualenv command: it makes it appear that the root directory (/) is at a
user-specified path instead of the actual root, providing a completely separate user
Docker doesn’t use chroot, and it doesnt even use Linux Containers anymore (allow‐
ing the universe of Docker images to include Citrix and Solaris machines), but the
Docker Containers are still doing about the same thing. Its configuration files are
called Dockerfiles, which build Docker images that can then be hosted on the Docker
Hub, the Docker package repository (like PyPI).
Docker images, when configured correctly, can take up less space than environments
created using Buildout or conda because Docker users the AUFS union file system,
which stores the “diff” of an image, instead of the whole image. So, for example, if
you want to build and test your package against multiple releases of a dependency,
you could make a base Docker image that contains a virtual environment13 (or Build‐
out environment, or conda environment) containing all of the other dependencies.
You’d then inherit from that base for all of your other images, adding only the single
changing dependency in the last layer. Then, all of the derived containers will contain
only the different new library, sharing the contents of the base image. For more infor‐
mation, see the Docker documentation.
40 | Chapter 3: Your Development Environment
Getting Down to Business
Weve got our towels, a Python interpreter, virtual environments, and an editor or
IDE—were ready to get down to business. This part does not teach you the language;
“Learning Python” on page 289 lists great resources that already do that. Instead, we
want you to come out of this part feeling froody, like a real Python insider, knowing
the tricks of some of the best Pythonistas in our community. This part includes the
following chapters:
Chapter 4, Writing Great Code
We briefly cover style, conventions, idioms, and gotchas that can help new Pytho‐
Chapter 5, Reading Great Code
We take you on a guided tour of parts of our favorite Python libraries, with the
hope that you’ll be encouraged to do more reading on your own.
Chapter 6, Shipping Great Code
We briefly talk about the Python Packaging Authority and how to load libraries
to PyPI, plus options to build and ship executables.
1Originally stated by Ralph Waldo Emerson in Self-Reliance, it is quoted in PEP 8 to affirm that the coder’s best
judgment should supercede the style guide. For example, conformity with surrounding code and existing con‐
vention is more important than consistency with PEP 8.
Writing Great Code
This chapter focuses on best practices for writing great Python code. We will review
coding style conventions that will be used in Chapter 5, and briefly cover logging best
practices, plus list a few of the major differences between available open source licen‐
ses. All of this is intended to help you write code that is easy for us, your community,
to use and extend.
Code Style
Pythonistas (veteran Python developers) celebrate having a language so accessible
that people who have never programmed can still understand what a Python program
does when they read its source code. Readability is at the heart of Pythons design,
following the recognition that code is read much more often than it is written.
One reason Python code can be easily understood is its relatively complete set of code
style guidelines (collected in the two Python Enhancement Proposals PEP 20 and
PEP 8, described in the next few pages) and “Pythonic” idioms. When a Pythonista
points to portions of code and says they are not “Pythonic,” it usually means that
those lines of code do not follow the common guidelines and fail to express the intent
in what is considered the most readable way. Of course, “a foolish consistency is the
hobgoblin of little minds.1 Pedantic devotion to the letter of the PEP can undermine
readability and understandability.
PEP 8 is the de facto code style guide for Python. It covers naming conventions, code
layout, whitespace (tabs versus spaces), and other similar style topics.
This is highly recommended reading. The entire Python community does its best to
adhere to the guidelines laid out within this document. Some projects may stray from
it from time to time, while others (like Requests) may amend its recommendations.
Conforming your Python code to PEP 8 is generally a good idea and helps make code
more consistent when working on projects with other developers. The PEP 8 guide‐
lines are explicit enough that they can be programmatically checked. There is a
command-line program, pep8, that can check your code for conformity. Install it by
running the following command in your terminal:
$ pip3 install pep8
Heres an example of the kinds of things you might see when you run pep8:
$ pep8 E401 multiple imports on one line E302 expected 2 blank lines, found 1 E301 expected 1 blank line, found 0 W602 deprecated form of raising exception E211 whitespace before '(' E201 whitespace after '{' E221 multiple spaces before operator W601 .has_key() is deprecated, use 'in'
The fixes to most of the complaints are straightforward and stated directly in PEP 8.
The code style guide for Requests gives examples of good and bad code and is only
slightly modified from the original PEP 8.
The linters referenced in “Text Editors” on page 22 usually use pep8, so you can also
install one of these to run checks within your editor or IDE. Or, the program auto
pep8 can be used to automatically reformat code in the PEP 8 style. Install the pro‐
gram with:
$ pip3 install autopep8
Use it to format a file in-place (overwriting the original) with:
$ autopep8 --in-place
Excluding the --in-place flag will cause the program to output the modified code
directly to the console for review (or piping to another file). The --aggressive flag
will perform more substantial changes and can be applied multiple times for greater
44 | Chapter 4: Writing Great Code
2Tim Peters is a longtime Python user who eventually became one of its most prolific and tenacious core
developers (creating Pythons sorting algorithm, Timsort), and a frequent Net presence. He at one point was
rumored to be a long-running Python port of the Richard Stallman AI program stallman.el. The original
conspiracy theory appeared on a listserv in the late 1990s.
PEP 20 (a.k.a. The Zen of Python)
PEP 20, the set of guiding principles for decision making in Python, is always avail‐
able via import this in a Python shell. Despite its name, PEP 20 only contains 19
aphorisms, not 20 (the last has not been written down…).
The true history of the Zen of Python is immortalized in Barry Warsaw’s blog post
import this and the Zen of Python.
The Zen of Python by Tim Peters2
Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases arent special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one—and preferably only one—obvious way to do it.
Although that way may not be obvious at first unless you’re Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, its a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea—lets do more of those!
For an example of each Zen aphorism in action, see Hunter Blanks’ presentation
“PEP 20 (The Zen of Python) by Example. Raymond Hettinger also put these princi‐
ples to fantastic use in his talk “Beyond PEP 8: Best Practices for Beautiful, Intelligible
Code Style | 45
3di is a shell utility that identifies and shows lines that differ between two files.
General Advice
This section contains style concepts that are hopefully easy to accept without debate,
and often applicable to languages other than Python. Some of them are direct from
the Zen of Python, but others are just plain common sense. They reaffirm our prefer‐
ence in Python to select the most obvious way to present code, when multiple options
are possible.
Explicit is better than implicit
While any kind of black magic is possible with Python, the simplest, most explicit
way to express something is preferred:
Bad Good
def make_dict(*args):
x, y = args
return dict(**locals())
def make_dict(x, y):
return {'x': x, 'y': y}
In the good code, x and y are explicitly received from the caller, and an explicit dictio‐
nary is returned. A good rule of thumb is that another developer should be able to
read the first and last lines of your function and understand what it does. Thats not
the case with the bad example. (Of course, its also pretty easy when the function is
only two lines long.)
Sparse is better than dense
Make only one statement per line. Some compound statements, such as list compre‐
hensions, are allowed and appreciated for their brevity and their expressiveness, but it
is good practice to keep disjoint statements on separate lines of code. It also makes for
more understandable diffs3 when revisions to one statement are made:
Bad Good
print('one'); print('two')print('one')
if x == 1: print('one')if x == 1:
if (<complex comparison> and
<other complex comparison>):
# do something
cond1 = <complex comparison>
cond2 = <other complex comparison>
if cond1 and cond2:
# do something
46 | Chapter 4: Writing Great Code
Gains in readability, to Pythonistas, are more valuable than a few bytes of total code
(for the two-prints-on-one-line statement) or a few microseconds of computation
time (for the extra-conditionals-on-separate-lines statement). Plus, when a group is
contributing to open source, the “good” codes revision history will be easier to deci‐
pher because a change on one line can only affect one thing.
Errors should never pass silently / Unless explicitly silenced
Error handling in Python is done using the try statement. An example from Ben
Gleitzmans HowDoI package (described more in “HowDoI” on page 93) shows when
silencing an error is OK:
def format_output(code, args):
if not args['color']:
return code
lexer = None
# try to find a lexer using the Stack Overflow tags
# or the query arguments
for keyword in args['query'].split() + args['tags']:
lexer = get_lexer_by_name(keyword)
except ClassNotFound:
# no lexer found above, use the guesser
if not lexer:
lexer = guess_lexer(code)
return highlight(code,
This is part of a package that provides a command-line script to query the Internet
(Stack Overflow, by default) for how to do a particular coding task, and prints it to
the screen. The function format_output() applies syntax highlighting by first search‐
ing through the questions tags for a string understood by the lexer (also called a
tokenizer; a “python, “java, or “bash” tag will identify which lexer to use to split and
colorize the code), and then if that fails, to try inferring the language from the code
itself. There are three paths the program can follow when it reaches the try state‐
Execution enters the try clause (everything between the try and the except), a
lexer is successfully found, the loop breaks, and the function returns the code
highlighted with the selected lexer.
The lexer is not found, the ClassNotFound exception is thrown, its caught, and
nothing is done. The loop continues until it finishes naturally or a lexer is found.
Code Style | 47
Some other exception occurs (like a KeyboardInterrupt) that is not handled,
and it is raised up to the top level, stopping execution.
The “should never pass silently” part of the zen aphorism discourages the use of over‐
zealous error trapping. Heres an example you can try in a separate terminal so that
you can kill it more easily once you get the point:
>>> while True:
... try:
... print("nyah", end=" ")
... except:
... pass
Or don’t try it. The except clause without any specified exception will catch every‐
thing, including KeyboardInterrupt (Ctrl+C in a POSIX terminal), and ignore it; so
it swallows the dozens of interrupts you try to give it to shut the thing down. Its not
just the interrupt issue—a broad except clause can also hide bugs, leaving them to
cause some problem later on, when it will be harder to diagnose. We repeat, don’t let
errors pass silently: always explicitly identify by name the exceptions you will catch,
and handle only those exceptions. If you simply want to log or otherwise acknowl‐
edge the exception and re-raise it, like in the following snippet, thats OK. Just don’t
let the error pass silently (without handling or re-raising it):
>>> while True:
... try:
... print("ni", end="-")
... except:
... print("An exception happened. Raising.")
... raise
Function arguments should be intuitive to use
Your choices in API design will determine the downstream developer’s experience
when interacting with a function. Arguments can be passed to functions in four dif‐
ferent ways:
def func(positional, keyword=value, *args, **kwargs):
Positional arguments are mandatory and have no default values.
Keyword arguments are optional and have default values.
An arbitrary argument list is optional and has no default values.
An arbitrary keyword argument dictionary is optional and has no default values.
Here are tips for when to use each method of argument passing:
48 | Chapter 4: Writing Great Code
Positional arguments
Use these when there are only a few function arguments, which are fully part of
the functions meaning, with a natural order. For instance, in send(message,
recipient) or point(x, y) the user of the function has no difficulty remember‐
ing that those two functions require two arguments, and in which order.
Usage antipattern: It is possible to use argument names, and switch the order of
arguments when calling functions—for example, calling send(recipi
ent="World", message="The answer is 42.") and point(y=2, x=1). This
reduces readability and is unnecessarily verbose. Use the more straightforward
calls to send("The answer is 42", "World") and point(1, 2).
Keyword arguments
When a function has more than two or three positional parameters, its signature
is more difficult to remember, and using keyword arguments with default values
is helpful. For instance, a more complete send function could have the signature
send(message, to, cc=None, bcc=None). Here cc and bcc are optional and
evaluate to None when they are not passed another value.
Usage antipattern: It is possible to follow the order of arguments in the definition
without explicitly naming the arguments, like in send("42", "Frankie",
"Benjy", "Trillian"), sending a blind carbon copy to Trillian. It is also possi‐
ble to name arguments in another order, like in send("42", "Frankie",
bcc="Trillian", cc="Benjy"). Unless theres a strong reason not to, its better
to use the form that is the closest to the function definition: send("42",
"Frankie", cc="Benjy", bcc="Trillian").
Never is often better than right now
It is often harder to remove an optional argument (and its logic
inside the function) that was added “just in case” and is seemingly
never used, than to add a new optional argument and its logic
when needed.
Arbitrary argument list
Defined with the *args construct, it denotes an extensible number of positional
arguments. In the function body, args will be a tuple of all the remaining posi‐
tional arguments. For example, send(message, *args) can also be called with
each recipient as an argument: send("42", "Frankie", "Benjy", "Tril
lian"); and in the function body, args will be equal to ("Frankie", "Benjy",
"Trillian"). A good example of when this works is the print function.
Caveat: If a function receives a list of arguments of the same nature, its often
more clear to use a list or any sequence. Here, if send has multiple recipients, we
Code Style | 49
can define it explicitly: send(message, recipients) and call it with send("42",
["Benjy", "Frankie", "Trillian"]).
Arbitrary keyword argument dictionary
Defined via the **kwargs construct, it passes an undetermined series of named
arguments to the function. In the function body, kwargs will be a dictionary of all
the passed named arguments that have not been caught by other keyword argu‐
ments in the function signature. An example of when this is useful is in logging;
formatters at different levels can seamlessly take what information they need
without inconveniencing the user.
Caveat: The same caution as in the case of *args is necessary, for similar reasons:
these powerful techniques are to be used when there is a proven necessity to use
them, and they should not be used if the simpler and clearer construct is suffi‐
cient to express the functions intention.
The variable names *args and **kwargs can (and should) be
replaced with other names, when other names make more sense.
It is up to the programmer writing the function to determine which arguments are
positional arguments and which are optional keyword arguments, and to decide
whether to use the advanced techniques of arbitrary argument passing. After all,
there should be one—and preferably only one—obvious way to do it. Other users will
appreciate your effort when your Python functions are:
Easy to read (meaning the name and arguments need no explanation)
Easy to change (meaning adding a new keyword argument wont break other
parts of the code)
If the implementation is hard to explain, it’s a bad idea
A powerful tool for hackers, Python comes with a very rich set of hooks and tools
allowing you to do almost any kind of tricky tricks. For instance, it is possible to:
Change how objects are created and instantiated
Change how the Python interpreter imports modules
Embed C routines in Python
All these options have drawbacks, and it is always better to use the most straightfor‐
ward way to achieve your goal. The main drawback is that readability suffers when
50 | Chapter 4: Writing Great Code
using these constructs, so whatever you gain must be more important than the loss of
readability. Many code analysis tools, such as pylint or pyflakes, will be unable to
parse this “magic” code.
A Python developer should know about these nearly infinite possibilities, because it
instills confidence that no impassable problem will be on the way. However, knowing
how and particularly when not to use them is very important.
Like a kung fu master, a Pythonista knows how to kill with a single finger, and never
to actually do it.
We are all responsible users
As already demonstrated, Python allows many tricks, and some of them are poten‐
tially dangerous. A good example is that any client code can override an objects prop‐
erties and methods: there is no “private” keyword in Python. This philosophy is very
different from highly defensive languages like Java, which provide a lot of mecha‐
nisms to prevent any misuse, and is expressed by the saying: “We are all responsible
This doesn’t mean that, for example, no properties are considered private, and that
proper encapsulation is impossible in Python. Rather, instead of relying on concrete
walls erected by the developers between their code and others’ code, the Python com‐
munity prefers to rely on a set of conventions indicating that these elements should
not be accessed directly.
The main convention for private properties and implementation details is to prefix all
internals” with an underscore (e.g., sys._getframe). If the client code breaks this
rule and accesses these marked elements, any misbehavior or problems encountered
if the code is modified are the responsibility of the client code.
Using this convention generously is encouraged: any method or property that is not
intended to be used by client code should be prefixed with an underscore. This will
guarantee a better separation of duties and easier modification of existing code; it will
always be possible to publicize a private property, but making a public property pri‐
vate might be a much harder operation.
Return values from one place
When a function grows in complexity, it is not uncommon to use multiple return
statements inside the functions body. However, to keep a clear intent and sustain
readability, it is best to return meaningful values from as few points in the body as
The two ways to exit from a function are upon error, or with a return value after the
function has been processed normally. In cases when the function cannot perform
correctly, it can be appropriate to return a None or False value. In this case, it is better
Code Style | 51
to return from the function as early as the incorrect context has been detected, to flat‐
ten the structure of the function: all the code after the return-because-of-failure state‐
ment can assume the condition is met to further compute the functions main result.
Having multiple such return statements is often necessary.
Still, when possible, keep a single exit point—its difficult to debug functions when
you first have to identify which return statement is responsible for your result. Forc‐
ing the function to exit in just one place also helps to factor out some code paths, as
the multiple exit points probably are a hint that such a refactoring is needed. This
example is not bad code, but it could possibly be made more clear, as indicated in the
def select_ad(third_party_ads, user_preferences):
if not third_party_ads:
return None # Raising an exception might be better
if not user_preferences:
return None # Raising an exception might be better
# Some complex code to pick the best_ad given the
# available ads and the individual's preferences...
# Resist the temptation to return best_ad if succeeded...
if not best_ad:
# Some Plan-B computation of best_ad
return best_ad # A single exit point for the returned value
# will help when maintaining the code
Conventions make sense to everyone, but may not be the only way to do things. The
conventions we show here are the more commonly used choices, and we recommend
them as the more readable option.
Alternatives to checking for equality
When you don’t need to explicitly compare a value to True, or None, or 0, you can just
add it to the if statement, like in the following examples. (See “Truth Value Testing”
for a list of what is considered false).
Bad Good
if attr == True:
print 'True!'
# Just check the value
if attr:
print 'attr is truthy!'
# or check for the opposite
if not attr:
print 'attr is falsey!'
# but if you only want 'True'
if attr is True:
print 'attr is True'
52 | Chapter 4: Writing Great Code
Bad Good
if attr == None:
print 'attr is None!'
# or explicitly check for None
if attr is None:
print 'attr is None!'
Accessing dictionary elements
Use the x in d syntax instead of the dict.has_key method, or pass a default argu‐
ment to dict.get():
Bad Good
>>> d = {'hello': 'world'}
>>> if d.has_key('hello'):
... print(d['hello']) # prints 'world'
... else:
... print('default_value')
>>> d = {'hello': 'world'}
>>> print d.get('hello', 'default_value')
>>> print d.get('howdy', 'default_value')
>>> # Or:
... if 'hello' in d:
... print(d['hello'])
Manipulating lists
List comprehensions provide a powerful, concise way to work with lists (for more
information, see the entry in The Python Tutorial). Also, the map() and filter()
functions can perform operations on lists using a different, more concise syntax:
Standard loop List comprehension
# Filter elements greater than 4
a = [3, 4, 5]
b = []
for i in a:
if i > 4:
# The list comprehension is clearer
a = [3, 4, 5]
b = [i for i in a if i > 4]
# Or:
b = filter(lambda x: x > 4, a)
# Add three to all list members.
a = [3, 4, 5]
for i in range(len(a)):
a[i] += 3
# Also clearer in this case
a = [3, 4, 5]
a = [i + 3 for i in a]
# Or:
a = map(lambda i: i + 3, a)
Use enumerate() to keep a count of your place in the list. It is more readable than
manually creating a counter, and it is better optimized for iterators:
>>> a = ["icky", "icky", "icky", "p-tang"]
>>> for i, item in enumerate(a):
Code Style | 53
4A max of 80 characters according to PEP 8, 100 according to many others, and for you, whatever your boss
says. Ha! But honestly, anyone whos ever had to use a terminal to debug code while standing up next to a rack
will quickly come to appreciate the 80-character limit (at which code doesn’t wrap on a terminal) and in fact
prefer 75–77 characters to allow for line numbering in Vi.
5See Zen 14. Guido, our BDFL, happens to be Dutch.
... print("{i}: {item}".format(i=i, item=item))
0: icky
1: icky
2: icky
3: p-tang
Continuing a long line of code
When a logical line of code is longer than the accepted limit,4 you need to split it over
multiple physical lines. The Python interpreter will join consecutive lines if the last
character of the line is a backslash. This is helpful in some cases, but should usually
be avoided because of its fragility: a whitespace character added to the end of the line,
after the backslash, will break the code and may have unexpected results.
A better solution is to use parentheses around your elements. Left with an unclosed
parenthesis on an end-of-line, the Python interpreter will join the next line until the
parentheses are closed. The same behavior holds for curly and square braces:
Bad Good
french_insult = \
"Your mother was a hamster, and \
your father smelt of elderberries!"
french_insult = (
"Your mother was a hamster, and "
"your father smelt of elderberries!"
from \
import a_nice_function, \
another_nice_function, \
from import (
However, more often than not, having to split a long logical line is a sign that you are
trying to do too many things at the same time, which may hinder readability.
Although there usually is one—and preferably only one—obvious way to do it, the
way to write idiomatic (or Pythonic) code can be non-obvious to Python beginners at
first (unless they’re Dutch5). So, good idioms must be consciously acquired.
54 | Chapter 4: Writing Great Code
If you know the length of a list or tuple, you can assign names to its elements with
unpacking. For example, because its possible to specify the number of times to split a
string in split() and rsplit(), the righthand side of an assignment can be made to
split only once (e.g., into a filename and an extension), and the lefthand side can con‐
tain both destinations simultaneously, in the correct order, like this:
>>> filename, ext = "my_photo.orig.png".rsplit(".", 1)
>>> print(filename, "is a", ext, "file.")
my_photo.orig is a png file.
You can use unpacking to swap variables as well:
a, b = b, a
Nested unpacking works, too:
a, (b, c) = 1, (2, 3)
In Python 3, a new method of extended unpacking was introduced by PEP 3132:
a, *rest = [1, 2, 3]
# a = 1, rest = [2, 3]
a, *middle, c = [1, 2, 3, 4]
# a = 1, middle = [2, 3], c = 4
Ignoring a value
If you need to assign something while unpacking, but will not need that variable, use
a double underscore (__):
filename = 'foobar.txt'
basename, __, ext = filename.rpartition('.')
Many Python style guides recommend a single underscore (_) for
throwaway variables rather than the double underscore (__) rec‐
ommended here. The issue is that a single underscore is commonly
used as an alias for the gettext.gettext() function, and is also
used at the interactive prompt to hold the value of the last opera‐
tion. Using a double underscore instead is just as clear and almost
as convenient, and eliminates the risk of accidentally overwriting
the single underscore variable, in either of these other use cases.
Creating a length-N list of the same thing
Use the Python list * operator to make a list of the same immutable item:
>>> four_nones = [None] * 4
>>> print(four_nones)
[None, None, None, None]
Code Style | 55
6By the way, this is why only hashable objects can be stored in sets or used as dictionary keys. To make your
own Python objects hashable, define an object.__hash__(self) member function that returns an integer.
Objects that compare equal must have the same hash value. The Python documentation has more informa‐
But be careful with mutable objects: because lists are mutable, the * operator will cre‐
ate a list of N references to the same list, which is not likely what you want. Instead,
use a list comprehension:
Bad Good
>>> four_lists = [[]] * 4
>>> four_lists[0].append("Ni")
>>> print(four_lists)
[['Ni'], ['Ni'], ['Ni'], ['Ni']]
>>> four_lists = [[] for __ in range(4)]
>>> four_lists[0].append("Ni")
>>> print(four_lists)
[['Ni'], [], [], []]
A common idiom for creating strings is to use str.join() on an empty string. This
idiom can be applied to lists and tuples:
>>> letters = ['s', 'p', 'a', 'm']
>>> word = ''.join(letters)
>>> print(word)
Sometimes we need to search through a collection of things. Lets look at two options:
lists and sets.
Take the following code for example:
>>> x = list(('foo', 'foo', 'bar', 'baz'))
>>> y = set(('foo', 'foo', 'bar', 'baz'))
>>> print(x)
['foo', 'foo', 'bar', 'baz']
>>> print(y)
{'foo', 'bar', 'baz'}
>>> 'foo' in x
>>> 'foo' in y
Even though both boolean tests for list and set membership look identical, foo in y
is utilizing the fact that sets (and dictionaries) in Python are hash tables,6 the lookup
performance between the two examples is different. Python will have to step through
each item in the list to find a matching case, which is time-consuming (the time dif‐
ference becomes significant for larger collections). But finding keys in the set can be
done quickly, using the hash lookup. Also, sets and dictionaries drop duplicate
56 | Chapter 4: Writing Great Code
entries, which is why dictionaries cannot have two identical keys. For more informa‐
tion, see this Stack Overflow discussion on list versus dict.
Exception-safe contexts
It is common to use try/finally clauses to manage resources like files or thread
locks when exceptions may occur. PEP 343 introduced the with statement and a con‐
text manager protocol into Python (in version 2.5 and beyond)—an idiom to replace
these try/finally clauses with more readable code. The protocol consists of two
methods, __enter__() and __exit__(), that when implemented for an object allow
it to be used via the new with statement, like this:
>>> import threading
>>> some_lock = threading.Lock()
>>> with some_lock:
... # Make Earth Mark One, run it for 10 million years ...
... print(
... "Look at me: I design coastlines.\n"
... "I got an award for Norway."
... )
which would previously have been:
>>> import threading
>>> some_lock = threading.Lock()
>>> some_lock.acquire()
>>> try:
... # Make Earth Mark One, run it for 10 million years ...
... print(
... "Look at me: I design coastlines.\n"
... "I got an award for Norway."
... )
... finally:
... some_lock.release()
The standard library module contextlib provides additional tools that help turn
functions into context managers, enforce the call of an objects close() method, sup‐
press exceptions (Python 3.4 and greater), and redirect standard output and error
streams (Python 3.4 or 3.5 and greater). Here is an example use of contextlib.clos
>>> from contextlib import closing
>>> with closing(open("outfile.txt", "w")) as output:
... output.write("Well, he's...he's, ah...probably pining for the fjords.")
Code Style | 57
7In this case, the __exit__() method just calls the I/O wrapper’s close() method, to close the file descriptor.
On many systems, theres a maximum allowable number of open file descriptors, and its good practice to
release them when they’re done.
but because __enter__() and __exit__() methods are defined for the object that
handles file I/O,7 we can use the with statement directly, without the closing:
>>> with open("outfile.txt", "w") as output:
"PININ' for the FJORDS?!?!?!? "
"What kind of talk is that?, look, why did he fall "
"flat on his back the moment I got 'im home?\n"
Common Gotchas
For the most part, Python aims to be a clean and consistent language that avoids sur‐
prises. However, there are a few cases that can be confusing to newcomers.
Some of these cases are intentional but can be potentially surprising. Some could
arguably be considered language warts. In general, though, what follows is a collec‐
tion of potentially tricky behaviors that might seem strange at first glance, but are
generally sensible once youre aware of the underlying cause for the surprise.
Mutable default arguments
Seemingly the most common surprise new Python programmers encounter is
Pythons treatment of mutable default arguments in function definitions.
What you wrote:
def append_to(element, to=[]):
return to
What you might have expected to happen:
my_list = append_to(12)
my_other_list = append_to(42)
A new list is created each time the function is called if a second argument isn’t
provided, so that the output is:
58 | Chapter 4: Writing Great Code
What actually happens:
[12, 42]
A new list is created once when the function is defined, and the same list is used
in each successive call: Pythons default arguments are evaluated once when the
function is defined, not each time the function is called (like it is in say, Ruby).
This means that if you use a mutable default argument and mutate it, you will
have mutated that object for all future calls to the function as well.
What you should do instead:
Create a new object each time the function is called, by using a default arg to sig‐
nal that no argument was provided (None is often a good choice):
def append_to(element, to=None):
if to is None:
to = []
return to
When this gotcha isn’t a gotcha:
Sometimes you can specifically “exploit” (i.e., use as intended) this behavior to
maintain state between calls of a function. This is often done when writing a
caching function (which stores results in-memory), for example:
def time_consuming_function(x, y, cache={}):
args = (x, y)
if args in cache:
return cache[args]
# Otherwise this is the first time with these arguments.
# Do the time-consuming operation...
cache[args] = result
return result
Late binding closures
Another common source of confusion is the way Python binds its variables in clo‐
sures (or in the surrounding global scope).
What you wrote:
def create_multipliers():
return [lambda x : i * x for i in range(5)]
What you might have expected to happen:
for multiplier in create_multipliers():
print(multiplier(2), end=" ... ")
A list containing five functions that each have their own closed-over i variable
that multiplies their argument, producing:
Code Style | 59
0 ... 2 ... 4 ... 6 ... 8 ...
What actually happens:
8 ... 8 ... 8 ... 8 ... 8 ...
Five functions are created; instead all of them just multiply x by 4. Why? Pythons
closures are late binding. This means that the values of variables used in closures
are looked up at the time the inner function is called.
Here, whenever any of the returned functions are called, the value of i is looked
up in the surrounding scope at call time. By then, the loop has completed, and i
is left with its final value of 4.
Whats particularly nasty about this gotcha is the seemingly prevalent misinfor‐
mation that this has something to do with lambda expressions in Python. Func‐
tions created with a lambda expression are in no way special, and in fact the same
exact behavior is exhibited by just using an ordinary def:
def create_multipliers():
multipliers = []
for i in range(5):
def multiplier(x):
return i * x
return multipliers
What you should do instead:
The most general solution is arguably a bit of a hack. Due to Pythons aforemen‐
tioned behavior concerning evaluating default arguments to functions (see
Mutable default arguments” on page 58), you can create a closure that binds
immediately to its arguments by using a default argument:
def create_multipliers():
return [lambda x, i=i : i * x for i in range(5)]
Alternatively, you can use the functools.partial() function:
from functools import partial
from operator import mul
def create_multipliers():
return [partial(mul, i) for i in range(5)]
When this gotcha isn’t a gotcha:
Sometimes you want your closures to behave this way. Late binding is good in
lots of situations (e.g., in the Diamond project, “Example use of a closure (when
the gotcha isn’t a gotcha)” on page 109). Looping to create unique functions is
unfortunately a case where it can cause hiccups.
60 | Chapter 4: Writing Great Code
Structuring Your Project
By structure we mean the decisions you make concerning how your project best
meets its objective. The goal is to best leverage Pythons features to create clean, effec‐
tive code. In practical terms, that means the logic and dependencies in both your
code and in your file and folder structure are clear.
Which functions should go into which modules? How does data flow through the
project? What features and functions can be grouped together and isolated? By
answering questions like these, you can begin to plan, in a broad sense, what your
finished product will look like.
The Python Cookbook has a chapter on modules and packages that describes in detail
how __import__ statements and packaging works. The purpose of this section is to
outline aspects of Pythons module and import systems that are central to enforcing
structure in your project. We then discuss various perspectives on how to build code
that can be extended and tested reliably.
Thanks to the way imports and modules are handled in Python, it is relatively easy to
structure a Python project: there are few constraints and the model for importing
modules is easy to grasp. Therefore, you are left with the pure architectural task of
crafting the different parts of your project and their interactions.
Modules are one of Pythons main abstraction layers, and probably the most natural
one. Abstraction layers allow a programmer to separate code into parts that hold
related data and functionality.
For example, if one layer of a project handles interfacing with user actions, while
another handles low-level manipulation of data, the most natural way to separate
these two layers is to regroup all interfacing functionality in one file, and all low-level
operations in another file. This grouping places them into two separate modules. The
interface file would then import the low-level file with the import module or from
module import attribute statements.
As soon as you use import statements, you also use modules. These can be either
built-in modules (such as os and sys), third-party packages you have installed in
your environment (such as Requests or NumPy), or your projects internal modules.
The following code shows some example import statements and confirms that an
imported module is a Python object with its own data type:
>>> import sys # built-in module
>>> import matplotlib.pyplot as plt # third-party module
>>> import mymodule as mod # internal project module
Structuring Your Project | 61
8If you’d like, you could name your module, but even our friend the underscore should not be
seen often in module names (underscores give the impression of a variable name).
>>> print(type(sys), type(plt), type(mod))
<class 'module'> <class 'module'> <class 'module'>
To keep in line with the style guide, keep module names short and lowercase. And be
sure to avoid using special symbols like the dot (.) or question mark (?), which would
interfere with the way Python looks for modules. So a filename like my.spam.py8 is
one you should avoid; Python would expect to find a file in a folder named
my, which is not the case. The Python documentation gives more details about using
dot notation.
Importing modules
Aside from some naming restrictions, nothing special is required to use a Python file
as a module, but it helps to understand the import mechanism. First, the import
modu statement will look for the definition of modu in a file named in the
same directory as the caller if a file with that name exists. If it is not found, the
Python interpreter will search for in Pythons search path recursively and
raise an ImportError exception if it is not found. The value of the search path is
platform-dependent and includes any user- or system-defined directories in the envi‐
ronments $PYTHONPATH (or %PYTHONPATH% in Windows). It can be manipulated or
inspected in a Python session:
import sys
>>> sys.path
[ '', '/current/absolute/path', 'etc']
# The actual list contains every path that is searched
# when you import libraries into Python, in the order
# that they'll be searched.
Once is found, the Python interpreter will execute the module in an isolated
scope. Any top-level statement in will be executed, including other imports,
if any exist. Function and class definitions are stored in the modules dictionary.
Finally, the modules variables, functions, and classes will be available to the caller
through the modules namespace, a central concept in programming that is particu‐
larly helpful and powerful in Python. Namespaces provide a scope containing named
attributes that are visible to each other but not directly accessible outside of the
In many languages, an include file directive causes the preprocessor to, effectively,
copy the contents of the included file into the caller’s code. Its different in Python: the
included code is isolated in a module namespace. The result of the import modu
statement will be a module object named modu in the global namespace, with the
62 | Chapter 4: Writing Great Code
attributes defined in the module accessible via dot notation: modu.sqrt would be the
sqrt object defined inside of, for example. This means you generally dont
have to worry that the included code could have unwanted effects—for example,
overriding an existing function with the same name.
Namespace Tools
The functions dir(), globals(), and locals() help with quick namespace introspec‐
dir(object) returns a list of attributes that are accessible via the object
globals() returns a dictionary of the attributes currently in the global name‐
space, along with their values.
locals() returns a dictionary of the attributes in the current local namespace
(e.g., within a function), along with their values.
For more information, see “Data model in Pythons official documentation.
It is possible to simulate the more standard behavior by using a special syntax of the
import statement: from modu import *. However, this is generally considered bad
practice: using import * makes code harder to read, makes dependencies less com‐
partmentalized, and can clobber (overwrite) existing defined objects with the new
definitions inside the imported module.
Using from modu import func is a way to import only the attribute you want into
the global namespace. While much less harmful than from modu import * because it
shows explicitly what is imported in the global namespace. Its only advantage over a
simpler import modu is that it will save you a little typing.
Table 4-1 compares the different ways to import definitions from other modules.
Table 4-1. Dierent ways to import denitions from modules
Very bad
(confusing for a reader)
(obvious which new names are
in the global namespace)
(immediately obvious where
the attribute comes from)
from modu import *
x = sqrt(4)
from modu import sqrt
x = sqrt(4)
import modu
x = modu.sqrt(4)
Is sqrt part of modu? Or a built-in?
Or dened above? Has sqrt been modied or redened
in between, or is it the one in modu?Now sqrt is visibly part of modu’s
Structuring Your Project | 63
As mentioned in Code Style” on page 43, readability is one of the main features of
Python. Readable code avoids useless boilerplate text and clutter. But terseness and
obscurity are the limits where brevity should stop. Explicitly stating where a class or
function comes from, as in the modu.func() idiom, greatly improves code readability
and understandability in all but the simplest single-file projects.
Structure Is Key
Though you can structure a project however you like, some pitfalls to avoid are:
Multiple and messy circular dependencies
If your classes Table and Chair in need to import Carpenter from work‐ to answer a question such as table.is_done_by(), and if the class Carpen
ter needs to import Table and Chair, to answer carpenter.what_do(), then you
have a circular dependency— depends on, which depends on In this case, you will have to resort to fragile hacks such as using import
statements inside methods to avoid causing an ImportError.
Hidden coupling
Each and every change in Tables implementation breaks 20 tests in unrelated
test cases because it breaks Carpenters code, which requires very careful surgery
to adapt the change. This means you have too many assumptions about Table in
Carpenters code.
Heavy use of global state or context
Instead of explicitly passing (height, width, type, wood) to each other, Table and
Carpenter rely on global variables that can be modified and are modified on the
fly by different agents. You need to scrutinize all access to these global variables
to understand why a rectangular table became a square, and discover that remote
template code is also modifying this context, messing with table dimensions.
Spaghetti code
Multiple pages of nested if clauses and for loops with a lot of copy-pasted pro‐
cedural code and no proper segmentation are known as spaghetti code. Pythons
meaningful indentation (one of its most controversial features) makes it hard to
maintain this kind of code, so you may not see too much of it.
Ravioli code
This is more likely in Python than spaghetti code. Ravioli code consists of hun‐
dreds of similar little pieces of logic, often classes or objects, without proper
structure. If you never can remember whether you have to use FurnitureTable,
AssetTable or Table, or even TableNew for your task at hand, you might be
swimming in ravioli code. Diamond, Requests, and Werkzeug (in the next chap‐
64 | Chapter 4: Writing Great Code
9Thanks to PEP 420, which was implemented in Python 3.3, there is now an alternative to the root package,
called the namespace package. Namespace packages must not have an and can be dispersed across
multiple directories in sys.path. Python will gather all of the pieces together and present them together to the
user as a single package.
ter) avoid ravioli code by collecting their useful but unrelated pieces of logic into
a module or a utils package to reuse across the project.
Python provides a very straightforward packaging system, which extends the module
mechanism to a directory.
Any directory with an file is considered a Python package. The top-level
directory with an is the root package.9 The different modules in the pack‐
age are imported in a similar manner as plain modules, but with a special behavior
for the file, which is used to gather all package-wide definitions.
A file in the directory pack/ is imported with the statement import
pack.modu. The interpreter will look for an file in pack and execute all of
its top-level statements. Then it will look for a file named pack/ and execute
all of its top-level statements. After these operations, any variable, function, or class
defined in is available in the pack.modu namespace.
A commonly seen issue is too much code in files. When the projects
complexity grows, there may be subpackages and sub-subpackages in a deep direc‐
tory structure. In this case, importing a single item from a sub-sub-package will
require executing all files met while traversing the tree.
It is normal, even good practice, to leave an empty when the packages
modules and subpackages do not need to share any code—the HowDoI and Dia‐
mond projects that are used as examples in the next section both have no code except
version numbers in their files. The Tablib, Requests, and Flask projects
contain a top-level documentation string and import statements that expose the
intended API for each project, and the Werkzeug project also exposes its top-level
API but does it using lazy loading (extra code that only adds content to the name‐
space as it is used, which speeds up the initial import statement).
Lastly, a convenient syntax is available for importing deeply nested packages: import
very.deep.module as mod. This allows you to use mod in place of the verbose repeti‐
tion of very.deep.module.
Structuring Your Project | 65
Object-Oriented Programming
Python is sometimes described as an object-oriented programming language. This
can be somewhat misleading and needs to be clarified.
In Python, everything is an object, and can be handled as such. This is what is meant
when we say that functions are first-class objects. Functions, classes, strings, and even
types are objects in Python: they all have a type, can be passed as function arguments,
and may have methods and properties. In this understanding, Python is an object-
oriented language.
However, unlike Java, Python does not impose object-oriented programming as the
main programming paradigm. It is perfectly viable for a Python project to not be
object oriented—that is, to use no (or very few) class definitions, class inheritance, or
any other mechanisms that are specific to object-oriented programming. These fea‐
tures are available, but not obligatory, for us Pythonistas. Moreover, as seen in “Mod‐
ules on page 61, the way Python handles modules and namespaces gives the
developer a natural way to ensure the encapsulation and separation of abstraction lay‐
ers—the most common reasons to use object orientation—without classes.
Proponents of functional programming (a paradigm that, in its purest form, has no
assignment operator, no side effects, and basically chains functions to accomplish
tasks), say that bugs and confusion occur when a function does different things
depending on the external state of the system—for example, a global variable that
indicates whether or not a person is logged in. Python, although not a purely func‐
tional language, has tools that make functional programming possible, and then we
can restrict our use of custom classes to situations where we want to glue together a
state and a functionality.
In some architectures, typically web applications, multiple instances of Python pro‐
cesses are spawned to respond to external requests that can happen at the same time.
In this case, holding some state into instantiated objects, which means keeping some
static information about the world, is prone to race conditions, a term used to describe
the situation where, at some point between the initialization of the state of an object
(usually done with the Class.__init__() method in Python) and the actual use of
the object state through one of its methods, the state of the world has changed.
For example, a request may load an item in memory and later mark it as added to a
user’s shopping cart. If another request sells the item to another person at the same
time, it may happen that the sale actually occurs after the first session loaded the item,
and then we are trying to sell inventory already flagged as sold. This and other issues
led to a preference for stateless functions.
Our recommendation is as follows: when working with code that relies on some per‐
sistent context or global state (like most web applications), use functions and proce‐
dures with as few implicit contexts and side effects as possible. A functions implicit
66 | Chapter 4: Writing Great Code
context is made up of any of the global variables or items in the persistence layer that
are accessed from within the function. Side eects are the changes that a function
makes to its implicit context. If a function saves or deletes data in a global variable or
in the persistence layer, it is said to have a side effect.
Custom classes in Python should be used to carefully isolate functions with context
and side effects from functions with logic (called pure functions). Pure functions are
deterministic: given a fixed input, the output will always be the same. This is because
they do not depend on context, and do not have side effects. The print() function,
for example, is impure because it returns nothing but writes to standard output as a
side effect. Here are some benefits of having pure, separate functions:
Pure functions are much easier to change or replace if they need to be refactored
or optimized.
Pure functions are easier to test with unit-tests there is less need for complex
context setup and data cleaning afterward.
Pure functions are easier to manipulate, decorate (more on decorators in a
moment), and pass around.
In summary, for some architectures, pure functions are more efficient building blocks
than classes and objects because they have no context or side effects. As an example,
the I/O functions related to each of the file formats in the Tablib library (tablib/
formats/*.py—we’ll look at Tablib in the next chapter) are pure functions, and not
part of a class, because all they do is read data into a separate Dataset object that per‐
sists the data, or write the Dataset to a file. But the Session object in the Requests
library (also coming up in the next chapter) is a class, because it has to persist the
cookie and authentication information that may be exchanged in an HTTP session.
Object orientation is useful and even necessary in many cases—for
example, when developing graphical desktop applications or
games, where the things that are manipulated (windows, buttons,
avatars, vehicles) have a relatively long life of their own in the com‐
puter’s memory. This is also one motive behind object-relational
mapping, which maps rows in databases to objects in code, dis‐
cussed further in “Database Libraries” on page 272.
Decorators were added to Python in version 2.4 and are defined and discussed in
PEP 318. A decorator is a function or a class method that wraps (or decorates)
another function or method. The decorated function or method will replace the origi‐
nal function or method. Because functions are first-class objects in Python, this can
Structuring Your Project | 67
be done manually, but using the @decorator syntax is clearer and preferred. Here is
an example of how to use a decorator:
>>> def foo():
... print("I am inside foo.")
>>> import logging
>>> logging.basicConfig()
>>> def logged(func, *args, **kwargs):
... logger = logging.getLogger()
... def new_func(*args, **kwargs):
... logger.debug("calling {} with args {} and kwargs {}".format(
... func.__name__, args, kwargs))
... return func(*args, **kwargs)
... return new_func
... @logged
... def bar():
... print("I am inside bar.")
>>> logging.getLogger().setLevel(logging.DEBUG)
>>> bar()
DEBUG:root:calling bar with args () and kwargs {}
I am inside bar.
>>> foo()
I am inside foo.
This mechanism is useful for isolating the core logic of the function or method. A
good example of a task that is better handled with decoration is memoization or cach‐
ing: you want to store the results of an expensive function in a table and use them
directly instead of recomputing them when they have already been computed. This is
clearly not part of the function logic. As of PEP 3129, starting in Python 3, decorators
can also be applied to classes.
Dynamic Typing
Python is dynamically typed (as opposed to statically typed), meaning variables do
not have a fixed type. Variables are implemented as pointers to an object, making it
possible for the variable a to be set to the value 42, then to the value “thanks for all the
fish, then to a function.
The dynamic typing used in Python is often considered to be a weakness, because it
can lead to complexities and hard-to-debug code: if something named a can be set to
many different things, the developer or the maintainer must track this name in the
68 | Chapter 4: Writing Great Code
10 Instructions to define your own types in C are provided in the Python extension documentation.
code to make sure it has not been set to a completely unrelated object. Table 4-2 illus‐
trates good and bad practice when using names.
Table 4-2. Avoid using the same variable name for dierent things
Advice Bad Good
Use short functions or methods to
reduce the risk of using the same
name for two unrelated things.
a = 1
a = 'answer is {}'.format(a)
def get_answer(a):
return 'answer is {}'.format(a)
a = get_answer(1)
Use dierent names for related
items when they have a dierent
# A string ...
items = 'a b c d'
# No, a list ...
items = items.split(' ')
# No, a set ...
items = set(items)
items_string = 'a b c d'
items_list = items.split(' ')
items = set(items_list)
There is no efficiency gain when reusing names: the assignment will still create a new
object. And when the complexity grows and each assignment is separated by other
lines of code, including branches and loops, it becomes harder to determine a given
variables type.
Some coding practices, like functional programming, recommend against reassigning
variables. In Java, a variable can be forced to always contain the same value after
assignment by using the final keyword. Python does not have a final keyword, and it
would be against its philosophy. But assigning a varible only once may be a good dis‐
cipline; it helps reinforce the concept of mutable versus immutable types.
Pylint will warn you if you reassign a variable to two different
Mutable and Immutable Types
Python has two kinds of built-in or user-defined10 types:
# Lists are mutable
my_list = [1, 2, 3]
my_list[0] = 4
print my_list # [4, 2, 3] <- The same list, changed.
# Integers are immutable
Structuring Your Project | 69
11 An example of a simple hashing algorithm is to convert the bytes of an item to an integer, and take its value
modulo some number. This is how memcached distributes keys across multiple computers.
x = 6
x = x + 1 # The new x occupies a different location in memory.
Mutable types
These allow in-place modification of the objects content. Examples are lists and
dictionaries, which have mutating methods like list.append() or dict.pop()
and can be modified in place.
Immutable types
These types provide no method for changing their content. For instance, the vari‐
able x set to the integer 6 has no “increment” method. To compute x + 1, you
have to create another integer and give it a name.
One consequence of this difference in behavior is that mutable types cannot be used
as dictionary keys, because if the value ever changes, it will not hash to the same
value, and dictionaries use hashing11 for key storage. The immutable equivalent of a
list is the tuple, created with parentheses—for example, (1, 2). It cannot be changed
in place and so can be used as a dictionary key.
Using properly mutable types for objects that are intended to be mutable (e.g.,
my_list = [1, 2, 3]) and immutable types for objects that are intended to have a
fixed value (e.g., islington_phone = ("220", "7946", "0347")) clarifies the intent
of the code for other developers.
One peculiarity of Python that can surprise newcomers is that strings are immutable;
attempting to change one will yield a type error:
>>> s = "I'm not mutable"
>>> s[1:7] = " am"
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: 'str' object does not support item assignment
This means that when constructing a string from its parts, it is much more efficient to
accumulate the parts in a list, because it is mutable, and then join the parts together to
make the full string. Also, a Python list comprehension, which is a shorthand syntax to
iterate over an input to create a list, is better and faster than constructing a list from
calls to append() within a loop. Table 4-3 shows different ways to create a string from
an iterable.
70 | Chapter 4: Writing Great Code
12 We should admit that even though, according to PEP 3101, the percent-style formatting (%s, %d, %f) has been
deprecated now for over a decade, most old hats still use it, and PEP 460 just introduced this same method to
format bytes or bytearray objects.
Table 4-3. Example ways to concatenate a string
Bad Good Best
>>> s = ""
>>> for c in (97, 98, 98):
... s += unichr(c)
>>> print(s)
>>> s = []
>>> for c in (97, 98, 99):
... s.append(unichr(c))
>>> print("".join(s))
>>> r = (97, 98, 99)
>>> s = [unichr(c) for c in r]
>>> print("".join(s))
The main Python page has a good discussion on this kind of optimization.
Finally, if the number of elements in a concatenation is known, pure string addition is
faster (and more straightforward) than creating a list of items just to do a "".join().
All of the following formatting options to define cheese do the same thing:12
>>> adj = "Red"
>>> noun = "Leicester"
>>> cheese = "%s %s" % (adj, noun) # This style was deprecated (PEP 3101)
>>> cheese = "{} {}".format(adj, noun) # Possible since Python 3.1
>>> cheese = "{0} {1}".format(adj, noun) # Numbers can also be reused
>>> cheese = "{adj} {noun}".format(adj=adj, noun=noun) # This style is best
>>> print(cheese)
Red Leicester
Vendorizing Dependencies
A package that vendorizes dependencies includes external dependencies (third-party
libraries) within its source, often inside of a folder named vendor, or packages. There
is a very good blog post on the subject that lists the main reasons a package owner
might do this (basically, to avoid various dependency issues), and discusses alterna‐
Consensus is that in almost all cases, it is better to keep the dependency separate, as it
adds unnecessary content (often megabytes of extra code) to the repository; virtual
environments used in combination with (preferred, especially when your
package is a library) or a requirements.txt (which, when used, will override dependen‐
cies in in the case of conflicts) can restrict dependencies to a known set of
working versions.
If those options are not enough, it might be helpful to contact the owner of the
dependency to maybe resolve the issue by updating their package (e.g., your library
Structuring Your Project | 71
many depend on an upcoming release of their package, or may need a specific new
feature added), as those changes would likely benefit the entire community. The cav‐
eat is, if you submit pull requests for big changes, you may be expected to maintain
those changes when further suggestions and requests come in; for this reason, both
Tablib and Requests vendorize at least some dependencies. As the community moves
into complete adoption of Python 3, hopefully fewer of the most pressing issues will
Testing Your Code
Testing your code is very important. People are much more likely to use a project that
actually works.
Python first included doctest and unittest in Python 2.1, released in 2001, embrac‐
ing test-driven development (TDD), where the developer first writes tests that define
the main operation and edge cases for a function, and then writes the function to pass
those tests. Since then, TDD has become accepted and widely adopted in business
and in open source projects—its a good idea to practice writing the testing code and
the running code in parallel. Used wisely, this method helps you precisely define your
codes intent and have a more modular architecture.
Tips for testing
A test is about the most massively useful code a hitchhiker can write. Weve summar‐
ized some of our tips here.
Just one thing per test. A testing unit should focus on one tiny bit of functionality and
prove it correct.
Independence is imperative. Each test unit must be fully independent: able to run
alone, and also within the test suite, regardless of the order they are called. The impli‐
cation of this rule is that each test must be loaded with a fresh dataset and may have
to do some cleanup afterward. This is usually handled by setUp() and tearDown()
Precision is better than parsimony. Use long and descriptive names for testing func‐
tions. This guideline is slightly different than for running code, where short names
are often preferred. The reason is testing functions are never called explicitly.
square() or even sqr() is OK in running code, but in testing code, you should have
names such as test_square_of_number_2() or test_square_negative_number().
These function names are displayed when a test fails and should be as descriptive as
Speed counts. Try hard to make tests that are fast. If one test needs more than a few
milliseconds to run, development will be slowed down, or the tests will not be run as
72 | Chapter 4: Writing Great Code
often as is desirable. In some cases, tests can’t be fast because they need a complex
data structure to work on, and this data structure must be loaded every time the test
runs. Keep these heavier tests in a separate test suite that is run by some scheduled
task, and run all other tests as often as needed.
RTMF (Read the manual, friend!). Learn your tools and learn how to run a single test or
a test case. Then, when developing a function inside a module, run this functions
tests often, ideally automatically when you save the code.
Test everything when you start—and again when you nish. Always run the full test suite
before a coding session, and run it again after. This will give you more confidence
that you did not break anything in the rest of the code.
Version control automation hooks are fantastic. It is a good idea to implement a hook
that runs all tests before pushing code to a shared repository. You can directly add
hooks to your version control system, and some IDEs provide ways to do this more
simply in their own environments. Here are the links to the popular systems’ docu‐
mentation, which will step you through how to do this:
Write a breaking test if you want to take a break. If you are in the middle of a develop‐
ment session and have to interrupt your work, it is a good idea to write a broken unit
test about what you want to develop next. When coming back to work, you will have
a pointer to where you were and get back on track faster.
In the face of ambiguity, debug using a test. The first step when you are debugging your
code is to write a new test pinpointing the bug. While it is not always possible to do,
those bug catching tests are among the most valuable pieces of code in your project.
If the test is hard to explain, good luck nding collaborators. When something goes wrong
or has to be changed, if your code has a good set of tests, you or other maintainers
will rely largely on the testing suite to fix the problem or modify a given behavior.
Therefore, the testing code will be read as much as—or even more than—the running
code. A unit test whose purpose is unclear is not very helpful in this case.
If the test is easy to explain, it is almost always a good idea. Another use of the testing
code is as an introduction to new developers. When other people will have to work
on the code base, running and reading the related testing code is often the best thing
they can do. They will (or should) discover the hot spots, where most difficulties
arise, and the corner cases. If they have to add some functionality, the first step
should be to add a test and, by this means, ensure the new functionality is not already
a working path that has not been plugged into the interface.
Testing Your Code | 73
Above all, don’t panic. Its open source! The whole world’s got your back.
Testing Basics
This section lists the basics of testing—for an idea about what options are available—
and gives a few examples taken from the Python projects we dive into next, in Chap‐
ter 5. There is an entire book on TDD in Python, and we dont want to rewrite it.
Check out Test-Driven Development with Python (O’Reilly) (obey the testing goat!).
unittest is the batteries-included test module in the Python standard library. Its API
will be familiar to anyone who has used any of the JUnit (Java)/nUnit (.NET)/CppU‐
nit (C/C++) series of tools.
Creating test cases is accomplished by subclassing unittest.TestCase. In this exam‐
ple code, the test function is just defined as a new method in MyTest:
import unittest
def fun(x):
return x + 1
class MyTest(unittest.TestCase):
def test_that_fun_adds_one(self):
self.assertEqual(fun(3), 4)
class MySecondTest(unittest.TestCase):
def test_that_fun_fails_when_not_adding_number(self):
self.assertRaises(TypeError, fun, "multiply six by nine")
Test methods must start with the string test or they will not run.
Test modules (files) are expected to match the pattern test*.py by
default but can match any pattern given to the --pattern keyword
argument on the command line.
To run all tests in that TestClass, open a terminal shell; and in the same directory as
the file, invoke Pythons unittest module on the command line, like this:
$ python -m unittest test_example.MyTest
Ran 1 test in 0.000s
Or to run all tests in a file, name the file:
74 | Chapter 4: Writing Great Code
$ python -m unittest test_example
Ran 2 tests in 0.000s
Mock (in unittest)
As of Python 3.3, unittest.mock is available in the standard library. It allows you to
replace parts of your system under test with mock objects and make assertions about
how they have been used.
For example, you can monkey patch a method like in the following example (a mon‐
key patch is code that modifies or replaces other existing code at runtime.) In this
code, the existing method named ProductionClass.method, for the instance we cre‐
ate named instance, is replaced with a new object, MagicMock, which will always
return the value 3 when called, and which counts the number of method calls it
receives, records the signature it was called with, and contains assertion methods for
testing purposes:
from unittest.mock import MagicMock
instance = ProductionClass()
instance.method = MagicMock(return_value=3)
instance.method(3, 4, 5, key='value')
instance.method.assert_called_with(3, 4, 5, key='value')
To mock classes or objects in a module under test, use the patch decorator. In the
following example, an external search system is replaced with a mock that always
returns the same result (as used in this example, the patch is only for the duration of
the test):
import unittest.mock as mock
def mock_search(self):
class MockSearchQuerySet(SearchQuerySet):
def __iter__(self):
return iter(["foo", "bar", "baz"])
return MockSearchQuerySet()
# SearchForm here refers to the imported class reference
# myapp.SearchForm, and modifies this instance, not the
# code where the SearchForm class itself is initially
# defined.
@mock.patch('', mock_search)
def test_new_watchlist_activities(self):
# get_search_results runs a search and iterates over the result
self.assertEqual(len(myapp.get_search_results(q="fish")), 3)
Testing Your Code | 75
Mock has many other ways you can configure it and control its behavior. These are
detailed in the Python documentation for unittest.mock.
The doctest module searches for pieces of text that look like interactive Python ses‐
sions in docstrings, and then executes those sessions to verify that they work exactly
as shown.
Doctests serve a different purpose than proper unit tests. They are usually less
detailed and don’t catch special cases or obscure regression bugs. Instead, they are
useful as an expressive documentation of the main use cases of a module and its com‐
ponents (an example of a happy path). However, doctests should run automatically
each time the full test suite runs.
Heres a simple doctest in a function:
def square(x):
"""Squares x.
>>> square(2)
>>> square(-2)
return x * x
if __name__ == '__main__':
import doctest
When you run this module from the command line (i.e., python, the
doctests will run and complain if anything is not behaving as described in the doc‐
In this section, well take excerpts from our favorite packages to highlight good test‐
ing practice using real code. The test suites require additional libraries not included
in the packages (e.g., Requests uses Flask to mock up an HTTP server) which are
included in their projectsrequirements.txt file.
For all of these examples, the expected first steps are to open a terminal shell, change
directories to a place where you work on open source projects, clone the source
repository, and set up a virtual environment, like this:
$ git clone
$ cd projectname
$ virtualenv -p python3 venv
76 | Chapter 4: Writing Great Code
$ source venv/bin/activate
(venv)$ pip install -r requirements.txt
Example: Testing in Tablib
Tablib uses the unittest module in Pythons standard library for its testing. The test
suite does not come with the package; you must clone the GitHub repository for the
files. Here is an excerpt, with important parts annotated:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
"""Tests for Tablib."""
import json
import unittest
import sys
import os
import tablib
from tablib.compat import markup, unicode, is_py3
from tablib.core import Row
class TablibTestCase(unittest.TestCase):
"""Tablib test cases."""
def setUp(self):
"""Create simple data set with headers."""
global data, book
data = tablib.Dataset()
book = tablib.Databook()
# ... skip additional setup not used here ...
def tearDown(self):
def test_empty_append(self):
"""Verify append() correctly adds tuple with no headers."""
new_row = (1, 2, 3)
# Verify width/data
self.assertTrue(data.width == len(new_row))
self.assertTrue(data[0] == new_row)
Testing Your Code | 77
13 Note that unittest.TestCase.tearDown will not be run if the code errors out. This may be a surprise if
you’ve used features in unittest.mock to alter the codes actual behavior. In Python 3.1, the method
unittest.TestCase.addCleanup() was added; it pushes a cleanup function and its arguments to a stack that
will be called one by one after unittest.TestCase.tearDown() or else called anyway regardless of whether
tearDown() was called. For more information, see the documentation on unittest.TestCase.addCleanup().
def test_empty_append_with_headers(self):
"""Verify append() correctly detects mismatch of number of
headers and data.
data.headers = ['first', 'second']
new_row = (1, 2, 3, 4)
self.assertRaises(tablib.InvalidDimensions, data.append, new_row)
To use unittest, subclass unittest.TestCase, and write test methods whose
names begin with test. The TestCase provides assert methods that check for
equality, truth, data type, set membership, and whether exceptions are raised—
see the documentation for more details.
TestCase.setUp() is run before every single test method in the TestCase.
TestCase.tearDown() is run after every single test method in the TestCase.13
All test methods must begin with test, or they will not be run.
There can be multiple tests within a single TestCase, but each one should test
just one thing.
If you were contributing to Tablib, the first thing youd do after cloning it is run the
test suite and confirm that nothing breaks. Like this:
(venv)$ ### inside the top-level directory, tablib/
(venv)$ python -m unittest
Ran 62 tests in 0.289s
As of Python 2.7, unittest also includes its own test discovery mechanisms, using
the discover option on the command line:
(venv)$ ### *above* the top-level directory, tablib/
(venv)$ python -m unittest discover tablib/
78 | Chapter 4: Writing Great Code
Ran 62 tests in 0.234s
After confirming all of the tests pass, youd (a) find the test case related to the part
youre changing and run it often while youre modifying the code, or (b) write a new
test case for the feature youre adding or the bug you’re tracking down and run that
often while modifying the code. The following snippet is an example:
(venv)$ ### inside the top-level directory, tablib/
(venv)$ python -m unittest test_tablib.TablibTestCase.test_empty_append
Ran 1 test in 0.001s
Once your code works, youd run the entire test suite again before pushing it to the
repository. Because youre running these tests so often, it makes sense that they
should be as fast as possible. There are a lot more details about using unittest in the
standard library unittest documentation.
Example: Testing in Requests
Requests uses py.test. To see it in action, open a terminal shell, change into a tem‐
porary directory, clone Requests, install the dependencies, and run py.test, as
shown here:
$ git clone -q
$ virtualenv venv -q -p python3 # dash -q for 'quiet'
$ source venv/bin/activate
(venv)$ pip install -q -r requests/requirements.txt # 'quiet' again...
(venv)$ cd requests
(venv)$ py.test
========================= test session starts =================================
platform darwin -- Python 3.4.3, pytest-2.8.1, py-1.4.30, pluggy-0.3.1
rootdir: /tmp/requests, inifile:
plugins: cov-2.1.0, httpbin-0.0.7
collected 219 items
tests/ ........................................................
tests/ ..s....................................................
========= 217 passed, 1 skipped, 1 xpassed in 25.75 seconds ===================
Testing Your Code | 79
Other Popular Tools
The testing tools listed here are less frequently used, but still popular enough to men‐
pytest is a no-boilerplate alternative to Pythons standard unittest module, meaning it
doesn’t require the scaffolding of test classes, and maybe not even setup and teardown
methods. To install it, use pip like usual:
$ pip install pytest
Despite being a fully featured and extensible test tool, it boasts a simple syntax. Creat‐
ing a test suite is as easy as writing a module with a couple of functions:
# content of
def func(x):
return x + 1
def test_answer():
assert func(3) == 5
and then running the py.test command is far less work than would be required for
the equivalent functionality with the unittest module:
$ py.test
=========================== test session starts ============================
platform darwin -- Python 2.7.1 -- pytest-2.2.1
collecting ... collected 1 items F
================================= FAILURES =================================
_______________________________ test_answer ________________________________
def test_answer():
> assert func(3) == 5
E assert 4 == 5
E + where 4 = func(3) AssertionError
========================= 1 failed in 0.02 seconds =========================
Nose extends unittest to make testing easier:
$ pip install nose
80 | Chapter 4: Writing Great Code
Nose provides automatic test discovery to save you the hassle of manually creating
test suites. It also provides numerous plug-ins for features such as xUnit-compatible
test output, coverage reporting, and test selection.
tox is a tool for automating test environment management and testing against multi‐
ple interpreter configurations:
$ pip install tox
tox allows you to configure complicated multiparameter test matrices via a simple
ini-style configuration file.
Options for older versions of Python
If you aren’t in control of your Python version but still want to use these testing tools,
here are a few options.
unittest2. unittest2 is a backport of Python 2.7’s unittest module which has an
improved API and better assertions than the ones available in previous versions of
If youre using Python 2.6 or below (meaning you probably work at a large bank or
Fortune 500 company), you can install it with pip:
$ pip install unittest2
You may want to import the module under the name unittest to make to make it eas‐
ier to port code to newer versions of the module in the future:
import unittest2 as unittest
class MyTest(unittest.TestCase):
This way if you ever switch to a newer Python version and no longer need the
unittest2 module, you can simply change the import in your test module without the
need to change any other code.
Mock. If you liked Mock (in unittest)” on page 75 but use a Python version below
3.3, you can still use unittest.mock by importing it as a separate library:
$ pip install mock
xture. fixture can provide tools that make it easier to set up and tear down database
backends for testing. It can load mock datasets for use with SQLAlchemy, SQLObject,
Google Datastore, Django ORM, and Storm. There are still new releases, but it has
only been tested on Python 2.4 through Python 2.6.
Testing Your Code | 81
14 For those interested, theres some discussion about adding Markdown support for the README files on PyPI.
Lettuce and Behave
Lettuce and Behave are packages for doing behavior-driven development (BDD) in
Python. BDD is a process that sprung out of TDD (obey the testing goat!) in the early
2000s, wishing to substitute the word “test” in test-driven development with “behav‐
ior” to overcome newbies’ initial trouble grasping TDD. The name was first coined by
Dan North in 2003 and introduced to the world along with the Java tool JBehave in a
2006 article for Better Soware magazine that is reproduced in Dan Norths blog post,
“Introducing BDD.
BDD grew very popular after the 2011 release of e Cucumber Book (Pragmatic
Bookshelf), which documents a Behave package for Ruby. This inspired Gabriel Fal‐
cos Lettuce, and Peter Parentes Behave in our community.
Behaviors are described in plain text using a syntax named Gherkin that is human-
readable and machine-processable. The following tutorials may be of use:
Gherkin tutorial
Lettuce tutorial
Behave tutorial
Readability is a primary focus for Python developers, in both project and code docu‐
mentation. The best practices described in this section can save both you and others a
lot of time.
Project Documentation
There is API documentation for project users, and then there is additional project
documentation for those who want to contribute to to the project. This section is
about the additional project documentation.
A README file at the root directory should give general information to both users
and maintainers of a project. It should be raw text or written in some very easy to
read markup, such as reStructured Text (recommended because right now it’s the
only format that can be understood by PyPI14) or Markdown. It should contain a few
lines explaining the purpose of the project or library (without assuming the user
knows anything about the project), the URL of the main source for the software, and
some basic credit information. This file is the main entry point for readers of the
82 | Chapter 4: Writing Great Code
15 Other tools that you might see are Pycco, Ronn, Epydoc (now discontinued), and MkDocs. Pretty much
everyone uses Sphinx and we recommend you do, too.
An INSTALL file is less necessary with Python (but may be helpful to comply with
licence requirements such as the GPL). The installation instructions are often
reduced to one command, such as pip install module or python install and added to the README file.
A LICENSE file should always be present and specify the license under which the
software is made available to the public. (See Choosing a License on page 88 for
more information.)
A TODO file or a TODO section in README should list the planned development
for the code.
A CHANGELOG file or section in README should compile a short overview of the
changes in the code base for the latest versions.
Project Publication
Depending on the project, your documentation might include some or all of the fol‐
lowing components:
An introduction should provide a very short overview of what can be done with
the product, using one or two extremely simplified use cases. This is the 30-
second pitch for your project.
A tutorial should show some primary use cases in more detail. The reader will
follow a step-by-step procedure to set up a working prototype.
An API reference is typically generated from the code (see “Docstring Versus
Block Comments” on page 84). It will list all publicly available interfaces, param‐
eters, and return values.
Developer documentation is intended for potential contributors. This can include
code conventions and the general design strategy of the project.
Sphinx is far and away the most popular15 Python documentation tool. Use it. It con‐
verts the reStructured Text markup language into a range of output formats, includ‐
ing HTML, LaTeX (for printable PDF versions), manual pages, and plain text.
There is also great, free hosting for your Sphinx documentation: Read the Docs. Use
that, too. You can configure it with commit hooks to your source repository so that
rebuilding your documentation will happen automatically.
Documentation | 83
Sphinx is famous for its API generation, but it also works well for
general project documentation. The online Hitchhiker’s Guide to
Python is built with Sphinx and is hosted on Read the Docs.
reStructured Text
Sphinx uses reStructured Text, and nearly all Python documentation is written using
it. If the content of your long_description argument to setuptools.setup() is
written in reStructured Text, it will be rendered as HTML on PyPI—other formats
will just be presented as text. It’s like Markdown with all the optional extensions built
in. Good resources for the syntax are:
The reStructuredText Primer
reStructuredText Quick Reference
Or just start contributing to your favorite packages documentation and learn by read‐
Docstring Versus Block Comments
Docstrings and block comments arent interchangeable. Both can be used for a func‐
tion or class. Heres an example using both:
# This function slows down program execution for some reason.
def square_and_rooter(x):
"""Return the square root of self times self."""
The leading comment block is a programmer’s note.
The docstring describes the operation of the function or class and will be shown
in an interactive Python session when the user types help(square_and_rooter).
Docstrings placed at the beginning of a module or at the top of an file will
also appear in help(). Sphinx’s autodoc feature can also automatically generate docu‐
mentation using appropriately formatted docstrings. Instructions for how to do this,
and how to format your docstrings for autodoc, are in the Sphinx tutorial. For further
details on docstrings, see PEP 257.
The logging module has been a part of Pythons Standard Library since version 2.3. It
is succinctly described in PEP 282. The documentation is notoriously hard to read,
except for the basic logging tutorial.
84 | Chapter 4: Writing Great Code
Logging serves two purposes:
Diagnostic logging
Diagnostic logging records events related to the applications operation. If a user
calls in to report an error, for example, the logs can be searched for context.
Audit logging
Audit logging records events for business analysis. A user’s transactions (such as
a clickstream) can be extracted and combined with other user details (such as
eventual purchases) for reports or to optimize a business goal.
Logging Versus Print
The only time that print is a better option than logging is when the goal is to display
a help statement for a command-line application. Other reasons why logging is better
than print:
The log record, which is created with every logging event, contains readily avail‐
able diagnostic information such as the filename, full path, function, and line
number of the logging event.
Events logged in included modules are automatically accessible via the root log‐
ger to your applications logging stream, unless you filter them out.
Logging can be selectively silenced by using the method logging.Logger.setLe
vel() or disabled by setting the attribute logging.Logger.disabled to True.
Logging in a Library
Notes for configuring logging for a library are in the logging tutorial. Another good
resource for example uses of logging is the libraries we mention in the next chapter.
Because the user, not the library, should dictate what happens when a logging event
occurs, one admonition bears repeating:
It is strongly advised that you do not add any handlers other than NullHandler to your
library’s loggers.
The NullHandler does what its name says—nothing. The user will otherwise have to
expressly turn off your logging if they don’t want it.
Best practice when instantiating loggers in a library is to only create them using the
__name__ global variable: the logging module creates a hierarchy of loggers using dot
notation, so using __name__ ensures no name collisions.
Here is an example of best practice from the Requests source—place this in your
projects top-level
Logging | 85
# Set default logging handler to avoid "No handler found" warnings.
import logging
try: # Python 2.7+
from logging import NullHandler
except ImportError:
class NullHandler(logging.Handler):
def emit(self, record):
Logging in an Application
The Twelve-Factor App, an authoritative reference for good practice in application
development, contains a section on logging best practice. It emphatically advocates
for treating log events as an event stream, and for sending that event stream to stan‐
dard output to be handled by the application environment.
There are at least three ways to configure a logger:
Pros Cons
Using an INI-
formatted le It’s possible to update conguration while running using
the function logging.config.listen() to listen
for changes on a socket.
You have less control (e.g., custom
subclassed lters or loggers) than
possible when conguring a logger in
Using a dictionary
or a JSON-formatted
In addition to updating while running, it is also possible to
load from a le using the json module, in the standard
library since Python 2.6.
You have less control than when
conguring a logger in code.
Using code You have complete control over the conguration. Any modications require a change to
source code.
Example conguration via an INI le
More details about the INI file format are in the logging configuration section of the
logging tutorial. A minimal configuration file would look like this:
86 | Chapter 4: Writing Great Code
format=%(asctime)s %(name)-12s %(levelname)-8s %(message)s
The asctime, name, levelname, and message are all optional attributes available from
the logging library. The full list of options and their definitions is available in the
Python documentation. Let us say that our logging configuration file is named log‐
ging_cong.ini. Then to set up the logger using this configuration in the code, wed
use logging.config.fileConfig():
import logging
from logging.config import fileConfig
logger = logging.getLogger()
logger.debug('often makes a very good meal of %s', 'visiting tourists')
Example conguration via a dictionary
As of Python 2.7, you can use a dictionary with configuration details. PEP 391 con‐
tains a list of the mandatory and optional elements in the configuration dictionary.
Heres a minimal implementation:
import logging
from logging.config import dictConfig
logging_config = dict(
version = 1,
formatters = {
'f': {'format':
'%(asctime)s %(name)-12s %(levelname)-8s %(message)s'}
handlers = {
'h': {'class': 'logging.StreamHandler',
'formatter': 'f',
'level': logging.DEBUG}
loggers = {
'root': {'handlers': ['h'],
'level': logging.DEBUG}
logger = logging.getLogger()
logger.debug('often makes a very good meal of %s', 'visiting tourists')
Logging | 87
16 As of this writing, they were the Academic Free License v. 2.1 or the Apache License, Version 2.0. The full
description of how this works is on the PSF’s contributions page.
Example conguration directly in code
And last, here is a minimal logging configuration directly in code:
import logging
logger = logging.getLogger()
handler = logging.StreamHandler()
formatter = logging.Formatter(
'%(asctime)s %(name)-12s %(levelname)-8s %(message)s')
logger.debug('often makes a very good meal of %s', 'visiting tourists')
Choosing a License
In the United States, when no license is specified with your source publication, users
have no legal right to download, modify, or distribute it. Furthermore, people can’t
contribute to your project unless you tell them what rules to play by. You need a
Upstream Licenses
If you are deriving from another project, your choice may be determined by
upstream licenses. For example, the Python Software Foundation (PSF) asks all con‐
tributors to Python source code to sign a contributor agreement that formally licenses
their code to the PSF (retaining their own copyright) under one of two licenses.16
Because both of those licenses allow users to sublicense under different terms, the
PSF is then free to distribute Python under its own license, the Python Software
Foundation License. A FAQ for the PSF License goes into detail about what users can
and cannot do in plain (not legal) language. It is not intended for further use beyond
licensing the PSF’s distribution of Python.
There are plenty of licenses available to choose from. The PSF recommends using one
of the Open Source Institute (OSI)–approved licenses. If you wish to eventually con‐
tribute your code to the PSF, the process will be much easier if you start with one of
the licenses specified on the contributions page.
88 | Chapter 4: Writing Great Code
17 All of the licenses described here are OSI-approved, and you can learn more about them from the main OSI
license page.
Remember to change the placeholder text in the template licenses
to actually reflect your information. For example, the MIT license
template contains Copyright (c) <year> <copyright holders>
on its second line. Apache License, Version 2.0 requires no modifi‐
Open source licenses tend to fall into one of two categories:17
Permissive licenses
Permissive licenses, often also called Berkeley Software Distribution (BSD)–style
licenses, focus more on the user’s freedom to do with the software as they please.
Some examples:
The Apache licenses—version 2.0 is the current one, modified so that people
can include it without modification in any project, can include the license by
reference instead of listing it in every file, and can use Apache 2.0–licensed
code with the GNU General Public License version 3.0 (GPLv3).
Both the BSD 2-clause and 3-clause licenses—the three-clause license is the
two-clause license plus an additional restriction on use of the issuer’s trade‐
The Massachusetts Institute of Technology (MIT) licenses—both the Expat
and the X11 versions are named after popular products that use the respec‐
tive licenses.
The Internet Software Consortium (ISC) licenseits almost identical to the
MIT license except for a few lines now deemed to be extraneous.
Copyle licenses
Copyleft licenses, or less permissive licenses, focus more on making sure that the
source code itself—including any changes made to it—is made available. The
GPL family is the most well known of these. The current version is GPLv3.
The GPLv2 license is not compatible with Apache 2.0; so code
licensed with GPLv2 cannot be mixed with Apache 2.0–licensed
projects. But Apache 2.0–licensed projects can be used in GPLv3
projects (which must subsequently all be GPLv3).
Choosing a License | 89
18 tl;dr means “Too long; didn’t read,” and apparently existed as editor shorthand before popularization on the
Licenses meeting the OSI criteria all allow commercial use, modification of the soft‐
ware, and distribution downstream—with different restrictions and requirements. All
of the ones listed in Table 4-4 also limit the issuer’s liability and require the user to
retain the original copyright and license in any downstream distribution.
Table 4-4. Topics discussed in popular licenses
License family Restrictions Allowances Requirements
BSD Protects issuer’s trademark
(BSD 3-clause) Allows a warranty (BSD 2-clause and
MIT (X11 or
Expat), ISC Protects issuer’s trademark
(ISC and MIT/X11) Allows sublicensing with a dierent
Apache version
2.0 Protects issuer’s trademark Allows sublicensing, use in patents Must state changes made to
the source
GPL Prohibits sublicensing with a
dierent license Allows a warranty, and (GPLv3 only)
use in patents Must state changes to the
source and include source code
Licensing Resources
Van Lindberg’s book Intellectual Property and Open Source (O’Reilly) is a great
resource on the legal aspects of open source software. It will help you understand not
only licenses, but also the legal aspects of other intellectual property topics like trade‐
marks, patents, and copyrights as they relate to open source. If youre not that con‐
cerned about legal matters and just want to choose something quickly, these sites can
GitHub offers a handy guide that summarizes and compares licenses in a few
TLDRLegal18 lists what can, cannot, and must be done under the terms of each
license in quick bullets.
The OSI list of approved licenses contains the full text of all licenses that have
passed their license review process for compliance with the Open Source Defini‐
tion (allowing software to be freely used, modified, and shared).
90 | Chapter 4: Writing Great Code
1For a book that contains decades of experience about reading and refactoring code, we recommend Object-
Oriented Reengineering Patterns (Square Bracket Associates) by Serge Demeyer, Stéphane Ducasse, and Oscar
2A daemon is a computer program that runs as a background process.
Reading Great Code
Programmers read a lot of code. One of the core tenets behind Pythons design is
readability, and one secret to becoming a great programmer is to read, understand,
and comprehend excellent code. Such code typically follows the guidelines outlined
in Code Style” on page 43 and does its best to express a clear and concise intent to
the reader.
This chapter shows excerpts from some very readable Python projects that illustrate
topics covered in Chapter 4. As we describe them, we’ll also share techniques for
reading code.1
Heres a list of projects highlighted in this chapter in the order they will appear:
HowDoI is a console application that searches the Internet for answers to coding
questions, written in Python.
Diamond is a Python daemon2 that collects metrics and publishes them to
Graphite or other backends. It is capable of collecting CPU, memory, network,
I/O, load and disk metrics. Additionally, it features an API for implementing cus‐
tom collectors to gather metrics from almost any source.
Tablib is a format-agnostic tabular dataset library.
Requests is a HyperText Transfer Protocol (HTTP) library for human beings (the
90% of us who just want an HTTP client that automatically handles password
authentication and complies with the half-dozen standards to perform things like
a multipart file upload with one function call).
Werkzeug started as a simple collection of various utilities for Web Service Gate‐
way Interface (WSGI) applications and has become one of the most advanced
WSGI utility modules.
Flask is a web microframework for Python based on Werkzeug and Jinja2. Its
good for getting simple web pages up quickly.
There is a lot more to all of these projects than what were mentioning, and we really,
really hope that after this chapter youll be motivated to download and read at least
one or two of them in depth yourself (and maybe even present what you learn to a
local user group).
Common Features
Some features are common across all of the projects: details from a snapshot of each
one show very few (fewer than 20, excluding whitespace and comments) lines of code
on average per function, and a lot of blank lines. The larger, more complex projects
use docstrings and/or comments; usually more than a fifth of the content of the code
base is some sort of documentation. But we can see from HowDoI, which has no doc‐
strings because it is not for interactive use, that comments are not necessary when the
code is straightforward. Table 5-1 shows common practices in these projects.
Table 5-1. Common features in the example projects
Package License Line count Docstrings
(% of lines)
(% of lines)
Blank lines
(% of lines)
function length
HowDoI MIT 262 0% 6% 20% 13 lines of code
Diamond MIT 6,021 21% 9% 16% 11 lines of code
Tablib MIT 1,802 19% 4% 27% 8 lines of code
Requests Apache 2.0 4,072 23% 8% 19% 10 lines of code
Flask BSD 3-clause 10,163 7% 12% 11% 13 lines of code
Werkzeug BSD 3-clause 25,822 25% 3% 13% 9 lines of code
In each section, we use a different code-reading technique to figure out what the
project is about. Next, we single out code excerpts that demonstrate ideas mentioned
elsewhere in this guide. (Just because we don’t highlight things in one project doesnt
mean they don’t exist; we just want to provide good coverage of concepts across these
examples.) You should finish this chapter more confident about reading code, with
examples that reinforce what makes good code, and with some ideas youd like to
incorporate in your own code later.
92 | Chapter 5: Reading Great Code
3If you run into trouble with lxml requiring a more recent libxml2 shared library, just install an earlier version
of lxml by typing: pip uninstall lxml;pip install lxml==3.5.0. It will work fine.
With fewer than 300 lines of code, The HowDoI project, by Benjamin Gleitzman, is a
great choice to start our reading odyssey.
Reading a Single-File Script
A script usually has a clear starting point, clear options, and a clear ending point. This
makes it easier to follow than libraries that present an API or provide a framework.
Get the HowDoI module from GitHub:3
$ git clone
$ virtualenv -p python3 venv # or use mkvirtualenv, your choice...
$ source venv/bin/activate
(venv)$ cd howdoi/
(venv)$ pip install --editable .
(venv)$ python # Run the unit tests.
You should now have the howdoi executable installed in venv/bin. (You can look at it
if you want by typing cat `which howdoi` on the command line.) It was auto-
generated when you ran pip install.
Read HowDoI’s documentation
HowDoI’s documentation is in the README.rst file in the HowDoI repository on
GitHub: its a small command-line application that allows users to search the Internet
for answers to programming questions.
From the command line in a terminal shell, we can type howdoi --help for the usage
(venv)$ howdoi --help
usage: howdoi [-h] [-p POS] [-a] [-l] [-c] [-n NUM_ANSWERS] [-C] [-v]
instant coding answers via the command line
positional arguments:
QUERY the question to answer
optional arguments:
-h, --help show this help message and exit
-p POS, --pos POS select answer in specified position (default: 1)
-a, --all display the full text of the answer
-l, --link display only the answer link
HowDoI | 93
-c, --color enable colorized output
-n NUM_ANSWERS, --num-answers NUM_ANSWERS
number of answers to return
-C, --clear-cache clear the cache
-v, --version displays the current version of howdoi
Thats it—from the documentation we know that HowDoI gets answers to coding
questions from the Internet, and from the usage statement we know we can choose
the answer in a specific position, can colorize the output, get multiple answers, and
that it keeps a cache that can be cleared.
Use HowDoI
We can confirm we understand what HowDoI does by actually using it. Heres an
(venv)$ howdoi --num-answers 2 python lambda function list comprehension
--- Answer 1 ---
[(lambda x: x*x)(x) for x in range(10)]
--- Answer 2 ---
[x() for x in [lambda m=m: m for m in [1,2,3]]]
# [1, 2, 3]
Weve installed HowDoI, read its documentation, and can use it. On to reading actual
Read HowDoI’s code
If you look inside the howdoi/ directory, you’ll see it contains two files: an,
which contains a single line that defines the version number, and, which
well open and read.
Skimming, we see each new function definition is used in the next func‐
tion, making it is easy to follow. And each function does just one thing—the thing its
name says. The main function, command_line_runner(), is near the bottom of how‐
Rather than reprint HowDoI’s source here, we can illustrate its call structure using
the call graph in Figure 5-1. It was created by Python Call Graph, which provides a
visualization of the functions called when running a Python script. This works well
with command-line applications thanks to a single start point and the relatively few
paths through their code. (Note that we manually deleted functions not in the How‐
DoI project from the rendered image to legibly fit it on the page, and slightly recol‐
ored and reformatted it.)
94 | Chapter 5: Reading Great Code
Figure 5-1. Clean paths and clear function names in this howdoi call graph
The code could have been all one large, incomprehensible spaghetti function. Instead,
intentional choices structure the code into compartmentalized functions with
straightforward names. Heres a brief description of the execution depicted in
Figure 5-1: command_line_runner() parses the input and passes the user flags and
the query to howdoi(). Then, howdoi() wraps _get_instructions() in a try/except
statement so that it can catch connection errors and print a reasonable error message
(because application code should not terminate on exceptions).
The primary functionality is in _get_instructions(): it calls _get_links() to do a
Google search of Stack Overflow for links that match the query, then calls
HowDoI | 95
_get_answer() once for each resulting link (up to the number of links that the user
specified on the command line—the default is just one link).
The _get_answer() function follows a link to Stack Overflow, extracts code from the
answer, colorizes it, and returns it to _get_instructions(), which will combine all
of the answers into one string, and return it. Both _get_links() and _get_answer()
call _get_result() to actually do the HTTP request: _get_links() for the Google
query, and _get_answer() for the resulting links from the Google query.
All _get_result() does is wrap requests.get() with a try/except statement so
that it can catch SSL errors, print an error message, and re-raise the exception so that
the top-level try/except can catch it and exit. Catching all exceptions before exiting
is best practice for application programs.
HowDoI’s Packaging
HowDoI’s, above the howdoi/ directory, is a good example setup module
because in addition to normal package installation, it also installs an executable
(which you can refer to when packaging your own command-line utility). The setup
tools.setup() function uses keyword arguments to define all of the configuration
options. The part that identifies the executable is associated with the keyword argu‐
ment entry_points:
##~~ ... Skip the other typical entries ...
'console_scripts': [
'howdoi = howdoi.howdoi:command_line_runner',
## ~~ ... Skip the list of dependencies ...
The keyword to list console scripts is console_scripts.
This declares the executable named howdoi will have as its target the function
howdoi.howdoi.command_line_runner(). So later when reading, we will know
command_line_runner() is the starting point for running the whole application.
Structure Examples from HowDoI
HowDoI is a small library, and we’ll be highlighting structure much more elsewhere,
so there are only a few notes here.
96 | Chapter 5: Reading Great Code
Let each function do just one thing
We cant reiterate enough how beneficial it is for readers to separate out HowDoI’s
internal functions to each do just one thing. Also, there are functions whose sole pur‐
pose is to wrap other functions with a try/except statement. (The only function with
a try/except that doesnt follow this practice is _format_output(), which leverages
try/except clauses to identify the correct coding language for syntax highlighting,
not for exception handling.)
Leverage data available from the system
HowDoI checks and uses relevant system values, such as urllib.request.getprox
ies(), to handle the use of proxy servers (this can be the case in organizations like
schools that have an intermediary server filtering the connection to the Internet), or
in this snippet:
XDG_CACHE_DIR = os.environ.get(
os.path.join(os.path.expanduser('~'), '.cache')
How do you know that these variables exist? The need for urllib.request.getprox
ies() is evident from the optional arguments in requests.get()—so part of this
information comes from understanding the API of libraries you call. Environment
variables are often utility-specific, so if a library is intended for use with a particular
database or other sister application, those applications’ documentation list relevant
environment variables. For plain POSIX systems, a good place to start is Ubuntus list
of default environment variables, or else the base list of environment variables in the
POSIX specification, which links to various relevant other lists.
Style Examples from HowDoI
HowDoI mostly follows PEP 8, but not pedantically, and not when it restricts read‐
ability. For example, import statements are at the top of the file, but standard library
and external modules are intermixed. And although the string constants in
USER_AGENTS are much longer than 80 characters, there is no natural place to break
the strings, so they are left intact.
These next excerpts highlight other style choices weve previously advocated for in
Chapter 4.
Underscore-prexed function names (we are all responsible users)
Almost every function in HowDoI is prefixed with an underscore. This identifies
them as for internal use only. For most of them, this is because if called, there is the
HowDoI | 97
possibility of an uncaught exception—anything that calls _get_result() risks this—
until the howdoi() function, which handles the possible exceptions.
The rest of the internal functions (_format_output(), _is_question(),
_enable_cache(), and _clear_cache()) are identified as such because they’re simply
not intended for use outside of the package. The testing script, howdoi/, only calls the nonprefixed functions, checking that the formatter
works by feeding a command-line argument for colorization to the top-level how
doi.howdoi() function, rather than by feeding code to howdoi._format_output().
Handle compatibility in just one place (readability counts)
Differences between versions of possible dependencies are handled before the main
code body so the reader knows there won’t be dependency issues, and version check‐
ing doesn’t litter the code elsewhere. This is nice because HowDoI is shipped as a
command-line tool, and the extra effort means users won’t be forced to change their
Python environment just to accommodate the tool. Here is the snippet with the
from urllib.parse import quote as url_quote
except ImportError:
from urllib import quote as url_quote
from urllib import getproxies
except ImportError:
from urllib.request import getproxies
And the following snippet resolves the difference between Python 2 and Python 3’s
Unicode handling in seven lines, by creating the function u(x) to either do nothing
or emulate Python 3. Plus it follows Stack Overflow’s new citation guideline, by citing
the original source:
# Handle Unicode between Python 2 and 3
if sys.version < '3':
import codecs
def u(x):
return codecs.unicode_escape_decode(x)[0]
def u(x):
return x
Pythonic choices (beautiful is better than ugly)
The following snippet from shows thoughtful, Pythonic choices. The func‐
tion get_link_at_pos() returns False if there are no results, or else identifies the
98 | Chapter 5: Reading Great Code
links that are to Stack Overflow questions, and returns the one at the desired posi‐
tion (or the last one if there aren’t enough links):
def _is_question(link):
return'questions/\d+/', link)
# [ ... skip a function ... ]
def get_link_at_pos(links, position):
links = [link for link in links if _is_question(link)]
if not links:
return False
if len(links) >= position:
link = links[position-1]
link = links[-1]
return link
The first function, _is_question(), is defined as a separate one liner, giving clear
meaning to an otherwise opaque regular expression search.
The list comprehension reads like a sentence, thanks to the separate definition of
_is_question() and meaningful variable names.
The early return statement flattens the code.
The additional step of assigning to the variable link here…
…and here, rather than two separate return statements with no named variable
at all, reinforces the purpose of get_link_at_pos() with clear variable names.
The code is self-documenting.
The single return statement at the highest indentation level explicitly shows that
all paths through the code exit either right away—because there are no links—or
at the end of the function, returning a link. Our quick rule of thumb works: we
can read the first and last line of this function and understand what it does.
(Given multiple links and a position, get_link_at_pos() returns one single link:
the one at the given position.)
Diamond is a daemon (an application that runs continuously as a background pro‐
cess) that collects system metrics and publishes them to downstream programs like
MySQL, Graphite (a platform open sourced by Orbitz in 2008 that stores, retrieves,
Diamond | 99
and optionally graphs numeric time-series data), and others. We’ll get to explore good
package structure, as Diamond is a multifile application, much larger than HowDoI.
Reading a Larger Application
Diamond is still a command-line application, so like with HowDoI, theres still a clear
starting point and clear paths of execution, although the supporting code now spans
multiple files.
Get Diamond from GitHub (the documentation says it only runs on CentOS or
Ubuntu, but code in its makes it appear to support all platforms; however,
some of the commands that default collectors use to monitor memory, disk space,
and other system metrics are not on Windows). As of this writing, it still uses
Python 2.7:
$ git clone
$ virtualenv -p python2 venv # It's not Python 3 compatible yet...
$ source venv/bin/activate
(venv)$ cd Diamond/
(venv)$ pip install --editable .
(venv)$ pip install mock docker-py # These are dependencies for testing.
(venv)$ pip install mock # This is also a dependency for testing.
(venv)$ python # Run the unit tests.
Like with the HowDoI library, Diamond’s setup script installs executables in venv/
bin/: diamond and diamond-setup. This time they’re not automatically generated—
they’re prewritten scripts in the project’s Diamond/bin/ directory. The documentation
says that diamond starts the server, and diamond-setup is an optional tool to walk
users through interactive modification of the collector settings in the configuration
There are a lot of additional directories, and the diamond package is underneath Dia‐
mond/src in this project directory. We are going to look at files in Diamond/src (which
contains the main code), Diamond/bin (which contains the executable diamond), and
Diamond/conf (which contains the sample configuration file). The rest of the directo‐
ries and files may be of interest to people distributing similar applications but is not
what we want to cover right now.
Read Diamond’s documentation
First, we can get a sense of what the project is and what it does by scanning the online
documentation. Diamonds goal is to make it easy to gather system metrics on clus‐
ters of machines. Originally open sourced by BrightCove, Inc., in 2011, it now has
over 200 contributors.
After describing its history and purpose, the documentation tells you how to install it,
and then says how to run it: just modify the example configuration file (in our down‐
100 | Chapter 5: Reading Great Code
4When you daemonize a process, you fork it, detach its session ID, and fork it again, so that the process is
totally disconnected from the terminal you’re running it in. (Nondaemonized programs exit when the termi‐
nal is closed—you may have seen the warning message “Are you sure you want to close this terminal? Closing
it will kill the following processes:” before listing all of the currently running processes.) A daemonized pro‐
cess will run even after the terminal window closes. It’s named daemon after Maxwell’s daemon (a clever dae‐
mon, not a nefarious one).
load its in conf/diamond.conf.example), put it in the default location (/etc/diamond/
diamond.conf) or a path you’ll specify on the command line, and you’re set. Theres
also a helpful section on configuration in the Diamond wiki page.
From the command line, we can get the usage statement via diamond --help:
(venv)$ diamond --help
Usage: diamond [options]
-h, --help show this help message and exit
config file
-f, --foreground run in foreground
-l, --log-stdout log to stdout
-p PIDFILE, --pidfile=PIDFILE
pid file
run a given collector once and exit
-v, --version display the version and exit
--skip-pidfile Skip creating PID file
-u USER, --user=USER Change to specified unprivileged user
-g GROUP, --group=GROUP
Change to specified unprivileged group
--skip-change-user Skip changing to an unprivileged user
--skip-fork Skip forking (damonizing) process
From this, we know it uses a configuration file; by default, it runs in the background;
it has logging; you can specifiy a PID (process ID) file; you can test collectors; you can
change the processs user and group; and it by default will daemonize (fork) the pro‐
Use Diamond
To understand it even better, we can run Diamond. We need a modified configura‐
tion file, which we can put in a directory we make called Diamond/tmp. From inside
the Diamond directory, type:
(venv)$ mkdir tmp
(venv)$ cp conf/diamond.conf.example tmp/diamond.conf
Then edit tmp/diamond.conf to look like this:
Diamond | 101
### Options for the server
# Handlers for published metrics.
handlers = diamond.handler.archive.ArchiveHandler
user =
group =
# Directory to load collector modules from
collectors_path = src/collectors/
### Options for handlers
log_file = /dev/stdout
### Options for collectors
# Default Poll Interval (seconds)
interval = 20
### Default enabled collectors
enabled = True
enabled = True
We can tell from the example configuration file that:
There are multiple handlers, which we can select by class name.
We have control over the user and group that the daemon runs as (empty means
to use the current user and group).
We can specify a path to look for collector modules. This is how Diamond will
know where the custom Collector subclasses are: we directly state it in the con‐
figuration file.
We can also store configure handlers individually.
Next, run Diamond with options that set logging to /dev/stdout (with default format‐
ting configurations), that keep the application in the foreground, that skip writing the
PID file, and that use our new configuration file:
(venv)$ diamond -l -f --skip-pidfile --configfile=tmp/diamond.conf
To end the process, type Ctrl+C until the command prompt reappears. The log out‐
put demonstrates what collectors and handlers do: collectors collect different metrics
102 | Chapter 5: Reading Great Code
5In PyCharm, do this by navigating in the menu bar to PyCharm → Preferences → Project:Diamond → Project
Interpreter, and then selecting the path to the Python interpreter in the current virtual environment.
(such as the MemoryCollectors total, available, free, and swap memory sizes), which
the handlers format and send to various destinations, such as Graphite, MySQL, or in
our test case, as log messages to /dev/stdout.
Reading Diamond’s code
IDEs can be useful when reading larger projects—they can quickly locate the original
definitions of functions and classes in the source code. Or, given a definition, they
can find all places in the project where it is used. For this functionality, set the IDE’s
Python interpreter to the one in your virtual environment.5
Instead of following each function as we did with HowDoI, Figure 5-2 follows the
import statements; the diagram just shows which modules in Diamond import which
other modules. Drawing sketches like these helps by providing a very high-level look
for larger projects: you hide the trees so you can see the forest. We can start with the
diamond executable file on the top left and follow the imports through the Diamond
project. Aside from the diamond executable, every square outline denotes a file (mod‐
ule) or directory (package) in the src/diamond directory.
Figure 5-2. e module import structure of Diamond
Diamond | 103
Diamonds well-organized and appropriately named modules make it possible to get
an idea of what the code is doing solely from our diagram: diamond gets the version
from util, then sets up logging using utils.log and starts a Server instance using
server. The Server imports from almost all of the modules in the utils package,
using utils.classes to acess both the Handlers in handler and the collectors,
config to read the configuration file and obtain settings for the collectors (and the
extra paths to the user-defined collectors), and scheduler and signals to set the
polling interval for the collectors to calculate their metrics, and to set up and start the
handlers processing the queue of metrics to send them to their various destinations.
The diagram doesn’t include the helper modules and, which
are used by specific collectors, or the over 20 handler implementations defined in the
handler subpackage, or the over 100 collector implementations defined in the proj‐
ects Diamond/src/collectors/ directory (which is installed elsewhere when not
installed the way we did for reading—that is, using PyPI or Linux package
distributions, instead of source). These are imported using
diamond.classes.load_dynamic_class(), which then calls the function dia
mond.util.load_class_from_name() to load the classes from the string names given
in the configuration file, so the import statements do not explicitly name them.
To understand why there is both a utils package and a util module, you have to dig
into the actual code: the util module provides functions related more to Diamonds
packaging than to its operation—a function to get the version number from ver
sion.__VERSION__, and two functions that parse strings that identify either modules
or classes, and import them.
Logging in Diamond
The function diamond.utils.log.setup_logging(), found in src/diamond/utils/, is called from the main() function in the diamond executable when starting the
# Initialize logging
log = setup_logging(options.configfile, options.log_stdout)
If options.log_stdout is True, setup_logging() will set up a logger with default
formatting to log to standard output at the DEBUG level. Heres the excerpt that does
##~~ ... Skip everything else ...
def setup_logging(configfile, stdout=False):
log = logging.getLogger('diamond')
if stdout:
104 | Chapter 5: Reading Great Code
streamHandler = logging.StreamHandler(sys.stdout)
##~~ ... Skip this ...
Otherwise, it parses the configuration file using logging.config.file.fileCon
fig() from the Python Standard Library. Here is the function call—its indented
because its inside the preceding if/else statement, and a try/except block:
The logging configuration ignores keywords in the configuration file that arent
related to logging. This is how Diamond can use the same configuration file for both
its own and the logging configuration. The sample configuration file, located in Dia‐
mond/conf/diamond.conf.example, identifies the logging handler among the other
Diamond handlers:
### Options for handlers
# daemon logging handler(s)
keys = rotated_file
It defines example loggers later in the configuration file, under the header “Options
for logging,” recommending the logging config file documentation for details.
Structure Examples from Diamond
Diamond is more than an executable application—its also a library that provides a
way for users to create and use custom collectors.
We’ll highlight more things we like about the overall package structure, and then dig
into how exactly Diamond makes it possible for the application to import and use
externally defined collectors.
Separate dierent functionality into namespaces (they are one honking great idea)
The diagram in Figure 5-2 shows the server module interacting with three other
modules in the project: diamond.handler, diamond.collector, and diamond.utils.
The utils subpackage could realistically have contained all of its classes and functions
in a single, large module, but there was an opportunity to use namespaces to
separate code into related groups, and the development team took it. Honking great!
All of the implementations of Handlers are contained in diamond/handler (which
makes sense), but the structure for the Collectors is different. Theres not a directory,
only a module diamond/ that defines the Collector and ProcessCollec
Diamond | 105
6In Python, an abstract base class is a class that has left certain methods undefined, with the expectation that
the developer will define them in the subclass. In the abstract base class, this function raises a NotImplemente
dError. A more modern alternative is to use Pythons module for abstract base classes, abc, first implemented
in Python 2.6, which will error when constructing an incomplete class rather than when trying to access that
classs unimplemented method. The full specification is defined in PEP 3119.
7This is a paraphrase of a great blog post on the subject by Larry Cuban, a professor emeritus of education at
Stanford, titled “The Difference Between Complicated and Complex Matters.
tor base classes. All implementations of the Collectors are defined instead in Dia‐
mond/src/collectors/ and would be installed in the virtual environment under venv/
share/diamond/collectors when installing from PyPI (as recommended) rather than
from GitHub (like we did to read it). This helps the user to create new implementa‐
tions of Collectors: placing all of the collectors in the same location makes it easier for
the application to find them and easier for library users to follow their example.
Finally, each Collector implementation in Diamond/src/collectors is in its own direc‐
tory (rather than in a single file), which makes it possible to keep each Collector
implementations tests separate. Also honking great.
User-extensible custom classes (complex is better than complicated)
Its easy to add new Collector implementations: just subclass the diamond.collec
tor.Collector abstract base class,6 implement a Collector.collect() method, and
place the implementation in its own directory in venv/src/collectors/.
Underneath, the implementation is complex, but the user doesnt see it. This section
shows both the simple user-facing part of Diamonds Collector API and the complex
code that makes this user interface possible.
Complex versus complicated. We can boil down the user experience of working with
complex code to be something like experiencing a Swiss watch—it just works, but
inside there are a ton of precisely made little pieces, all interfacing with remarkable
precision, in order to create the effortless user experience. Using complicated code, on
the other hand, is like piloting an airplane—you really have to know what youre
doing to not crash and burn.7 We dont want to live in a world without airplanes, but
we do want our watches to work without us having to be rocket scientists. Wherever
its possible, less complicated user interfaces are a good thing.
The simple user interface. To create a custom data collector, the user must subclass the
abstract class, Collector, and then provide, via the configuration file, the path to that
new collector. Here is an example of a new Collector definition from Diamond/src/
collectors/cpu/ When Python searches for the collect() method, it will look
in the CPUCollector for a definition first, and then if it doesn’t find the definition, it
106 | Chapter 5: Reading Great Code