Python For Athena Manual

python_for_athena_manual

User Manual:

Open the PDF directly: View PDF PDF.
Page Count: 267

DownloadPython For Athena Manual
Open PDF In BrowserView PDF
ENTHOUGHT
SCIENTIFIC

COMPUTING

SOLUTIONS

Python for Athena

Enthought, Inc.
www.enthought.com

(c) 2001-2014, Enthought, Inc.
All Rights Reserved.
All trademarks and registered trademarks are the property of their
respective owners.
Enthought, Inc.
515 Congress Avenue
Suite 2100
Austin, TX 78701
www.enthought.com

Q1-2014

Python for Athena
Enthought, Inc.
www.enthought.com

Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
IPython . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Language Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Strings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Lists (Indexing and Slicing) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Mutable vs. Immutable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Tuple . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Dictionaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Assignment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
If Statements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
While Loops . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
For Loops . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
List Comprehensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Looping Patterns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Two types of arguments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Variable number of arguments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Modules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
PYTHONPATH . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Reading Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Writing Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Simple Class Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Overloading Addition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Sorting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Error and Exception Handling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Finally Clause . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Else Clause . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Raising Exceptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Error Messages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Warnings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Defining New Exceptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
File IO Error Handling and ’with’ statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Object Oriented Programming in Python . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2
6
19
23
32
38
39
40
44
47
49
51
52
53
56
59
61
62
65
71
72
74
75
76
77
80
84
85
86
87
88
90
91
93

Class Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Object Creation Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Simple Class Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Overloading Addition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Inheritance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Super . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Advanced Python Topics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Classes and properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Variable Scoping Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Class Scoping Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Partial Function Application (Curries) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Decorators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Iterators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Generators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Timing Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Dynamic Code Execution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Dynamic byte-code generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
csv files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Connecting to Databases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Regular Expressions in Python . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
datetime . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Timing Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
sys module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
os module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Remote calls using XMLRPC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Calling Sub-processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Multiprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
What is multiprocessing? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Multiprocessing vs. Threading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
The Process Class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Inter-process Communication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Synchronization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Task Pools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Shared Memory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Multiprocessing and NumPy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
NumPy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Matplotlib Basics (an interlude) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Plotting errors bars with Matplotlib . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Plotting in 3D . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Introducing NumPy Arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Slicing/Indexing Arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Multi-Dimensional Arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

97
98
99
100
101
104
106
107
113
117
119
120
129
130
132
135
137
138
141
146
156
158
161
164
166
168
173
174
175
176
178
180
182
184
185
187
194
209
211
215
218
219

Fancy Indexing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Array Data Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Array Calculation Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Summary of Array Attributes and Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Array Creation Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Trig and Other Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Vectorizing Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Array Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Universal Function Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Other NumPy Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Array Broadcasting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Vector Quantization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Structured Arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Memory Mapped Arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Output Formats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Error Handling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Interactive 3D Visualization with Mayavi - mlab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Mlab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Various types of plots with Mlab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Figure decorations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Mayavi GUI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Selected topics on scientific analysis with SciPy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Linear Algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Interpolation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Data analysis with pandas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Pandas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Creating pandas data structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Reading from and writing to files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Index manipulation and data alignment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Computations with pandas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Pivot tables and advanced reshaping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Data aggregation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Dealing with dates and times . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Visualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Software Craftsmanship in Python . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Naming Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Documenting Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Coding Standards . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Functions and Refactoring Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Code Read Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

224
230
243
247
251
256
258
259
263
269
272
278
284
288
301
303
305
306
308
315
316
318
320
328
335
344
351
352
354
357
358
362
365
367
369
371
373
381
399
422
446
447

Testing Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Monitoring execution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Timing Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Timing and profiling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Timing Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Profiling with cProfile . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
cProfile with kernprof.py script . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
cProfile from ipython . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Profiling with line profiler and kernprof . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
The Python Debugger . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Overview of Python with other languages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Interface with C : hand wrapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Cython . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Def vs. CDef . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Def vs. CDef . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Functions from C Libraries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Structures from C Libraries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Classes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Classes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Using NumPy with Cython . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

456
468
472
474
475
477
478
479
487
492
502
506
512
517
518
519
520
521
523
525

Python for Athena

Enthought, Inc.
www.enthought.com
1

What Is Python?
ONE LINER
Python is an interpreted programming language that allows you to do almost
anything possible with a compiled language (C/C++/Fortran) without requiring all the
complexity.

PYTHON HIGHLIGHTS

• Automatic garbage
collection

• “Batteries Included”

• Dynamic typing

• Portable

• Interpreted and interactive

• Easy to Learn and Use

• Object-oriented

• Truly Modular

• Free

2

Who is using Python?
WALL STREET

GOOGLE

Banks and hedge funds rely on
Python for their high speed trading
systems, data analysis, and
visualization.

One of top three languages used at
Google along with C++ and Java.
Guido van Rossum worked there.

HOLLYWOOD

YOUTUBE

Digital animation and special effects:
• Industrial Light and Magic
• Imageworks
• Tippett Studios
• Disney
• Dreamworks

Highly scalable web application for
delivering video content.

PETROLEUM INDUSTRY

PROCTER & GAMBLE

Geophysics and exploration tools:
•
ConocoPhillips
•
Shell

Fluid dynamics simulation tools.

REDHAT
Anaconda, the Redhat Linux installer
program, is written in Python.

3

PYPL PopularitY of Programming Language index

* https://sites.google.com/site/pydatalog/pypl/PyPL-PopularitY-of-Programming-Language (CC by 3.0 license)

4

GitHub Top Languages 2013

5

IPython
An enhanced
interactive Python shell

6

Starting IPython
IPython can be started:
• w/ the pylab icon

• by typing ipython --pylab in any
terminal/command prompt

7

Starting IPython in Canopy

8

PyLab: Interactive Python Environment

The PyLab mode in IPython handles some
gory details behind the scenes. It allows
both the Python command interpreter
(above) and the GUI plot window (right) to
coexist. This involves a bit of multithreaded magic.
PyLab also imports some handy functions
into the command interpreter for user
convenience.

9

IPython notebook: exportable session
Contains in 1 file the
inputs, the outputs,
figures, plain text
(markdown), titles and
more...
• .ipynb files.

• Create a new one in
Canopy by selecting:
File > New >
IPython notebook

10

IPython
STANDARD PYTHON
In [1]: a=1
In [2]: a
Out[2]: 1

AVAILABLE VARIABLES
In [3]: b = [1,2,3]
# List available variables.
In [4]: %whos
Variable
Type
Data/Info
---------------------------a
int
1
b
list
n=3

RESET
# Remove user defined variables.
In [5]: %reset
In [6]: %whos
Interactive namespace is empty.
“%reset” also removes the
names imported by PyLab,
such as the plot command.
In [7]: plot
NameError: name 'plot' is not
defined
# Reload pylab.
In [8]: %pylab
Welcome to pylab,…
11

Directory Navigation in IPython
# Change directory (note Unix style forward slashes!)
In [9]: cd c:/python_class/Demos/speed_of_light
c:\python_class\Demos\speed_of_light
Tab completion helps you find and type directory and file names.

# List directory contents (Unix style, not “dir”).
In [10]: ls
Volume in drive C has no label.
Volume Serial Number is 5417-593D
Directory of c:\python_class\Demos\speed_of_light
09/01/2008 02:53 PM 
.
09/01/2008 02:53 PM 
..
09/01/2008 02:48 PM
1,188 exercise_speed_of_light.txt
09/01/2008 02:48 PM
2,682,023 measurement_description.pdf
09/01/2008 02:48 PM
187,087 newcomb_experiment.pdf
09/01/2008 02:48 PM
1,312 speed_of_light.dat
09/01/2008 02:48 PM
1,436 speed_of_light.py
09/01/2008 02:48 PM
1,232 speed_of_light2.py
6 File(s)
2,874,278 bytes
2 Dir(s) 11,997,437,952 bytes free

12

Directory Bookmarks
# Print working directory name (Unix style, not “cd”).
In [11]: pwd
c:\python_class\Demos\speed_of_light

# Bookmark the demo and exercise directories, so we can return
# to them easily.
In [12]: cd ..
c:\python_class\Demos

In [13]: %bookmark demo
In [14]: cd ../Exercises
c:\python_class\Exercises

In [15]: %bookmark exer
In [16]: %bookmark –l
demo -> c:\python_class\Demos
exer -> c:\python_class\Exercises

In [17]: cd demo
(bookmark:demo) -> c:\python_class\Demos
13

Running Scripts in IPython
# tab completion
In [11]: %run speed_of_li
speed_of_light.dat speed_of_light.py
# execute a python file
In [11]: %run speed_of_light.py

14

Function Info
HELP USING ?
# Follow a command with '?' to print its documentation.
In [19]: len?
Type:
Base Class:
String Form:
Namespace:
Docstring:
len(object)

builtin_function_or_method


Python builtin
-> integer

Return the number of items of a sequence or mapping.

15

Function Info
SHOW SOURCE CODE USING ??
# Follow a command with '??' to print its source code.
In [43]: squeeze??
def squeeze(a):
"""Remove single-dimensional entries from the shape of a.
Examples
------->>> x = array([[[1,1,1],[2,2,2],[3,3,3]]])
>>> x.shape
(1, 3, 3)
>>> squeeze(x).shape
(3, 3)
"""
?? can’t show the source
try:
code for “extension” functions
squeeze = a.squeeze
that are implemented in C.
except AttributeError:
return _wrapit(a, 'squeeze')
return squeeze()
16

IPython History
HISTORY COMMAND

OUTPUT HISTORY

# list previous commands. Use
# ‘magic’ % because ‘hist’ is
# histogram function in pylab
In [3]: %hist
a=1
a

# grab previous result
In [5]: _
Out[5]: 'a\n’
# grab result from prompt[2]
In [6]: _2
Out[6]: 1

INPUT HISTORY
# list string from prompt[2]
In [4]: _i2
Out[4]: 'a\n'

The up and down arrows
scroll through your ipython
input history.

17

Reading Simple Tracebacks
ERROR ADDING AN INTEGER TO A STRING
In [9]: 1 + "hello"
Location and code
-------------------where error occurred.
TypeError
Traceback (most recent call last)
C:\... in ()
----> 1 1 + "hello"
TypeError: unsupported operand type(s) for +: 'int' and 'str'

The “type” of error
that occurred.

Short message about
why it occurred.

ERROR TRYING TO ADD A NON-EXISTENT VARIABLE
# Again we fail when adding two variables, but note that the
# traceback tells us we have a completely different problem.
# In this case, our variable doesn't exist, so the operation fails.
In [10]: undefined_var + 1
…
NameError: name 'undefined_var' is not defined
18

Language Introduction

19

Interactive Calculator
# adding two values
>>> 1 + 1
2
# setting a variable
>>> a = 1
>>> a
1
# checking a variable’s type
>>> type(a)

# an arbitrarily long integer
>>> a = 12345678901234567890
>>> a
12345678901234567890L
>>> type(a)

# Remove 'a' from the 'namespace'
>>> del a
>>> a
NameError: name 'a' is not
defined

# real numbers
>>> b = 1.4 + 2.3
>>> b
3.6999999999999997
# "prettier" version.
>>> print b
3.7
>>> type(b)

# complex numbers
>>> c = 2+1.5j
>>> c
(2+1.5j)
The four numeric types in Python on 32-bit
architectures are:
integer (4 byte)
long integer (any precision)
float (8 byte like C’s double)
complex (16 byte)
The numpy module, which we will see later,
supports a larger number of numeric types.20

More Interactive Calculation
ARITHMETIC OPERATIONS

TYPE CONVERSION

>>> 1+2-(3*4/6)**5+(7%5)
-27

>>> int(2.718281828)
2
>>> float(2)
2.0
>>> 1+2.0
3.0

SIMPLE MATH FUNCTIONS
>>> abs(-3)
3
>>> max(0, min(10, -1, 4, 3))
0
>>> round(2.718281828)
3.0

OVERWRITING FUNCTIONS (!)
# don't do this
>>> max = 100
# ...some time later...
>>> x = max(4, 5)
TypeError: 'int' object is not
callable

ALTERNATIVE NOTATIONS
>>> 0xFF
255
>>> 023 # octal!
19
>>> 6e3
6000.0

IN-PLACE OPERATIONS
>>> b = 2.5
>>> b += 0.5
# b = b + 0.5
>>> b
3.0
# Also -=, *=, /=, etc.
21

Logical expressions, bool data type
COMPARISON OPERATORS

and OPERATOR

# <, >, <=, >=, ==, !=
>>> 1 >= 2
False
>>> 1 + 1 == 2
True
>>> 2**3 != 3**2
True
# Chained comparisons
>>> 1 < 10 < 100
True

>>> 1 > 0 and 5 == 5
True
# If first operand is false,
# the second is not evaluated.
>>> 1 < 0 and max(0,1,2) > 1
False

bool DATA TYPE
>>> q = 1 > 0
>>> q
True
>>> type(q)


or OPERATOR
>>> a = 50
>>> a < 10 or a > 90
False
# If first operand is true,
# the second is not evaluated.
>>> a = 0
>>> a < 10 or a > 90
True

not OPERATOR
>>> not 10 <= a <= 90
True

22

Strings
CREATING STRINGS

STRING LENGTH

# using double quotes
>>> s = "hello world"
>>> print s
hello world
# single quotes also work
>>> s = 'hello world'
>>> print s
hello world

>>> s = "12345"
>>> len(s)
5

STRING OPERATIONS
# concatenating two strings
>>> "hello " + "world"
'hello world'
# repeating a string
>>> "hello " * 3
'hello hello hello '

SPLIT/JOIN STRINGS
# split space-delimited words
>>> s = "hello world"
>>> wrd_lst = s.split()
>>> print wrd_lst
['hello', 'world']
# join words back together
# with a space in between
>>> space = ' '
>>> space.join(wrd_lst)
'hello world'
23

A few string methods and functions
REPLACING TEXT
>>> s = "hello world"
>>> s.replace('world','Mars')
'hello Mars'

CONVERT TO UPPER CASE
>>> s.upper()
'HELLO WORLD'

REMOVE WHITESPACE
>>> s = "\t
hello
>>> s.strip()
'hello'

NUMBERS TO STRINGS
>>> str(1.1 + 2.2)
'3.3'
>>> repr(1.1 + 2.2)
'3.3000000000000003'
>>> str(1)
'1'
>>> hex(255)
'0xFF'
>>> oct(19)
'023'
STRINGS TO NUMBERS

\n"

>>> int('23')
23
>>> int('FF', 16)
255
>>> float('23')
23.0

24

Available string methods
# list available methods on a string
>>> dir(s)
'isspace',
[…
'istitle',
'capitalize',
'isupper',
'center',
'join',
'count',
'ljust',
'decode',
'lower',
'encode',
'lstrip',
'endswith',
'partition',
'expandtabs',
'replace',
'find',
'rfind',
'format',
'rindex',
'index',
'rjust',
'isalnum',
'rpartition',
'isalpha',
'rsplit',
'isdigit',
'rstrip',
'islower',

'split',
'splitlines',
'startswith',
'strip',
'swapcase',
'title',
'translate',
'upper',
'zfill']

25

Multi-line Strings
TRIPLE QUOTES
# strings in triple quotes
# retain line breaks
>>> a = """hello
... world"""
>>> print a
hello
world

LINE CONTINUATION
# The \ character means line
# continuation. Be careful
# because it must be the last
# character on line.
>>> a = "hello " \
...
"world"
>>> print a
hello world

MULTI-LINE WITH PARENTHESES

NEW LINE CHARACTER

# group strings using
# parentheses
>>> a = ("hello "
...
"world")
>>> print a
hello world

# including the new line
>>> a = "hello\nworld"
>>> print a
hello
world
26

String Formatting
str.format(*args, **kwargs)
The format() method replaces the replacement fields in the string with the values given as
arguments. Any other text in the string remains unchanged. For example:
>>> '{0:2d}-{1}: {name}, ${price:.2f}'.format(7, 19, name='SC1', price=3.4)
' 7-19: SC1, $3.40'
Optional

Replacement field:

{ : }

•

Delimited by curly brackets. (Use {{ and }} to include curly brackets in the output.)

•

If field_name is an integer, it refers to a position in the positional arguments:
>>> '{0} is greater than {1}'.format(100, 50)
'100 is greater than 50'

•

If field_name is a name, it refers to a keyword argument:
>>> '{last}, {first}'.format(first='Ellen', last='Ripley')
'Ripley, Ellen'

27

String Formatting - Format Spec
>>> 'price: ${0:=-7.2f}'.format(3.4)
'price: $
3.40'

The format spec is a sequence of characters
including:
• the alignment option,
• the sign option,
• the width (and .precision) option
• the type code.

ALIGNMENT OPTION
Char
<
>
=

^

Meaning
Left aligned.
Right aligned.
(For numeric types only.) Pad after the sign but
before the digits (e.g. +000000120).
Center within the available space.

If an alignment character is given, it may be preceded by
a fill character.

SIGN OPTION
For numbers only.
Char
Meaning
+
Include a sign for positive and negative number.
Indicate sign for negative numbers only (default)
space
Include a leading space for positive numbers.

STRING TYPE CODES
Type Meaning
s
String. This is the default, and may be omitted.

INTEGER TYPE CODES
Type
b
c
d
o
x
X
n
None

Meaning
Binary format.
Character; converts int to unicode char.
Decimal integer (base 10).
Octal (base 8).
Hex (base 16), lower case.
Hex (base 16), upper case.
Number; same as 'd', but uses current locale.
Same as 'd'.

FLOATING POINT TYPE CODES
Type
e
E
f
F
g
G

Meaning
Scientific notation.
Scientific notation, with upper case 'E'.
Fixed point.
Fixed point; same as 'f'.
General format.
General format; same as 'g', with upper case 'E' when
necessary.
n
Number; same as 'g', but uses current locale.
%
Percentage. Multiplies by 100 and displays with 'f',
followed by a percent sign.
28
None Same as 'g'.

String Formatting - Examples
# Basic string formatting
>>> print '{} {} {}'.format('a', 'b', 'c')
a b c
# Numbered fields refer to the position of the arguments
>>> print '{2} {1} {0}'.format('a', 'b', 'c')
c b a
# Named fields refer to keyword arguments
>>> print '{color} {n} {x}'.format(n=10, x=1.5, color='blue')
blue 10 1.5
# Positional and keyword arguments can be combined
>>> print '{color} {0} {x} {1}'.format(10, 'foo', x=1.5, color='blue')
blue 10 1.5 foo
# Precision and padding
>>> from math import pi
>>> print '{0:10} {1:10d} {c:10.2f}'.format('foo', 5, c=2*pi)
foo
5
6.28
29

String Formatting - Examples cont.
# Fixed point format (and a named keyword argument).
>>> print '[{x:5.0f}] [{x:5.1f}] [{x:5.2f}]'.format(x=12.3456)
[
12] [ 12.3] [12.35]
# Sign options.
>>> '{:-f}; {:-f}'.format(3.14, -3.14)
'3.140000; -3.140000'
>>> '{:+f}; {:+f}'.format(3.14, -3.14)
'+3.140000; -3.140000'
>>> '{: f}; {: f}'.format(3.14, -3.14)
' 3.140000; -3.140000'

# Default
# Use '+'

# Use ' '

# Alignment (and using a numbered positional argument).
>>> print '[{0:<10s}] [{0:>10s}] [{0:^10s}]'.format('PYTHON')
[PYTHON
] [
PYTHON] [ PYTHON ]
# Alignment with fill character.
>>> print '[{0:*<10s}] [{0:*>10s}] [{0:*^10s}]'.format('PYTHON')
[PYTHON****] [****PYTHON] [**PYTHON**]
# Different bases for an integer (hex, decimal, octal, binary).
>>> '{0:X} {0:d} {0:o} {0:b}'.format(254)
'FE 254 376 11111110'

30

Formatting with %
FORMAT OPERATOR %
# the % operator formats values
# to strings using C conventions.
>>> s = "some numbers:"
>>> x = 1.34
>>> y = 2
>>> t = "%s %f, %d" % (s,x,y)
>>> print t
some numbers: 1.340000, 2
>>> y = -2.1
>>> print "%f\n%f" % (x,y)
1.340000
-2.100000
>>> print "% f\n% f" % (x,y)
1.340000
-2.100000
>>> print "%4.2f" % x
1.34

CONVERSION CODES
Conversion
d or i
o
u
x
X
e
E
F or f
G or g
c
r
s

Meaning
Signed integer decimal
Unsigned octal
Unsigned decimal
Unsigned hexadecimal (lowercase)
Unsigned hexadecimal (uppercase)
Floating point exponential format (lowercase)
Floating point exponential format (uppercase)
Floating point decimal format
Floating point format or exponential
Single character
Converts object using repr()
Converts object using str()

CONVERSION FLAGS
Flag
Meaning
0
The conversion will be zero padded for
numeric values.
The converted value is left adjusted (overrides
the "0" conversion if both are given).
 (a space) A blank should be left before a
positive number (or empty string) produced by
a signed conversion.
+
A sign character ("+" or "-") will precede the
conversion (overrides a "space" flag).
31

List objects
LIST CREATION WITH BRACKETS
>>> a = [10,11,12,13,14]
>>> print a
[10, 11, 12, 13, 14]
CONCATENATING LIST
# simply use the + operator
>>> [10, 11] + [12, 13]
[10, 11, 12, 13]

range(start, stop, step)
# the range function is helpful
# for creating a sequence
>>> range(5)
[0, 1, 2, 3, 4]
>>> range(2,7)
[2, 3, 4, 5, 6]
>>> range(2,7,2)
[2, 4, 6]

REPEATING ELEMENTS IN LISTS
# the multiply operator
# does the trick
>>> [10, 11] * 3
[10, 11, 10, 11, 10, 11]
32

Indexing
RETRIEVING AN ELEMENT

NEGATIVE INDICES

# list
# indices: 0 1 2 3 4
>>> a = [10,11,12,13,14]
>>> a[0]
10

#
#
#
#
#

SETTING AN ELEMENT
>>> a[1] = 21
>>> print a
[10, 21, 12, 13, 14]

negative indices count
backward from the end of
the list
indices: -5 -4 -3 -2 -1
>>> a = [10,11,12,13,14]

>>> a[-1]
14
>>> a[-2]
13

OUT OF BOUNDS
>>> a[10]
Traceback (innermost last):
File "",line 1,in ?
IndexError: list index out of range

The first element in an array has index=0
as in C. Take note Fortran programmers!

33

More on list objects
LIST CONTAINING MULTIPLE
TYPES
# list containing integer,
# string, and another list
>>> a = [10,'eleven',[12,13]]
>>> a[1]
'eleven'
>>> a[2]
[12, 13]
# use multiple indices to
# retrieve elements from
# nested lists
>>> a[2][0]
12
Prior to version 2.5, Python was limited to
sequences with ~2 billion elements.
Python 2.5 can handle up to 263 elements.

LENGTH OF A LIST
>>> len(a)
3

DELETING OBJECT FROM LIST
# use the del keyword
>>> del a[2]
>>> a
[10,'eleven']
DOES THE LIST CONTAIN x ?
# use in or not in
>>> a = [10,11,12,13,14]
>>> 13 in a
True
>>> 13 not in a
False
34

Slicing
var[lower:upper:step]
Extracts a portion of a sequence by specifying a lower and upper bound.
The lower-bound element is included, but the upper-bound element is not included.
Mathematically: [lower, upper). The step value specifies the stride between elements.

SLICING LISTS

OMITTING INDICES

# indices:
#
-5 -4 -3 -2 -1
#
0 1 2 3 4
>>> a = [10,11,12,13,14]
# [10,11,12,13,14]
>>> a[1:3]
[11, 12]

# omitted boundaries are
# assumed to be the beginning
# (or end) of the list

# negative indices work also
>>> a[1:-2]
[11, 12]
>>> a[-4:3]
[11, 12]

# grab first three elements
>>> a[:3]
[10, 11, 12]
# grab last two elements
>>> a[-2:]
[13, 14]
# every other element
>>> a[::2]
[10, 12, 14]
35

A few methods for list objects
some_list.append( x )

some_list.pop( index )

Add the element x to the end
of the list some_list.

Return the element at the specified
index. Also, remove it from the list.

some_list.count( x )

some_list.remove( x )

Count the number of times x
occurs in the list.

Delete the first occurrence of x from the
list.

some_list.extend( sequence )

some_list.reverse( )

Concatenate sequence onto this list.

Reverse the order of elements in the
list.

some_list.index( x )

some_list.sort( cmp )

Return the index of the first
occurrence of x in the list.

By default, sort the elements in
ascending order. If a compare
function is given, use it to sort
the list.

some_list.insert( index, x )
Insert x before the specified index.

36

List methods in action
>>> a = [10,21,23,11,24]
# add an element to the list
>>> a.append(11)
>>> print a
[10,21,23,11,24,11]
# how many 11s are there?
>>> a.count(11)
2
# extend with another list
>>> a.extend([5,4])
>>> print a
[10,21,23,11,24,11,5,4]
# where does 11 first occur?
>>> a.index(11)
3
# insert 100 at index 2?
>>> a.insert(2, 100)
>>> print a
[10,21,100,23,11,24,11,5,4]

# pop the item at index=3
>>> a.pop(3)
23
# remove the first 11
>>> a.remove(11)
>>> print a
[10,21,100,24,11,5,4]
# sort the list (in-place)
# Note: use sorted(a) to
#
return a new list.
>>> a.sort()
>>> print a
[4,5,10,11,21,24,100]
# reverse the list
>>> a.reverse()
>>> print a
[100,24,21,11,10,5,4]
37

Mutable vs. Immutable
MUTABLE OBJECTS

IMMUTABLE OBJECTS

# Mutable objects, such as
# lists, can be changed
# in place.

# Immutable objects, such as
# integers and strings,
# cannot be changed in place.

# insert new values into list
>>> a = [10,11,12,13,14]
>>> a[1:3] = [5,6]
>>> print a
[10, 5, 6, 13, 14]

# try inserting values into
# a string
>>> s = 'abcde'
>>> s[1:3] = 'xy'

The cStringIO module treats strings
like a file buffer and allows insertions.
It’s useful when working with large
strings or when speed is paramount.

Traceback (innermost last):
File "",line 1,in ?
TypeError: object doesn't support
slice assignment

# here's how to do it
>>> s = s[:1] + 'xy' + s[3:]
>>> print s
'axyde'
38

Tuple – Immutable Sequence
TUPLE CREATION

LENGTH-1 TUPLE

>>> a = (10,11,12,13,14)
>>> print a
(10, 11, 12, 13, 14)

>>> (10,)
(10,)
>>> (10)
10

PARENTHESES ARE OPTIONAL

TUPLES ARE IMMUTABLE

>>> a = 10,11,12,13,14
>>> print a
(10, 11, 12, 13, 14)

# create a list
>>> a = range(10,15)
[10, 11, 12, 13, 14]

(10) is not a tuple,
but an integer
with parentheses.

# cast the list to a tuple
>>> b = tuple(a)
>>> print b
(10, 11, 12, 13, 14)
# try inserting a value
>>> b[3] = 23
TypeError: 'tuple' object doesn't
39
support item assignment

Dictionaries
Dictionaries store key/value pairs. Indexing a dictionary by a key returns
the value associated with it. The key must be immutable.
DICTIONARY EXAMPLE
# Create an empty dictionary using curly brackets.
>>> record = {}
# Each indexed assignment creates a new key/value pair.
>>> record['first'] = 'Jmes'
>>> record['last'] = 'Maxwell'
>>> record['born'] = 1831
>>> print record
{'first': 'Jmes', 'born': 1831, 'last': 'Maxwell'}
# Create another dictionary with initial entries.
>>> new_record = {'first': 'James', 'middle':'Clerk'}
# Now update the first dictionary with values from the new one.
>>> record.update(new_record)
>>> print record
{'first': 'James', 'middle': 'Clerk', 'last':'Maxwell', 'born':
1831}
40

Accessing and deleting keys and values
ACCESS USING INDEX NOTATION
>>> print record['first']
James

REMOVE AN ENTRY WITH DEL
>>> del record['middle']
>>> record
{'born': 1831, 'first':
'James', 'last': 'Maxwell'}

ACCESS WITH get(key, default)

REMOVE WITH pop(key, default)

The get() method returns the value
associated with a key; the optional
second argument is the return value if
the key is not in the dictionary.

pop() removes the key from the
dictionary and returns the value; the
optional second argument is the return
value if the key is not in the dictionary.

>>> record.get('born',0)
1831
>>> record.get('home', 'TBD')
'TBD'
>>> record['home']
KeyError: ...

>>> record.pop('born', 0)
1831
>>> record
{'first': 'James', 'last':
'Maxwell'}
>>> record.pop('born', 0)
0
41

A few dictionary methods
some_dict.clear( )

some_dict.keys( )

Remove all key/value pairs from
the dictionary, some_dict.

Return a list of all the keys in the
dictionary.

some_dict.copy( )

some_dict.values( )

Create a copy of the dictionary

Return a list of all the values in the
dictionary.

x in some_dict

some_dict.items( )

Test whether the dictionary contains the
key x.

Return a list of all the key/value pairs in
the dictionary.
42

Dictionary methods in action
# dict of animals:count pairs
>>> barn = {'cows': 1,
...
'dogs': 5,
...
'cats': 3}

# return key/value tuples
>>> barn.items()
[('cats', 3), ('dogs', 5),
('cows', 1)]

# test for chickens
>>> 'chickens' in barn
False

# How many cats?
>>> barn['cats']
3

# get a list of all keys
>>> barn.keys()
['cats','dogs','cows']
# get a list of all values
>>> barn.values()
[3, 5, 1]

# Change the number of cats.
>>> barn['cats'] = 10
>>> barn['cats']
10
# Add some sheep.
>>> barn['sheep'] = 5
>>> barn['sheep']
5

43

Set objects
DEFINITION

REMOVE ELEMENTS

A set is an unordered collection of
unique, immutable objects.

# Remove an element. Raise
# exception if not in the set.
>>> t.remove(1)
>>> t
set([2, 3, 5, 6, 7])
# Remove and return an
# arbitrary element from the set.
>>> t.pop()
2
# Remove element from list
# if it exists.
>>> t.discard(3)
>>> t
set([5, 6, 7])
# Otherwise, do nothing.
>>> t.discard(20)
>>> t
set([5, 6, 7])

CONSTRUCTION
# an empty set
>>> s = set()
# convert a sequence to set
>>> t = set([1,2,3,1])
# note removal of duplicates
>>> t
set([1, 2, 3])

ADD ELEMENTS
>>> t.add(5)
set([1, 2, 3, 5])
>>> t.update([5,6,7])
set([1, 2, 3, 5, 6, 7])

44

Set Operations
OVERLAPPING SETS
>>> a = set([1,2,3,4])
>>> b = set([3,4,5,6])
UNION
>>> a.union(b)
set([1, 2, 3, 4, 5, 6])
>>> a | b
set([1, 2, 3, 4, 5, 6])

A

B

DIFFERENCE
A

B

A

B

A

B

SYMMETRIC DIFFERENCE

INTERSECTION
>>> a.intersection(b)
set([3, 4])
>>> a & b
set([3, 4])

>>> a.difference(b)
set([1, 2])
>>> a – b
set([1, 2])

A

B

>>> a.symmetric_difference(b)
set([1, 2, 5, 6])
>>> a ^ b #xor
set([1, 2, 5, 6])
45

Set Containment
OVERLAPPING SETS
>>> a = set([1,2,3])
>>> b = set([1,2])

A B

ISSUBSET
# Is b fully contained in a?
>>> b.issubset(a)
True
>>> b <= a
True

ISSUPERSET
# Does a fully contain b?
>>> a.issuperset(b)
True
>>> a >= b
True

There is also an immutable type of
set called a frozenset (analogous to
a tuple) which can be used as a
dictionary key or an element of a
set.
46

Assignment of “simple” object
Assignment creates object references.
>>> x = 0

x

# This causes x and y to point
# to the same value.
>>> y = x

y

# Re-assigning y to a new value
# decouples the two variables.
>>> y = "foo"
>>> print x
0

0

x

0

y

foo
47

Assignment of Container object
Assignment creates object references.
>>> x = [0, 1, 2]

x

# This causes x and y to point
# to the same list.
>>> y = x

y

# A
>>>
>>>
[0,

change to y also changes x.
y[1] = 6
print x
6, 2]

# Re-assigning y to a new list
# decouples the two variables.
>>> y = [3, 4]

x
y

0 1 2

0 6 2

x

0 6 2

y

3 4
48

If statements
if/elif/else provides conditional execution of code blocks.

IF STATEMENT FORMAT

if :


elif :

else:


IF EXAMPLE
# a simple if statement
>>> x = 10
>>> if x > 0:
...
print 'Hey!'
...
print 'x > 0'
... elif x == 0:
...
print 'x is 0'
... else:
...
print 'x is negative'
... < hit return >
Hey!
x > 0
49

Test Values
• zero, None, “”, and empty objects are treated as False.
• All other objects are treated as True.

EMPTY OBJECTS
# empty objects test as false
>>> x = []
>>> if x:
...
print 1
... else:
...
print 0
... < hit return >
0

It often pays to be explicit. If you are
testing for an empty list, then test for:
if len(x) == 0:
...
This is clearer to future readers of
your code. It also can avoid bugs
where x==None may be passed in and
unexpectedly go down this path.

50

While loops
while loops iterate until a condition is met
while :

WHILE LOOP
# the condition tested is
# whether lst is empty
>>> lst = range(3)
>>> while lst:
...
print lst
...
lst = lst[1:]
... < hit return >
[0, 1, 2]
[1, 2]
[2]

BREAKING OUT OF A LOOP
# breaking from an infinite
# loop
>>> i = 0
>>> while True:
...
if i < 3:
...
print i,
...
else:
...
break
...
i = i + 1
... < hit return >
0 1 2
51

For loops
for loops iterate over a sequence of objects
for  in :

TYPICAL SCENARIO

LOOPING OVER A STRING

>>> for item in range(5):
...
print item,
... < hit return >
0 1 2 3 4

>>> for item in 'abcde':
...
print item,
... < hit return >
a b c d e

# For a large range, xrange()
# is faster and more memory
# efficient.
>>> for item in xrange(10**6):
...
print item,
... < hit return >
0 1 2 3 4 5 6 7 8 9 10 11 ...

LOOPING OVER A LIST
>>> animals=['dogs','cats',
...
'bears']
>>> accum = ''
>>> for animal in animals:
...
accum += animal + ' '
... < hit return >
>>> print accum
52
dogs cats bears

List Comprehension
LIST TRANSFORM WITH LOOP
# element by element transform of
# a list by applying an
# expression to each element
>>> a = [10,21,23,11,24]
>>> results=[]
>>> for val in a:
...
results.append(val+1)
>>> results
[11, 22, 24, 12, 25]

FILTER-TRANSFORM WITH LOOP
# transform only elements that
# meet a criteria
>>> a = [10,21,23,11,24]
>>> results=[]
>>> for val in a:
...
if val>15:
...
results.append(val+1)
>>> results
[22, 24, 25]

LIST COMPREHENSION
# list comprehensions provide
# a concise syntax for this sort
# of element by element
# transformation
>>> a = [10,21,23,11,24]
>>> [val+1 for val in a]
[11, 22, 24, 12, 25]

LIST COMPREHENSION WITH
FILTER
>>> a = [10,21,23,11,24]
>>> [val+1 for val in a if val>15]
[22, 24, 25]

Consider using a list
comprehension whenever you
need to transform one sequence
to another.
53

Generator Expressions
Generator expressions are like list comprehensions without the brackets:
LIST COMPREHENSION

GENERATOR EXPRESSION

>>> set([str(i) for i in range(5)])

>>> set(str(i) for i in range(5))

set(['1', '0', '3', '2', '4'])

set(['1', '0', '3', '2', '4'])

# tests on sequences

# tests on sequences

>>> any([i >= 3 for i in range(5)])

>>> any(i >= 3 for i in range(5))

True

True

>>> all([i >= 3 for i in range(5)])

>>> all(i >= 3 for i in range(5))

False

False

# summation

# summation

>>> sum([i**2 for i in range(5)])

>>> sum(i**2 for i in range(5))

30

30

54

Multiple assignments
FROM TUPLE TO TUPLE

# creating a tuple without ()
>>> d = 1,2,3
>>> d
(1, 2, 3)

FROM LIST TO TUPLE
# also works for lists
>>> a,b,c = [1,2,3]
>>> print b
2

# multiple assignments
>>> a,b,c = 1,2,3
>>> print b
2
# multiple assignments from a
# tuple
>>> a,b,c = d
>>> print b
2
55

Looping Patterns
MULTIPLE LOOP VARIABLES
# Looping through a sequence of
# tuples allows multiple
# variables to be assigned.
>>> pairs = [(0,'a'),(1,'b'),
...
(2,'c')]
>>> for index, value in pairs:
...
print index, value
0 a
1 b
2 c

ENUMERATE
# enumerate -> index, item.
>>> y = ['a', 'b', 'c']
>>> for index, value in enumerate(y):

...
0 a
1 b
2 c

print index, value

ZIP
# zip 2 or more sequences
# into a list of tuples
>>> x = [0, 1, 2]
>>> y = ['a', 'b', 'c']
>>> zip(x,y)
[(0,'a'), (1,'b'), (2,'c')]
>>> for index, value in zip(x,y):
...
print index, value
0 a
1 b
2 c

REVERSED
>>> z = [(0,'a'),(1,'b'),(2,'c')]
for index, value in reversed(z):
...
print index, value
2 c
1 b
0 a
56

Looping over a dictionary
>>> d = {'a':1, 'b':2, 'c':3}

DEFAULT LOOPING (KEYS)

LOOPING OVER VALUES

>>> for key in d:
...
print key, d[key]
a 1
c 3
b 2

>>> for val in d.values():
...
print val
1
3
2

LOOPING OVER KEYS (EXPLICIT)

LOOPING OVER ITEMS

>>> for key in d.keys():
...
print key, d[key]
a 1
c 3
b 2

>>> for key, val in d.items():
...
print key, val
a 1
c 3
b 2
57

Anatomy of a function
The keyword def indicates the
start of a function.

Function arguments are listed,
separated by commas. They are passed
by assignment.
A colon ( : )
terminates
the function
signature.

Indentation is
used to indicate
the contents of
the function. It
is not optional,
but a part of the
syntax.

def add(arg0, arg1):
"""Add two numbers"""
a = arg0 + arg1
return a
An optional docstring
An optional return
statement specifies
the value returned from the
function. If return is omitted,
the function returns the
special value None.

documents the function
in a standard way for
tools like ipython.

59

Our new function in action
# We'll create our function
# on the fly in the
# interpreter.
>>> def add(x,y):
...
a = x + y
...
return a

# How about strings?
>>> val_1 = 'foo'
>>> val_2 = 'bar'
>>> add(val_1,val_2)
'foobar'

# Test it out with numbers.
>>> val_1 = 2
>>> val_2 = 3
>>> add(val_1,val_2)
5

# Functions can be assigned
# to variables.
>>> func = add
>>> func(val_1, val_2)
'foobar'

# How about numbers and strings?
>>> add('abc',1)
Traceback (innermost last):
File "", line 1, in ?
File "", line 2, in add
TypeError: cannot add type "int" to string

60

Function Calling Conventions
POSITIONAL ARGUMENTS

DEFAULT VALUES

# The "standard" calling
# convention we know and love.
>>> def add(x, y):
...
return x + y

# Arguments can be
# assigned default values.
>>> def quad(x,a=1,b=1,c=0):
...
return a*x**2 + b*x + c

>>> add(2, 3)
5

# Use defaults for a, b and c.
>>> quad(2.0)
6.0

KEYWORD ARGUMENTS
# specify argument names
>>> add(x=2, y=3)
5
# or even a mixture if you are
# careful with order
>>> add(2, y=3)
5

# Set b=3. Defaults for a & c.
>>> quad(2.0, b=3)
10.0
# Keyword arguments can be
# passed in out of order.
>>> quad(2.0, c=1, a=3, b=2)
17.0
61

Function Calling Conventions
VARIABLE NUMBER OF ARGS

VARIABLE KEYWORD ARGS

# Pass in any number of
# arguments. Extra arguments
# are put in the tuple args.
>>> def foo(x, y, *args):
...
print x, y, args

# Extra keyword arguments
# are put into the dict kw.

>>> foo(2, 3, 'hello', 4)
2 3 ('hello', 4)

>>> bar(1, y=2, a=1, b=2)
1 2 {'a': 1, 'b': 2}

>>> def bar(x, y=1, **kw):
...
print x, y, kw

62

Function Calling Conventions
THE ‘ANYTHING’ SIGNATURE
# This signature takes any
# number of positional and
# keyword arguments.
>>> def foo(*args, **kw):
...
print args, kw

>>> foo(2, 3, x='hello', y=4)
(2, 3) {'x': 'hello', 'y': 4}

MULTIPLE FUNCTION RETURNS
# To return multiple values
# from a function, we return
# a tuple containing those
# values. This is a common
# use of multiple (tuple)
# assignment.
>>> def functions(x):
...
y1 = x**2 + x
...
y2 = x**3 + x**2 + 2*x
...
return y1, y2
>>> a, b = functions(c)

63

Expanding Function Arguments
POSITIONAL ARGUMENT
EXPANSION

KEYWORD ARGUMENT
EXPANSION

>>> def add(x, y):
...
return x + y

>>> def bar(x, y=1, **kw):
...
print x, y, kw

# '*' in a function call
# converts a sequence into the
# arguments to a function.
>>> vars = [1,2]
>>> add(*vars)
3

# '**' expands a
# dictionary into keyword
# arguments for a function.
vars = {'y':3, 'z':4}
>>> bar(1, **vars)
1, 3, {'z': 4}

64

Modules
EX1.PY

FROM SHELL

# ex1.py

[ej@bull ej]$ python ex1.py
6 3.1416

PI = 3.1416

FROM INTERPRETER

def sum(lst):
tot = lst[0]
for value in lst[1:]:
tot = tot + value
return tot

# load and execute the module
>>> import ex1
6 3.1416
# get/set a module variable
>>> print ex1.PI
3.1416
>>> ex1.PI = 3.14159
>>> print ex1.PI
3.14159
# call a module function
>>> t = [2,3,4]
>>> ex1.sum(t)
9
65

w = [0,1,2,3]
print sum(w), PI

Modules cont.
INTERPRETER

EDITED EX1.PY

# load and execute the module
>>> import ex1
6 3.1416
< edit file >
# import module again
>>> import ex1
# nothing happens!!!

# ex1.py version 2

# Use reload to force a
# previously imported library
# to be reloaded.
>>> reload(ex1)
10 3.14159

PI = 3.14159
def sum(lst):
tot = 0
for value in lst:
tot = tot + value
return tot
w = [0,1,2,3,4]
print sum(w), PI

66

Modules cont. 2
A Python file can be used as a script, or as a module, or both.
EX2.PY
# An example module that can
# be run as a script.
PI = 3.1416

def sum(lst):
""" Sum the values in a
list.
"""
tot = 0
for value in lst:
tot = tot + value
return tot

def add(x,y):
" Add two values."
a = x + y
return a
def test():
w = [0,1,2,3]
assert( sum(w) == 6)
print 'test passed'

# This code runs only if this
# module is the main program.
if __name__ == '__main__':
test()
67

import Variations
BASIC IMPORTS

IMPORTING SPECIFIC SYMBOLS

# The most basic import
>>> import ex2
>>> ex2.PI
3.1416

# Select specific names to
# bring into the local
# namespace.
>>> from ex2 import add, PI
>>> PI
3.1416
>>> add(2,3)
5

ALIASING A NAME

IMPORTING *EVERYTHING*

# Use an ‘alias’
>>> import ex2 as e2
>>> e2.PI
3.1416

# Pull *everything* into the
# local namespace.
>>> from ex2 import *
>>> PI
3.1416
>>> add(3,4.5)
7.5
68

Modules cont. 3
PACKAGES

FILE STRUCTURE

Often a library will contain several

foo/
__init__.py
bar.py (defines func)
baz.py (defines zap)

modules. These may be organized
in a hierarchical directory structure,
and imported using "dotted module
names". The first and the intermediate

names (if any) are called "packages".

• The directory foo must be in the
python search path

Example:

• The file __init__.py indicates
that foo is a package. It may be an
empty file.

>>> from foo.bar import func
>>> from foo.baz import zap
bar.py and baz.py are modules in the

• In the simplest case:
package = directory
module = python file

package foo.
69

Standard Modules
Python has a large library of standard modules ("batteries included"):
re - regular expressions

cmd – support for command interpreters

copy – shallow and deep copy operations

pdb – Python interactive debugger

datetime - time and date objects

profile, cProfile, timeit – Python profilers

math, cmath - real and complex math

collections, heapq, bisect – standard CS
algorithms and data structures

decimal, fractions - arbitrary precision
decimal and rational number objects

mmap – memory-mapped files

os, os.path, shutil - filesystem operations

threading, Queue – threading support

sqlite3 - internal SQLite database

multiprocessing – process based ‘threading’

gzip, bz2, zipfile, tarfile – compression and
archiving formats

subprocess – executing external commands

csv, netrc – file format handling

pickle, cPickle – object serialization
struct – interpret bytes as packed binary data

xml – various modules for handling XML
htmllib – an HTML parser
httplib, ftplib, poplib, socket, etc. –
modules for standard internet protocols

and many more… To see the content of one:
>>> dir(module_name)

70

Setting up PYTHONPATH
PYTHONPATH is an environment variable (or set of registry entries on
Windows) that lists the directories Python searches for modules.
WINDOWS
• Right-click on My Computer
• Click Properties
• Click Advanced Tab
• Click Environment Variables Button at
the bottom of the Advanced Tab
• Click New to create
PYTHONPATH or
• Click Edit to change existing
PYTHONPATH
• Changes take effect in the next
Command Prompt or IPython session.

UNIX: .cshrc
!! NOTE: The following should !!
!! all be on one line
!!
setenv PYTHONPATH
$PYTHONPATH:$HOME/your_modules

UNIX: .bashrc
PYTHONPATH=$PYTHONPATH:$HOME/your
_modules
export PYTHONPATH

71

Reading files
FILE INPUT EXAMPLE

PRINTING THE RESULTS

>>> results = []
>>> f = open('c:\\rcs.txt','r')
# Read all the lines.
>>> lines = f.readlines()
>>> f.close()
# Discard the header.
>>> lines = lines[1:]
>>> for line in lines:
...
# split line into fields
...
fields = line.split()
...
# convert text to numbers
...
freq = float(fields[0])
...
vv = float(fields[1])
...
hh = float(fields[2])
...
# group & append to results
...
all = [freq,vv,hh]
...
results.append(all)
... < hit return >

>>> for i in results: print i
[100.0, -20.30…, -31.20…]
[200.0, -22.70…, -33.60…]

EXAMPLE FILE: RCS.TXT
#freq (MHz) vv (dB) hh (dB)
100
-20.3
-31.2
200
-22.7
-33.6

See demo/reading_files directory
for code.
72

More compact version
ITERATING ON A FILE AND LIST COMPREHENSIONS
>>> results = []
>>> f = open('c:\\rcs.txt','r')
>>> f.readline()
'#freq (MHz) vv (dB) hh (dB)\n'
>>> for line in f:
...
all = [float(val) for val in line.split()]
...
results.append(all)
... < hit return >
>>> for i in results:
...
print i
... < hit return >
EXAMPLE FILE: RCS.TXT
#freq (MHz)
100
200

vv (dB)
-20.3
-22.7

hh (dB)
-31.2
-33.6
73

Writing files
FILE OUTPUT

REDIRECTED PRINT

>>> # Mode 'w': create new file:
>>> f = open('a.txt', 'w')
>>> f.write('Hello, world!')
>>> f.close()
>>> open('a.txt').read()
'Hello, world!'
>>> # Use the 'with' statement:
>>> with open('a.txt', 'w') as f:
....
f.write('Wow!')
....
>>> open('a.txt').read()
'Wow!'
>>> # Mode 'a': append to file:
>>> with open('a.txt', 'a') as f:
....
f.write(' Boo.')
....
>>> open('a.txt').read()
'Wow! Boo.'

>>> # Redirect output of a
>>> # print statement:
>>> f = open('a.txt', 'w')
>>> print >> f, "Here I am."
>>> f.close()
>>> open('a.txt').read()
'Here I am.\n'
WRITE AND READ
>>> f = open('a.txt', 'w+')
>>> print >> f, 12, 34, 56
>>> f.seek(3)
>>> f.read(2)
'34'
>>> f.close()
74

Particle Class Example
SIMPLE PARTICLE CLASS
>>> class Particle(object):
...
# Initializer method
...
def __init__(self, m, v):
...
self.mass = m # Assign attribute values of new object
...
self.velocity = v
...
# Method for calculating object momentum
...
def momentum(self):
...
return self.mass * self.velocity
...
# A "magic" method defines object's string representation.
...
# Evaluating the repr recovers the Particle object.
...
def __repr__(self):
...
return "Particle({0}, {1})".format(
...
repr(self.mass), repr(self.velocity))

EXAMPLE
>>> a = Particle(3.2, 4.1)
>>> print a.momentum()
13.12
>>> a
Particle(3.2, 4.1)

75

Overloading Addition
SIMPLE PARTICLE CLASS
Classes can override many behaviors using special method names —
including numeric behavior.
...
...
...
...
...
...

def __add__(self, other)
if not isinstance(other, Particle):
return NotImplemented
mnew = self.mass + other.mass
vnew = (self.momentum() + other.momentum())/mnew
return Particle(mnew, vnew)

EXAMPLE (cont.)
>>> b = Particle(8.6, 10.2)
>>> b, a
(Particle(8.6, 10.2), Particle(3.2, 4.1))
>>> c = a + b
>>> c
Particle(11.8, 8.545762711864406)
>>> print c.momentum()
100.84

76

Sorting
IN-PLACE VS NEW LIST
# sorting
>>> x = ['s','o','r','t']
# new list
>>> sorted(x)
['o', 'r', 's', 't']
# in-place
>>> x.sort()
>>> x
['o', 'r', 's', 't']
# reversing
# new list (note explicit 'list')
>>> x = ['b', 'a', 'c', 'k']
>>> list(reversed(x))
['k', 'c', 'a', 'b']
# in-place
>>> x.reverse()
>>> x
['k', 'c', 'a', 'b']

CUSTOM SORTS
# define a key function to
# transform values before
# comparing in a sort
>>> def ignore_case(x):
...
return x.lower()

>>> x = ['S','o','r','T']
>>> sorted(x)
['S', 'T', 'o', 'r']
>>> sorted(x, key=ignore_case)
['o', 'r', 'S', 'T']

77

Sorting
SORTING CLASS INSTANCES
# comparison functions for a variety of particle values
>>> def by_mass(x):
...
return x.mass
See demo/particle
>>> def by_momentum(x):
directory for sample
...
return x.momentum()
code.
>>> def by_kinetic_energy(x):
...
return 0.5*x.mass*x.velocity**2
# sorting particles in a list by their various properties
>>> from particle import Particle
>>> x = [Particle(1.2,3.4), Particle(2.1,2.3), Particle(4.6,.7)]
>>> sorted(x, key=by_mass)
[(m:1.2, v:3.4), (m:2.1, v:2.3), (m:4.6, v:0.7)]
>>> sorted(x, key=by_momentum)
[(m:4.6, v:0.7), (m:1.2, v:3.4), (m:2.1, v:2.3)]
>>> sorted(x, key=by_kinetic_energy)
[(m:4.6, v:0.7), (m:2.1, v:2.3), (m:1.2, v:3.4)]

78

Excellent source of help
http://www.python.org/doc

79

Error and Exception
Handling

80

Basic Exception Handling
ERROR ON LOG OF ZERO
>>> import math
>>> math.log10(10.0)
1.0
>>> math.log10(0.0)
Traceback (innermost last):
ValueError: math domain error
CATCHING ERROR AND CONTINUING
>>> a = 0.0
>>> try:
...
r = math.log10(a)
... except ValueError:
...
print 'Warning: overflow occurred. Value set to -100.0'
...
# set value to -100.0 and continue
...
r = -100.0
Warning: overflow occurred. Value set to -100.0
>>> print r
81
-100.0

Exceptions
ACCESS TO THE EXCEPTION OBJECT
try:

a = value_bag[key]
Exception Object
r = math.log10(a)
except KeyError as error:
print "Variable '%s' not found" % error.args
MULTIPLE EXCEPTION CLAUSES
try:
a = value_bag[key]
r = math.log10(a)
except KeyError as error:
print "Variable '%s' not found" % error.args
except ValueError:
print 'Warning: overflow occurred. Value set to -100.0'
# set value to -100.0 and continue
r = -100.0
82

Exceptions
HANDLING MULTIPLE EXCEPTIONS IN ONE CODE BLOCK
try:

a = value_bag[key]
r = math.log10(a)
except (KeyError, ValueError) as error:
print "an error occurred", error
CATCH ALL EXCEPTIONS
# If the Exception exception type is
# specified, almost all exceptions
# are caught. Note: Don't do this in
# libraries because it can “mask”
# unexpected exceptions, which
# should be passed to calling code.
try:
employee = directory_lookup(name)
except Exception as e:
print ”An error occurred: %s" % e

You can also leave off the
exception type completely and
catch all exceptions, including
SystemExit and
KeyboardInterrupt. This can
make it hard to stop execution
of a program, so this is almost
never a good idea.
83

Exceptions – Finally Clause
TRY, FINALLY
# The finally clause always executes. Use it for code that
# needs to “clean up” a resource whether the code executed
# successfully or not.
try:
# We don’t want others to use the radio_station object
# while it is on the air.
radio_station.on_air = True
radio_station.broadcast("Pinball Wizard")
except KeyError as error:
print "Could not find song '%s'" % error.args
finally:
# If an exception occurs, or the song finishes,
# the station is taken off air.
radio_station.on_air = False

84

Exceptions – Else Clause
TRY, ELSE
# The else clause only executes if an exception *does not*
# occur.
# An example from Python’s standard tutorial.
#
# Print out the line count for all the file names passed in
# on the command line.
for arg in sys.argv[1:]:
try:
f = open(arg, 'r')
except IOError:
print 'cannot open', arg
else:
print arg, 'has', len(f.readlines()), 'lines'
f.close()

85

Raising Exceptions
RAISE
# Use the raise statement to raise an exception from your code.
def percent_range_check(percent):
if not 0.0 <= percent <= 100.0:
msg = "Percent value should be between 0 and 100"
# This is the standard, Python 3.0 compatible
# way to raise exceptions.
raise ValueError(msg)
# The following also works, but is deprecated.
# in Python 3.0
# raise ValueError, msg

It will not work

>>> percent_range_check(101.0)
ValueError: Percent value should be between 0 and 100

86

Error Messages
RAISE
# Use the raise statement to raise an exception from your code.
def percent_range_check(percent):
if not 0.0 <= percent <= 100.0:
msg = "Percent value should be between 0 and 100"
raise ValueError(msg)
>>> percent_range_check(101.0)
ValueError: Percent value should be between 0 and 100

INFORMATIVE ERROR MESSAGES
# If possible, print out the offending value in error messages.
def percent_range_check2(percent):
if not 0.0 <= percent <= 100.0:
msg = "Percent (%3.2f) is not between 0 and 100" % percent
raise ValueError(msg)
>>> percent_range_check2(101.0)
ValueError: Percent(101.00) is not between 0 and 100

87

Warnings
WARNINGS
# Warnings alert users without halting execution.
import warnings
def percent_range_warning(percent):
if not 0.0 <= percent <= 100.0:
msg = "Percent(%3.2f) is not between 0 and 100" % percent
warnings.warn(msg, RuntimeWarning)
# Warnings are different than exceptions because they don’t halt execution.
>>> percent_range_warning(101.0)
RuntimeWarning: Percent(101.00) is not between 0 and 100
# You can choose to ignore certain types of warnings globally.
# allowable actions: error, ignore, always, default, once, module
>>> warnings.filterwarnings(action="ignore", category=RuntimeWarning)
>>> percent_range_warning(101.0)


88

Exception and Warnings Example
import warnings
def weighted_average(values, weights):
"""
Return the average of all the values weighted by the weights array.
Both values and weights are numpy arrays and must be the same length.
The sum of the weights must be 1.0.
"""
#ensure weights sum to (nearly) 1.0
weights_total = sum(weights)
if not allclose(weights_total, 1.0, atol=1e-8):
msg = "The sum of the weights should usually be 1.0." \
"Instead, it was '%f'" % weights_total
warnings.warn(msg)
# Provide useful error message for unequal array lengths.
if len(weights) != len(values):
msg = "The values (len=%d) and weights (len=%d) arrays" \
"must have the same lengths." % (len(values), len(weights))
raise ValueError(msg)
# weighted average calculation
avg = sum(values * weights) / weights_total
return avg

89

Defining New Exceptions
EXCEPTION CLASSES
class LengthMismatchError(ValueError):
""" Sequence of wrong length was used """
pass
def check_for_length_mismatch(a, b):
""" Compare the lengths of sequences a and b. If they are different,
raise a LengthMismatchError.
"""
if len(a) != len(b):
msg = "The two sequences do not have the same length " \
"(%d!=%d)." % (len(a), len(b))
raise LengthMismatchError(msg)
>>> check_for_length_mismatch([1,2],[1,2,3])
LengthMismatchError: The two sequences do not have the same length (2!=3).

CATCHING BASE EXCEPTIONS
# Catching either LengthMismatchError, or ValueError will catch our exception.
>>> try:
...
check_for_length_mismatch([1,2],[1,2,3])
... except ValueError:
...
print "exception caught"
exception caught
90

Robust File IO Error Handling
TYPICAL FILE IO TRY BLOCKS
try:

file = open(file_name, "rb")
try:
# Move into the file header and read the “name” field.
file.seek(NAME_OFFSET)
name = file.read(12)
# Other file manipulation...
finally:
# Ensure File is closed, even if an exception occurs.
file.close()
except IOError as err:
# Handle file IO Errors that occur during opening the file or
# reading data from the file here. Use err.errno to determine
# the type of IOError if necessary.
# Using the logging system to report unhandled exceptions.
logging.exception("unexpected IOError")
91

Using ‘with’ for File IO Error Handling
TYPICAL FILE IO TRY BLOCKS
# Enable the "with" statement feature within this module.
# Not necessary for Python >= 2.6.
from __future__ import with_statement
try:
with open("myfile.txt", "rb") as file:
# Move into the file header and read the “name” field.
file.seek(NAME_OFFSET)
name = file.read(12)
# Other file manipulation...
except IOError as err:
# Handle file IO Errors that occur during opening the file or
# reading data from the file here. Use err.errno to determine
# the type of IOError if necessary.
# Using the logging system to report unhandled exceptions.
logging.exception("unexpected IOError")
92

Object Oriented
Programming
In Python

93

Modeling a Power Plant
ATTRIBUTES
# Data that describes the
# modeled "object"
name: Comanche Peak
maximum_output: 2.3 Gigawatts
current_output: ? Gigawatts
status: normal
night_watchman: Homer Simpson (!)
...

BEHAVIORS (METHODS)
# tasks or operations that the
# model does.
start_up()
shut_down()
emergency_shutdown()
...
94

Creating Objects
Object oriented programming unifies (encapsulates)
attributes and behavior within a single
definition, called a Class.
Instances, or Objects, of a class are specific
manifestations of that class.
INSTANTIATING CLASS INSTANCES (OBJECTS)
Instance

Class

>>> plant_1 = PowerPlant(name='Comanche Peak')
# Create a 2nd instance from the same class
>>> plant_2 = PowerPlant(name='Susquehanna')

95

Working with Objects
# Our befuddled friend.
>>> homer = Person(first='Homer', last='Simpson')

# Create the power plant he is in charge of...
>>> plant = PowerPlant(name='Comanche Peak', maximum_output=2.3,
night_watchman=homer)
# Start the plant using the 'start_up' method.
>>> plant.start_up()
# Homer does his thing.
>>> homer.eat('donut')
>>> homer.take_nap()
# Check the status (an attribute) of
# the plant. No Bueno.
>>> if plant.status != 'normal':
...
print 'status:', plant.status
...
homer.speak('Doh!')
...
plant.emergency_shutdown()
status: meltdown
Homer says 'Doh!'

96

Class Definition
class PowerPlant(object):
# Constructor method
def __init__(self, # self is ALWAYS the first argument.
name='', maximum_output=0.0, night_watchman=None):
# assign passed-in arguments to new object
self.name = name
self.maximum_output = maximum_output
self.night_watchman = night_watchman
# Intialize other attributes.
self.current_output = 0.0
self.status = 'normal'
# class methods
def start_up(self):
self.start_reactor_cooling_pump()
self.reduce_boric_acid_concentration()
self.remove_control_rods()
def shut_down(self):
...


97

Object Creation Process
CREATING A CLASS OBJECT
# What does Python do when it sees this line of code?
>>> plant = PowerPlant(name='Comanche Peak', maximum_output=2.3,
night_watchman=homer)

UNDER THE COVERS…
# The first step is to call the PowerPlant class' "magic"
# __new__ method to create the object. (This is a secret…)
new_object = PowerPlant.__new__(PowerPlant,
name='Comanche Peak',
maximum_output=2.3,
night_watchman=homer)
# The second step is to call the magic "constructor" method, __init__.
# This is the method you will need to write...
PowerPlant.__init__(new_object,
name='Comanche Peak',
maximum_output=2.3,
night_watchman=homer)
# Python hands this newly created object back to you.
plant = new_object
98

Particle Class Example
SIMPLE PARTICLE CLASS
>>> class Particle(object):
...
# Initializer method
...
def __init__(self, m, v):
...
self.mass = m # Assign attribute values of new object
...
self.velocity = v
...
# Method for calculating object momentum
...
def momentum(self):
...
return self.mass * self.velocity
...
# A "magic" method defines object's string representation.
...
# Evaluating the repr recovers the Particle object.
...
def __repr__(self):
...
return "Particle({0}, {1})".format(
...
repr(self.mass), repr(self.velocity))

EXAMPLE
>>> a = Particle(3.2, 4.1)
>>> print a.momentum()
13.12
>>> a
Particle(3.2, 4.1)

99

Overloading Addition
SIMPLE PARTICLE CLASS
Classes can override many behaviors using special method names —
including numeric behavior.
...
...
...
...
...
...

def __add__(self, other)
if not isinstance(other, Particle):
return NotImplemented
mnew = self.mass + other.mass
vnew = (self.momentum() + other.momentum())/mnew
return Particle(mnew, vnew)

EXAMPLE (cont.)
>>> b = Particle(8.6, 10.2)
>>> b, a
(Particle(8.6, 10.2), Particle(3.2, 4.1))
>>> c = a + b
>>> c
Particle(11.8, 8.545762711864406)
>>> print c.momentum()
100.84

100

Inheritance
PowerPlant

NuclearPowerPlant

HydroPowerPlant

101

Base Class
class PowerPlant(object):
# Constructor method
def __init__(self, # "self" is ALWAYS the first argument.
name='', maximum_output=0.0, night_watchman=None):
# assign passed-in arguments to new object
self.name = name
self.maximum_output = maximum_output
self.night_watchman = night_watchman
# Initialize other attributes.
self.current_output = 0.0
self.status = 'normal'
# Default implementation for methods
def start_up(self):
self.current_output = self.maximum_output
def shut_down(self):
self.current_output = 0.0
...
102

Sub-Classing
# Derive 'specialized' classes from the PowerPlant base class.
# They will 'inherit' the methods of the base class.
class NuclearPowerPlant(PowerPlant):
# over-ride methods that need custom behavior.
def start_up(self):
self.start_reactor_cooling_pump()
self.reduce_boric_acid_concentration()
self.remove_control_rods()
def shut_down(self):
...

103

Using Super
# Use 'super' to call the "super-class" (parent class) methods.
class HydroPowerPlant(PowerPlant):
def __init__(self, name='', river_name='',
maximum_output=0.0, night_watchman=None):
# Use 'super' to call an over-ridden method from the
# base class.
super(HydroPowerPlant, self).__init__(name=name,
maximum_output=maximum_output,
night_watchman=night_watchman)
self.river_name = river_name
# over-ride methods that need custom behavior.
def start_up(self):
self.open_sluice_gate()
self.unlock_turbine_shaft()
self.remove_control_rods()

104

Interchangeable Classes
# The same code will work without modification for a
# NuclearPowerPlant or a HydroPowerPlant.
>>> homer = Person(first='Homer', last='Simpson')
# Create the power plant he is in charge of...
>>> plant = HydroPowerPlant(name='Comanche Peak', maximum_output=2.3,
night_watchman=homer)

# Start the plant using the 'start_up' method.
>>> plant.start_up()
# Homer does his thing.
>>> homer.eat('donut')
>>> homer.take_nap()
# Check the status (an attribute) of the plant. No Bueno.
>>> if plant.status != 'normal':
...
print 'status:', plant.status
...
homer.speak('Doh!')
...
plant.emergency_shutdown()
status: Fish stuck in impeller.
Homer says 'Doh!'

105

Advanced Python Topics

106

Properties

107

Properties: an example
# my_number.py
class MyNumberA(object):
def __init__(self, number):

The set_number method allows

self.number = number

the value to be an integer or a

def set_number(self, value):

hex or binary string (e.g. ‘0xFF’

if isinstance(value, basestring):

or ‘0b1101’).

self.number = eval(value)
else:
self.number = value

>>> a = MyNumberA(number=13)

>>> a.set_number('0xFF')

>>> a.number

>>> a.number

13

255

>>> a.number = 200

>>> a.number = '0xFFFF'

>>> a.number

# !!!

>>> a.number
‘0xFFFF’

200

108

Properties: definition
Wouldn’t this be nice?
>>> a.number = '0xFFFF'
>>> a.number
65535
>>> a.number = '0b1101'
>>> a.number
13

The number magically converts a hex string or a binary string to the integer value.
We can do this, by creating a property. A property allows us to implement something
that acts like an attribute, but attempting to get or set a value of the attribute actually results

in a function call. property() is a built-in function in Python.
property(fget=None, fset=None, fdel=None, doc=None)

109

Properties: application 1
# my_numberB.py
class MyNumberB(object):
def __init__(self, number):
self._number = number
def _get_number(self):
return self._number
def _set_number(self, value):
if isinstance(value, basestring):
number = eval(value)
else:
number = value
self._number = number
number = property(_get_number, _set_number)

>>> b = MyNumberB(number=100)

>>> b.number

>>> b.number

255

100

>>> b.number = '0b1101'

>>> b.number = '0xFF'

>>> b.number
13

110

Properties: application 2
For example, the energy and momentum methods of the Particle class can be
implemented as read-only properties:
# My_particle_class.py
class Particle(object):
…
def _get_momentum(self):
return self.mass * self.velocity

def _get_energy(self):
return 0.5 * self.mass * self.velocity**2
def _not_allowed(self, value):
raise RuntimeError("Can not set energy or momentum
directly.")

energy = property(_get_energy, _not_allowed)
momentum = property(_get_momentum, _not_allowed)

111

Properties
Then, for example:
>>> p = Particle(mass=3.0, velocity=4.0)
>>> p.momentum
12.0
>>> p.energy = 100
Traceback (most recent call last):
File "", line 1, in 

File "particle_with_properties.py", line 23, in _not_allowed
raise RuntimeError("can not set energy or momentum directly.")
RuntimeError: can not set energy or momentum directly.

112

General Scoping Rules
Variables are looked up from
namespaces in the following order:
•
•
•
•

Local Function Scope
Enclosing Function Scope
Global Scope
Builtin Scope

113

Scoping Rules
LOCAL SCOPE

MODIFYING GLOBALS

# a, b, c and d are all local
# variables in the function.
def foo(a,b):
c = 1
d = a + b + c

# Modifying globals from within
# a local scope requires the
# global keyword.

GLOBAL SCOPE
#
#
#
#

If a requested variable is not
found in the local scope, the
"global" or module level scope
is searched.

c = 1
def foo():
global c
c = 2
>>> foo()
>>> print c
2

# global module level variable
c = 1
def foo(a,b):
d = a + b + c
114

Scoping Rules
BUILTIN SCOPE
#
#
#
#

If not found in the local or
global scope, the "builtin"
scope is searched. 'len' is
a builtin function.

def list_length(a):
return len(a)

>>> a = [1,2,3]
>>> print list_length(a)
3
# The "builtin" functions are
# found in the special
# __builtin__ module
>>> import __builtin__
>>> __builtin__.len

115

Class Scoping Rules
CLASS SCOPING
# global variable
var = 0

class MyClass(object):
# class variable
var = 1
def access_class_c(self):
print 'class var:', self.var

def access_global_c(self):
print 'global var:', var

EXAMPLES
>>> obj = MyClass()
>>> obj.access_class_c()
class var: 1
>>> obj.access_global_c()
global var: 0
>>> obj.write_class_c()
class var: 2
>>> obj.write_instance_c()
instance var: 3

def write_class_c(self):
# Modify the class variable.
MyClass.var = 2
print 'class var:', self.var
def write_instance_c(self):
# Create an instance variable.
self.var = 3
print 'instance var:', self.var

See
demo/variable_scoping/class_
scoping.py.
117

Lexical Scoping Rules
NESTED SCOPE

FUNCTION CLOSURE

# Nested functions have access
# to the enclosing function’s
# variables.

# The inner() function has access
# to the outer variables that it
# uses for its entire lifetime.

def outer():
a = 1
def inner():
print "a =", a
inner()

def outer():
a = 1
def inner():
return a
return inner #Note: returns
#function object!

>>> outer()
a = 1

See
demo/variable_scoping/lexical
_scoping.py.

>>> func = outer()
>>> print 'a (1):', func()
a (1): 1

118

Partial Function Application (or Curries)
PARTIAL
#
#
#
#
#

functools.partial "freezes"
one or more arguments of a
function to create a simplified
signature and/or modify default
arguments.

# By default, int uses base 10
# conversion from string to int.
>>> int('10010')
10010
# Create function to convert
# base 2 strings to ints.
>>> from functools import partial
>>> basetwo = partial(int, base=2)
>>> basetwo('10010')
18

119

Decorators

120

Func as objects and arguments
In Python, functions are first-class objects.
Function Objects
>>> def foo(x):
...
print x
# foo is of type ‘function’.
>>> type(foo)

# Functions have a set of
# attributes, like other objects.
>>> dir(foo)
['__call__', ..., 'func_closure’,
'func_code', 'func_defaults’,
'func_dict', 'func_doc',
'func_globals’, 'func_name']
# __call__ is the most important
>>> foo(42)
42

Passing Functions
# Since functions are objects,
# like everything else in Python,
# they can be passed as arguments
# to other functions!
>>> def foo(x):
...
print x
...
>>> def bar(f, x):
...
x += 1
...
f(x)
...
>>> bar(foo, 42)
43

Note: Python has a ‘lambda’
keyword for defining anonymous
functions inline.
121

Decorators
Decorators are functions which take a function as an argument and
(usually) return another function.
Example
# A basic decorator
>>> def dec(f):
...
print "I am decorating function ", id(f)
...
return f
>>> declen = dec(len)
I am decorating function 2338768
>>> declen([10, 20, 30])
3

122

Decorators
Example
# Makes a new function that will loudly call the original.
# Note the definition of new_func inside the definition of
# loud.
>>> def loud(f):
...
def new_func(*args, **kw):
...
print "Calling with", args, kw
...
rtn = f(*args, **kw)
...
print "Return value is", rtn
...
return rtn
...
return new_func
>>> loudlen = loud(len)
# Create a modified len().
>>> n = loudlen([10, 20, 30]) # Call it.
Calling with ([10, 20, 30],) {}
Return value is 3
>>> n
3
123

Decorators
Python uses the ‘@’ symbol to replace a function with its decorated version.
@ Notation
# Given a decorator dec’, the phrasing...
def foo(x):
print x
foo = dec(foo)
# can be replaced by...
@dec
def foo(x):
print x
# If the decorator returns a function,
# decorations may be chained.
@dec1
@dec2
def foo(x):
print x

124

Decorator Examples
# A decorator that adds 1
def plus_one(f):
def new_func(x):
return f(x) + 1
return new_func
# A decorator that multiplies by 2
def times_two(f):
def new_func(x):
return f(x) * 2
return new_func
# A decorated function
@plus_one
@times_two
def foo(x):
return int(x)
>>> foo(13)
27

125

Decorator Factories
If you want to change the behavior of the decorator when you decorate, you
do this by writing a decorator factory. A factory is a function that returns a
decorator.
# A basic decorator generator
def super_dec(x, y, z):
# The decorator we are creating
def dec(f):
# The function to return
def new_func(*args, **kw):
print x + y + z
return f(*args, **kw)
return new_func
return dec

126

Decorator Factory Example
# Appends a string to the function output
def append_string(s):
def dec(f):
def new_func(*args, **kw):
new_s = f(*args, **kw) + s
return new_s
return new_func
return dec
@append_string(", World")
def hello():
return "Hello"
>>> hello()
Hello, World
@append_string(", Moon")
def goodnight():
return "Goodnight"
>>> goodnight()
Goodnight, Moon

127

Iterators, generators and
coroutines

128

Iterators
Iterators are anything that can be looped over. They shield users from
implementation details of looping over an object. Classes that have __iter__ and
next methods can be used as iterators.
CLASS DEFINTION
class FootballRushIterator(object):
def __init__(self, count=5):
self.count = count
def __iter__(self):
# Prepare an iterable object.
self._counter = 1
return self
def next(self):
cnt = self._counter
if cnt <= self.count:
# Return the next element
res = str(cnt) + " mississippi"
self._counter += 1
return res
else:
# Or signal the end of iteration.
raise StopIteration

USING ITERATORS
>>> rusher = FootballRushIterator()
>>> for count in rusher:
...
print count
1 mississippi
2 mississippi
3 mississippi
4 mississippi
5 mississippi

(WITH GENERATOR)
def get_rusher(count=5):
for i in range(count):
yield "%d mississippi" % (i+1,)
>>> for count in get_rusher():
...
print count

129

Generators
Generators are a special kind of function that contain a yield statement. These
functions are really iterator factories. The returned iterator can be looped over or
its next() method can be called directly to return elements from the sequence.
GENERATOR BASICS

USING GENERATORS

def simple_generator():
print 'first'
yield 1
print 'second'
yield 2

# Create a generator that returns the
# first N elements of the Fibonacci
# sequence.
def fib(N=10):
a, b = 0, 1
for count in range(N):
yield b
a, b = b, a+b

>>> sequence = simple_generator()
>>> sequence.next()
first
1
>>> sequence.next()
second
2
>>> sequence.next()
Traceback (most recent call last):
File "" line 1, in ?
StopIteration

>>> fib_gen = fib(6)
>>> for value in fib_gen:
print value
1
1
2
3
5
8

130

Generators --> CoRoutines
Generators can receive values as well using the return value of a yield
expression: they are called coroutines.

COROUTINES BASICS

EMITTING AND RECEIVING

def receiving_generator():
while True:
val = (yield)
print "Received", val

def emiting_receiving_generator():
i = -1
while True:
i += 1
val = (yield i)
print "Received", val

>>> sequence = receiving_generator()
# First next() call take to the first
# yield
>>> sequence.next()
>>> sequence.next()
Received None
>>> sequence.send(1)
Received 1
>>> sequence.send("hello")
Received hello
>>> sequence.close()

>>> sequence = receiving_generator()
>>> sequence.next()
0
>>> sequence.next()
Received None
1
>>> sequence.send("hello")
Received hello
2
>>> sequence.close()

131

Context managers and the with
statement

132

Introduction
•

New in python 2.5, though it’s then required to import it:
from __future__ import with_statement

•

Context managers are objects that possess the
__enter__ and __exit__ methods. They are used to
define a sequence of setup and tear-down operations
that the with statement will call automatically even if
exceptions are raised, similar to a try/finally block.

•

An example of a context manager is the file pointer
which includes f.close() in its __exit__ method.
133

Example
CONTEXT CREATION
class MyContext(object):

def __enter__(self):
print "I entered"
return self
def __exit__(self, type, value, tb):
print("Exception info: %s, %s, %s" % (type, value, tb)
print "Before exiting important things to do... ”

USING THE CONTEXT
>>> with MyContext() as c:
...

print "I am doing things..."

...

raise RuntimeError("Crash!")

I entered
I am doing things...

Exception info: ’RuntimeError’,
Crash!, 

Before exiting important things
to do...

RuntimeError: Crash!
>>> c
<__main__.MyContext at
0xa3c09b0>
134

Dynamic Code Execution
EVAL

EXEC

# The "eval" function will
# evaluate a Python expression
# within a given global and local
# namespace (dictionary).
# Both the global and local
# dictionary are optional.
# eval(expr, globals, locals)
>>> a = 1
>>> eval("a+1")
2

# The "exec" statement will
# execute a Python statement
# within a given global and
# local namespace (dictionary).
# Both the global and local
# dictionary are optional.
# exec(stmt, globals, locals)
>>> a = 1
>>> exec("b = a+1")
>>> print b
2

# Specify a local dictionary.
>>> local = dict(a=2)
>>> glob = {}
>>> eval("a+1", glob, local)
3

# Specify a local dictionary.
>>> local = dict(a=2)
>>> glob = {}
>>> exec("b=a+1", glob, local)
>>> print local
{'a': 2, 'b': 3}

Careful about eval’ing/exec’ing an untrusted user’s input: they have full
access to python.

135

exec with a dictionary-like namespace
EXEC
class LoudDict(dict):
""" This dictionary announces whenever one of its values
is get or set.
"""
def __getitem__(self, key):
value = super(LoudDict, self).__getitem__(key)
print "retrieving %s which is %s" % (key, value)

return value
def __setitem__(self, key, value):
print "setting %s = %s" % (key, value)
super(LoudDict, self).__setitem__(key, value)

# Execute a statement in our "LoudDict" and watch it announce
# variable get/set during the execution.
>>> local = LoudDict(a=1)
>>> exec "b=a+1" in {}, local
See demo/dictionary_like_exec.py
retrieving a which is 1
setting b = 2
136

Dynamic ByteCode Generation
COMPILE
# The ”compile" function will generate bytecode from a Python
# expression or statement. It can then be evaluated multiple
# times without regenerating the bytecode each time.
# Syntax: compile(str, filename, mode)
>>> a = 1
>>> c=compile("a+2", "", "eval")
>>> eval(c)
3
>>> c=compile("b=a+2", "", "exec")
>>> exec(c)
>>> b
3
>>> a = 3
>>> exec(c)
>>> b
5

Careful about eval’ing/exec’ing an untrusted user’s input: they have full
access to python.

137

csv – read and write CSV files
The csv modules provides classes for reading and writing Comma Separated
Value (CSV) files.
csv.reader - Read a list of comma-separated values from a file.
csv.writer - Output a list (or other iterable) to a file.
The format is configurable; in particular, the data delimiter does not have to be
a comma.

Default “dialect” is compatible with Excel.

138

csv examples
READER

WRITER

Sample data file “data.csv”:

>>> data = [("One", 1, 1.5),
("Two", 2, 8.0)]

"alpha 1", 100, -1.443
"beta 3", 12, -0.0934
"gamma 3a", 192, -0.6621
"delta 2a", 15, -4.515
>>> rdr = csv.reader(
open("data.csv"))
>>> for row in rdr:

print row

['alpha 1', ' 100', ' -1.443']
['beta

3', '

>>> f = open("out.csv", "w")
>>> wrtr = csv.writer(f)
>>> wrtr.writerows(data)

>>> import csv

...

>>> import csv

>>> f.close()
Output file “out.csv”:
One,1,1.5

Two,2,8.0

12', ' -0.0934']

['gamma 3a', ' 192', ' -0.6621']
['delta 2a', '

15', ' -4.515']

139

csv options
OPTIONS (not a complete list)
delimiter: str (a single character)

The field separator character
doublequote: bool
If True, a quote inside a string is doubled.
If False, prefix the quote with escapechar.

escapechar: str (a single character)

EXAMPLE
>>> data
[('One', 1, 1.5), ('Two', 2, 8.0)]
>>> f = open("out.txt", "w")
>>> wrtr = csv.writer(f,
delimiter='|',
quoting=csv.QUOTE_ALL)

The character that indicates the following

>>> wrtr.writerows(data)

character has no special meaning.

>>> f.close()

quotechar: str (a single character)
Character used to quote strings containing

Output file “out.csv”:

delimiters or other special characters.
quoting: one of csv.QUOTE_* constants
Controls when quotes should be generated

"One"|"1"|"1.5"

"Two"|"2"|"8.0"

by the writer and recognized by the reader.
140

Connecting to Databases

141

DB-API 2.0
PEP (Python Enhancement Proposal) 249 detailed version
2.0 of the DB-API to promote consistency between
Python front-ends to data-bases.
Database

Module or Modules for Python 2.X

Oracle

cx_Oracle, DCOracle2, mxODBC, pyodbc

PostgreSQL

psycopg2, PyGreSQL, pyPgSQL, mxODBC, pyodbc, pg8000

MySQL

MySQLdb, mxODBC, pyodbc, myconnpy

Sqlite

sqlite3 (included in standard library)

Microsoft SQL Server

adodbapi, pymssql, mssql, mxODBC, pyodbc

Ingres

ingresdbi

IBM DB2

Ibm_db, PyDB2, ceODBC, mxODBC, pyodbc

Sybase ASE

Sybase, mxODBC

Sybase SQL Anywhere

mxODBC, sqlanydb

SAP DB

sdb.dbapi, sapdbapi, mxODBC, sdb.sql, sapdb

Informix

InformixDB, mxODBC,

Firebird

KInterbasdb

142

DB-API 2.0
Connect to the database

Execute SQL Statements (cont.)

import  as db
# Connect to the data-source
# Extra arguments typically
# include user and password
conn = db.connect(,…)

t = ('2006-04-06', 'SELL',
'IBM', 500, 53.00)
c.execute(“insert into stocks values
(?,?,?,?,?)”, t)
# some modules use a different
# place-holder instead of ?
# see db.paramstyle

Get a cursor object

t = (‘IBM’,)
c.execute(“select

c = conn.cursor()

Execute SQL statements
c.execute(“create

table stocks(date
text, trans text, symbol text, qty real,
price real)”)

c.execute(“insert

into stocks values
('2006-01-05','BUY','RHAT',100,35.14)”)

conn.commit()

* from stocks where
symbol=? order by price”, t)

for row in c:
print row
single = c.fetchone()
list_of_lists = c.fetchall()

Close cursor and connection
c.close()
conn.close()
143

An ORM: SQLAlchemy

144

Regular Expressions
in Python

146

What is a regular expression?
A regular expression is a pattern designed to match
specific strings or sub-strings.
A regular expression can match very specific strings, as
well as very general strings.
Once matched, specific fields in the string can be extracted
or substituted. Information about the location, and type
of match is stored in a match object.

147

>>> import re
Some of the more common uses of the re API
>>> re.match(pattern, string[, flags])
>>> re.search(pattern, string[, flags])
Matches (or search) the pattern against the string, returning a match object on
success and None on failure (flags described later). re.match only searches for a
match at the beginning of the string. re.search will search through the entire string.
>>> re.findall(pattern, string)
Returns a list of matches of pattern in string. re.finditer returns an iterator.
>>> re.split(pattern, string[, maxsplit])
Returns a list of sub-strings, delimited by the regular expression pattern. If maxsplit
is specified only maxsplit splits will occur.

>>> re.sub(pattern, repl, string[, count])
Returns a string where every occurrence matching the regular expression pattern in
string is replaced by repl, unless count is reached (specifies how many
substitutions to make).
>>> re.compile(pattern)
Compiles the regex and returns a pattern object which can be used for searching,
matching, substituting, …. match, search, findall … are then methods of the object.
148

Regular expression syntax
A regex pattern is composed of regular characters as well as various symbols for
matching patterns in strings...below are some of the common symbols.
. Matches any character except newline
\w Matches any alphanumeric character
\d

Matches any decimal digit, equal to [0-9]

\s

Matches whitespace, equal to [ \t\n\r\f\v]

? Matches zero or 1 occurrences of the
previous RE
{m} Matches m occurrences of the previous
RE

[…] Indicates a set of characters which can
be matched. Ranges can be specified
using - (e.g. a-z, 0-9).
(...) Groups characters/symbols into 1 block
| Stands for logical OR

\D \S and \W denote the
complement of the same sets

^ Takes the complement of the following set
* Causes the previous RE to match zero
or more repetitions
+ Matches 1 or more repetitions of the
previous RE

For e.g., ca*t matches ‘ct’, ‘cat’, ‘caaaat’, …
(ab\d)|(ac\d) matches ‘ab1’, ‘ac9’, …
([^a-q]bd) matches ‘rbd’, ‘5bd’, …
149

Match objects & groups
When a regular expression matches, a match object is returned.
>>> myString = "hello world"
>>> print re.match("hello (\w+)", myString)
<_sre.SRE_Match object at 0x009E45E0>

Check if a match was successful, extract matched groups, location, …
>>> myString = "hello there"
>>> match = re.match("hello (\w+)", myString)
>>> if match != None:
...
print match.group(1)
'there'
>>> print match.start, match.end
6 11

Groups are numbered left to right, based on open parenthesis...
>>> re.match("hello (\w+(\d+)*) (\w+)", myString)
Nested groups
count too!

Group 1
Group 2

Group 3
150

Regular expression example
compile() return a re pattern object
>>> patt1 = re.compile("hello\s+user\s+number\s+(\d+)(\.\d+)*")
Match one or
more space
characters

First group will
match 1 or
more digits

2nd group will match an
optional dot then 1 or more
digits. Optional because of *,
dot is escaped with \ since it
is a special symbol

>>> patt1 = re.compile("hello\s+user\s+number\s+(\d+)(\.\d+)*")
>>> s1 = "hello
user number
87"
>>> s2 = "hello user number 87.34“
>>> m = patt1.match(s1)
>>> print m.group(1)
'87'
>>> m = patt1.match(s2)
>>> print m.groups()
('87', '.34')
151

Exercise

Parse the following text to extract the
number and the type of all travellers:
"5 men, 7 dogs and 12 goats are
travelling together."

152

A note about parsing XHTML

Source: http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags

153

NumPy’s fromregex
NumPy contains a function fromregex that can be used to convert arbitrary
text files to (structured) arrays.

data = fromregex(file, pattern, dtype)
file:
file-like object or filename string.
pattern: Regular expression to parse file with.
Groups must correspond to fields in the
dtype.
dtype: data-type for the structured array with
field for each group in the regular
expression.
Output is a structured array where each element is a
match to the regular expression in the file.

154

fromregex example
# Create the file
>>> f = open('test.dat', 'w')
>>> f.write("1312 foo\n1534 bar\n444
>>> f.close()

qux")

# match [digits, whitespaces, anything]
>>> regexp = "(\d+)\s+(...)"
>>> output = fromregex('test.dat', regexp,
...
[('num’,np.int64),('key','S3')])
>>> output
array([(1312L, 'foo'), (1534L, 'bar'), (444L, 'qux')],
dtype=[('num', '>> output['num']
array([1312, 1534, 444], dtype=int64)
155

datetime – Dates and Times
>>> from datetime import date, time
DATE OBJECT

TIME OBJECT

# date(year, month, day)
# date in Gregorian calendar,
# assuming its permanence
>>> d1 = date(2007, 9, 25)
>>> d2 = date(2008, 9, 25)
>>> d1.strftime('%A %m/%d/%y')
'Tuesday 09/25/07'

# time(hour, min, sec, us)
# local time of day
# always 24 hrs per day
>>> t1 = time(15, 38)
>>> t2 = time(18)
>>> t1.strftime('%I:%M %p')
'03:38 PM'

# difference is timedelta
>>> print d2 – d1
366 days, 0:00:00
>>> (d2-d1).days
366
>>> print date.today()
2008-09-24

# difference is not supported
>>> print t2 – t1
Traceback ...
TypeError: unsupported operand ...
# use datetime objects for
# difference operation.

156

datetime – Dates and Times
>>> from datetime import datetime, timedelta
DATETIME OBJECT
# datetime(year, month, day,
hr, min, sec, us)
# combination of date and time
>>> d1 = datetime.now()
>>> print d1
2008-09-24 14:20:30.978207
>>> d2 = d1 + timedelta(30)
>>> d2.strftime('%A %m/%d/%y')
'Friday 10/24/08'
# creating datetime from
# a format string
>>> datetime.strptime('2/10/01',
'%m/%d/%y')
datetime.datetime(2001, 2, 10, 0, 0)

DATETIME FORMAT STRING
Directive
%a (%A)

Meaning
Abbrev. (full) weekday
name.

%w

Weekday number [0(Sun),6]

%b (%B)

Abbrev. (full) month name

%d
%H (%I)

Day of month [01,31]
Hour [00,23] ([01,12])

%j

Day of the year [001,366]

%m

Month [01,12]

%M

Minute [0,59]

%p

AM or PM

%S

Second [00,61]

%U (%W)

Week number of the year
[00,53] Sunday (Monday) as
first day of week.

%y (%Y)

Year without (with)
century [00,99]

157

Context managers and the with
statement

158

Introduction
•

New in python 2.5, though it’s then required to import it:
from __future__ import with_statement

•

Context managers are objects that possess the
__enter__ and __exit__ methods. They are used to
define a sequence of setup and tear-down operations
that the with statement will call automatically even if
exceptions are raised, similar to a try/finally block.

•

An example of a context manager is the file pointer
which includes f.close() in its __exit__ method.
159

Example
CONTEXT CREATION
class MyContext(object):

def __enter__(self):
print "I entered"
return self
def __exit__(self, type, value, tb):
print("Exception info: %s, %s, %s" % (type, value, tb)
print "Before exiting important things to do... ”

USING THE CONTEXT
Before exiting important things
to do...

>>> with MyContext() as c:
...

print "I am doing things..."

...

raise RuntimeError("Crash!")

RuntimeError: Crash!
>>> c

I entered

<__main__.MyContext at
0xa3c09b0>

I am doing things...

Exception info: ’RuntimeError’,
Crash!, 

160

sys module
>>> import sys
Some frequently used attributes and functions—see the
reference manual for complete details.
Command Line Arguments

Exception Information

sys.argv

sys.exc_info()

List of command line arguments.

Returns a tuple (type, value, traceback)

sys.argv[0] is the name of the python script.

sys.exc_clear()
Clear all exception information.

Example:
# File: print_args.py
import sys
print sys.argv
$ python print_args.py 1 foo
['print_args.py', '1', 'foo']

>>> try:
...
x = 1/0
... except Exception:
...
print sys.exc_info()
...
(,
ZeroDivisionError('integer division or
modulo by zero',), )
>>>

161

sys module
Standard File Objects

Exit

sys.stdin
sys.stdout
sys.stderr

sys.exit(arg)

The interpreter’s standard input, output and
error streams.
sys.__stdin__
sys.__stdout__
sys.__stderr__
The original values of sys.stdin, sys.stdout
and sys.stderr at the start of the program.

Exit from Python. arg is optional. It can be an
integer giving the exit status (defaults to zero).
If not an integer, None is equivalent to passing
0, and any other argument is printed to
sys.stderr and the exit status is 1.

Python’s module search path
sys.path
A list of strings that specifies the interpreter’s
search path for modules.

A program is free to modify this list
dynamically.

162

sys module
Platform Information
sys.platform
A string containing the platform identifier.
Windows: ‘win32’
Mac OSX: ‘darwin’
Linux: ‘linux2’
sys.getwindowsversion()
Return a tuple that describes the version of
Windows currently running: major, minor,
build, platform, and service_pack. (More is
included in Python 2.7.)
>>> sys.platform
‘win32’
>>> sys.getwindowsversion()
(5, 1, 2600, 2, ‘Service Pack 3’)

Python Version
sys.version
A string containing information
about the Python version.
sys.version_info
A tuple containing information about
the Python version: major, minor,
micro, releaselevel and serial.
>>> sys.version
'2.6.5 |EPD 6.2-2 (32-bit)|
(r265:79063, May 7 2010, 13:28:19)
[MSC v.1500 32 bit (Intel)]'
>>> sys.version_info
(2, 6, 5, 'final', 0)

See also the platform module in the
standard library.

163

os module
>>> import os
Path Operations

Separation Constants

os.remove(path) os.unlink(path)

os.linesep (e.g. '\n' or '\r\n')

Remove a file from disk (file can be

Line separator in text mode.

either the full path or a file from the

os.sep (e.g. '/' or '\')

current working directory will be

Path separator on file system.

removed.

os.pathsep (e.g. ':' or ';')

os.chdir(path)

Search path separator (i.e. in
environment variables).

Change the current working directory to
the provided path.

os.getcwd()
Return the current working directory.

os.listdir(path)
Return a list of strings containing all the
files in the given path (does not include '.‘
or '..' in the listing).

Others
os.environ
Dictionary of all environment variables

os.urandom(len)
String of random bytes

os.error
Error object

164

os.path
os.path – tests

Others

os.path.isfile(path)

os.path.abspath(path)

Test whether a path is a regular file.

Return an absolute path.

os.path.isdir(path)

os.path.dirname(path)

Test whether a path refers to an existing
directory.

Return the directory component of a
pathname.

os.path.exists(path)
Test whether a path exists.

os.path.isabs(path)
Test whether a path is absolute.

os.path – split and join
os.path.split(path)
Split a pathname. Returns the tuple
(head, tail).

os.path.basename(path)
Return the final component of a
pathname.

os.path.splitext(path)
Split the extension from a pathname.
Returns (root, ext).

os.path.expanduser(path)
Expand ~ and ~user. If user or $HOME is
unknown, do nothing.

os.path.join(a, *p)
Join two or more path components.

165

Remote Call example — xmlrpc
XMLRPC FACTORIAL SERVER – VERSION 1
from SimpleXMLRPCServer import SimpleXMLRPCServer
def fact(n):
if n <= 1:
return 1
else:
return n * fact(n-1)
# Start a server listening for requests on port 8001.
if __name__ == '__main__':
server = SimpleXMLRPCServer((‘localhost', 8001))
# Register fact as a function provided by the server.
server.register_function(fact)
server.serve_forever()

CLIENT CODE — CALLS REMOTE FACTORIAL FUNCTION
>>> import xmlrpclib
>>> svr = xmlrpclib.Server("http://localhost:8001")
>>> svr.fact(10)
3628800

166

Remote Call example — xmlrpc
XMLRPC FACTORIAL SERVER – VERSION 2
from SimpleXMLRPCServer import SimpleXMLRPCRequestHandler, \
SimpleXMLRPCServer
def fact(n):
if n <= 1:
return 1
else:
return n * fact(n-1)
# Override the _dispatch() handler to call the requested function.
class my_handler(SimpleXMLRPCRequestHandler):
def _dispatch(self, method, params):
print "CALL", method, params
return apply(eval(method), params)
# Start a server listening for requests on port 8001.
if __name__ == '__main__':
server = SimpleXMLRPCServer((‘localhost', 8001), my_handler)
server.serve_forever()

CLIENT CODE — CALLS REMOTE FACTORIAL FUNCTION
>>> import xmlrpclib
>>> svr = xmlrpclib.Server("http://localhost:8001")
>>> svr.fact(10)
3628800

167

Calling Subprocesses

168

Executing other programs
>>> from subprocess import Popen, PIPE, STDOUT
p = Popen([cmd,arg1,…,argn], env=None, cwd=None, shell=False,
stdin=None, stdout=None, stderr=None)
cmd,arg1,…,argn

Command to run plus any arguments

stdin, stderr, stdout

Standard file handles: can be None (use
handles of the current process), any open file
descriptor, any open file object, or PIPE.
(stderr can also be STDOUT for a redirect).

env

Dictionary of environment variables. If None,
then the current process environment is used.

shell
cwd

Run the cmd in the shell if True

Directory to change to before executing the
command.

Command begins to execute immediately!
169

Executing other programs
SIMPLE COMMAND EXECUTION
# On Windows
>>> Popen('dir', shell=True, cwd="c:\\")
Volume in drive C has no label.
Volume Serial Number is A080-3756
Directory of c:\
08/30/2007
08/30/2007
…
01/24/2009
01/24/2009

…
…

AUTOEXEC.BAT
CONFIG.SYS

…
…

Python25
WINDOWS
2 File(s)
7 Dir(s)

0 bytes
1,401,319,424 bytes free

# On Unix (no shell needed)
>>> fid = open('simple.txt','w')
>>> Popen('ls', cwd='/usr', stdout=fid)
>>> fid.close()

simple.txt

bin
games
include
lib
lib32
lib64
local
sbin
share
src
X11R6
170

Executing other programs
WAIT FOR TERMINATION

POLL PERIODICALLY

from subprocess import Popen
print "Starting..."
# sleep for 10 seconds
p = Popen(['sleep','10'])
# wait to finish
# and get return code
ret = p.wait()
# return code also stored
# in subprocess object
# after completion
print "Done", p.returncode

from subprocess import Popen
import sys
print "Starting..."
# sleep for 10 seconds
p = Popen(['sleep','10'])
# continue on with processing
print "Continuing",
k=0
while p.poll() is None:
k += 1
if (k % 100000) == 0:
print k//100000,
sys.stdout.flush()
print
print "Done", p.returncode

Starting
<10 seconds later>
Done 0

Starting
Continuing 1 2 3 … 45
Done 0
171

Executing other programs
PIPES
"Pipes" are used to send and/or receive data from another process on the same
machine.
# Setup process which will receive
# stdin through a PIPE and send
# stdout to a PIPE
>>> p = Popen(['python', 'count.py',
'c'], stdin=PIPE, stdout=PIPE,
stderr=PIPE)
>>> instr = "gatccatgca"
>>> out, err = p.communicate(instr)
>>> print out
3

count.py
import sys
letter = sys.argv[1]
data = sys.stdin.read()
print data.count(letter)

SHLEX TO CREATE COMPLEX COMMAND
>>> shlex.split('python count.py c’)
['python', 'count.py', 'c']

172

Multiprocessing

173

What is Multiprocessing?
The multiprocessing module provides access to task-based parallel
computing using separate Python processes.
Multiprocessing has several features:
• part of Python’s standard library,
• cross-platform,
• low-level parallel constructs: locks, queues, pipes, semaphores, shared memory
access
• higher-level constructs: task pools with automatic scheduling, shared memory
object wrappers, functional and iterator-like design support, etc.
The multiprocessing module is useful for:
• batch computing, embarrassingly parallel computations
• parallel computing with minimal shared state
• GUI programming: do calculations in a background process
• many, many others.
http://docs.python.org/library/multiprocessing.html
174

Multiprocessing vs. Threading
The threading module provides thread-based parallelism to Python; in CPython,
threads are limited by the Global Interpreter Lock.
Only one thread may hold the lock at any point in time, holding the lock allows
that thread to execute Python instructions.
Multiprocessing provides a way around this performance limitation.
The multiprocessing module has a nearly identical API to the threading module;
this is intentional.

Learn multiprocessing, you will learn majority of threading.
Primary differences:
• Threads share the same address space as the spawning thread, whereas
processes have separate address spaces.

• often more communication is required with processes; shared memory
can mitigate.
• Spawning new processes is generally more expensive (memory, time) than
spawning new threads.
• Trivial API differences.

175

The Process Class
Process(target, name=None, args=(), kwargs={})
target: a callable object, called when the start() method is called

name: string, name to give to the process; default given if None
args: tuple, arguments to pass to the target when called
kwargs: dict, keyword args to pass to the target
The basic methods for Process objects are:

start(): start the process’ activity
join([timeout]): block the calling process until the process finishes, or until
timeout
is_alive(): True if process is still running, false otherwise
terminate(): Terminate the process; may cause deadlocks, synchronization
errors, orphaned processes; use only when necessary.

176

The Process Class
CREATING A PROCESS
import multiprocessing as mp
from time import sleep
from random import random
def worker(num):
sleep(2.0 * random())
name = mp.current_process().name
print "worker {},name:{}".format(num, name)

OUTPUT
$ python basics.py
Master name: MainProcess
worker 1, name: Process-2
worker 0, name: Process-1

if __name__ == '__main__’:
master = mp.current_process().name
print "Master name: {}".format(master)
for i in range(2):
p = mp.Process(target=worker, args=(i,))
p.start()
# Close all child processes spawn
[p.join() for p in mp.active_children()]

177

Communication Between Processes
Multiprocessing provides a Queue class and a Pipe class for communicating
between processes. Queue and Pipe objects are thread and process safe.
Queue – Communicate in one direction; put() and get() methods.
Pipe – two-ended Queue, each end has a send() and recv() method.

QUEUE BASICS

OUTPUT

def worker(q):
v = q.get() # blocking!
print "got '{}' from parent".format(v)

$ python queue_basic.py
putting 'parent' on queue
got 'parent' from parent

if __name__ == '__main__':
q = mp.Queue()
p = mp.Process(target=worker, args=(q,))
p.start() # blocks at q.get()
v = 'parent'
print "putting '{}' on queue".format(v)
q.put(v)
p.join()

178

Communication Between Processes
PIPE BASICS
import multiprocessing as mp
def worker(p):
msg = 'whiskey tango fox trot'
print "sending {!r} to parent".format(msg)
p.send(msg)
v = p.recv()
print ”got {!r} from parent".format(v)
if __name__ == '__main__':
# The two ends of the pipe: the parent and the child connections
p_conn, c_conn = mp.Pipe()
p = mp.Process(target=worker, args=(c_conn,))
p.start()
msg = 'foobar'
print ”got {!r} from child".format(p_conn.recv())
print "sending {!r} to child".format(msg)
p_conn.send(msg)
p.join()

179

Synchronization
Lock() objects provide a low-level way to guarantee that code is executed by only one
process at a time.

LOCK BASICS

USING THE WITH STATEMENT

import multiprocessing as mp

import multiprocessing as mp

def print_lock(lk, i):
name = mp.current_process().name
lk.acquire()
for j in range(100):
print i, "from process", name
lk.release()

def print_lock(lk, i):
name = mp.current_process().name
with lk:
for j in range(100):
print i, "from process", name

if __name__ == '__main__':
lk = mp.Lock()
ps = [mp.Process(target=print_lock,
args=(lk,i))
for i in range(100)]
[p.start() for p in ps]
[p.join() for p in ps]

if __name__ == '__main__':
lk = mp.Lock()
ps = [mp.Process(target=print_lock,
args=(lk,i))
for i in range(100)]
[p.start() for p in ps]
[p.join() for p in ps]
180

Synchronization
Event() objects provide a way to notify other processes that something
happened.

EVENT USAGE
import multiprocessing as mp
def wait_on_event(e):
name = mp.current_process().name
e.wait()
print name, "finished waiting"
if __name__ == '__main__':
e = mp.Event()
ps = [mp.Process(target=wait_on_event, args=(e,))
for i in range(10)]
[p.start() for p in ps]
print "e.is_set()", e.is_set()
raw_input("press any key to set event")
e.set()
181

Task Pools
Pool() objects provide a convenient way to perform embarrassingly parallel
computations with automatic load balancing.

Pool(processes=None, initializer=None, initargs=())
processes – int, number of processes in pool, cpu_count() by default
initializer – callable, if not None, initializer(*initargs) is called by each
process when starting.
Methods:
map(func, iterable, chunksize=None) – parallel equivalent of the map()
builtin function. Blocks until result is ready and returns a list.
map_async(func, iterable, chunksize=None, callback=None)
Same as map(), nonblocking. Returns a result object.
imap(func, iterable, chunksize=None)
A version of map that returns an iterator instead of a list.
182

Task Pools
BASIC POOL USAGE
import multiprocessing as mp
def is_prime(num):
# determine whether num is prime
return (num, isp)
if __name__ == '__main__':
# create a pool with cpu_count() procs.
pl = mp.Pool()
results = pl.map(is_prime, range(50, 100))
for num, isp in results:
if isp: print num, “is prime”

OUTPUT
$ python fermat_prime.py
53 is prime
59 is prime
61 is prime
67 is prime
...

Scaling computed for 106 integers while
varying pool size. Max speedup is ~4.5X
on 8 cores. This problem has high
communication overhead.
See demo_pool.py for the code.

183

Shared Memory
Multiprocessing provides the Value and Array classes for shared memory
programming; there is also the multiprocessing.sharedctypes module to create
ctypes objects and arrays in shared memory.
Passing a shared memory object to the args of a Process() object will pass a proxy
to the target; the actual memory location is in shared memory (via mmap). Any
modifications to the proxy object will be seen by all processes.
This allows parallel array computations without much communication
overhead.
Shared Memory
a=Array()

def worker1(a):
# modify a

b=Value()

def worker2(a):
# can see worker1’s changes

184

Multiprocessing + NumPy
NumPy arrays can be made to view multiprocessing shared memory objects.

SHARED MEMORY VIEW

OUTPUT

import multiprocessing as mp
from multiprocessing import sharedctypes
from numpy import ctypeslib

$ python shared_numpy.py
[[1 1]
[2 2]]

def fill_arr(arr_view, i):
arr_view.fill(i)
if __name__ == '__main__':
ra = sharedctypes.RawArray('i', 4)
arr = ctypeslib.as_array(ra)
arr.shape = (2, 2)
p1 = mp.Process(target=fill_arr,
args=(arr[:1, :], 1))
p2 = mp.Process(target=fill_arr,
args=(arr[1:, :], 2))
p1.start(); p2.start()
p1.join(); p2.join()
print arr

185

Multiprocessing and others
The multiprocessing module is typically used for non-numerical calculations, when
Python’s role as a glue between several concurrent processes can be used.
For numerical calculations, numerically-optimized third-party libraries such as PyMPI,
MPI4Py, PyOpenCL, CLyther, PyCuda, etc. are typically used.
Multiprocessing is useful when one needs to add concurrency to an application
without requiring external dependencies.

186

NumPy

187

NumPy and SciPy
SciPy [ Scientific Algorithms ]
linalg

stats

interpolate

cluster

special

spatial

io

fftpack

odr

ndimage

sparse

integrate

signal

optimize

weave

NumPy [ Data Structure Core ]
fft
NDArray
multi-dimensional
array object

random

linalg

UFunc
fast array
math operations

188

NumPy
• Website: http://numpy.scipy.org/
•
•
•
•
•
•
•

Offers Matlab-ish capabilities within Python
NumPy replaces Numeric and Numarray
Developed by Travis Oliphant
>100 “committers” to the project (github.com)
NumPy 1.0 released October, 2006
NumPy 1.7.1 released April, 2013
~45K downloads/month from SourceForge

This does not count:
• Linux distributions that include NumPy
• Enthought distributions that include NumPy
189

Helpful Sites
SCIPY DOCUMENTATION PAGE
http://docs.scipy.org/doc

NUMPY EXAMPLES
http://www.scipy.org/Numpy_Example_List_With_Doc

190

Getting Started
IMPORT NUMPY
In [1]: from numpy import *
In [2]: __version__
Out[2]: 1.6.0
or

In [1]: from numpy import \
array, ...

Often at the command line, it is
handy to import everything from
NumPy into the command shell.
However, if you are writing scripts,
it is easier for others to read and
debug in the future if you use
explicit imports.

USING IPYTHON -PYLAB
C:\> ipython –-pylab
In [1]: array([1,2,3])
Out[1]: array([1, 2, 3])

IPython has a ‘pylab’ mode where
it imports all of NumPy, Matplotlib,
and SciPy into the namespace for
you as a convenience. It also enables
threading for showing plots.

While IPython is used for all the
demos, ‘>>>’ is used on future
slides instead of ‘In [1]:’ to
save space.

191

Array Operations
SIMPLE ARRAY MATH
>>> a = array([1,2,3,4])
>>> b = array([2,3,4,5])
>>> a + b
array([3, 5, 7, 9])
>>> a * b
array([ 2, 6, 12, 20])
>>> a ** b
array([ 1, 8, 81, 1024])

NumPy defines these constants:

pi = 3.14159265359
e = 2.71828182846

MATH FUNCTIONS
# create array from 0 to 10
>>> x = arange(11.)
# multiply entire array by
# scalar value
>>> c = (2*pi)/10.
>>> c
0.62831853071795862
>>> c*x
array([ 0.,0.628,…,6.283])
# in-place operations
>>> x *= c
>>> x
array([ 0.,0.628,…,6.283])
# apply functions to array
>>> y = sin(x)

192

Plotting Arrays
MATPLOTLIB
>>> plot(x,y)

CHACO SHELL
>>> from chaco import shell
>>> shell.plot(x,y)

193

Matplotlib Basics
(an interlude)

194

http://matplotlib.org/

195

Line Plots
PLOT AGAINST INDICES

MULTIPLE DATA SETS

>>> x = linspace(0,2*pi,50)
>>> plot(sin(x))

>>> plot(x, sin(x),
...
x, sin(2*x))

196

Matplotlib Menu Bar
Zoom
tool

Back to
original
view

Save to file
(png, svg,
eps, pdf, …)

Previous/
next views

Pan
tool

Plot size
parameter
within
window
197

Line Plots
LINE FORMATTING

MULTIPLE PLOT GROUPS

# red, dot-dash, triangles
>>> plot(x, sin(x), 'r-^')

>>> plot(x, sin(x), 'b-o',
...
x, sin(2*x), 'r-^')

198

Scatter Plots
SIMPLE SCATTER PLOT

COLORMAPPED SCATTER

>>> x = linspace(0,2*pi,50)
>>> y = sin(x)
>>> scatter(x, y)

# marker size/color set with data
>>> x = rand(200)
>>> y = rand(200)
>>> size = rand(200)*30
>>> color = rand(200)
>>> scatter(x, y, size, color)
>>> colorbar()

199

Multiple Figures
>>> t = linspace(0,2*pi,50)
>>> x = sin(t)
>>> y = cos(t)

# Now create a figure
>>> figure()
>>> plot(x)
# Now create a new figure.
>>> figure()
>>> plot(y)

200

Multiple Plots Using subplot
>>> x = array([1,2,3,2,1])
>>> y = array([1,3,2,3,1])
# To divide the plotting area
columns
>>> subplot(2, 1, 1)
>>> plot(x)
rows
active plot

Plot 1

# Now activate a new plot
# area.
>>> subplot(2, 1, 2)
>>> plot(y)

Plot 2

If this is used in a python script, a call
to the function show() is required.

201

Adding Lines to a Plot
MULTIPLE PLOTS

ERASING OLD PLOTS

# By default, previous lines
# are "held" on a plot.
>>> plot(sin(x))
>>> plot(cos(x))

# Set hold(False) to erase
# old lines
>>> plot(sin(x))
>>> hold(False)
>>> plot(cos(x))

202

Legend
LEGEND LABELS WITH PLOT

LABELING WITH LEGEND

# Add labels in plot command.
>>> plot(sin(x), label='sin')
>>> plot(cos(x), label='cos')
>>> legend()

# Or as a list in legend().
>>> plot(sin(x))
>>> plot(cos(x))
>>> legend(['sin', 'cos'])

203

Titles and Grid
TITLES AND AXIS LABELS

PLOT GRID

>>> plot(x, sin(x))
>>> xlabel('radians')
# Keywords set text properties.
>>> ylabel('amplitude',
...
fontsize='large')
>>> title('Sin(x)')

# Display gridlines in plot
>>> grid()

204

Clearing and Closing Plots
CLEARING A FIGURE

CLOSING PLOT WINDOWS

>>> plot(x, sin(x))
# clf will clear the current
# plot (figure).
>>> clf()

# close() will close the
# currently active plot window.
>>> close()
# close('all') closes all the
# plot windows.
>>> close('all')

205

Image Display
# Get the Lena image from scipy.
>>> from scipy.misc import lena
>>> img = lena()
# Display image with the jet
# colormap, and setting
# x and y extents of the plot.
>>> imshow(img,
...
extent=[-25,25,-25,25],
...
cmap = cm.bone)

# Add a colorbar to the display.
>>> colorbar()

206

Plotting from Scripts
INTERACTIVE MODE

NON-INTERACTIVE MODE

# In IPython, plots show up
# as soon as a plot command
# is called.
>>> figure()
>>> plot(sin(x))
>>> figure()
>>> plot(cos(x))

# script.py
# In a script, you must call
# the show() command to display
# plots. Call it at the end of
# all your plot commands for
# best performance.
figure()
plot(sin(x))
figure()
plot(cos(x))

# Plots will not appear until
# this command is issued.
show()

207

Histograms
HISTOGRAM

HISTOGRAM 2

# plot histogram
# defaults to 10 bins
>>> hist(randn(1000))

# change the number of bins
>>> hist(randn(1000), 30)

208

Plots with error bars
ERRORBAR
# Assume points are known
# with errors in both axis
>>> x = linspace(0,2*pi,10)
>>> y = sin(x)
>>> yerr = rand(10)
>>> xerr = rand(10)/3
>>> errorbar(x, y, yerr, xerr)

209

Plots with error bars II
BOXPLOT
# Assume 4 experiments have measured a
# quantity with a certain medians and
# std deviations.
# Data
>>> from numpy.random import normal
>>> exp = [normal(0, 1+i/5, size=(500,))
for i in np.arange(4.)]

>>> medians=[np.median(e) for e in exp]
>>> stds = [np.std(e) for e in exp]
# Plotting
>>> conf_intervals = [
(med - std, med + std) for
med, std in zip(medians, stds)]
>>> pos = np.arange(len(exp)))+1
>>> boxplot(exp, sym='k+',
positions=positions,
usermedians=medians,
conf_intervals=conf_intervals)

210

3D Plots with Matplotlib
SURFACE PLOT
>>> from mpl_toolkits.mplot3d import
Axes3D
>>> x, y = mgrid[-5:5:35j, 0:10:35j]
>>> z = x*sin(x)*cos(0.25*y)
>>> fig = figure()
>>> ax = fig.gca(projection='3d')
>>> ax.plot_surface(x, y, z,
...
rstride=1, cstride=1,
...
cmap=cm.jet)
>>> xlabel('x'); ylabel('y')

PARAMETRIC CURVE
>>> from mpl_toolkits.mplot3d import
Axes3D
>>> t = linspace(0, 30, 1000)
>>> x, y, z = [t*cos(t), t*sin(t), t]
>>> fig = figure()
>>> ax = fig.gca(projection='3d')
>>> ax.plot(x, y, z)
>>> xlabel(‘x’)
>>> ylabel(‘y’)

211

Surface Plots with mlab
# Create 2D array where values
# are radial distance from
# the center of array.
>>> from numpy import mgrid
>>> from scipy import special
>>> x,y = mgrid[-25:25:100j,
...
-25:25:100j]
>>> r = sqrt(x**2+y**2)
# Calculate Bessel function of
# each point in array and scale.
>>> s = special.j0(r)*25
# Display surface plot.
>>> from mayavi import mlab
>>> mlab.surf(x,y,s)
>>> mlab.scalarbar()
>>> mlab.axes()
212

More Details
• Simple examples with increasing difficulty:
http://matplotlib.org/examples/index.html
• Gallery (huge): http://matplotlib.org/gallery.html

213

Continuing NumPy…

214

Introducing NumPy Arrays
SIMPLE ARRAY CREATION
>>> a = array([0,1,2,3])
>>> a
array([0, 1, 2, 3])
CHECKING THE TYPE
>>> type(a)
numpy.ndarray
NUMERIC ‘TYPE’ OF ELEMENTS
>>> a.dtype
dtype('int32')
BYTES PER ELEMENT

>>> a.itemsize
4

ARRAY SHAPE
# Shape returns a tuple
# listing the length of the
# array along each dimension.
>>> a.shape
(4,)
>>> shape(a)
(4,)
ARRAY SIZE
# Size reports the entire
# number of elements in an
# array.
>>> a.size
4
>>> size(a)
4
215

Introducing NumPy Arrays
BYTES OF MEMORY USED
# Return the number of bytes
# used by the data portion of
# the array.
>>> a.nbytes
16
NUMBER OF DIMENSIONS

>>> a.ndim
1

216

Setting Array Elements
ARRAY INDEXING
>>> a[0]
0
>>> a[0] = 10
>>> a
array([10, 1, 2, 3])
FILL
# set all values in an array
>>> a.fill(0)
>>> a
array([0, 0, 0, 0])
# this also works, but may
# be slower
>>> a[:] = 1
>>> a
array([1, 1, 1, 1])

BEWARE OF TYPE
COERCION

>>> a.dtype
dtype('int32')
# assigning a float into
# an int32 array truncates
# the decimal part
>>> a[0] = 10.6
>>> a
array([10, 1, 2, 3])
# fill has the same behavior
>>> a.fill(-4.8)
>>> a
array([-4, -4, -4, -4])
217

Slicing
var[lower:upper:step]
Extracts a portion of a sequence by specifying a lower and upper bound.
The lower-bound element is included, but the upper-bound element is not included.
Mathematically: [lower, upper). The step value specifies the stride between elements.

SLICING ARRAYS

OMITTING INDICES

# indices:
0 1 2 3 4
>>> a = array([10,11,12,13,14])
# [10,11,12,13,14]
>>> a[1:3]
array([11, 12])

# omitted boundaries are
# assumed to be the beginning
# (or end) of the list

# negative indices work also
>>> a[1:-2]
array([11, 12])
>>> a[-4:3]
array([11, 12])

# grab first three elements
>>> a[:3]
array([10, 11, 12])
# grab last two elements
>>> a[-2:]
array([13, 14])
# every other element
>>> a[::2]
array([10, 12, 14])
218

Multi-Dimensional Arrays
MULTI-DIMENSIONAL ARRAYS
>>> a = array([[ 0, 1, 2, 3],
[10,11,12,13]])
>>> a
array([[ 0, 1, 2, 3],
[10,11,12,13]])
SHAPE = (ROWS,COLUMNS)
>>> a.shape
(2, 4)
ELEMENT COUNT
>>> a.size
8

GET/SET ELEMENTS
>>> a[1,3]
13
column
row

>>> a[1,3] = -1
>>> a
array([[ 0, 1, 2, 3],
[10,11,12,-1]])
ADDRESS SECOND (ONETH)
ROW USING SINGLE INDEX
>>> a[1]
array([10, 11, 12, -1])

NUMBER OF DIMENSIONS
>>> a.ndim
2

219

Arrays from/to ASCII files
BASIC PATTERN

ARRAYS FROM/TO TXT FILES

# Read data into a list of lists,
# and THEN convert to an array.
file = open('myfile.txt')

Data.txt
-- BEGINNING OF THE FILE
% Day,

# Create a list for all the data.
data = []
for line in file:
# Read each row of data into a
# list of floats.
fields = line.split()
row_data = [float(x) for x
in fields]
# And add this row to the
# entire data set.
data.append(row_data)
# Finally, convert the "list of
# lists" into a 2D array.
data = array(data)
file.close()

Month,

Year, Skip, Avg Power

01, 01, 2000, x876, 13 % crazy day!
% we don’t have Jan 03rd
04, 01, 2000, xfed, 55

# loadtxt() automatically generates
# an array from the txt file
arr = loadtxt('Data.txt', skiprows=1,
dtype=int, delimiter=",",
usecols = (0,1,2,4),
comments = "%")
# Save an array into a txt file
savetxt('filename', arr)

220

Arrays to/from Files
OTHER FILE FORMATS
Many file formats are supported in various packages:
File format
txt
csv

Package name(s)
numpy
csv

Matlab

scipy.io

hdf

pytables, h5py

NetCDF

netCDF4, scipy.io.netcdf

Functions
loadtxt, savetxt, genfromtxt, fromfile, tofile
reader, writer
loadmat, savemat

netCDF4.Dataset, scipy.io.netcdf.netcdf_file

This includes many industry specific formats:
File format

Package name

wav

scipy.io.wavfile

Comments

LAS/SEG-Y

Scipy cookbook, EPD

Data files in Geophysics

jpeg, png, …

PIL, scipy.misc.pilutil

Common image formats

FITS

pyfits, astropy.io.fits

Image files in Astronomy

Audio files

221

Array Slicing
SLICING WORKS MUCH LIKE
STANDARD PYTHON SLICING
>>> a[0,3:5]
array([3, 4])
>>> a[4:,4:]
array([[44, 45],
[54, 55]])
>>> a[:,2]
array([2,12,22,32,42,52])
STRIDES ARE ALSO POSSIBLE

0

1

2

3

4

5

10 11 12 13 14 15
20 21 22 23 24 25
30 31 32 33 34 35
40 41 42 43 44 45
50 51 52 53 54 55

>>> a[2::2,::2]
array([[20, 22, 24],
[40, 42, 44]])
222

Slices Are References
Slices are references to memory in the original array.
Changing values in a slice also changes the original array.
>>> a = array((0,1,2,3,4))
# create a slice containing only the
# last element of a
>>> b = a[2:4]
>>> b
array([2, 3])
>>> b[0] = 10
# changing b changed a!
>>> a
array([ 0, 1, 10, 3, 4])
223

Fancy Indexing
INDEXING BY POSITION

INDEXING WITH BOOLEANS
# manual creation of masks
>>> mask = array([0,1,1,0,0,1,0,0],
...
dtype=bool)

>>> a = arange(0,80,10)

# fancy indexing
>>> indices = [1, 2, -3]
>>> y = a[indices]
>>> print y
[10 20 50]

# conditional creation of masks
>>> mask2 = a < 30
# fancy indexing
>>> y = a[mask]
>>> print y
[10 20 50]

a

0

y

10 20 50

10 20 30 40 50 60 70

224

Fancy Indexing in 2-D
>>> a[(0,1,2,3,4),(1,2,3,4,5)]
array([ 1, 12, 23, 34, 45])
>>> a[3:,[0, 2,
array([[30, 32,
[40, 42,
[50, 52,

5]]
35],
45],
55]])

>>> mask = array([1,0,1,0,0,1],
dtype=bool)
>>> a[mask,2]
array([2,22,52])

0

1

2

3

4

5

10 11 12 13 14 15
20 21 22 23 24 25
30 31 32 33 34 35
40 41 42 43 44 45
50 51 52 53 54 55

Unlike slicing, fancy indexing creates copies instead of a view into original array.

225

“Incomplete” Indexing
>>> y = a[:3]

y

a

>>> y = a[condition]
y

a

condition
0

0

1

2

3

4

5

1

10 11 12 13 14 15

10 11 12 13 14 15

1

20 21 22 23 24 25

20 21 22 23 24 25

0

30 31 32 33 34 35

40 41 42 43 44 45

1

40 41 42 43 44 45

227

3D Example
MULTIDIMENSIONAL

2
1
0

# retrieve two slices from a
# 3D cube via indexing
>>> y = a[:,:,[2,-2]]

a

y
228

Where
1 DIMENSION
# find the indices in array
# where expression is True
>>> a = array([0, 12, 5, 20])
>>> a > 10
array([False, True, False,
True], dtype=bool)
# Note: it returns a tuple!
>>> where(a > 10)
(array([1, 3]),)

n DIMENSIONS
# In general, the tuple
# returned is the index of the
# element satisfying the
# condition in each dimension.
>>> a = array([[0, 12, 5, 20],
[1, 2, 11, 15]])
>>> loc = where(a > 10)
>>> loc
(array([0, 0, 1, 1]),
array([1, 3, 2, 3]))
# Result can be used in
# various ways:
>>> a[loc]
array([12, 20, 11, 15])
229

Array Data Structure

230

Array Data Structure

231

Indexing with newaxis
newaxis is a special index that inserts a new axis in the array at
the specified location.
Each newaxis increases the array’s dimensionality by 1.
a
1X3

0

1

3X1

>>> shape(a)
(3,)
>>> y = a[newaxis,:]
>>> shape(y)
(1, 3)

1X1X3

>>> y = a[:,newaxis]
>>> shape(y)
(3, 1)

1

2

> y = a[newaxis, newaxis, :]
> shape(y)
(1, 1, 3)

0
1

0

2

2
1
0

2

232

“Flattening” Arrays
a.flatten()

a.flat

a.flatten() converts a multidimensional array into a 1-D array. The new
array is a copy of the original data.
# Create a 2D array
>>> a = array([[0,1],
[2,3]])

a.flat is an attribute that returns an
iterator object that accesses the data in the
multi-dimensional array data as a 1-D array.
It references the original memory.

# Flatten out elements to 1D
>>> b = a.flatten()
>>> b
array([0,1,2,3])
# Changing b does not change a
>>> b[0] = 10
>>> b
array([10,1,2,3])
no change
>>> a
array([[0, 1],
[2, 3]])

>>> a.flat

>>> a.flat[:]
array(0,1,2,3)
>>> b = a.flat
>>> b[0] = 10
changed!
>>> a
array([[10, 1],
[ 2, 3]])

233

“(Un)raveling” Arrays
a.ravel()
a.ravel() is the same as a.flatten(),
but returns a reference (or view) of the array
if possible (i.e., the memory is contiguous).
Otherwise the new array copies the data.
# create a 2-D array
>>> a = array([[0,1],
[2,3]])
# flatten out elements to 1-D
>>> b = a.ravel()
>>> b
array([0,1,2,3])

# changing b does change a
>>> b[0] = 10
>>> b
array([10,1,2,3])
changed!
>>> a
array([[10, 1],
[ 2, 3]])

a.ravel() MAKES A COPY
# create a 2-D array
>>> a = array([[0,1],
[2,3]])

# transpose array so memory
# layout is no longer contiguous
>>> aa = a.transpose()
>>> aa
array([[0, 2],
[1, 3]])
# ravel creates a copy of data
>>> b = aa.ravel()
array([0,2,1,3])
# changing b doesn't change a
>>> b[0] = 10
>>> b
array([10,1,2,3])
>>> a
array([[0, 1],
234
[2, 3]])

Reshaping Arrays
SHAPE
>>> a = arange(6)
>>> a
array([0, 1, 2, 3, 4, 5])
>>> a.shape
(6,)

# reshape array in-place to
# 2x3
>>> a.shape = (2,3)
>>> a
array([[0, 1, 2],
[3, 4, 5]])

RESHAPE
# return a new array with a
# different shape
>>> a.reshape(3,2)
array([[0, 1],
[2, 3],
[4, 5]])
# reshape cannot change the
# number of elements in an
# array
>>> a.reshape(4,2)
ValueError: total size of new
array must be unchanged

235

Transpose
TRANSPOSE
>>> a = array([[0,1,2],
...
[3,4,5]])
>>> a.shape
(2,3)
# Transpose swaps the order
# of axes. For 2-D this
# swaps rows and columns.
>>> a.transpose()
array([[0, 3],
[1, 4],
[2, 5]])
# The .T attribute is
# equivalent to transpose().
>>> a.T
array([[0, 3],
[1, 4],
[2, 5]])

TRANSPOSE RETURNS VIEWS
>>> b = a.T
# Changes to
>>> b[0,1] =
>>> a
array([[ 0,
[30,

b alter a.
30
1,
4,

2],
5]])

TRANSPOSE AND STRIDES

# Transpose does not move
# values around in memory. It
# only changes the order of
# "strides" in the array
>>> a.strides
(12, 4)
>>> a.T.strides
(4, 12)

236

Squeeze
SQUEEZE
>>> a = array([[1,2,3],
...
[4,5,6]])
>>> a.shape
(2,3)
# insert an
>>> a.shape
>>> a
array([[[0,
[[3,

"extra" dimension
= (2,1,3)
1, 2]],
4, 5]]])

# squeeze removes any
# dimension with length==1
>>> a = a.squeeze()
>>> a.shape
(2,3)
237

Diagonals
DIAGONAL
>>> a = array([[11,21,31],
...
[12,22,32],
...
[13,23,33]])

DIAGONALS WITH INDEXING
# "Fancy" indexing also works.
>>> i = [0,1,2]
>>> a[i, i]
array([11, 22, 33])

# Extract the diagonal from
# an array.
>>> a.diagonal()
array([11, 22, 33])

# Indexing can also be used
# to set diagonal values…
>>> a[i, i] = 2
>>> i2 = array([0,1])
# upper diagonal
>>> a[i2, i2+1] = 1
# lower diagonal
>>> a[i2+1, i2]= -1
>>> a
array([[ 2, 1, 31],
[-1, 2, 1],
[13, -1, 2]])

# Use offset to move off the
# main diagonal (offset can
# be negative).
>>> a.diagonal(offset=1)
array([21, 32])

238

Complex Numbers
COMPLEX ARRAY ATTRIBUTES
>>> a = array([1+1j, 2, 3, 4])
array([1.+1.j, 2.+0.j, 3.+0.j,
4.+0.j])
>>> a.dtype
dtype('complex128')

CONJUGATION
>>> a.conj()
array([1.-1.j, 2.-2.j, 3.-3.j,
4.-4.j])
FLOAT (AND OTHER) ARRAYS
>>> a = array([0., 1, 2, 3])

# real and imaginary parts
>>> a.real
array([ 1., 2., 3., 4.])
>>> a.imag
array([ 1., 0., 0., 0.])
# set imaginary part to a
# different set of values
>>> a.imag = (1,2,3,4)
>>> a
array([1.+1.j, 2.+2.j, 3.+3.j,
4.+4.j])

# .real and .imag attributes
# are available
>>> a.real
array([ 0., 1., 2., 3.])
>>> a.imag
array([ 0., 0., 0., 0.])
# but .imag is read-only
>>> a.imag = (1,2,3,4)
TypeError: array does not
have imaginary part to set

239

Array Constructor Examples
FLOATING POINT ARRAYS
# Default to double precision
>>> a = array([0,1.0,2,3])
>>> a.dtype
dtype('float64')
>>> a.nbytes
32
REDUCING PRECISION
>>> a = array([0,1.,2,3],
...
dtype=float32)
>>> a.dtype
dtype('float32')
>>> a.nbytes
16

UNSIGNED INTEGER BYTE
>>> a = array([0,1,2,3],
...
dtype=uint8)
>>> a.dtype
dtype('uint8')
>>> a.nbytes
4
ARRAY FROM BINARY DATA
# frombuffer or fromfile
# to create an array from
# binary data.
>>> a = frombuffer('foo',
...
dtype=uint8)
>>> a
array([102, 111, 111])
# Reverse operation
>>> a.tofile('foo.dat')

240

NumPy dtypes
Basic Type

Available NumPy types

Boolean

bool

Integer

int8, int16, int32, int64,
int128, int

Unsigned
Integer

uint8, uint16, uint32, uint64,
uint128, uint

Float

float16, float32, float64,
float,longfloat,

Complex

complex64, complex128,
complex, longcomplex

Strings

str, unicode

Object

object

Records

void

Comments
Elements are 1 byte in size.
int defaults to the size of long in C for the
platform.
uint defaults to the size of unsigned long in
C for the platform.
float is always a double precision floating
point value (64 bits). longfloat represents
large precision floats. Its size is platform
dependent.
The real and imaginary elements of a
complex64 are each represented by a single
precision (32 bit) value for a total size of 64
bits.
For example, dtype=‘S4’ would be used for an
array of 4-character strings.
Represent items in array as Python objects.
Used for arbitrary data structures.

241

Type Casting
ASARRAY

ASTYPE

>>> a = array([1.5, -3],
...
dtype=float32)
>>> a
array([ 1.5, -3.], dtype=float32)

>>> a = array([1.5, -3],
...
dtype=float64)
>>> a.astype(float32)
array([ 1.5, -3.], dtype=float32)

# upcast
>>> asarray(a, dtype=float64)
array([ 1.5, -3. ])

>>> a.astype(uint8)
array([ 1, 253],dtype=uint8)

# downcast
>>> asarray(a, dtype=uint8)
array([ 1, 253], dtype=uint8)
# asarray is efficient.
# It does not make a copy if the
# type is the same.
>>> b = asarray(a, dtype=float32)
>>> b[0] = 2.0
>>> a
array([ 2., -3.],dtype=float32)

# astype is safe.
# It always returns a copy of
# the array.
>>> b = a.astype(float64)
>>> b[0] = 2.0
>>> a
array([1.5, -3.])
242

Array Calculation Methods
SUM FUNCTION
>>> a = array([[1,2,3],
[4,5,6]])
# sum() defaults to adding up
# all the values in an array.
>>> sum(a)
21
# supply the keyword axis to
# sum along the 0th axis
>>> sum(a, axis=0)
array([5, 7, 9])
# supply the keyword axis to
# sum along the last axis
>>> sum(a, axis=-1)
array([ 6, 15])

SUM ARRAY METHOD
# a.sum() defaults to adding
# up all values in an array.
>>> a.sum()
21
# supply an axis argument to
# sum along a specific axis
>>> a.sum(axis=0)
array([5, 7, 9])
PRODUCT
# product along columns
>>> a.prod(axis=0)
array([ 4, 10, 18])
# functional form
>>> prod(a, axis=0)
array([ 4, 10, 18])

243

Min/Max
MIN
>>> a = array([2.,3.,0.,1.])
>>> a.min(axis=0)
0.0
# Use NumPy's amin() instead
# of Python's built-in min()
# for speedy operations on
# multi-dimensional arrays.
>>> amin(a, axis=0)
0.0

MAX
>>> a = array([2.,3.,0.,1.])
>>> a.max(axis=0)
3.0

ARGMIN
# Find index of minimum value.
>>> a.argmin(axis=0)
2
# functional form
>>> argmin(a, axis=0)
2

ARGMAX
# Find index of maximum value.
>>> a.argmax(axis=0)
1
# functional form
>>> argmax(a, axis=0)
1

# functional form
>>> amax(a, axis=0)
3.0

244

Statistics Array Methods
MEAN
>>> a = array([[1,2,3],
[4,5,6]])
# mean value of each column
>>> a.mean(axis=0)
array([ 2.5, 3.5, 4.5])
>>> mean(a, axis=0)
array([ 2.5, 3.5, 4.5])
>>> average(a, axis=0)
array([ 2.5, 3.5, 4.5])

STANDARD DEV./VARIANCE
# Standard Deviation
>>> a.std(axis=0)
array([ 1.5, 1.5, 1.5])

# variance
>>> a.var(axis=0)
array([2.25, 2.25, 2.25])
>>> var(a, axis=0)
array([2.25, 2.25, 2.25])

# average can also calculate
# a weighted average
>>> average(a, weights=[1,2],
...
axis=0)
array([ 3., 4., 5.])
245

Other Array Methods
CLIP

ROUND

# Limit values to a range.

# Round values in an array.
# NumPy rounds to even, so
# 1.5 and 2.5 both round to 2.
>>> a = array([1.35, 2.5, 1.5])
>>> a.round()
array([ 1., 2., 2.])

>>> a = array([[1,2,3],
[4,5,6]])
# Set values < 3 equal to 3.
# Set values > 5 equal to 5.
>>> a.clip(3, 5)
array([[3, 3, 3],
[4, 5, 5]])

# Round to first decimal place.
>>> a.round(decimals=1)
array([ 1.4, 2.5, 1.5])

PEAK TO PEAK
# Calculate max – min for
# array along columns
>>> a.ptp(axis=0)
array([3, 3, 3])
# max – min for entire array.
>>> a.ptp(axis=None)
5

246

Summary of (most) array
attributes/methods (1/4)
BASIC ATTRIBUTES
a.dtype

Numerical type of array elements: float 32, uint8, etc.

a.shape

Shape of array (m, n, o, …)

a.size

Number of elements in entire array

a.itemsize

Number of bytes used by a single element in the array

a.nbytes

Number of bytes used by entire array (data only)

a.ndim

Number of dimensions in the array

SHAPE OPERATIONS
a.flat

An iterator to step through array as if it were 1D

a.flatten()

Returns a 1D copy of a multi-dimensional array

a.ravel()

Same as flatten(), but returns a "view" if possible

a.resize(new_size)

Changes the size/shape of an array in place

a.swapaxes(axis1, axis2)

Swaps the order of two axes in an array

a.transpose(*axes)

Swaps the order of any number of array axes

a.T

Shorthand for a.transpose()

a.squeeze()

Removes any length==1 dimensions from an array

247

Summary of (most) array
attributes/methods (2/4)
FILL AND COPY
a.copy()

Returns a copy of the array

a.fill(value)

Fills an array with a scalar value

CONVERSION/COERCION
a.tolist()

Converts array into nested lists of values

a.tostring()

Raw copy of array memory into a Python string

a.astype(dtype)

Returns array coerced to the given type

a.byteswap(False)

Converts byte order (big <->little endian)

a.view(type_or_dtype)

Creates a new ndarray that sees the same memory but interprets it as a
new datatype (or subclass of ndarray)

COMPLEX NUMBERS
a.real

Returns the real part of the array

a.imag

Returns the imaginary part of the array

a.conjugate()

Returns the complex conjugate of the array

a.conj()

Returns the complex conjugate of the array (same as conjugate)

248

Summary of (most) array
attributes/methods (3/4)
SAVING
a.dump(file)

Stores binary array data to file

a.dumps()

Returns a binary pickle of the data as a string

a.tofile(fid, sep="",
format="%s")

Formatted ASCII output to a file

SEARCH/SORT
a.nonzero()

Returns indices for all non-zero elements in the array

a.sort(axis=-1)

Sort the array elements in place, along axis

a.argsort(axis=-1)

Finds indices for sorted elements, along axis

a.searchsorted(b)

Finds indices where elements of b would be inserted in a to maintain order

ELEMENT MATH OPERATIONS
a.clip(low, high)

Limits values in the array to the specified range

a.round(decimals=0)

Rounds to the specified number of digits

a.cumsum(axis=None)

Cumulative sum of elements along axis

a.cumprod(axis=None)

Cumulative product of elements along axis

249

Summary of (most) array
attributes/methods (4/4)
REDUCTION METHODS

All the following methods “reduce” the size of the array by 1 dimension by
carrying out an operation along the specified axis. If axis is None, the
operation is carried out across the entire array.
a.sum(axis=None)

Sums values along axis

a.prod(axis=None)

Finds the product of all values along axis

a.min(axis=None)

Finds the minimum value along axis

a.max(axis=None)

Finds the maximum value along axis

a.argmin(axis=None)

Finds the index of the minimum value along axis

a.argmax(axis=None)

Finds the index of the maximum value along axis

a.ptp(axis=None)

Calculates a.max(axis) – a.min(axis)

a.mean(axis=None)

Finds the mean (average) value along axis

a.std(axis=None)

Finds the standard deviation along axis

a.var(axis=None)

Finds the variance along axis

a.any(axis=None)

True if any value along axis is non-zero (logical OR)

a.all(axis=None)

True if all values along axis are non-zero (logical AND)

250

Array Creation Functions
ARANGE
arange(start,stop=None,step=1,
dtype=None)
Nearly identical to Python’s range().
Creates an array of values in the range
[start,stop) with the specified step value.
Allows non-integer values for start, stop, and
step. Default dtype is derived from the start,
stop, and step values.

>>> arange(4)
array([0, 1, 2, 3])
>>> arange(0, 2*pi, pi/4)
array([ 0.000, 0.785, 1.571,
2.356, 3.142, 3.927, 4.712,
5.497])

ONES, ZEROS
ones(shape, dtype=float64)
zeros(shape, dtype=float64)
shape is a number or sequence specifying
the dimensions of the array. If dtype is not
specified, it defaults to float64.

>>> ones((2,3),dtype=float32)
array([[ 1., 1., 1.],
[ 1., 1., 1.]],
dtype=float32)
>>> zeros(3)
array([ 0., 0., 0.])

# Be careful…
>>> arange(1.5, 2.1, 0.3)
array([ 1.5, 1.8, 2.1])

251

Array Creation Functions (cont.)
IDENTITY
# Generate an n by n identity
# array. The default dtype is
# float64.
>>> a = identity(4)
>>> a
array([[ 1., 0., 0., 0.],
[ 0., 1., 0., 0.],
[ 0., 0., 1., 0.],
[ 0., 0., 0., 1.]])
>>> a.dtype
dtype('float64')
>>> identity(4, dtype=int)
array([[ 1, 0, 0, 0],
[ 0, 1, 0, 0],
[ 0, 0, 1, 0],
[ 0, 0, 0, 1]])

EMPTY AND FILL

# empty(shape, dtype=float64,
#
order='C')
>>> a = empty(2)
>>> a
array([1.78021120e-306,
6.95357225e-308])
# fill array with 5.0
>>> a.fill(5.0)
array([5., 5.])
# alternative approach
# (slightly slower)
>>> a[:] = 4.0
array([4., 4.])
252

Array Creation Functions (cont.)
LINSPACE
# Generate N evenly spaced
# elements between (and
# including) start and
# stop values.
>>> linspace(0,1,5)
array([0.,0.25.,0.5,0.75, 1.0])

ROW SHORTCUT
# r_ and c_ are "handy" tools
# (cough hacks…) for creating
# row and column arrays.
# used like arange
# -- real stride value
>>> r_[0:1:.25]
array([ 0., 0.25., 0.5, 0.75])

LOGSPACE
# Generate N evenly spaced
# elements on a log scale
# between base**start and
# base**stop (default base=10).
>>> logspace(0,1,5)
array([ 1., 1.77, 3.16, 5.62,
10.])

# used like linspace
# -- complex stride value
>>> r_[0:1:5j]
array([0.,0.25.,0.5,0.75,1.0])
# concatenate elements
>>> r_[(1,2,3),0,0,(4,5)]
array([1, 2, 3, 0, 0, 4, 5])
253

Array Creation Functions (cont.)
MGRID

OGRID

# Get equally spaced points
# in N output arrays for an
# N-dimensional (mesh) grid.

#
#
#
#
#

>>> x,y = mgrid[0:5,0:5]
>>> x
array([[0, 0, 0, 0, 0],
[1, 1, 1, 1, 1],
[2, 2, 2, 2, 2],
[3, 3, 3, 3, 3],
[4, 4, 4, 4, 4]])
>>> y
array([[0, 1, 2, 3, 4],
[0, 1, 2, 3, 4],
[0, 1, 2, 3, 4],
[0, 1, 2, 3, 4],
[0, 1, 2, 3, 4]])

Construct an "open" grid
of points (not filled in
but correctly shaped for
math operations to be
broadcast correctly).

>>> x,y = ogrid[0:3,0:3]
>>> x
array([[0],
[1],
[2]])
>>> y
array([[0, 1, 2]])
>>>
[[0
[1
[2

print x+y
1 2]
2 3]
3 4]]

254

Matrix Objects
MATRIX CREATION

BLOCK MATRICES

# Matlab-like creation from string
>>> A = mat('1,2,4;2,5,3;7,8,9')
>>> print A
Matrix([[1, 2, 4],
[2, 5, 3],
[7, 8, 9]])

# create a matrix from
# sub-matrices
>>> a = array([[1,2],
[3,4]])
>>> b = array([[10,20],
[30,40]])

# matrix exponents
>>> print A**4
Matrix([[ 6497, 9580, 9836],
[ 7138, 10561, 10818],
[18434, 27220, 27945]])

>>> bmat('a,b;b,a')
matrix([[ 1, 2, 10, 20],
[ 3, 4, 30, 40],
[10, 20, 1, 2],
[30, 40, 3, 4]])

# matrix multiplication
>>> print A*A.I
Matrix([[ 1., 0., 0.],
[ 0., 1., 0.],
[ 0., 0., 1.]])
255

Trig and Other Functions
TRIGONOMETRIC

sin(x)
cos(x)
arccos(x)
arctan(x)
arcsin(x)
arctan2(x,y)

OTHERS

sinh(x)
cosh(x)
arccosh(x)
arctanh(x)
arcsinh(x)

VECTOR OPERATIONS

dot(x,y)
vdot(x,y)
inner(x,y)
outer(x,y)
cross(x,y)
kron(x,y)
tensordot(x,y[,axis])

exp(x)
log10(x)
absolute(x)
negative(x)
floor(x)
hypot(x,y)
maximum(x,y)

log(x)
sqrt(x)
conjugate(x)
ceil(x)
fabs(x)
fmod(x,y)
minimum(x,y)

hypot(x,y)

Element by element distance
calculation using
x2  y2

256

More Basic Functions
TYPE HANDLING

OTHER USEFUL FUNCTIONS

iscomplexobj real_if_close

isnan

fix

unwrap

roots

iscomplex

isscalar

nan_to_num

mod

sort_complex

poly

isrealobj

isneginf

common_type

amax

trim_zeros

any

isreal

isposinf

typename

amin

fliplr

all

imag

isinf

ptp

flipud

disp

real

isfinite

sum

rot90

unique

cumsum

eye

nansum

SHAPE MANIPULATION

prod

diag

nanmax

atleast_1d

hstack

hsplit

cumprod

select

nanargmax

atleast_2d

vstack

vsplit

diff

extract

nanargmin

dstack

dsplit

angle

insert

nanmin

column_stack

split

atleast_3d
expand_dims
apply_over_axes

squeeze

apply_along_axis

257

Vectorizing Functions
SCALAR SINC FUNCTION
# special.sinc already available
# This is just for show.
def sinc(x):
if x == 0.0:
return 1.0
else:
w = pi*x
return sin(w) / w

SOLUTION
>>> from numpy import vectorize
>>> vsinc = vectorize(sinc)
>>> vsinc(x)
array([-0.1981, -0.2122])
>>> x2 = linspace(-5, 5, 101)
>>> plot(x2, vsinc(x2))

# attempt
>>> x = array((1.3, 1.5))
>>> sinc(x)
ValueError: The truth value of
an array with more than one
element is ambiguous. Use
a.any() or a.all()

258

Mathematical Binary Operators
a + b 
a - b 
a % b 

add(a,b)
subtract(a,b)
remainder(a,b)

MULTIPLY BY A SCALAR
>>> a = array((1,2))
>>> a*3.
array([3., 6.])
ELEMENT BY ELEMENT
ADDITION

>>> a = array([1,2])
>>> b = array([3,4])
>>> a + b
array([4, 6])

a * b 
a / b 
a ** b 

multiply(a,b)
divide(a,b)
power(a,b)

ADDITION USING AN OPERATOR
FUNCTION
>>> add(a,b)
array([4, 6])

IN-PLACE OPERATION
# Overwrite contents of a.
# Saves array creation
# overhead.
>>> add(a,b,a) # a += b
array([4, 6])
>>> a
array([4, 6])

259

Comparison and Logical Operators
equal
(==)
greater_equal (>=)
logical_and
logical_not

not_equal (!=)
less
(<)
logical_or

2-D EXAMPLE
>>> a = array(((1,2,3,4),(2,3,4,5)))
>>> b = array(((1,2,5,4),(1,3,4,5)))
>>> a == b
array([[True, True, False, True],
[False, True, True, True]])

# functional equivalent
>>> equal(a,b)
array([[True, True, False, True],
[False, True, True, True]])

greater
(>)
less_equal (<=)
logical_xor

Be careful with if statements
involving numpy arrays. To
test for equality of arrays,
don't do:

if a == b:
Rather, do:
if all(a==b):
For floating point,

if allclose(a,b):
is even better.
260

Bitwise Operators
bitwise_and
bitwise_or

(&)
(|)

invert
bitwise_xor

(~)
(^)

right_shift (>>)
left_shift (<<)

BITWISE EXAMPLES
>>> a = array((1,2,4,8))
>>> b = array((16,32,64,128))
>>> bitwise_or(a,b)
array([ 17, 34, 68, 136])
# bit inversion
>>> a = array((1,2,3,4), uint8)
>>> invert(a)
array([254, 253, 252, 251], dtype=uint8)

When possible,
operation made
bitwise are another
way to speed up
computations.

# left shift operation
>>> left_shift(a,3)
array([ 8, 16, 24, 32], dtype=uint8)
261

Bitwise and Comparison Together
PRECEDENCE ISSUES
# When combining comparisons with bitwise operations,
# precedence requires parentheses around the comparisons.
>>> a = array([1,2,4,8])
>>> b = array([16,32,64,128])
>>> (a > 3) & (b < 100)
array([ False, False, True, False])

LOGICAL AND ISSUES
# Note that logical AND isn't supported for arrays without
# calling the logical_and function.
>>> a>3 and b<100
Traceback (most recent call last):
ValueError: The truth value of an array with more than one
element is ambiguous. Use a.any() or a.all()

# Also, you cannot currently use the "short version" of
# comparison with NumPy arrays.
>>> 2>> a = array([1,2,3,4])
>>> add.reduce(a)
10
STRING LIST EXAMPLE
>>> a = array(['ab','cd','ef'],
...
dtype=object)
>>> add.reduce(a)
'abcdef'

LOGICAL OP EXAMPLES
>>> a = array([1,1,0,1])
>>> logical_and.reduce(a)
False
>>> logical_or.reduce(a)
True
264

op.reduce()
For multidimensional arrays, op.reduce(a,axis)applies op to the elements of
a along the specified axis. The resulting array has dimensionality one less than a.
The default value for axis is 0.
SUM COLUMNS BY DEFAULT

SUMMING UP EACH ROW

>>> add.reduce(a)
array([60, 64, 68])

>>> add.reduce(a,1)
array([ 3, 33, 63, 93])

1
0

1

60 64 68

0

3

0

2

33

10 11 12

10 11 12

63

20 21 22

20 21 22

93

30 31 32

0

1

1

2

30 31 32
265

op.accumulate()
op.accumulate(a) creates a

ADD EXAMPLE

new array containing the intermediate
results of the reduce operation at

>>> a = array([1,2,3,4])
>>> add.accumulate(a)
array([ 1, 3, 6, 10])

each element in a.

For example:

y  add.accumulate(a)
1
N 1
0

  a[n],  a[n],..., a[n]
n 0
n 0
 n 0


STRING LIST EXAMPLE
>>> a = array(['ab','cd','ef'],
...
dtype=object)
>>> add.accumulate(a)
array([ab,abcd,abcdef],
dtype=object)
LOGICAL OP EXAMPLES
>>> a = array([1,1,0])
>>> logical_and.accumulate(a)
array([True, True, False])
>>> logical_or.accumulate(a)
array([True, True, True])

266

op.reduceat()
op.reduceat(a,indices)
applies op to ranges in the 1-D array a
defined by the values in indices.
The resulting array has the same length
as indices.

EXAMPLE
>>> a = array([0,10,20,30,
...
40,50])
>>> indices = array([1,4])
>>> add.reduceat(a,indices)
array([60, 90])

0

For example:

1

y = add.reduceat(a, indices)

y[i] 

indices[i  1]



10 20 30 40 50

4

For multidimensional arrays,
reduceat() is always applied along the
last axis (sum of rows for 2-D arrays).
This is different from the default for
reduce() and accumulate().

a[n]

n  indices[i]

267

op.outer()
op.outer(a,b) forms all possible combinations of elements between
a and b using op. The shape of the resulting array results from
concatenating the shapes of a and b. (Order matters.)
a

a[0] a[1] a[2] a[3]

b

b[0] b[1] b[2]

>>> add.outer(a,b)

>>> add.outer(b,a)

a[0]+b[0] a[0]+b[1] a[0]+b[2]

b[0]+a[0] b[0]+a[1] b[0]+a[2] b[0]+a[3]

a[1]+b[0] a[1]+b[1] a[1]+b[2]

b[1]+a[0] b[1]+a[1] b[1]+a[2] b[1]+a[3]

a[2]+b[0] a[2]+b[1] a[2]+b[2]

b[2]+a[0] b[2]+a[1] b[2]+a[2] b[2]+a[3]

a[3]+b[0] a[3]+b[1] a[3]+b[2]

268

Array Functions – choose()
>>> y = choose(choice_array,(c0,c1,c2,c3))
c0
0 1 2
3 4 5
6 7 8

c1
5 5 5
5 5 5
5 5 5

c2
2 2 2
2 2 2
2 2 2

c3
9

choice_array
0 0 0
1 2 3
0 1 3

choose()

y

0 1 2
5 2 9
6 5 9

269

Example - choose()
CLIP LOWER VALUES TO 10

CLIP LOWER AND UPPER
VALUES

>>> a
array([[ 0, 1, 2],
[10, 11, 12],
[20, 21, 22]])

>>> a < 10
array([[True, True, True],
[False, False, False],
[False, False, False],
dtype=bool)
>>> choose(a<10,(a,10))
array([[10, 10, 10],
[10, 11, 12],
[20, 21, 22]])

>>> lt = a < 10
>>> gt = a > 15
>>> choice = lt + 2 * gt
>>> choice
array([[1, 1, 1],
[0, 0, 0],
[2, 2, 2]])
>>> choose(choice,(a,10,15))
array([[10, 10, 10],
[10, 11, 12],
[15, 15, 15]])
270

Array Functions – concatenate()
concatenate((a0,a1,…,aN),axis=0)
The input arrays (a0,a1,…,aN) are concatenated along the given
axis. They must have the same shape along every axis except the
one given.
x

0

1

y

2

10 11 12

50 51 52
60 61 62

>>> concatenate((x,y)) >>> concatenate((x,y),1)

0

1

>>> array((x,y))

2

10 11 12
50 51 52

0

1

2

50 51 52

60 61 62

10 11 12 60 61 62

50 51 52
0 1 2
60 61 62
10 11 12

See also vstack(), hstack() and dstack() respectively.

271

Array Broadcasting
NumPy arrays of different dimensionality can be combined in the same
expression. Arrays with smaller dimension are broadcasted to match the
larger arrays, without copying data. Broadcasting has two rules.
RULE 1: PREPEND ONES TO SMALLER ARRAYS’ SHAPE
In [3]: a = ones((3, 5)) # a.shape == (3, 5)
In [4]: b = ones((5,)) # b.shape == (5,)
In [5]: b.reshape(1, 5) # result is a (1,5)-shaped array.
In [6]: b[newaxis, :] # equivalent, more concise.
RULE 2: DIMENSIONS OF SIZE 1 ARE REPEATED WITHOUT COPYING
In [7]: c = a + b # c.shape == (3, 5)
is logically equivalent to...
In [8]: tmp_b = b.reshape(1, 5)
In [9]: tmp_b_repeat = tmp_b.repeat(3, axis=0)
In [10]: c = a + tmp_b_repeat
# But broadcasting makes no copies of “b”s data!
272

Array Broadcasting
4x3

4x3

4x3

3

stretch

4x1

3

stretch

stretch

273

Broadcasting Rules
The trailing axes of either arrays must be 1 or both must have the same

size for broadcasting to occur. Otherwise, a
"ValueError: shape mismatch: objects cannot be
broadcast to a single shape" exception is thrown.
mismatch!
4x3
0

0

4

0

10 10 10
20 20 20

0

+

1

2

3

=

30 30 30

274

Broadcasting in Action
>>> a = array((0,10,20,30))
>>> b = array((0,1,2))
>>> y = a[:, newaxis] + b

0
10
20
30

0

+

1

0

2

=

1

2

10 11 12
20 21 22
30 31 32

275

Application: distance from center
In
In
In
In
In

[1]:
[2]:
[3]:
[4]:
[5]:

a = linspace(0, 1, 15) - 0.5
b = a[:, newaxis] # b.shape == (15, 1)
dist2 = a**2 + b**2 # broadcasting sum.
dist = sqrt(dist2)
imshow(dist); colorbar()

276

Broadcasting’s usefulness
Broadcasting can often be used to replace needless data replication inside a
NumPy array expression.
np.meshgrid() – use newaxis appropriately in broadcasting expressions.
np.repeat() – broadcasting makes repeating an array along a dimension of
size 1 unnecessary.
MESHGRID: COPIES DATA
In [3]: x, y = \
meshgrid([1,2],[3,4,5])
In [4]: z = x + y

BROADCASTING: NO COPIES
In [5]: x = array([1,2])
In [6]: y = array([3,4,5])
In [7]: z = \
x[newaxis,:] + y[:,newaxis]

277

Vector Quantization Example
Target 1

Feature 2

Target 2
0

3

1
2
4

Feature 1
278

Feature 2

Vector Quantization Example

0

3

1
2
Minimum
Distance

4

Feature 1
279

Vector Quantization Example

Feature 2

Observations

0

3

1
2
4

Feature 1
280

Vector Quantization Example
Observations
( obs )
x y z
o0
o1
o2
o3
o4
o5
o6
o7
o8
o9

Code Book

( book )
x y z
c0
c1
c2
c3
c4

diff = obs[newaxis,:,:] – book[:,newaxis,:]
5x1x3
1x10x3
5x10x3

5x3

10x3

distance = sqrt(sum(diff**2,axis=-1))
o0 o1 o2 o3 o4 o5 o6 o7 o8 o9

code_index = argmin(distance,axis=0))

281

VQ Speed Comparisons
Method
Matlab 5.3

•
•
•

Run Time
(sec)
1.611

Speed
Up
-

Python VQ1, double

2.245

0.71

Python VQ1, float

1.138

1.42

Python VQ2, double

1.637

0.98

Python VQ2, float
C, double
C, float

0.954
0.066
0.064

1.69
24.40
24.40

4000 observations with 16 features categorized into 40 codes
on Pentium III 500 MHz.
VQ1 uses the technique described on the previous slide verbatim.
VQ2 applies broadcasting on an observation by observation basis.
This turned out to be much more efficient because it is less memory intensive. 282

Broadcasting Indices
Broadcasting can also be used to slice elements from different
“depths” in a 3-D (or any other shape) array. This is a very powerful
feature of indexing.
>>> xi,yi = ogrid[:3,:3]
>>> zi = array([[0, 1, 2],
[1, 1, 2],
[2, 2, 2]])
>>> horizon = data_cube[xi,yi,zi]

Indices

Selected Data

283

“Structured” Arrays
# "Data structure" (dtype) that describes the fields and
# type of the items in each array element.
>>> particle_dtype = dtype([('mass','float32'), ('velocity', 'float32')])
# This must be a list of tuples.
>>> particles = array([(1,1), (1,2), (2,1), (1,3)],
dtype=particle_dtype)
>>> print particles
[(1.0, 1.0) (1.0, 2.0) (2.0, 1.0) (1.0, 3.0)]
# Retrieve the mass for all particles through indexing.
>>> print particles['mass']
[ 1. 1. 2. 1.]
# Retrieve particle 0 through indexing.
>>> particles[0]
(1.0, 1.0)
# Sort particles in place, with velocity as the primary field and
# mass as the secondary field.
>>> particles.sort(order=('velocity','mass'))
>>> print particles
[(1.0, 1.0) (2.0, 1.0) (1.0, 2.0) (1.0, 3.0)]
# See demo/multitype_array/particle.py.
284

“Structured” Arrays
Elements of an array can be
any fixed-size data structure!
name char[10]
age int
weight double
Brad

Jane

John

Fred

33

25

47

54

135.0

105.0

225.0

140.0

Henry

George

Brian

Amy

29

61

32

27

154.0

202.0

137.0

187.0

Ron

Susan

Jennifer

Jill

19

33

18

54

188.0

135.0

88.0

145.0

EXAMPLE
>>> from numpy import dtype, empty
# structured data format
>>> fmt = dtype([('name', 'S10'),
('age', int),
('weight', float)
])
>>> a = empty((3,4), dtype=fmt)
>>> a.itemsize
22
>>> a['name'] = [['Brad', … ,'Jill']]
>>> a['age'] = [[33, … , 54]]
>>> a['weight'] = [[135, … , 145]]
>>> print a
[[('Brad', 33, 135.0)
…
('Jill', 54, 145.0)]]

285

Nested Datatype
nested.dat

286

Nested Datatype (cont’d)
The data file can be extracted with the following code:
>>> dt = dtype([('time', uint64),
...

('size', uint32),

...

('position', [('az', float32),

...

('el', float32),

...

('region_type', uint8),

...

('region_ID', uint16)]),

...

('gain', uint8),

...

('samples', int16, 2048)])

>>> data = loadtxt('nested.dat', dtype=dt, skiprows = 2)
>>> data['position']['az']

array([ 0.71559399,

0.70687598,

0.69815701,

0.68942302,

0.68068302, ...], dtype=float32)

287

Memory Mapped Arrays
• Methods for Creating:
– memmap: subclass of ndarray that manages the
memory mapping details.
– frombuffer: Create an array from a memory mapped
buffer object.
– ndarray constructor: Use the buffer keyword to
pass in a memory mapped buffer.

• Limitations:
– Files must be < 2GB on Python 2.4 and before.
– Files must be < 2GB on 32-bit machines.
– Python 2.5 and higher on 64 bit machines is
theoretically "limited" to 17.2 billion GB (17 Exabytes).
288

Memory Mapped Example
# Create a "memory mapped" array where
# the array data is stored in a file on
# disk instead of in main memory.
>>> from numpy import memmap
>>> image = memmap('some_file.dat',
dtype=uint16,
mode='r+',
shape=(5,5),
offset=header_size)
# Standard array methods work.
>>> mean_value = image.mean()
# Standard math operations work.
# The resulting scaled_image *is*
# stored in main memory. It is a
# standard numpy array.
>>> scaled_image = image * .5

image:
2D NumPy array
shape: 5, 5
dtype: uint16

some_file.dat
110111… 0110000001 0010010111011000 1101001001000100 1111010101000010 0010111000101011 00011110101011… 289 memmap The memmap subclass of array handles opening and closing files as well as synchronizing memory with the underlying file system. memmap(filename, dtype=uint8, mode='r+', offset=0, shape=None, order=0) filename Name of the underlying file. For all modes, except for 'w+', the file must already exist and contain at least the number of bytes used by the array. dtype The numpy data type used for the array. This can be a "structured" dtype as well as the standard simple data types. offset Byte offset within the file to the memory used as data within the array. mode shape Tuple specifying the dimensions and size of each dimension in the array. shape=(5,10) would create a 2D array with 5 rows and 10 columns. order 'C' for row major memory ordering (standard in the C programming language) and 'F' for column major memory ordering (standard in Fortran). 290 memmap -- mode The mode setting for memmap arrays is used to set the access flag when opening the specified file using the standard mmap module. memmap(filename, dtype=uint8, mode='r+', offset=0, shape=None, order=0) mode A string indicating how the underlying file should be opened. 'r' or 'readonly': Open an existing file as an array for reading. 'c' or 'copyonwrite': "Copy on write" arrays are "writable" as Python arrays, but they never modify the underlying file. 'r+' or 'readwrite': Create a read/write array from an existing file. The file will have "write through" behavior where changes to the array are written to the underlying file. Use the flush() method to ensure the array is synchronized with the file. 'w+' or 'write': Create the file or overwrite if it exists. The array is filled with zeros and has "write through" behavior similar to 'r+'. 291 memmap -- write through behavior # Create a memory mapped "write through" file, overwriting it if it exists. In [66]: q=memmap('new_file.dat',mode='w+',shape=(2,5)) In [67]: q memmap([[0, 0, 0, 0, 0], [0, 0, 0, 0, 0]], dtype=uint8) # Print out the contents of the underlying file. Note: It # doesn't print because 0 isn't a printable ascii character. In [68]: !cat new_file.dat # Now write the ascii value for 'A' (65) into our array. In [69]: q[:] = ord('A') In [70]: q memmap([[65, 65, 65, 65, 65], [65, 65, 65, 65, 65]], dtype=uint8) # Ensure the OS has written the data to the file, and examine # the underlying file. It is full of 'A's as we hope. In [71]: q.flush() In [72]: !cat new_file.dat AAAAAAAAAA 292 memmap -- copy on write behavior # Create a copy-on-write memory map where the underlying file is never # modified. The file must already exist. # This is a memory efficient way of working with data on disk as arrays but # ensuring you never modify it. In [73]: q=memmap('new_file.dat',mode='c',shape=(2,5)) In [74]: q memmap([[65, 65, 65, 65, 65], [65, 65, 65, 65, 65]], dtype=uint8) # Set values in array to something new. In [75]: q[1] = ord('B') In [76]: q memmap([[65, 65, 65, 65, 65], [66, 66, 66, 66, 66]], dtype=uint8) # Even after calling flush(), the underlying file is not updated. In [77]: q.flush() In [78]: !cat new_file.dat 293 AAAAAAAAAA Using Offsets # Create a memory mapped array with 10 elements. In [1]: q=memmap('new_file.dat',mode='w+', dtype=uint8, shape=(10,)) In [2]: q[:] = arange(0,100,10) memmap([ 0, 10, 20, 30, 40, 50, 60, 70, 80, 90], dtype=uint8) new_file.dat 0 10 20 30 40 50 60 70 80 90 # Now, create a new memory mapped array (read only) with an offset into # the previously created file. In [3]: q=memmap('new_file.dat',mode='r', dtype=uint8, shape=6, offset=4) In [4]: q memmap([40, 50, 60, 70, 80, 90], dtype=uint8) new_file.dat 0 10 20 30 40 50 60 70 80 90 # The number of bytes required by the array must be equal or less than # the number of bytes available in the file. In [3]: q=memmap('new_file.dat',mode='r', dtype=uint8, shape=7, offset=4) ValueError: mmap length is greater than file size 294 Working with file headers File Format: header data rows (int32) cols (int32) 64 bit floating point data… # Create a dtype to represent the header. header_dtype = dtype([('rows', int32), ('cols', int32)]) # Create a memory mapped array using this dtype. Note the shape is empty. header = memmap(file_name, mode='r', dtype=header_dtype, shape=()) # Read the row and column sizes from using this structured array. rows = header['rows'] cols = header['cols'] # Create a memory map to the data segment, using rows, cols for shape # information and the header size to determine the correct offset. data = memmap(file_name, mode='r+', dtype=float64, shape=(rows, cols), offset=header_dtype.itemsize) 295 memory maps with ndarray File Format: header data rows (int32) cols (int32) 64 bit floating point data… # mmap is a standard Python module for working with memory maps. import mmap import numpy # Create a dtype to represent the header. header_dtype = numpy.dtype([('rows', int32), ('cols', int32)]) # Open a file for read/write access in binary mode. file = open(file_name,'r+b') # Create a read-only memory map from the opened file with the # correct size to read the header of the file. mm = mmap.mmap(file.fileno(), header_dtype.itemsize, access=mmap.ACCESS_READ) < continued > 296 memory maps with ndarray header data File Format: rows (int32) cols (int32) 64 bit floating point data… # Create a new array using the ndarray constructor. # The first argument is the shape, and we pass in the data type and the # memory buffer to use (mm) as keyword arguments. header = numpy.ndarray((), dtype=header_dtype, buffer=mm) rows = header['rows'] cols = header['cols'] # Create a writable memory map to use for the data array. The size of the # memory map in bytes is the size of a float64 (8) * rows * columns. mm = mmap.mmap(file.fileno(), 8*rows*cols, access=mmap.ACCESS_WRITE) # Create our data array using this new memory map. Start the arrays # data at the memory location directly after the header using offset. data = numpy.ndarray((rows, cols), dtype=float64, buffer=mm, 297 offset=header_dtype.itemsize) Structured Arrays int64 float32 Time Value Name MSFT_profit 10 6.20 GOOG_profit 12 -1.08 MSFT_profit 18 8.40 INTC_profit 25 -0.20 GOOG_profit 1000325 3.20 GOOG_profit 1000350 4.50 INTC_profit 1000385 -1.05 MSFT_profit 1000390 5.60 Elements of array can be any fixed-size data structure! Example >>> import numpy as np >>> fmt = np.dtype([('name', 'S12'), ('time', np.int64), ('value', np.float32)]) >>> vals = [('MSFT_profit', 10, 6.20), ('GOOG_profit', 12, -1.08), … char[12] ('INTC_profit', 1000385, -1.05) ('MSFT_profit', 1000390, 5.60)] >>> arr = np.array(vals, dtype=fmt) # or >>> arr = np.fromfile('db.dat', dtype=fmt) # or >>> arr = np.memmap('db.dat', dtype=fmt, mode='c') Disk MSFT_profit 10 6.20 GOOG_profit 12 -1.08 INTC_profit 1000385 -1.05 MSFT_profit 1000390 5.60 298 Memmap Timings (3D arrays) Operations (500x500x1000) y 500 Memory Mapped In Memory Memory Mapped 2103 ms 11.0 ms 3505.00 27.00 x slice 1.8 ms 4.8 ms 1.80 8.30 y slice 2.8 ms 4.6 ms 4.40 7.40 z slice 9.2 ms 13.8 ms 10.40 18.70 0.02 ms 125 ms 0.02 198.70 50 read 1000 500 MB file OS X In Memory 0 x z Linux (16 bit) downsample 4x4 All times in millesconds (ms). Linux: Ubuntu 4.10, Dell Precision 690, Dual Quad Core Zeon X5355 2.6 GHz, 8 GB Memory OS X: OS X 10.5, MacBook Pro Laptop, 2.6 GHz Core Duo, 4 GB Memory 299 Parallel FFT On Memory Mapped File 4 3 P P 1 P 2 x z P y 500 MB Data File Processors Time (seconds) Speed Up 1 11.75 1.0 2 6.06 1.9 4 3.36 3.5 8 2.50 4.7 300 Controlling Output Format set_printoptions(precision=None, threshold=None, edgeitems=None, linewidth=None, suppress=None) precision The number of digits of precision to use for floating point output. The default is 8. threshold Array length where NumPy starts truncating the output and prints only the beginning and end of the array. The default is 1000. edgeitems Number of array elements to print at beginning and end of array when threshold is exceeded. The default is 3. linewidth Characters to print per line of output. The default is 75. suppress Indicates whether NumPy suppresses printing small floating point values in scientific notation. The default is False. 301 Controlling Output Formats PRECISION >>> a = arange(1e6) >>> a array([ 0.00000000e+00, 1.00000000e+00, 2.00000000e+00, ..., 9.99997000e+05, 9.99998000e+05, 9.99999000e+05]) >>> set_printoptions(precision=3) >>> a array([ 0.000e+00, 1.000e+00, 2.000e+00, ..., 1.000e+06, 1.000e+06, 1.000e+06]) SUPPRESSING SMALL NUMBERS >>> set_printoptions(precision=8) >>> a = array((1, 2, 3, 1e-15)) >>> a array([ 1.00000000e+00, 2.00000000e+00, 1.00000000e-15]) >>> set_printoptions(suppress=True) >>> a array([ 1., 2., 3., 0.]) 3.00000000e+00, 302 Controlling Error Handling seterr(all=None, divide=None, over=None, under=None, invalid=None) Set the error handling flags in ufunc operations on a per thread basis. Each of the keyword arguments can be set to ‘ignore’, ‘warn’, ‘print’, ‘log’, ‘raise’, or ‘call’. all All error types to the specified value divide Divide-by-zero errors over Overflow errors under Underflow errors invalid Invalid floating point errors 303 Controlling Error Handling >>> a = array((1,2,3)) >>> a/0. Warning: divide by zero encountered in divide array([ 1.#INF0000e+000, 1.#INF0000e+000, 1.#INF0000e+000]) # Ignore division-by-zero. Also, save old values so that # we can restore them. >>> old_err = seterr(divide='ignore') >>> a/0. array([ 1.#INF0000e+000, 1.#INF0000e+000, 1.#INF0000e+000]) # Restore original error handling mode. >>> old_err {'divide': 'print', 'invalid': 'print', 'over': 'print', 'under': 'ignore'} >>> seterr(**old_err) >>> a/0. Warning: divide by zero encountered in divide array([ 1.#INF0000e+000, 1.#INF0000e+000, 1.#INF0000e+000]) 304 Interactive 3D Visualization with Mayavi-mlab 305 3D Visualization with Mayavi C:\ ipython –-pylab >>> from mayavi import mlab matplotlib also has an mlab namespace. Be sure you are using the one from mayavi 306 Interactive Example # create arrays >>> x, y = mgrid[-5:5:100j,-5:5:100j] >>> r = x**2 + y**2 >>> z = sin(r)/r # plot with pylab >>> imshow(z) # plot with mayavi >>> from mayavi import mlab >>> mlab.surf(x,y,5*z) 307 Some Plotting Commands 0D data mlab.points3d(x, y, z) 2D data mlab.surf(x, y, z) 3D data mlab.contour3d(x, y, z) 1D data mlab.plot3d(x, y, z) Vector field mlab.quiver(x, y, z, u, v, w) 308 Points in 3D scaling applied mlab.points3d(x, y, z, color=(1.0,0.0,1.0), mode=‘sphere’, scale_factor=0.1) x.shape == y.shape == z.shape from numpy.random import rand x,y,z = rand(30),rand(30),rand(30) mlab.axes() color = (R, G, B) 0.0 <= R, G, B <= 1.0 default is (1.0, 1.0, 1.0) mode = ‘sphere’, ‘cone’, ‘cube’, ‘arrow’, ‘cylinder’, ‘point’, ‘2darrow’, ‘2dcircle’, ‘2dcross’, ‘2ddash’, ‘2ddiamond’, ‘2dhooked_arrow’, ‘2dsquare’, ‘2dthick_arrow’, ‘2dthick_cross’, ‘2dtriangle’, ‘2dvertex’ 309 Lines in 3D mlab.plot3d(x, y, z, color=(0.0,1.0,1.0), tube_radius=0.03) 1-d arrays giving end-points of connected line-segments from numpy.random import rand x,y,z = rand(5), rand(5), rand(5) mlab.points3d(x,y,z,scale_factor=0.06) mlab.axes() len(x) == len(y) == len(z) color = (R, G, B) 0.0 <= R, G, B <= 1.0 default is (1.0, 1.0, 1.0) A float giving the radius of the tubes representing the lines. default is 0.025 310 Surfaces in 3D 2d array of “heights” mlab.surf(z, colormap=‘RdBu’) or mlab.surf(x, y, z, colormap=‘RdBu’) 2d array of “heights” 1d (or 2d) arrays producing a (possibly non-uniform) orthogonal grid (such as returned from ogrid or mgrid) from numpy import ogrid, hypot from scipy.special import j0 x,y = ogrid[-15:15:100j, -15:15:100j] z = j0(hypot(x,y)) 311 Volumes in 3D mlab.contour3d(data, contours=6, colormap=‘Dark2’) 3d array of volume data Integer number of contours or from numpy import ogrid, tanh, cos from scipy.special import j0 x,y,z = ogrid[-5:5:64j, -5:5:64j, -5:5:64j] data = cos(0.5*y)*j0(x+y*1.3)*sin(0.8*z) mlab.axes(ranges=[-5,5]*3) List of specific contours 312 Vector Fields in 3D – quiver3d mlab.quiver3d(u, v, w) or mlab.quiver3d(x, y, z, u, v, w) 3-d arrays each representing a component of a 3-d vector field. All must be the same shape. from numpy import mgrid x,y,z = mgrid[-5:5:12j, -5:5:12j, -5:5:12j] u,v,w = -z, y, x mlab.quiver3d(u,v,w, colormap=‘OrRd’) 1, 2, or 3-d arrays of the same shape as u, v, and w providing real locations of the vectors. 1, 2, or 3-d arrays of the same shape as x, y, and z representing components of the vectors. any colormap argument will color glyphs according to magnitude of vector. 313 Vector Fields in 3D – flow mlab.flow(u, v, w, linetype=‘tube’, seedtype=‘sphere’, colormap=‘PuRd’) 3-d arrays representing vector field. Can also be called with x,y,z,u,v,w where x, y, and z are all 3-d arrays. any colormap argument will color according to magnitude of vector. from numpy import mgrid x,y,z = mgrid[-5:5:12j, -5:5:12j, -5:5:12j] u,v,w = -z, y, x linetype= ‘line’, ‘ribbon’, or ‘tube’ seedtype = ‘sphere’, ‘plane’, ‘line’, or ‘point’ streamlines are calculated from seedpoints spaced throughout the specified seed widget. The seed widget is shown and can be interacted with to compute new streamlines. 314 Figure Decorations >>> >>> >>> >>> mlab.title('A title') mlab.axes() mlab.colorbar() zlabel('Depth') >>> mlab.clf() >>> mlab.figure() >>> mlab.gcf() 315 Graphical User Interface mlab.show_pipeline() 316 Examples and Demos >>> mlab.test_ mlab.test_points3d() mlab.test_plot3d() mlab.test_surf() mlab.test_contour3d() mlab.test_quiver3d() mlab.test_molecule() mlab.test_flow() mlab.test_mesh() Use ?? in IPython to look at the source code of these examples. 317 Scientific Analysis with SciPy 318 Overview • Available at www.scipy.org • Open source BSD style license • 38 contributors to the project in 2011 CURRENT PACKAGES • Special Functions (scipy.special) • Input/Output (scipy.io) • Signal Processing (scipy.signal) • Statistics (scipy.stats) • Image Processing (scipy.ndimage) • Fast Execution (scipy.weave) • Fourier Transforms (scipy.fftpack) • Clustering Algorithms (scipy.cluster) • Optimization (scipy.optimize) • Sparse Matrices (scipy.sparse) • Numerical Integration (scipy.integrate) • Interpolation (scipy.interpolate) • Linear Algebra (scipy.linalg) • ...and more. Note: “import scipy” does NOT import all these packages. Must be imported individually 319 Statistics scipy.stats — CONTINUOUS DISTRIBUTIONS over 80 continuous distributions! METHODS pdf entropy cdf nnlf rvs moment ppf freeze stats fit sf isf 320 Using stats objects DISTRIBUTIONS >>> from scipy.stats import norm # Sample normal dist. 100 times. >>> samp = norm.rvs(size=100) >>> x = linspace(-5, 5, 100) # Calculate probability dist. >>> pdf = norm.pdf(x) # Calculate cumulative dist. >>> cdf = norm.cdf(x) >>> x = linspace(0, 1, 100) # Calculate Percent Point Function >>> ppf = norm.ppf(x) 321 Distribution objects Every distribution can be modified by loc and scale keywords (many distributions also have required shape arguments to select from a family) LOCATION (loc) --- shift left (<0) or right (>0) the distribution SCALE (scale) --- stretch (>1) or compress (<1) the distribution 322 Example distributions NORM (norm) – N(,) Only location and scale arguments: location mean  scale  standard deviation LOG NORMAL (lognorm) log(S) is N(, ) S is lognormal one shape parameter! location offset from zero (rarely used) scale e shape  323 Setting location and Scale NORMAL DISTRIBUTION >>> from scipy.stats import norm # Normal dist with mean=10 and std=2 >>> dist = norm(loc=10, scale=2) >>> x = linspace(-5, 15, 100) # Calculate probability dist. >>> pdf = dist.pdf(x) # Calculate cumulative dist. >>> cdf = dist.cdf(x) # Get 100 random samples from dist. >>> samp = dist.rvs(size=100) # Estimate parameters from data >>> mu, sigma = norm.fit(samp) >>> print “%4.2f, %4.2f” % (mu, sigma) 10.07, 1.95 .fit returns best shape + (loc, scale) that explains the data 324 Statistics scipy.stats — Discrete Distributions 10 standard discrete distributions (plus any finite RV) METHODS pmf moment cdf entropy rvs freeze ppf stats sf isf 325 Using stats objects CREATING NEW DISCRETE DISTRIBUTIONS # Create loaded dice. >>> from scipy.stats import rv_discrete >>> xk = [1, 2, 3, 4, 5, 6] >>> pk = [0.3, 0.35, 0.25, 0.05, 0.025, 0.025] >>> new = rv_discrete(name='loaded', values=(xk,pk)) # Calculate histogram >>> samples = new.rvs(size=1000) >>> bins = linspace(0.5, 6.5, 7) >>> subplot(211) >>> hist(samples, bins=bins, normed=True) # Calculate pmf >>> x = range(0, 8) >>> subplot(212) >>> stem(x,new.pmf(x)) 326 Statistics CONTINUOUS DISTRIBUTION ESTIMATION USING GAUSSIAN KERNELS # Sample two normal distributions # and create a bimodal distribution >>> rv1 = stats.norm() >>> rv2 = stats.norm(2.0, 0.8) >>> samples = hstack([rv1.rvs(size=100), rv2.rvs(size=100)]) # Use a Gaussian kernel density to # estimate the PDF for the samples. >>> from scipy.stats.kde import gaussian_kde >>> approximate_pdf = gaussian_kde(samples) >>> x = linspace(-3, 6, 200) # Compare the histogram of the samples to # the PDF approximation. >>> hist(samples, bins=25, normed=True) >>> plot(x, approximate_pdf(x), 'r') 327 Linear Algebra scipy.linalg — FAST LINEAR ALGEBRA • Uses optimized BLAS and MKL if available — very fast • Low-level access to BLAS and LAPACK routines in modules linalg.blas, and linalg.lapack (FORTRAN order) • High level matrix routines • Linear Algebra Basics: inv, pinv, solve, det, norm, lstsq, ... • Decompositions: eig, lu, svd, orth, cholesky, qr, schur, ... • Matrix Functions: expm, logm, sqrtm, cosm, coshm, funm (general matrix functions) 328 Solving A*X = B SOLVE Ax=b LEAST SQUARES Ax=b >>> from scipy import linalg >>> a = array([[1.0, 3.0, 2.0], ... [4.0, 0.0, 1.0], ... [2.0, 2.0, 2.0]]) >>> b = array([2.0, 1.0, 0.0]) # Over-specified problem: >>> a = array([[1.0, 4.0], ... [3.0, 0.0], ... [2.0, 1.0]]) >>> b = array([2.0, 1.0, 0.0]) >>> x, res, rnk, svals = \ ... linalg.lstsq(a, b) >>> x = linalg.solve(a, b) >>> x array([ 1.5, 3.5, -5. ]) # Check: >>> dot(a, x) array([ 2., 1., 0.]) # Least squares solution: >>> x array([ 0.1832, 0.4059]) # Sum of squared residuals: >>> res 0.83663366336633671 # Effective rank of `a`: >>> rnk 2 # Singular values of `a`: >>> svals array([ 4.6567, 3.0521]) 329 Factorizations LU FACTORIZATION To solve many equations AX=B for different matrices B, factor A =L*U , L is lower triangular (L) and U is upper-triangular >>> from scipy import linalg >>> a = array([[1,3,5], ... [2,5,1], ... [2,3,6]]) # time consuming factorization >>> lu, piv = linalg.lu_factor(a) # fast solve for 1 or more # right hand sides. >>> b = array([10,8,3]) >>> linalg.lu_solve((lu, piv), b) array([-7.82608696, 4.56521739, 0.82608696]) # check >>> dot(a, _) array([ 10., 8., 3.]) QR FACTORIZATION To solve Ax=b, factor A = QR, where Q is orthogonal and R is upper triangular. >>> a = array([[12, -51, 4], ... [ 6, 167, -68], ... [-4, 24, -41]]) >>> q, r = linalg.qr(a) >>> q array([[-0.8571, 0.3943, 0.3314], [-0.4286, -0.9029, -0.0343], [ 0.2857, -0.1714, 0.9429]]) >>> r array([[ -14., -21., 14.], [ 0., -175., 70.], [ -0., 0., -35.]]) See the qr docstring for more advanced options (e.g. pivoting for rank-revealing QR decomposition). 330 Eigenvalues, eigenvectors EIGENVALUES AND VECTORS SCHUR DECOMPOSITION >>> a = array([[1,3,5], ... [2,5,1], ... [2,3,6]]) # compute eigenvalues/vectors >>> vals, vecs = linalg.eig(a) # print eigenvalues >>> vals array([ 9.39895873+0.j, -0.73379338+0.j, 3.33483465+0.j]) # eigenvectors are in columns # print first eigenvector >>> vecs[:,0] array([-0.57028326, -0.41979215, -0.70608183]) # norm of vector should be 1.0 >>> linalg.norm(vecs[:,0]) 1.0 Factor A = ZTZ-1, where Z is unitary and T is upper triangular. >>> t, z = linalg.schur(a) >>> t array( [[ 156.137, 64.737, 85.651], [ 0. , 16.06 , 15.814], [ 0. , 0. , -34.197]]) >>> z array([[ 0.328, -0.94 , 0.098], [-0.937, -0.337, -0.093], [-0.121, 0.061, 0.991]]) # Diagonals in t are eigenvalues: >>> linalg.eigvals(a) array([ 156.137+0.j, 16.060+0.j, -34.197+0.j]) 331 Linear Algebra SINGULAR VALUE DECOMPOSITION Factor A = U S V*, where U and V are unitary, and S is diagonal. This generalizes diagonalization. >>> a = array([[2, 1], ... [2, 0]]) >>> u, s, vt = linalg.svd(a) >>> u array([[-0.75 , -0.662], [-0.662, 0.75 ]]) # s holds the singular values. >>> s array([ 2.921, 0.685]) >>> vt array([[-0.966, -0.257], [ 0.257, -0.966]]) # Use to inverse A >>> inv_a = dot(vt.T, dot(diag(1/s), u.T)) >>> dot(a, inv_a) array([[ 1.0, -4.44089210e-16], [ -2.22044605e-16, 1.0]]) # Nonsquare example: underspecified # set of equations >>> a = array([[2, 1, -4], [2, 3, 1]]) >>> u, s, vt = linalg.svd(a) >>> u array([[-0.938, -0.347], [-0.347, 0.938]]) >>> s array([ 4.702, 3.59 ]) >>> vt array([[-0.546, -0.421, 0.724], [ 0.329, 0.687, 0.648], [ 0.77 , -0.592, 0.237]]) 332 Inverse, and determinants MATRIX INVERSION # Square matrix >>> from scipy import linalg >>> a = array([[2, 1], ... [2, 0]]) >>> a_inv = linalg.inv(a) >>> a_inv array([[ 0. , 0.5], [ 1. , -1. ]]) # Check >>> dot(a, a_inv) array([[ 1., 0.], [ 0., 1.]]) # Alternatively (slower) >>> linalg.solve(a, indentity(2)) array([[ 0. , 0.5], [ 1. , -1. ]]) # Non-square mat: Pseudo-inverse # with least-square method >>> b = array([[2, 1, -4], [2, 3, 1]]) >>> linalg.pinv(b) array([[ 0.07719298, 0.12631579], [ 0.01754386, 0.21052632], [-0.20701754, 0.11578947]]) MATRIX DETERMINANT >>> linalg.det(a) -2.0 #If LU decomposition available: >>> lu, _ = linalg.lu_factor(a) >>> np.prod(lu.diagonal()) -2.0 333 Matrix Objects STRING CONSTRUCTION >>> from numpy import mat >>> a = mat('[1,3,5;2,5,1;2,3,6]') >>> a matrix([[1, 3, 5], [2, 5, 1], [2, 3, 6]]) TRANSPOSE ATTRIBUTE >>> a.T matrix([[1, 2, 2], [3, 5, 3], [5, 1, 6]]) INVERTED ATTRIBUTE >>> a.I matrix([[-1.1739, 0.1304, 0.956], [ 0.4347, 0.1739, -0.391], [ 0.1739, -0.130, 0.0434] ]) DIAGONAL >>> a.diagonal() matrix([[1, 5, 6]]) >>> a.diagonal(-1) matrix([[2, 3]]) SOLVE >>> b = mat('10;8;3') >>> a.I*b matrix([[-7.82608696], [ 4.56521739], [ 0.82608696]]) >>> from scipy import linalg >>> linalg.solve(a,b) matrix([[-7.82608696], [ 4.56521739], [ 0.82608696]]) 334 Interpolation scipy.interpolate — General purpose Interpolation •1D Interpolating Class • Constructs callable function from data points and desired spline interpolation order. • Function takes vector of inputs and returns interpolated value using the spline. • Radial basis functions • Simple but effective N-dimensional interpolation • Works with scattered data 335 1D Spline Interpolation >>> from scipy.interpolate import interp1d interp1d(x, y, kind='linear', axis=-1, copy=True, bounds_error=True, fill_value=numpy.nan) Returns a function that uses interpolation to find the value of new points. • x – 1d array of increasing real values which cannot contain duplicates • y – Nd array of real values whose length along the interpolation axis must be len(x) • kind – kind of interpolation (e.g. 'linear', 'nearest', 'quadratic', 'cubic'). Can also be an integer n>1 which returns interpolating spline (with minimum sum-of-squares discontinuity in nth derivative). • axis – axis of y along which to interpolate • copy – make internal copies of x and y • bounds_error – raise error for out-of-bounds • fill_value – if bounds_error is False, then use this value to fill in out-of-bounds. 336 1D Spline Interpolation # demo/interpolate/spline.py from scipy.interpolate import interp1d from pylab import plot, axis, legend from numpy import linspace # sample values x = linspace(0,2*pi,6) y = sin(x) # Create a spline class for interpolation. # kind='nearest' -> zeroth older hold. # kind='linear' -> linear interpolation # kind=n -> use an nth order spline spline_fit = interp1d(x,y,kind=5) xx = linspace(0,2*pi, 50) yy = spline_fit(xx) # display the results. plot(xx, sin(xx), 'r-', x, y, 'ro', xx, yy, 'b--', linewidth=2) axis('tight') legend(['actual sin', 'original samples', 'interpolated curve']) 337 Radial Basis Functions >>> from scipy.interpolate import Rbf func = Rbf(x, y, …, function='multiquadric', epsilon=None, smooth=0, norm=euclidean_norm) Returns a function that uses interpolation to find the value of new points. • x – array of real values • y – array of real values the same shape as x • … - additional array arguments can be provided for N-d interpolation • function – basis function used to interpolate; one of 'multiquadric', 'inverse multiquadric', 'gaussian', 'linear', 'cubic', 'quintic', 'thin-plate' • epsilon – adjustable constant for ’gaussian' or 'multiquadrics' functions. Defaults to approximate average distance between nodes. • smooth – Greater than zero increases smoothness of approximation. Default of 0 is interpolation. • norm – function (vectorized) that returns the distance between two points 340 Radial Basis Functions The function is approximated by a sum of radial functions with certain weights nj: Typical radial functions: 341 1D RBF Interpolation # demo/interpolate/spline.py from scipy.interpolate import Rbf from pylab import plot, axis, legend from numpy import linspace # sample values x = linspace(0,2*pi,6) y = sin(x) # Create a new function fit = Rbf(x,y, function='gaussian') xx = linspace(0,2*pi, 50) yy = fit(xx) # display the results. plot(xx, sin(xx), 'r--', x, y, 'ro', xx, yy, 'b-', linewidth=2) axis('tight') legend(['actual sin', 'original samples', 'interpolated curve']) 342 2D RBF Interpolation EXAMPLE >>> ... >>> >>> >>> >>> >>> >>> from scipy.interpolate import \ Rbf from numpy import hypot, mgrid from scipy.special import j0 x, y = mgrid[-5:6,-5:6] z = j0(hypot(x,y)) newfunc = Rbf(x, y, z) xx, yy = mgrid[-5:5:100j, -5:5:100j] # xx and yy are both 2-d # result is evaluated # element-by-element >>> zz = newfunc(xx, yy) >>> from mayavi import mlab >>> mlab.surf(x, y, z*5) >>> mlab.figure() >>> mlab.surf(xx, yy, zz*5) >>> mlab.points3d(x,y,z*5, scale_factor=0.5) 343 Optimization scipy.optimize — Minimization and Root Finding Univariate Function Minimization minimize_scalar - minimize a scalar-valued function of a single variable: available methods include brent, bounded, golden Multivariate Function Minimization (constrained and unconstrained) minimize – minimize scalar-valued function of one or more variables: available methods include BFGS, Nelder-Mead simplex, Newton conjugate gradient, COBYLA, SLSQP, anneal, brute Least Squares Fitting leastsq - wrapper to the Fortran leastsq function from MINPACK Curve Fitting curve_fit – powerful, general, tool to fit functional parameters. Root Finding brentq, brenth, ridder, bisect, newton – find root of a scalar function fixed_point – find the fixed point of a function root – multidimensional root-finding: methods include hybr and lm for small problems and krylov, brodyen2, and anderson for large problems. 344 Optimization: 1-D Minimization EXAMPLE: MINIMIZE BESSEL FUNCTION # minimize 1st order Bessel # function between 4 and 7 >>> from scipy.special import j1 >>> from scipy.optimize import \ ... minimize_scalar >>> >>> >>> >>> ... ... >>> >>> x = np.linspace(2, 7, 200) j1x = j1(x) plot(x, j1x) result = minimize_scalar(j1, method="bounded", bounds=[4, 7]) j1_min = j1(result.x) plot(result.x, j1_min,'ro') 345 Result Object Output of minimize, minimize_scalar, and root is a Result object with attributes: x : solution to the optimization success : True/False – did the optimization succeed? status : integer termination status message : termination status message fun, jac, hess : values of the function, Jacobian, Hessian, (if available) at the optimized value nfev, njev, nhev : number of evaluations of the function, Jacobian, or Hessian (if available) ... : other attributes are added by specific methods Note: older legacy functions (fmin, fmin_bgfs, fminbound, fsolve, etc.) do not have a unified API and are replaced by the above. 346 Optimization: Using Derivatives Rosenbrock function minimize: WITHOUT DERIVATIVE >>> from scipy.optimize import rosen >>> x0 = [1.3, 1.6, -0.5, -1.8, 0.8] >>> result = minimize(rosen, x0) >>> print result.x [ 1. 1. 1. 1. 1.] >>> print result.nfev 462 minimize: USING DERIVATIVE >>> from scipy.optimize import rosen_der >>> result = minimize(rosen, x0, ... jac=rosen_der) >>> print result.x [ 1. 1. 1. 1. 1.] >>> print result.nfev, result.njev 66 66 347 Optimization: Solving Nonlinear Equations ROOT SYSTEM OF EQUATIONS >>> def equations(x,a,b,c): ... x0, x1, x2 = x ... eqs = \ ... [3 * x0 - cos(x1*x2) + a, ... x0**2 - 81*(x1+0.1)**2 + sin(x2) + b, ... exp(-x0*x1) + 20*x2 + c] ... return eqs >>> from scipy.optimize import root # coefficients >>> a = -0.5 ; b = 1.06 >>> c = (10 * pi - 3.0) / 3 # Optimization start location. >>> initial_guess = [0.1, 0.1, -0.1] # Solve the system of non-linear equations. >>> result = root(equations, initial_guess, args=(a, b, c)) >>> print “root:”, result.x root: [ 0.5 0. -0.52] >>> print “solution at root:”, result.fun solution at root: [ 0. -0. 0.] 348 Optimization: Data Fitting NONLINEAR LEAST SQUARES CURVE FITTING >>> from scipy.optimize import curve_fit >>> from scipy.stats import norm # Define the function to fit. >>> def function(x, a, b, f, phi): ... result = a * exp(-b * sin(f * x + phi)) ... return result # Create a noisy data set. >>> actual_params = [3, 2, 1, pi/4] >>> x = linspace(0,2*pi,25) >>> exact = function(x, *actual_params) >>> noisy = exact + 0.3*norm.rvs(size=len(x)) # Use curve_fit to estimate the function parameters from the noisy data. >>> initial_guess = [1,1,1,1] >>> estimated_params, err_est = curve_fit(function, x, noisy, p0=initial_guess) >>> estimated_params array([3.1705, 1.9501, 1.0206, 0.7034]) # err_est is an estimate of the covariance matrix of the estimates # (i.e. how good of a fit is it) >>> err_est.diagonal() array([ 0.0572, 0.006362, 0.0005834, 0.008979]) 349 Fitting Polynomials (NumPy) POLYFIT(X, Y, DEGREE) >>> from numpy import polyfit, poly1d >>> from scipy.stats import norm # Create clean data. >>> x = linspace(0, 4.0, 100) >>> y = 1.5 * exp(-0.2 * x) + 0.3 # Add a bit of noise. >>> noise = 0.1 * norm.rvs(size=100) >>> noisy_y = y + noise # Fit noisy data with a linear model. >>> linear_coef = polyfit(x, noisy_y, 1) >>> linear_poly = poly1d(linear_coef) >>> linear_y = linear_poly(x) # Fit noisy data with a quadratic model. >>> quad_coef = polyfit(x, noisy_y, 2) >>> quad_poly = poly1d(quad_coef) >>> quad_y = quad_poly(x) 350 Data Analysis with 351 Pandas Pandas (version 0.12.0, Jul 2013) is a library to make analysis of structured datasets easy. Author: Wes McKinney, http://pandas.pydata.org/ License: BSD GOALS • New array-based data-structures with axis labeling + nice representation in ipython, • Date/time management built-in, including timezones, • Data alignment, data merging, data aggregation (group by), missing data management, • Statistical tools (describe, moving stats, covariance, correlation, panel regression), • Easy visualization (line plot, bar chart, boxplot, scatter plot matrix, ...) with Matplotlib. RELATED PROJECTS • Built on top of: NumPy, pytables, matplotlib. • Replaces scikits.timeseries. Is now an optional dependency of statsmodels. • Offers similar features to larry, datarray (labeled arrays). 352 New Data Structures PANDAs = PANel DAta = multi-dimensional data in stats & econometrics. Data-structures: • A Series is a 1D data-structure that derives from np.ndarray. • A DataFrame is a 2D data-structure that can be viewed as a dictionary of series. It is NOT a subclass of an numpy array. • A Panel is a 3D data-structure that can be viewed as a dict of DataFrames. Rk, Fst N, Lst N, DoB items Gend Grade offset TAMU UT columns index 1 4 John Doe 1981 M 4.18 -1.18 2 2 Jill Ford 1975 F 5.26 1.26 3 5 Chris Jones NaN M 3.91 0.91 4 6 Dave Smith 1965 M 1.23 NaN 5 1 Ellen Frank 1973 F 5.52 0.52 6 3 Frank Hart M 4.71 -0.71 1976 353 Creating (Time)Series FROM LIST AND DICT # Data and corresponding indices can # be stored in lists. >>> index = ['a', 'b', 'c', 'd'] >>> Series(range(4), index=index, name='first series') a 0 b 1 c 2 d 3 Name: first series # data + indices in a dict >>> d = {'a':0,'b':1,'c':2,'d':1} >>> s = Series(d, name='first series') >>> print s.name 'first series' >>> s.name = '' >>> print s.index Index([a, b, c, d], dtype=object) >>> print s.values, type(s.values) array([0, 1, 2, 1], dtype=int64) numpy.ndarray >>> s.dtype dtype('int64') FROM A NUMPY ARRAY >>> >>> a b c d from numpy.random import randn Series(randn(4), index=index) -1.062984 -0.961625 -0.720323 1.100681 ACCESS ELEMENTS # Access elements like an array >>> s[2] 2 >>> s[:2] a 0 b 1 # Access elements like a dictionary >>> print s['c'], 'c' in s 2 True # Slicing works on indices! >>> s['a':'c'] a 0 b 1 354 c 2 Creating DataFrames FROM A DICT OF SERIES # DF from a dict of series: keys are # column names. >>> s2=Series(randn(3), name='B', index=index[1:]) >>> d = {'A':s, 'B':s2} >>> df = DataFrame(d) A B a 0 NaN b 1 -0.961625 c 2 -0.720323 d 1 1.100681 >>> df.index, df.columns Index([a, b, c, d], dtype=object) Index([A, B], dtype=object) >>> df.shape, df.dtypes (4,2) A int64 B float64 >>> df.values array([[ 0. , nan], [ 1. , -0.96162549], [ 2. , -0.72032263], [ 1. , 1.100680533]]) FROM A NUMPY ARRAY >>> DataFrame(randn(4,4), index=index, columns=['A','B','C','D']) A B C D a 0.28164 -0.36826 0.04011 1.25030 b -0.71049 -1.23956 -0.08504 -0.08336 c -1.29446 0.70709 1.39642 0.49035 d 0.74632 -0.03512 -0.69237 0.81488 # List(series) interpreted as an array >>> df2 = concat([s,s2], axis=1, keys=['A', 'B']) ACCESS OR ADD ELEMENTS >>> col1 = df['A'] >>> col2 = df.B >>> df['flag'] = df['B'] > 0 >>> df.ix['c'] A 2 B -0.7203226 flag False Name: c >>> df.ix['c','B'] -0.7203226322119064 355 Creating Panels FROM A DICT OF DATAFRAMES # Panel from a dict of dataframes: # keys used for third 'items' axis. >>> df2=df.ix[::2,:] >>> data = {'item1': df,'item2': df2} >>> wp = Panel(data) >>> wp.items, wp.major_axis, wp.minor_axis Index([item1, item2], dtype=object) Index([a, b, c, d], dtype=object) Index([A, B, flag], dtype=object) >>> wp.shape (2, 4, 3) >>> wp.values array([[[0, nan, False], [1, -0.96162549, False], [2, -0.72032263, False], [1, 1.100680533, True]], [[0.0, nan, False], [nan, nan, nan], [2.0, -0.72032263, False], [nan, nan, nan]]], dtype=object) FROM A NUMPY ARRAY >>> items = ['foo','bar','baz','biz'] >>> Panel(randn(4,4,4), items=items) Dimensions: 4 (items) x 4 (major) x 4 (minor) Items: foo to biz Major axis: 0 to 3 Minor axis: 0 to 3 ACCESS OR ADD ELEMENTS >>> df1 = wp['item1'] A B flag a 0 NaN False b 1 -0.9616255 False c 2 -0.7203226 False d 1 1.100681 True >>> df2 = wp.item2 >>> del wp['item1'] >>> wp.ix['item2','c','A'] 2.0 356 Pandas IO READING FUNCTIONS Format txt, csv clipboard Method, Function, Class read_table, read_csv read_clipboard EXAMPLES # Pickle is supported >>> df.to_pickle('foo.pickle') >>> read_pickle('foo.pickle') pickle read_pickle HDF5 read_hdf, HDFStore # To load a table from an ascii file Excel read_excel, ExcelFile df = read_table('foo.txt', sep=',', header=1, skiprows=2, index_col=0, names=['Paris', 'NY', 'Tokyo', 'London'], parse_dates=True, na_values = [-9999]) R pandas.rpy.common.load_data WRITING FUNCTIONS Format txt, csv html Method, Function, Class to_string, to_csv to_html # Long term storage: HDF5 >>> st = HDFStore('foo.h5') pickle to_pickle >>> st['data1'] = df HDF5 to_hdf, HDFStore >>> st['ser1'] = s Excel to_excel, ExcelWriter >>> s2 = st['ser1'] >>> st.close() 357 Indexes and data alignment RE-INDEXING # reindex shuffles the existing index >>> s.reindex(['c', 'b', 'a', 'e']) c 2 b 1 a 0 e NaN # The index attr can be overwritten >>> s.index = ['p', 'q', 'r', 's'] # ‘rename’ modifies index label >>> s.rename(lambda a: a.upper()) P 0 Q 1 R 2 S 3 # Sort by values >>> s.values[:] = [5, 3, 0, 2] >>> s.order() r 0 s 2 q 3 p 5 # Explicit alignment of BOTH series >>> s, s2 = s.align(s2) ALIGNMENT FROM OPERATIONS # Operations automatically align on # the index (different from ndarray) >>> s[1:] + s[:-2] a NaN b 1 c 2 d NaN e NaN # Sort a DF by a (list of) column(s) >>> df.sort('A') A B flags a 0 NaN False b 1 -0.961625 False d 1 1.100681 True c 2 -0.720323 False 358 More on indexing INDEX TO/FROM A COLUMN REDUNDANT INDEXES # Any dataframe column can become the # index >>> df A B a 0 NaN b 1 -0.961625 c 2 -0.720323 d 3 1.100681 >>> df2 = df.set_index('A') B A 0 NaN 1 -0.961625 2 -0.720323 3 1.100681 # The opposite operation converts the # index to a column >>> df2.reset_index() A B 0 0 NaN 1 1 -0.961625 2 2 -0.720323 3 3 1.100681 # Although dangerous, it is possible # to have redundant indices >>> index = ['a', 'b', 'c', 'c'] >>> s=Series(range(4), index=index) a b c c 0 1 2 3 >>> s['c'] c c 2 3 >>> type(s['c']) pandas.core.series.Series 359 Multi-indexed pandas MULTI-INDEXING SERIES MULTI-INDEXING DATAFRAMES >>> >>> >>> >>> >>> df = DataFrame(randn(4,2), index=ind) 0 1 bar one 0.24 0.81 two -0.96 0.49 baz one 1.23 0.29 two -0.97 1.18 >>> df.ix['bar'] 0 1 one 0.24 0.81 two -0.96 0.49 >>> df.ix['bar', 'one'] 0 0.24 1 0.81 Name: one # Partial slicing works! >>> df.ix[('bar','two'):('baz','two')] 0 1 bar two -0.96 0.49 baz one 1.23 0.29 360 two -0.97 1.18 bar baz ind1=['bar', 'bar', 'baz', 'baz'] ind2=['one', 'two', 'one', 'two'] ind = [ind1,ind2] s = Series(randn(4), index=ind) one two one two 0.03 0.80 0.56 -2.08 >>> s.index MultiIndex [('bar', 'one') ('bar', 'two') ('baz', 'one') ('baz', 'two')] >>> s['bar'] one 0.03 two 0.80 >>> s.reindex([('baz', 'two'), ('bar', 'one')]) baz bar two one -2.08 0.03 Multi-indexed pandas II MANIPULATING INDEXES SLICING # By default each index is identified # by a level 0 (outer), 1 (inner) >>> df.index.names=['first','second'] >>> df.xs("one", level="second") 0 1 first bar 0.24 0.81 baz 1.23 0.29 >>> s.swaplevel(0,1) one bar 0.03 two bar 0.80 one baz 0.56 two baz -2.08 # reorder_levels generalizes that >>> s.reorder_levels([1,0]) (...) >>> s.index.names=['first','second'] >>> s first bar baz second one two one two SORTING >>> s.sortlevel(level='second') bar one 0.03 baz one 0.56 bar two 0.80 baz two -2.08 0.03 0.80 0.56 -2.08 361 Computation with pandas SERIES # Series are subclasses of ndarrays: # computations are element by element >>> i = ['a', 'b', 'c', 'd'] >>> s = Series(range(4), index=i) >>> s + 1 a 1 b 2 c 3 d 4 # All Numpy operators can be applied >>> s2 = np.exp(s) >>> s = s.astype(np.float32) # including fancy indexing with masks >>> s[s > 1.5] = 1.5 DATAFRAMES >>> df A B a 0 NaN b 1 -0.9 c 2 -0.7 d 1 1.1 flags False False False True # Dataframe computations are applied # columns by columns >>> df2 = df + 1 # Adding a series or re-scaling >>> row = df.ix[1] >>> df - row A B flag a -1 NaN False b 0 0 False c 1 0.2 False d 0 2.0 True # This would have been equivalent >>> rescaled = df.sub(row, axis=0) # Adding another dataframe can be done # with a filler. >>> df1.add(df2, fill_value=0) # ‘apply’ a custom function to columns >>> f = lambda x: x.max() - x.min() >>> df.apply(f, axis = 0) A 2 B 2 C True 362 Statistical Analysis DESCRIPTIVE STATS # Descriptive stats available: # count, sum, mean, median, min, max, # abs, prod, std, var, skew, kurt, # quantile, cumsum, cumprod, cummax # Stats on DF are column per column >>> print df.mean(), df.mean(axis=1) A 1.000000 a -0.354328 B -0.227542 b 0.500000 flag 0.250000 c 0.426559 d 1.033560 # Stats func accept a level as kw arg # applicable for multi-index pandas >>> df.describe() A B count 4.000000 3.000000 mean 1.000000 -0.227542 std 0.816497 1.162964 min 0.000000 -1.062984 25% 0.750000 -0.891653 50% 1.000000 -0.720323 75% 1.250000 0.190179 max 2.000000 1.100681 CORRELATIONS & REGRESSION >>> cov = df.cov() >>> ts.corr(ts2, method='kendall') 0.066666666666666693 >>> ols(y = ts2, x = ts) Formula: Y ~ + Number of Observations: 20 Number of Degrees of Freedom: 2 R-squared: 0.9936 Adj R-squared: 0.9932 Rmse: 0.0582 F-stat(1,18): 2772.7496, p-value: 0.0 Degrees of Freedom: model 1, resid 18 (...) For more stats, see statsmodels. 363 >>> pandas.rolling_ Dealing with missing data FIND & REMOVE/REPLACE NaN # A boolean mask with ‘null’ values: # None, np.NaN, np.inf, -np.inf >>> np.all(isnull(s) | s.notnull()) True # Replace missing values manually >>> s[isnull(s)] = 0. # Automatic version with forward fill >>> s.fillna(method='ffill') # Inverse operation >>> s[s == -9999] = np.nan # Remove all entries w/ missing values >>> df.dropna(how='all') # Interpolation also removes NaN >>> s.interpolate() SPARSE DATASETS # If mostly missing data, convert to # SparseSeries, SparseDataFrame, ... # Only stores non-null values & loc >>> print len(s), s.count(), s.nbytes 1000, 10, 8000 >>> sp_s = s.to_sparse() >>> print sp_s.nbytes 80 MERGING DATASETS # Combining overlapping datasets with # a priority >>> df1 = DataFrame({ 'A' : [1.,nan,3.,5.,nan], 'B' : [nan,2.,3.,nan,6.]}) >>> df2 = DataFrame({ 'A' : [5.,2.,4.,nan,3.,7.], 'B' : [nan,nan,3.,4.,6.,8.]}) >>> df1.combine_first(df2) A B 0 1 NaN 1 2 2 2 3 3 3 5 4 4 3 6 5 7 8 # More custom merging to be # implemented using combine and a # custom function 364 Pivot tables and reshaping PIVOTING (UN-)STACKING # Repeating columns can be viewed as # an additional axis >>> df # Stacking a DF creates a # series with multiple indexes made # from the column names. >>> s a b one 1 2 two 3 4 >>> stacked = s.stack() one a 1 b 2 two a 3 b 4 # Opposite operation unstacks the # last level by default. If more cplx # DF, can specify a level to unstack >>> stacked.unstack() a b one 1 2 two 3 4 >>> stacked.unstack(level=0) one two a 1 3 365 b 2 4 date variable value 0 2000-01-03 A 1 2000-01-04 A -0.282863 0.469112 2 2000-01-05 A -1.509059 3 2000-01-03 B -1.135632 4 2000-01-04 B 1.212112 5 2000-01-05 B -0.173215 >>> df.pivot(index='date', columns='variable', values='value') variable A B date 2000-01-03 0.469112 -1.135632 2000-01-04 -0.282863 1.212112 2000-01-05 -1.509059 -0.173215 Pivot tables and reshaping II PIVOT_TABLE # Another way to reshape a DF and # aggregate at the same time >>> df A B C D 0 foo one small 1 1 foo one large 2 2 foo one large 2 3 foo two small 3 4 foo two small 3 5 bar one large 4 6 bar one small 5 7 bar two small 6 8 bar two large 7 >>> table= df.pivot_table(values='D', rows=['A', 'B'], cols=['C'], aggfunc=np.sum) >>> table small large foo one 1 4 two 6 NaN bar one 5 4 two 6 7 366 Data aggregation Aggregation in a dataset is done in 3 steps: Split > Apply > Combine SPLIT WITH groupby APPLY WITH aggregate() # Group data by one column’s value # Values in groups can be aggregated >>> gps = df.groupby('flags') >>> gps.sum() # gps = iterator of tuples with # group name and sub part of df A B flags >>> gps.groups {False: ['a', 'b', 'c'], True: ['d']} False 3 -1.6 True 1 1.1 # Version more flexible but slower # groupby can also take a custom # function that acts on index labels >>> gps.aggregate(np.sum) >>> df = df.reset_index() # agg is even more flexible >>> myfunc = lambda x: x%2 == 0 >>> gps.agg([np.mean, np.std]) >>> gps2 = df.groupby(myfunc) >>> gps.agg({'A':'sum', 'B':'std'}) # Series can also be grouped by A B >>> df['A'].groupby(myfunc) flags # which is identical to False 3 0.141421 >>> gps2['A'] True 1 NaN 367 Data aggregation II APPLY WITH transform() >>> gps['A'].apply(desc).unstack() # Values in groups can be # transformed (length conserved) flags >>> f = lambda x: x - x.mean() False 3 1 1 0 0.5 1 1.5 2 >>> gps.transform(f) True 1 1 NaN 1 1.0 1 1.0 1 A B a -1 NaN b 0 -0.1 c 1 0.1 d 0 0.0 APPLY WITH apply() # Computations from values in # groups can be turned into a DF # of calcs >>> desc = lambda x: x.describe() count mean std min 25% 50% 75% max >>> def f(group): return DataFrame({'original':group, 'demeaned': group - group.mean()}) >>> gps['A'].apply(f) demeaned original a -1 0 b 0 1 c 1 2 d 0 1 index 368 Dealing with date & time CREATING DATE/TIME INDEXES UP-/DOWN-SAMPLING # The index can be a list of # dates+times locations that can be # automatically generated >>> date_range('1/1/2000',periods=4) [2000-01-01 00:00:00, ..., 2000-01-04 00:00:00] Length: 4, Freq: D, Timezone: None # Specify frequency: us,ms,S,T,H,D,B, # W,M,3min, 2h20min, 2W,... >>> r=date_range('1/1/2000',periods=72, freq='H') >>> i=date_range('1/1/2000',periods=4, freq=datetools.YearEnd()) >>> i=date_range('1/1/2000',periods=4, freq=’3min’) >>> ts=Series(range(4), index=i) 2000-01-01 00:00:00 0 2000-01-01 00:03:00 1 2000-01-01 00:06:00 2 2000-01-01 00:09:00 3 Freq: 3T >>> ts.resample('T') 2000-01-01 00:00:00 0 2000-01-01 00:01:00 NaN 2000-01-01 00:02:00 NaN 2000-01-01 00:03:00 1 2000-01-01 00:04:00 NaN 2000-01-01 00:05:00 NaN 2000-01-01 00:06:00 2 2000-01-01 00:07:00 NaN 2000-01-01 00:08:00 NaN 2000-01-01 00:09:00 3 Freq: T # Group hourly data into daily >>> ts2 = Series(randn(72), index=r) >>> ts2.resample('D', how='mean’, closed='left', label='left') 01-Jan-2000 0.397501 02-Jan-2000 0.186568 03-Jan-2000 0.327240 Freq: D >>> my_avg = lambda x: x[1:-1].mean() >>> ts3=ts2.resample('D', how=my_avg) 369 Dealing with date & time II RANGES OF PERIODS CONVERT TIMESTAMP<->PERIOD # The index can be a list of # dates/times intervals >>> PeriodIndex(['2011-1', '2011-2', '2011-3', '2011-4'], freq='M') freq: M [Jan-2011, ..., Apr-2011] length: 4 # They can be automatically generated >>> pr = period_range('1/1/2000', periods=4, freq='M') freq: M [Jan-2000, ..., Apr-2000] length: 4 >>> ps=Series(range(4), index=pr) 2000-01 0 2000-02 1 2000-03 2 2000-04 3 Freq: M, dtype: int64 >>> ts = ps.to_timestamp(how='end') 2000-01-31 0 2000-02-29 1 2000-03-31 2 2000-04-30 3 Freq: M >>> all(ts.to_period() == ps) True TIMEZONES # Timestamps/pandas can be tz aware >>> ts_utc = ts.tz_localize('UTC') >>> ts_us=ts_utc.tz_convert('US/Eastern’) 2012-01-30 19:00:00-05:00 -0.425792 2012-02-28 19:00:00-05:00 0.788903 2012-03-30 20:00:00-04:00 0.009502 2012-04-29 20:00:00-04:00 1.775263 2012-05-30 20:00:00-04:00 -1.656727 Freq: M >>> ts.index == ts_us.index 370 True Visualizing (Time)series >>> ts.plot() >>> ts2.plot(secondary_y=True, style='g') See also tools.plotting.lag_plot, tools.plotting.autocorrelation_plot 371 Visualizing DataFrames >>> plt.figure() >>> df.boxplot() >>> df.plot(subplots=True) >>> plt.legend(loc='best') See also: plot with kind=‘bar’,‘barh’, ‘kde’ keywd hist method tools.plotting.scatter_matrix, andrews_curves, lag_plot 372 Software Craftsmanship in Python 373 Software Engineering Quotes Programs should be written for people to read, and only incidentally for machines to execute. Structure and Interpretation of Computer Programs Harold Abelson and Gerald Sussman 374 Software Engineering Quotes You need to have empathy not just for your users, but for your readers. It's in your interest, because you'll be one of them. Many a hacker has written a program only to find on returning to it six months later that he has no idea how it works. Hackers and Painters Paul Graham 375 Software Engineering Quotes Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it. Brian Kernighan co-author of “The C Programming Language” 376 Useful References or 377 Useful References 378 Software Carpentry Software Carpentry http://software-carpentry.org A Python-based course on software design and engineering for scientists. 379 Coding Modes The mode (context) you’re in changes the way you code. Interactive Mode: Quick Iteration Interactive Prompt and exploratory development. Production Mode: Building for the Ages Creating code that will be re-used by you or others. 380 Naming Variables Goal: Clarity for future readers of the code 381 Typical Scientific Naming Convention STRAIGHT FROM FFTPACK SUBROUTINE CFFTB1 (N,C,CH,WA,IFAC) DIMENSION CH(*) ,C(*) ,WA(*) ,IFAC(*) NF = IFAC(2) NA = 0 L1 = 1 IW = 1 DO 116 K1=1,NF IP = IFAC(K1+2) L2 = IP*L1 IDO = N/L2 IDOT = IDO+IDO IDL1 = IDOT*L1 IF (IP .NE. 4) GO TO 103 IX2 = IW+IDOT IX3 = IX2+IDOT IF (NA .NE. 0) GO TO 101 CALL PASSB4 (IDOT,L1,C,CH,WA(IW),WA(IX2),WA(IX3)) GO TO 102 382 Primary Naming Consideration Priority 1: A variable name should fully and accurately describe the entity and variable it represents. POOR NAME CHOICES # Update Cash Balance after stock trade. c1 = n * ip c2 = c1 + compute_tc(ins, n) b -= c2 DESCRIPTIVE NAME CHOICES # Update Cash Balance after stock trade. instrument_cost = instrument_quantity * instrument_price trade_cost = instrument_cost + transaction_cost(instrument_name, instrument_quantity) cash_balance -= trade_cost 383 Descriptive Name Choices Entity Represented Variable Name The number of trading days in the current month number_of_trading_days_in_current_month, NumberOfTradingDaysPerMonth, trading_days_in_current_month Length of a moving average window length_of_moving_average_window, lengthOfMovingAverageWindow Risk Free Interest Rate risk_free_interest_rate risk_free_rate RiskFreeRate These names are descriptive without much room for ambiguity, and they are trivially decoded. 384 But the Names are Looooong… Yes they are… Here are examples from the VTK library, a 3D visualization tool set with 1800+ classes. (www.vtk.org) 385 The Downside to Long Names DETAILED NAMES AND OBFUSCATED EQUATIONS # Long names can make simple formulas difficult to read, # reducing clarity. historic_daily_percent_price_change_per_day = \ (historic_daily_adjusted_close_price[1:] - \ historic_daily_adjusted_close_price[:-1]) \ / historic_daily_adjusted_close_price[:-1] SHORTER NAMES IMPROVE EQUATION READABILITY # Variable table: # adj_close – Daily adjusted closing price for an instrument. # change_percentage – Daily percent change in adjusted closing price. change_percentage = (adj_close[1:] - adj_close[:-1]) / adj_close[:-1] 386 Optimal Name Lengths Studies on debugging COBOL indicate* 10-16 character names is optimal length. 8-20 characters is almost as good. TOO LONG TOO SHORT JUST_RIGHT risk_free_interest_rate r, rfir, rate risk_free_rate, rsk_fr_rate length_of_moving_average_window N, nw win_len, window_length, Nwindow, mov_avg_win_len * Gorla, N., A. C. Benander, and B. A. Benander. 1990. “Debugging Effort Estimation Using Software Metrics.” IEEE Transactions on Software Engineering SE-16, no. 2 (February): 223-231 387 Strategies for Shortening Names Use Standard Abbreviations administrator_information -> admin_info Truncate consistently after 2nd or 3rd letter of each word display_window -> dis_win Remove Non-leading Vowels risk_free_interest_rate -> rsk_fr_intrst_rt Use only significant words in variable start_of_next_paragraph -> next_paragraph_start Remove useless suffixes checking_account -> check_account Iterate until variables are between 8 and 20 characters. 388 Using extremely short names LOOP INDICES I, J, and K # This is ok. for i in range(10): scores[i] = 0 # But this is better. for event in range(10): decathalon_scores[event] = 0 389 Using extremely short names INDUSTRY STANDARD VARIABLES IN “SMALL” CONTEXT # Quick, what does each variable stand for? y = a * sin(w*t + phi) 390 Using extremely short names INDUSTRY STANDARD VARIABLES IN “SMALL” CONTEXT def sin_wave(t, a=1, w=2*pi, phi=0): """ Return a sin wave form for time t. Inputs -----t: time in seconds a: amplitude scale factor w: frequency in radians/second phi: phase shift in radians Returns ------y: sin wave output """ y = a * sin(w*t + phi) return y 391 Domain vs. Computing Names Variable names should relate to the problem space, not generic programming terminology. # Not so good… Algorithm-centric names sum = 0 for record in record_list: sum += will_retire(record.name) # Better. Domain specific names. gold_watch_order = 0 for employee in employee_list: gold_watch_order += will_retire(employee.name) 392 Qualified Names Use the “most important” part of the name as the first word for “grouped” variables. Usually, this is a noun. MOST IMPORTANT WORD FIRST ADJECTIVE FIRST AS IN ENGLISH # The _avg suffix on the price # But... Placing avg as a prefix # variable indicates it is an # read more like english # average price. # (adjective first). So, # arguments can be made for this # approach as well. price_avg = mean(prices) avg_price = mean(prices) price_std = std(prices) avg_volume = mean(volumes) volume_avg = mean(volumes) std_price = std(prices) volume_std = std(volumes) std_volume = std(volumes) 393 Qualified Names Both prefixes and suffixes are found in “real” code, and are quite readable. Stick with one approach for a given variable group. EXAMPLE WITH MIXED PREFIX AND SUFFIX VARIABLES # Find the total profit made from group of trades. profit_total = 0.0 current_trade_index = 0 while current_trade_index < last_trade_index: current_trade = trades[current_trade_index] profit, open_positions = update_positions(current_trade, open_positions) profit_total += profit current_trade_index += 1 394 Global “Constants” GLOBALS CONSTANTS IN MODULES # "Constant" Global variables # are often capitalized. >>> from enthought.kiva import \ ... constants >>> constants.ITALIC 2 # But not always. # [This is a global type.] >>> import numpy >>> numpy.float32 The prefix used to differentiate constants in other languages are often removed and put in a single module in Python. ENUMERATED TYPE PREFIX # Type prefixes (like JOIN) often # indicate a group of enumerated # constants. >>> constants.JOIN_BEVEL 1 >>> constants.JOIN_MITER 2 # wxPython is a GUI library # wrapped around the C++ # wxWdigets library. >>> import wx # In C++, the constant is # wxALIGN_CENTER >>> wx.ALIGN_CENTER 2304 Note that these global values are not truly constants as they are writable variables. 395 Boolean Variables GENERIC NAMES POSITIVE VALUES # Try not to use generic names # as they don’t provide much # information flag = True status = True # Positive values are easy to # interpret if found: employee_of_month = employee STANDARD NAMES # These common names are more # informative. done = True found = True success = True valid = True error = True ok = True # Using the prefix is_ can # help identify booleans. is_done = True # Negative Booleans take more # mental gymnastics to read if not not_found: employee_of_month = employee 396 Naming Containers When naming lists (or sequences) of elements, use the plural of the word, or a suffix that indicates it is a container. VARIABLE SUFFIX FOR CONTAINERS # Using the _list suffix is descriptive and is more # easily differentiated from a single item. for name in name_list: print name PLURAL VARIABLE NAME FOR CONTAINERS # Using the plural version of the variable is also a common paradigm. for name in names: print name # For long variables names, the singular and plural approach isn’t # as good. It is to hard to differentiate between the two. close_price = {} for equity_symbol in equity_symbols: close_price[equity_symbol] = lookup_daily_close(equity_sybmol) # And this is obviously problematic... for moose in moose: moose.bellow() 397 Naming Variables Review Priority 1: A variable’s name should fully and accurately describe the entity and variable it represents. Names should refer to the real-world problem rather than to computer programming terms. A name should be long enough so that you don’t have to puzzle out its meaning. Short names are OK for industry standard names, indices, and in “small” contexts. Computed-value qualifiers should be at the end of the name. 398 Documenting Code 399 Documentation Layers Application Documentation End User Manuals, Domain specific Tutorials, etc. Developer Documentation Architecture Description, Concepts, Developer Tutorials API Documentation Doc-strings for functions, classes, and modules Inline Code Comments Describe intent of code or algorithm design Self Documenting Code SELF DOCUMENTING CODE # Snippet out of a draw() method for drawing # “bulls eye” cross hairs at the mouse position # on a plot. plot = self.component if plot is None: return # sx, sy: mouse position in screen space. sx, sy = plot.map_screen(self.current_position) if plot.orientation == "h" and sx is not None: self._draw_vertical_line(gc, sx) elif sy is not None: self._draw_horizontal_line(gc, sy) if self.show_marker: self._draw_marker(gc, sx, sy) Variable and Function Names, Good Design, Clear Layout 400 Documentation Layers Application Documentation End User Manuals, Domain specific Tutorials, etc. Developer Documentation Architecture Description, Concepts, Developer Tutorials API Documentation Doc-strings for functions, classes, and modules Inline Code Comments Describe intent of code or algorithm design Self Documenting Code Variable and Function Names, Good Design, Clear Layout INLINE CODE COMMENTS # Code snippet out of a Enthought Envisage # library written by Martin Chilvers. def unregister_services(self): """ Unregister any service offered by the plugin. """ # Services must be unregistered in reverse # order that they were registered in. service_ids = self._service_ids[:] service_ids.reverse() app = self.application for service_id in service_ids: app.unregister_service(service_id) # Reset services in case the plugin is # started again. self._service_ids = [] 401 Documentation Layers Application Documentation API DOCUMENTATION [FROM NUMPY] End User Manuals, Domain specific Tutorials, etc. def interp(x, xp, fp, left=None, right=None): """ One-dimensional linear interpolation. Developer Documentation Returns the one-dimensional piecewise linear interpolant to a function with given values at discrete data-points. Architecture Description, Concepts, Developer Tutorials API Documentation Doc-strings for functions, classes, and modules Inline Code Comments Describe intent of code or algorithm design Self Documenting Code Variable and Function Names, Good Design, Clear Layout Parameters ---------x : array_like The x-coordinates of the interpolated values. xp : 1-D sequence of floats The x-coordinates of the data points, must be increasing. Returns ------y : {float, ndarray} Interpolated values, same shape as `x`. 402 """ Documentation Layers Application Documentation End User Manuals, Domain specific Tutorials, etc. DEVELOPER DOCUMENTATION Provides a high level explanation of concepts and use of a library. Developer Documentation Architecture Description, Concepts, Developer Tutorials API Documentation Doc-strings for functions, classes, and modules Inline Code Comments Describe intent of code or algorithm design Self Documenting Code Variable and Function Names, Good Design, Clear Layout 403 Documentation Layers Application Documentation APPLICATION DOCUMENTATION End User Manuals, Domain specific Tutorials, etc. Developer Documentation Architecture Description, Concepts, Developer Tutorials API Documentation Doc-strings for functions, classes, and modules Inline Code Comments Describe intent of code or algorithm design Self Documenting Code Variable and Function Names, Good Design, Clear Layout 404 Documentation Layers Application Documentation End User Manuals, Domain specific Tutorials, etc. Developer Documentation Architecture Description, Concepts, Developer Tutorials OUR FOCUS • Writing useful inline code comments • Creating function, class, and module level documentation. API Documentation Doc-strings for functions, classes, and modules Inline Code Comments Describe intent of code or algorithm design Self Documenting Code Variable and Function Names, Good Design, Clear Layout 405 Bad Code Comments REPEATING THE CODE Comments that just parrot code are pretty much useless. # Check if the printer is ready. if printer_status == 'ready': document.print() INCORRECT COMMENTS This comment is not even accurate. It likely got out of sync with the code when the bank implemented a minimum balance policy and the comment wasn’t updated. # Flag withdrawals that cause # customer balance to become # negative. new_balance = balance - withdrawal if new_balance < min_allowed_balance: success = False 406 Better Code Comments SUMMARY COMMENTS FIXME COMMENTS Summarizing a few lines with a description of the code’s intent is useful. Flag design decisions and trade-offs that others should be aware of when editing code in the future. # Solve the dense linear system # ZI=V for the currents I, given # the impedance matrix Z and the # driving voltage V. lu_matrix, pivot = lu_factor(Z) I = lu_solve((lu_matrix, pivot), V) # FIXME: Sales tax hard coded to # 8.25%. This should be passed in # or looked up with a function # call. price_total = price * (1.0825) 407 Function Docstring Format TYPICAL DOCUMENTATION FORMAT – FOO.PY """ Short docstring for the module goes here. A longer description for the module goes it here. is typically multiple lines long. """ class Foo(object): """ Short docstring for Foo class. Longer docstring describing the Foo class. """ def some_method(self): """ Short description for some_method. It See the foo.py example in the demo/docstrings directory. Longer description for some_method... """ pass def bar(): """ Short docstring for bar function. And, not surprisingly, the long description for the function. """ pass 408 Command line help IPYTHON HELP PRINTS DOCSTRINGS In [12]: import foo In [13]: foo? Type: module Base Class: String Form: Namespace: Interactive File: c:\temp\foo.py Docstring: Short docstring for the module goes here. A longer description for the module goes here. is typically multiple lines long. It 409 Styles of Documentation SIMPLE IS AS SIMPLE DOES Use simple docstrings for simple functions. # from the Python standard library: locale.py def str(val): """Convert float to integer, taking the locale into account.""" return format("%.12g", val) 411 Styles of Documentation FORMAL DOCUMENTATION SPECIFICATION def interp(x, xp, fp, left=None, right=None): """ One-dimensional linear interpolation. Returns the one-dimensional piecewise linear interpolant to a function with given values at discrete data-points. Notes ===== Does not check that the x-coordinate sequence C{xp} is increasing. If C{xp} is not increasing, the results are nonsense. A simple check for increasingness is:: np.all(np.diff(xp) > 0) Examples ======== >>> xp = [1, 2, 3] >>> fp = [3, 2, 0] >>> np.interp(2.5, xp, fp) 1.0 >>> np.interp([0, 1, 1.5, 2.72, 3.14], xp, fp) array([ 3. , 3. , 2.5, 0.56, 0. ]) >>> UNDEF = -99.0 >>> np.interp(3.14, xp, fp, right=UNDEF) -99.0 @param x : array_like. The x-coordinates of the interpolated values. @param xp : 1-D sequence of floats. The x-coordinates of the data points, must be increasing. @param fp : 1-D sequence of floats. The y-coordinates of the data points, same length as C{xp}. @param left : float, optional. Value to return for C{x < xp[0]}, default is C{fp[0]}. @param right : float, optional. Value to return for C{x > xp[-1]}, defaults is C{fp[-1]}. @return: float or ndarray. The interpolated values, same shape as C{x}. @raise ValueError: if C{xp} and C{fp} have different lengths """ 412 Formal Specifications BENEFITS • Formalism helps developers know what to document. • Uniform look to documentation. • Tools like Sphinx and Epydoc can generate nice HTML and PDF docs automatically. DRAWBACKS • Can add (significant) overhead for creating new functions. • Overhead can inhibit developers from re-factoring and creating new functions when they should. 413 Restructured Text Docstring Format http://epydoc.sourceforge.net/othermarkup.html#restructuredtext http://docutils.sourceforge.net/docs/user/rst/quickstart.html __docformat__ = "restructuredtext" def interp(x, xp, fp, left=None, right=None): """ One-dimensional linear interpolation. Returns the one-dimensional piecewise linear interpolant to a function with given values at discrete data-points. :Parameters: x : array_like. The x-coordinates of the interpolated values. xp : 1-D sequence of floats. The x-coordinates of the data points, must be increasing. :return: float or ndarray. The interpolated values, same shape as C{x}. :raises ValueError: if C{xp} and C{fp} have different lengths Notes ===== Does not check that the x-coordinate sequence C{xp} is increasing. If C{xp} is not increasing, the results are nonsense. A simple check for increasingness is:: np.all(np.diff(xp) > 0) Examples ======== >>> xp = [1, 2, 3] >>> fp = [3, 2, 0] >>> np.interp(2.5, xp, fp) 1.0 >>> np.interp([0, 1, 1.5, 2.72, 3.14], xp, fp) array([ 3. , 3. , 2.5, 0.56, 0. ]) >>> UNDEF = -99.0 >>> np.interp(3.14, xp, fp, right=UNDEF) -99.0 fp : 1-D sequence of floats. The y-coordinates of the data points, same length as C{xp}. left : float, optional. Value to return for C{x < xp[0]}, default is C{fp[0]}. right : float, optional. Value to return for C{x > xp[-1]}, defaults is C{fp[-1]}. """ 417 Tips for Restructured Text EXAMPLE CODE PARAMETER DESCRIPTIONS Precede by '::' and a blank line. Parameter lists are “definition lists”, so every item must have a separate description line. """ Decorator to turn a codeblock inside of a function into a string:: @func2str def code(): c = a + b d = a - b """ Result: Decorator to turn a code-block inside of a function into a string. x_ctrl : float X-value of the control point. y_ctrl : float Y-value of the control point. Not this ("unexpected indentation" error): x_ctrl : float y_ctrl : float The control point. @func2str def code(): c = a + b d = a -b 418 Tips for Restructured Text MIXING LIST SYNTAX This example looks nice in the source file, but it is mixing syntax for bullet lists and definition lists. It gets an indentation error. - object: The value is treated as a normal Python object and is not modified. - file: The value is a file, and supports various types of conversions. This can be made into valid ReST as a bullet list or as a definition list. BULLET LISTS - object: The value is treated as a normal Python object and is not modified. - file: The value is a file, and supports various types of conversions. OBJECT LISTS object The value is treated as a normal Python object and is not modified. file The value is a file, and supports various types of conversions. 419 Result with Sphinx 420 Line Count Metrics for Documentation PORTION OF CODE THAT SHOULD BE COMMENTS Number for several Python projects using cloc: http://cloc.sourceforge.net/ $ ./cloc-1.56.pl numpy_trunk/ 1023 text files. 1006 unique files. 385 files ignored. http://cloc.sourceforge.net v 1.56 T=7.0 s (96.4 files/s, 53083.3 lines/s) -------------------------------------------------------------------------------Language files blank comment code -------------------------------------------------------------------------------C 81 22565 41042 120573 Python 424 24915 54760 83183 C/C++ Header 93 3064 3082 10007 Cython 6 940 2422 1523 C++ 8 52 116 643 make 9 100 74 457 HTML 5 47 15 403 CSS 3 134 46 369 ~39 % Comments Fortran 77 22 11 65 298 Fortran 90 15 56 20 200 sed 1 0 12 140 Bourne Shell 7 22 23 92 Bourne Again Shell 1 13 12 87 -------------------------------------------------------------------------------SUM: 675 51919 101689 217975 -------------------------------------------------------------------------------- Result: SciPy: 32%, NumPy: 39%, ETS: 38% 421 Coding Standards 422 Foolish Consistencies… A foolish consistency is the hobgoblin of little minds, adored by little statesmen and philosophers and divines. With consistency a great soul has simply nothing to do. He may as well concern himself with his shadow on the wall. Out upon your guarded lips! Sew them up with packthread, do. Else if you would be a man speak what you think to-day in words as hard as cannon balls, and to-morrow speak what to-morrow thinks in hard words again, though it contradict every thing you said to-day. Ah, then, exclaim the aged ladies, you shall be sure to be misunderstood! Misunderstood! It is a right fool’s word. Is it so bad then to be misunderstood? Pythagoras was misunderstood, and Socrates, and Jesus, and Luther, and Copernicus, and Galileo, and Newton, and every pure and wise spirit that ever took flesh. To be great is to be misunderstood. From the essay “Self Reliance,” by Ralph Waldo Emerson Quoted in “PEP 8: Style Guide for Python,” by Guido Von Rossum and Barry Warsaw 423 Foolish Consistencies… True. But this statement is too often used to get rid of good consistencies. Show your creativity and personal style in your algorithms, not your code layout. Code consistencies make sharing and debugging much more efficient. 424 Foolish Consistencies… Ralph doesn’t differentiate between good and foolish consistencies, but Guido does. From the Python Coding Standard*: [There are] two good reasons to break a particular [coding standard] rule: (1) When applying the rule would make the code less readable, even for someone who is used to reading code that follows the rules. [editorial addition: Think about others, not you.] (2) To be consistent with surrounding code that also breaks it (maybe for historic reasons) -- although this is also an opportunity to clean up someone else's mess (in true XP style). * http://www.python.org/dev/peps/pep-0008/ 425 Python Coding Standard The Python Coding Standard is defined in Python Enhancement Proposal 8* (PEP-8). * http://www.python.org/dev/peps/pep-0008/ 426 Indentation and Spaces INDENTATION TABS OR SPACES? Always use 4 spaces to indent code blocks. Never mix tabs and spaces in a file. def sum(a): total = 0 for value in a: total += a return total 4 spaces each The “Tab Nanny” tool shipped with Python can check this. # mixed.py # This file contains tabs and spaces def foo(a): print 'tabs' print 'spaces' C:>c:\python25\lib\tabnanny.py mixed.py mixed.py 5 " print 'spaces'\n" Most editors (emacs, vim, etc.) have a python mode that will insert 4 spaces when you type . 427 Line Lengths and Wrapping 80 CHARACTER LINES Limit line lengths to 80 characters. Emacs and many other editors wrap at 80 characters by default. # Use ‘\’ at the end of lines to continue on the next line. from numpy import array, sin, cos, allclose, zeros, ones, \ uint32, float32 # Code enclosed in () or [] or {} does not need the ‘\’. if (instrument_volume > VOLUME_THRESHOLD and instrument_price < PRICE_THRESHOLD): # Align function arguments with beginning of function args. a, b = some_long_function_call_name(with_long_var1, and_long_var2) 428 Line Spacing TOP LEVEL FUNCTIONS/CLASSES # Separate Top level functions # and classes with two blank # lines. CLASS METHODS # Separate class methods by a # single line class Person(object): def add(a, b): total = a + b return total 2 blank lines def __init__(self, first, last): self.name = name def full_name(self): return self.first + \ self.last class Person(object): def __init__(self, first, last): self.first = first self.last = last def __repr__(self): string = "Person(%s)" % \ self.full_name class City(object): pass def __init__(self, name, state): self.name = name self.state = state 1 blank line 429 Line Spacing GROUPING FUNCTIONS ONE LINER FUNCTIONS # Extra blank lines may be used # (sparingly) to separate groups # of related functions. # Lines can be omitted between # one-liner implementations (such # as dummy functions). #--------------------------------# Math functions #--------------------------------- def stub1(): pass def stub2(): pass def stub3(): pass def add(a, b): total = a + b return total def subtract(a, b): difference = a – b return difference extra space #--------------------------------# Name functions #--------------------------------def full_name(first, last): full = first + last return full 430 Line Spacing FUNCTION BODIES # Use blank lines (sparingly) in # functions to indicate logical # sections. def blend_color_channel(c1, c2, alpha): # Validation if not 0 <= alpha <= 1.0: raise ValueError, "bad alpha" # Channel blending algorithm. new_c = c1 * alpha + \ c2 * (1 - alpha) extra space return new_c 431 Imports SEPARATE LINES # Imports should usually be on # separate lines. # Like this. import os import sys # NOT like this. import os, sys IMPORT FROM A MODULE # It is OK (and encouraged) to have # have items imported from a module # on the same line. from numpy import array, float32 AVOID IMPORT * # Do not use import * except at # the command prompt. # Only at the command prompt... from numpy import * 432 Imports LOCATION IN FILE # # # # Imports should happen at the top of a file so that others can quickly see the dependencies for a module. # foo.py import os import sys def some_function(): pass ... LOCAL IMPORTS # # # # It is OCCASIONALLY useful to break this rule to have “optional” dependencies in a module. Be careful about this. #---------------------------------# Statistical Functions #---------------------------------def mean(a): ... def std(a): ... #---------------------------------# Statistical Plot Functions #---------------------------------def histogram_plot(a): from pylab import plot ... 433 Imports GROUPING IMPORTS # # # # # # PEP-8 Group imports in the following order: 1. Standard library imports 2. Related 3rd party imports 3. Application specific imports import os import sys import wx import pricing_models LOCAL IMPORTS # # # # # # # At of 1. 2. 3. 4. 5. Enthought, we expand the number groupings and label them. Standard library imports Math/science libraries Related 3rd party imports Enthought specific libraries Application specific imports # Standard Imports import os import sys # Math Imports from numpy import array # 3rd Party Imports import wx # Enthought Imports from enthought.traits.api import Int # Application Imports import pricing_models 434 Absolute Imports USE ABSOLUTE IMPORTS Example directory structure for Python libraries. C:/ # # # # # PEP-8 Relative imports of intra-package imports are highly discouraged. Use absolute package path for all imports. python_library/ package/ # File: another_module.py __init__.py some_module.py another_module.py tests/ test_some_module.py test_another_module.py # Recommended from package.some_module import func # Discouraged from some_module import func ... 435 Avoid Extraneous White Space INSIDE (), [], OR {} SLICING AND INDEXING # Yes spam(ham[1], {eggs: 2}) # Yes dict['key'] = list[index] # No spam( ham[ 1 ], { eggs: 2 } ) # No dict ['key'] = list [index] BEFORE COMMAS AND COLONS # Yes if x == 4: print x, y # No If x == 4 : print x , y FUNCTION CALLS # Yes spam(1) # No spam (1) AROUND ASSIGNMENT # PEP-8 encourages the first # Yes # No extra space x = 1 y = 2 long_variable = 3 # No # Align '=' character x = 1 y = 2 long_variable = 3 436 Whitespace AROUND BINARY OPERATORS FUNCTION ARGUMENTS # Binary operators are surrounded by # a space on either side. # Don’t put spaces around ‘=‘ in # keyword arguments. # Yes if a > 1: b = 2 * (a + 1) # Yes def quad(x, a=0, b=1, c=0): y = a * x**2 + b * x + c return y # No if a> 1: b= 2 * (a+1) # No def quad(x, a = 0, b = 1, c = 0): y = a * x**2 + b * x + c return y 437 Compound Statements COMPOUND STATEMENTS # Compound statements are discouraged # Yes if foo == 'blah': do_something() func1() func2() func3() # No if foo == 'blah': do_something() func1(); func2(); func3(); 438 Naming Conventions VARIABLES GLOBAL VARIABLES # Use lower_case notation without # capital letters for variables in # your functions, methods, and scripts. # Use UPPER_CASE notation for # module level “constants” or # “static” class variables. # Yes lower_case = 3 # File: foo.py # No CamelCase = 3 mixedCase = 3 Mixed_Case = 3 # Globals # Yes UPPER_CASE = 3 # No CamelCase = 3 mixedCase = 3 Mixed_Case = 3 def some_function(a): result = a * UPPER_CASE return result 439 Naming Conventions FUNCTIONS AND METHODS # # # # function and methods should use lower_case names with underscores. Do not use the Java convention of mixedCase. CLASSES # # # # Class names are CamelCase. If there is one class in a file and class is named FooBar, the file is named foo_bar. # file: foo_bar.py # Yes def some_function(a): pass # No def SomeFunction(a): pass def someFunction(a): pass # Yes class FooBar(object): def some_method(self): pass # No class foo_bar(object): def some_method(self): pass 440 Naming Conventions PRIVATE CLASSES AND FUNCTIONS # Names of ‘private’ classes, methods # and functions start with a single # underscore. def _private_function(a): pass def _PrivateClass(object): pass 441 Naming Packages and Modules MODULE NAMES PACKAGE NAMES # Module names should be lower case # with underscores. # # # # # Yes foo_bar.py # No FooBar.py Package directories should be all lower case alpha-numeric characters. Avoid underscores unless absolutely necessary. # Yes packagename # No PackageName package_name 442 Using Future Python can turn on planned features for future versions of Python that are backward incompatible. Here are several commonly used future features: from __future__ import division, absolute_import, with_statement • division In “old” Python, 1/2 would become 0 because it used “integer” division. In Python 3.0, 1/2 will be 0.50. • with_statement Enable the with statement found in Python 2.5 and beyond. • absolute_import Distinguish between absolute and relative module imports. import foo imports from the top level of the module namespace. import .foo imports from the same package as this module. 443 Using Future All future_statements must appear near the top of the module. The only lines that can appear before a future_statement are: The module docstring (if any). Comments. Blank lines. Other future_statements. FUTURE EXAMPLE """ This is a module docstring. """ # This is a comment, preceded by a blank line and followed by # a future_statement. from __future__ import with_statement from math import sin from __future__ import division # compile-time error! # ERROR because it is preceded by a non-future_statement. 444 Common Directory Structure PACKAGE/MODULE/TESTS LAYOUT Example directory structure for Python libraries. C:/ python_library/ yourpackage/ __init__.py some_module.py another_module.py tests/ test_some_module.py test_another_module.py The tests for module have are named similarly (test_ prefix) but live “one level below” in a tests directory. 445 Functions and Refactoring Code 446 Evolution of a script Useful software often starts its life as a script. EXPLORATORY SCRIPT # Read data files into some structure. for file in files: ... # Check for errors in data. if data > bad_value: raise ValueError ... # Execute one or more algorithms on the data. important_number = data * 2 + blah... ... # Create a report about the results. print important_number ... 448 To A Function Evolves to a function… ONE MEGA-FUNCTION TO BIND THEM ALL def display_data_report(files): # Read data files into some structure. for file in files: ... # Check for errors in data. if data > bad_value: raise ValueError ... # Execute one or more algorithms on the data. important_number = data * 2 + blah ... # Create a report about the results. print important_number ... 449 Evolution Stops And Stops… There are some short term benefits to this. MEGA-FUNCTION BENEFITS • Easy (quick) to create from original script. • Easy to read and modify during construction. • All the code is “in one place.” • Access to all variables at any time. (Global namespaces are nice that way.) • Achieves some re-use. 450 Evolution to a library Long term benefits come from continuing to “refactor” function until there is “one idea per function.” LOW LEVEL FUNCTION LIBRARY def data_from_files(files): # Read data files into a structure. for file in files: ... def check_for_errors(data): # Check for errors in data. if data > bad_value: raise ValueError ... DRIVER FUNCTIONS def display_data_report(files): """ "Driver" function that calls the low level library functions. """ data = data_from_files(files) check_for_errors(data) res = calc_important_number(data) create_report(data, res) def calc_important_number(data): # Execute one or more algorithms. important_number = data * 2 ... def create_report(data, calc_data): # Create a report about results. print important_number ... 451 “One Idea Per Function” Benefits • Smaller, less complex snippets of code that are easier for others (and you) to read code in the future. • Builds up a set of re-usable functions. • Easier for others to modify behavior without modifying original code. • Better Potential for Testing. 452 Avoid Duplicate Code DUPLICATED CODE instrument1_prices = lookup_price(instrument1, start_date, stop_date) instrument1_price_avg = mean(instrument1_prices) print_summary(instrument1, instrument1_prices, instrument1_price_avg) instrument2_prices = lookup_price(instrument2, start_date, stop_date) instrument2_price_avg = mean(instrument2_prices) print_summary(instrument2, instrument2_prices, instrument2_price_avg) REFACTOR TO USE A FUNCTION # refactor code so duplicated lines are in a function. def summarize_price_info(instrument, start_date, stop_date): instrument_prices = lookup_price(instrument, start_date, stop_date) instrument_price_avg = mean(instrument_prices) print_summary(instrument, instrument_prices, instrument_price_avg) # Now call the function for the two different instruments. summarize_price_info(instrument1, start_date, stop_date) summarize_price_info(instrument2, start_date, stop_date) 453 Simplify Complex Conditionals COMPLEX CONDITIONAL if ( isinstance(float, value) and sqrt(value**2+other_value**2) > 21 and value < 5): do_something_important(value) REFACTOR TO USE A FUNCTION # Refactor code so the complex expression is in a function. def is_important(value, other_value): important = isinstance(float, value) \ and sqrt(value**2+other_value**2) > 21 \ and value < 5 return important if is_important(value, other_value): do_something_important(value) 454 Useful Reference 455 Testing Code 456 Test Driven Development Overarching Concept: Write Tests First. 458 Useful Reference 459 Test Driven Development PROBLEM DEFINITION pt2 (x2, y2) dy pt1 (x1, y1) dx slope = dy/dx 460 Test Driven Development TEST # test_fancy_math.py from nose.tools import assert_almost_equal from fancy_math import slope def test_slope(): pt1 = [0.0, 0.0] pt2 = [1.0, 2.0] s = slope(pt1, pt2) assert_almost_equal(s, 2) pt2 (x2, y2) dy pt1 (x1, y1) dx slope = dy/dx 461 Test Driven Development TEST # test_fancy_math.py from nose.tools import assert_almost_equal from fancy_math import slope def test_slope(): pt1 = [0.0, 0.0] pt2 = [1.0, 2.0] s = slope(pt1, pt2) assert_almost_equal(s, 2) pt2 (x2, y2) dy pt1 (x1, y1) dx slope = dy/dx FANCY_MATH # Simplest function that passes, tests… def slope(pt1, pt2): return 2 462 Test Driven Development TEST # test_fancy_math.py from nose.tools import assert_almost_equal from fancy_math import slope def test_slope(): pt1 = [0.0, 0.0] pt2 = [1.0, 2.0] s = slope(pt1, pt2) assert_almost_equal(s, 2) pt2 (x2, y2) dy pt1 (x1, y1) dx slope = dy/dx FANCY_MATH # Simplest function that passes, tests… def slope(pt1, pt2): return 2 Smart Alec... 463 TDD Rationale Build only the features you need. (YAGNI Principle – you ain’t gonna need it) All the features are tested. You are the first consumer of your API. (Helps in design process) 464 Running Nosetests FROM IPYTHON In [27]: ls Volume in drive W is Shared Folders Volume Serial Number is 0000-0064 Directory of w:\...\slope 09/29/2008 09/29/2008 09/29/2008 09/29/2008 2 File(s) 11:41 11:30 11:37 11:41 PM PM PM PM 2 Dir(s) . .. 123 fancy_math.py 195 test_fancy_math.py 48,067,092,480 bytes free In [28]: !nosetests -v test_fancy_math.test_slope ... ok ------------------------------------------------------------------Ran 1 test in 0.000s 465 OK Using Unittest Framework TESTS EMBEDDED IN CLASSES # Test cases: # - each method is a unit test # - found by nosetests as well from unittest import TestCase from fancy_math import * class TestModule(TestCase): def test_slope(self): pt1 = [0.0, 0.0] pt2 = [1.0, 2.0] s = slope(pt1, pt2) self.assertAlmostEqual(s, 2) 466 Doctests TESTS EMBEDDED IN DOCSTRINGS def factorial(n): """ Return the product of all positive integers less than or equal to n. Example: >>> factorial(3) 6 >>> factorial(4) 24 """ if n <= 1: result = 1 else: result = n * factorial(n-1) return result if __name__ == "__main__": import doctest doctest.testmod() Optional 467 Monitoring execution 468 The logging module • Key part of good software craftsmanship • Part of the standard library • Structure of a logging system: 1. 2. 3. 4. 5. Create a logger instance Add 1 or more handlers (console, file, http, …) Set a level for these handlers or the logger [OPTIONAL] Control the formatting Add logger calls throughout the code 469 Simple Example from logging import getLogger, FileHandler, StreamHandler logger = getLogger() # Root logger console_handler = StreamHandler() logger.addHandler(console_handler) file_handler = FileHandler("log.txt”) logger.addHandler(file_handler) logger.debug("Debugging level message") # Hidden More numerous logger.info("Informative level message") # Hidden logger.warning("Warning level message") Warning level message logger.error("Error level message") Error level message More critical 470 More complex example from logging import getLogger, StreamHandler, DEBUG, INFO, WARNING, ERROR from logging.handlers import RotatingFileHandler logger = getLogger(__name__) # local file’s logger console_handler = logging.StreamHandler() logger.addHandler(console_handler) # File handler that is a rotating file file_handler = RotatingFileHandler("log.txt", maxBytes=1024, backupCount=3) formatter = logging.Formatter("%(levelname)s - %(asctime)s - %(name)s " "- %(message)s") file_handler.setFormatter(formatter) logger.addHandler(file_handler) logger.setLevel(DEBUG) # highly sensitive logger logger.debug("Debugging level message") DEBUG - 2012-10-16 23:08:03,592 – package.analysis_module - Debugging level message 471 Ways to time execution Timing inside the code (Good) • the time module from std lib Timing in ipython (Better): • %timeit “magic command” • -t option of ‘run’ (optionally –N also) Profiling the code (Best): • cProfile or line_profiler package 472 Timing in python USE TIME PACKAGE import time from numpy.random import randn from numpy import linspace, pi, exp, sin from scipy.optimize import leastsq def func(x,A,a,f,phi): return A*exp(-a*sin(f*x+phi)) def errfunc(params, x, data): return func(x, *params) - data start = time.time() params0 = [1,1,1,1] x = linspace(0,2*pi,25) ptrue = [3,2,1,pi/4] true = func(x, *ptrue) noisy = true + 0.3*randn(len(x)) pmin,ierr = leastsq(errfunc, params0, args=(x, noisy)) print “Total: %f s” %(time.time()-start) USE IPYTHON TOOLS # For a script >>> run –t [-N10] test.py IPython CPU timings (estimated): User : 1.10 s. System : 0.00 s. Wall time: 1.11 s. # For a function call >>> import random as rand >>> import numpy.random as \ nprand >>> %timeit rand.randint(0,10) 100000 loops, best of 3: 2.02 us per loop >>> %timeit nprand.randint(10) 10000000 loops, best of 3: 178 ns per loop 473 Timing and Profiling Code 474 Ways to time execution Timing inside the code (Good) • the time module from std lib Timing in ipython (Better): • %timeit “magic command” • -t option of ‘run’ (optionally –N also) Profiling the code (Best): • cProfile or line_profiler package 475 Timing in python USE TIME PACKAGE import time from numpy.random import randn from numpy import linspace, pi, exp, sin from scipy.optimize import leastsq def func(x,A,a,f,phi): return A*exp(-a*sin(f*x+phi)) def errfunc(params, x, data): return func(x, *params) - data start = time.time() params0 = [1,1,1,1] x = linspace(0,2*pi,25) ptrue = [3,2,1,pi/4] true = func(x, *ptrue) noisy = true + 0.3*randn(len(x)) pmin,ierr = leastsq(errfunc, params0, args=(x, noisy)) print “Total: %f s” %(time.time()-start) USE IPYTHON TOOLS # For a script >>> run –t [-N10] test.py IPython CPU timings (estimated): User : 1.10 s. System : 0.00 s. Wall time: 1.11 s. # For a function call >>> import random as rand >>> import numpy.random as \ nprand >>> %timeit rand.randint(0,10) 100000 loops, best of 3: 2.02 us per loop >>> %timeit nprand.randint(10) 10000000 loops, best of 3: 178 ns per loop 476 Profiling with cProfile cProfile (and its pure python version, profile) are profiling tools in the standard library. WORKFLOW The cProfile workflow has two main steps: 1. Run the code to be profiled via the cProfile’s run() (or runctx()) function or using the kernprof.py script. This counts and times function calls, and generates a profiling dataset. 1. Process and display the profile data. In the simplest case (e.g. cProfile.run('foo()')), a predefined report is generated and printed. For finer control, you can save the raw data to a file and process it using the pstats module. 477 Profiling with kernprof.py From the command line (if the line_profiler package is installed): $ kernprof.py datafit.py Wrote profile results to datafit.py.prof $ python -m pstats Welcome to the profile statistics browser. % read datafit.py.prof datafit.py.prof% strip datafit.py.prof% sort time datafit.py.prof% stats 20 (...) ncalls tottime percall filename:lineno(function) (...) cumtime percall 1 0.299 0.299 0.363 0.363 _core.py:4() 1 0.130 0.130 0.635 0.635 mpl.py:1() 1 0.115 0.115 0.482 0.482 __init__.py:14() 12 0.114 0.009 0.415 0.035 __init__.py:1() 478 Profiling from iPython EXAMPLE >>> >>> ... ... ... ... ... ... >>> >>> import time def func(n): if n < 0: return time.sleep(0.1*n) func(n-1) return import cProfile cProfile.run('func(3)') 11 function calls (7 primitive calls) in 0.601 seconds Ordered by: standard name ncalls 5/1 1 1 4 tottime 0.000 0.000 0.000 0.600 percall 0.000 0.000 0.000 0.150 cumtime 0.601 0.601 0.000 0.600 percall 0.601 0.601 0.000 0.150 filename:lineno(function) :1(func) :1() {method 'disable' of … {time.sleep} 479 Profiling with cProfile OUTPUT TABLE COLUMNS ncalls The number of calls. Counts of the form N/M indicate N actual calls (including recursive calls), and M 'primitive' (nonrecursive) calls. tottime The total time spent in the given function (and excluding time made in calls to sub-functions). percall The quotient of tottime divided by ncalls. cumtime The total time spent in this and all subfunctions (from invocation until exit). This figure is accurate even for recursive functions. percall The quotient of cumtime divided by primitive calls. filename:lineno(function) Provides the respective data of each function. 480 Profiling with cProfile USE PSTATS TO CONTROL THE TABLE # Save the raw profile data to a file. >>> cProfile.run('func(3)', filename='func.prof') # Use pstats.Stats to manipulate the data. >>> import pstats >>> stats = pstats.Stats('func.prof') # Clean up the filenames. >>> stats.strip_dirs() # Sort by the cumulative time. >>> stats.sort_stats('cumulative') # Print the table. >>> stats.print_stats() 481 Profiling with cProfile USE PSTATS TO CONTROL THE TABLE - CONTINUED Wed May 4 14:56:13 2011 func.prof 11 function calls (7 primitive calls) in 0.600 seconds Ordered by: cumulative time ncalls tottime percall filename:lineno(function) cumtime percall 1 0.000 0.000 0.600 0.600 :1() 5/1 0.000 0.000 0.600 0.600 :1(func) 4 0.600 0.150 0.600 0.150 {time.sleep} 1 0.000 0.000 0.000 0.000 {method 'disable' of …} 482 Profiling with cProfile MORE OUTPUT CONTROL Arguments to Stats.sort_stats(…) Valid Argument Meaning 'calls' call count 'cumulative' cumulative time 'file' file name 'module' file name 'pcalls' primitive call count 'line' line number 'name' function name 'nfl' name/file/line 'stdname' standard name 'time' internal time 483 Profiling with cProfile MORE OUTPUT CONTROL Arguments to Stats.print_stats([restriction, …]) allow to filter the output: Each restriction is either: • an integer to select the number of lines to output; • a decimal fraction between 0.0 and 1.0 to select a percentage of lines; or • a regular expression to match the filename. 484 Profiling with cProfile A FEW USEFUL OUTPUT CASES # Find out what is taking the most time: >>> stats.sort_stats('cumulative').print_stats(10) # Find functions in which the program is spending the # the most time (not including calls to other functions): >>> stats.sort_stats('time').print_stats(10) # Check how often __init__ is called in all the modules # (sorted by name): >>> stats.sort_stats('file').print_stats('__init__') 485 Profiling with cProfile MORE DOCUMENTATION Python standard library documentation: • http://docs.python.org/library/profile.html Doug Hellman's "Python Module of the Week" blog: • http://blog.doughellmann.com/2008/08/pymotw-profile-cprofile-pstats.html 486 line_profiler and kernprof line_profiler and kernprof are profiling tools developed by Robert Kern. • line_profiler is a module for doing line-by-line profiling of functions. • kernprof is a convenient script for running either line_profiler or the standard library's cProfile module. INSTALLATION $ easy_install line_profiler TYPICAL WORKFLOW 1. Decorate the functions to be profiled with @profile. 2. Run your script using kernprof.py with the –l option. For example, $ kernprof.py –l script_to_profile.py 3. Run the line_profiler module to display the results. For example, $ python –m line_profiler script.to_profile.py.lprof 4. Adjust your code, and repeat steps 2-4. 5. Remove the @profile decorators. 487 line_profiler example http_search.py import re PATTERN = r"https?:\/\/[\w\-_]+(\.[\w\-_]+)+([\w\\.,@?^=%&:/~\+#]*[\w\-\@?^=%&/~\+#])?" @profile def scan_for_http(f): addresses = [] for line in f: result = re.search(PATTERN, line) if result: addresses.append(result.group(0)) return addresses if __name__ == "__main__": import sys f = open(sys.argv[1], 'r') addresses = scan_for_http(f) for address in addresses: print address See demo/profiling directory for code. 488 line_profiler example Run kernprof.py and line_profiler $ kernprof.py –l http_search.py sample.html … http://sphinx.pocoo.org/ Wrote profile results to http_search.py.lprof $ python –m line_profiler http_search.py.lprof Timer unit: 1e-06 s File: http_search.py Function: scan_for_http at line 6 Total time: 0.016079 s Line # Hits Time Per Hit % Time Line Contents ============================================================== 6 @profile 7 def scan_for_http(f): 8 1 3 3.0 0.0 addresses = [] 9 1350 2080 1.5 12.9 for line in f: 10 1349 12417 9.2 77.2 result = re.search(PATTERN, line) 11 1349 1513 1.1 9.4 if result: 12 39 65 1.7 0.4 addresses.append(result.group(0)) 13 1 1 1.0 0.0 return addresses 489 line_profiler example http_search2.py import re PATTERN = r"https?:\/\/[\w\-_]+(\.[\w\-_]+)+([\w\\.,@?^=%&:/~\+#]*[\w\-\@?^=%&/~\+#])?" @profile def scan_for_http(f): addresses = [] pat = re.compile(PATTERN) for line in f: result = pat.search(line) if result: addresses.append(result.group(0)) return addresses if __name__ == "__main__": import sys f = open(sys.argv[1], 'r') addresses = scan_for_http(f) for address in addresses: print address 490 line_profiler example Run kernprof.py and line_profiler on the modified file $ kernprof.py –l http_search2.py sample.html … Wrote profile results to http_search2.py.lprof $ python –m line_profiler http_search2.py.lprof Timer unit: 1e-06 s File: http_search2.py Function: scan_for_http at line 6 Total time: 0.00911 s Line # Hits Time Per Hit % Time Line Contents ============================================================== 6 @profile 7 def scan_for_http(f): 8 1 3 3.0 0.0 addresses = [] 9 1 3117 3117.0 34.2 pat = re.compile(PATTERN) 10 1350 1995 1.5 21.9 for line in f: 11 1349 2507 1.9 27.5 result = pat.search(line) 12 1349 1415 1.0 15.5 if result: 13 39 72 1.8 0.8 addresses.append(result.group(0)) 14 1 1 1.0 0.0 return addresses 491 pdb The Python debugger 492 What is pdb? pdb is part of the standard library pdb, like Python, is interactive and interpreted, allowing for the execution of arbitrary Python code in the context of any stack frame pdb can debug a “post-mortem” condition, and can also be called under program control ipdb (not in std lib) is similar but includes tab completion 493 Starting pdb Run the script from the command line under debugger control C:\> python -m pdb script.py [arg ...] • • During an iPython session, here are some of the common pdb functions used for debugging : >>> pdb.run( statement ) Execute the statement (given as a string) under debugger control >>> pdb.runcall( function[, argument, ...] ) Call the function (not a string) with the given arguments under debugger control >>> pdb.pm() Start the debugger at the point of the last exception • Inside a script or a module to hard-code a breakpoint import pdb ; pdb.set_trace() • Useful for hard-coding a breakpoint in a program even if code is not being debugged (analyzing a failed assertion, etc.) 494 pdb commands • pdb runs as an interactive session having a specific set of commands • Some of the more common pdb commands: (Pdb) h(help)[command] One of the most important! Lists all the commands available, or help on a specific command (Pdb) u(p) / d(own) Pop up or push down the execution stack (Pdb) b(reak)[[filename:]lineno | function[, condition]] Set a breakpoint at a specific file/line or function and optionally if a specific condition is met. If no args are given, list all the breakpoints & their numbers (Pdb) s(tep) / n(ext) Execute the current line only. step will push into a function call and next will execute the function call and move to the next statement in the current function (Pdb) a(rgs) Print the args for the current function 495 pdb commands (Pdb) l(ist) [first[, last]] List the source code at the point of execution. Args first and last set a range for the number of lines printed. No args prints 11 lines around the current line, if only first, prints 11 lines around that line. (Pdb) j(ump) lineno Jump to a line in the bottom-most frame only and execute from there. Not all jumps are possible! (Pdb) p / pp [expression] Print or “pretty print” expression in the context of the current frame (Pdb) a(lias) [name [command]] Create an alias for command named name, or list all aliases. Here are two useful aliases (especially when placed in a .pdbrc file): #print all instance variables (usage "pi classInst") alias pi for k in %1.__dict__.keys(): print "%1.",k,"=",%1.__dict__[k] #print instance variables in self alias ps pi self 496 iPython and pdb # ipython can call pdb automatically upon error In [1]: pdb Automatic pdb calling has been turned ON In [2]: import middle In [3]: middle.run() --------------------------------------------------------------------------IndexError Traceback (most recent call last) Z:\projects\Training\pdb\ Z:\projects\Training\pdb\middle.py in run() 31 """ 32 for i in range( 1, 11 ) : 33 l = make_list( i ) ---> 34 print "The middle item(s) in %s\n\tis/are %s\n" % (l, get_middle( l )) 35 Z:\projects\Training\pdb\middle.py in get_middle(item_list) 9 10 if( num_items % 2 ) : ---> 11 return item_list[half] 12 13 return item_list[(half - 1):(half + 1)] IndexError: list index out of range > z:\projects\training\pdb\middle.py(11)get_middle() -> return item_list[half] 497 Example session DEBUGGING FROM ORIGIN OF EXCEPTION ipdb> l 6 7 8 9 10 11 -> 12 13 14 15 16 def """ num_items = len( item_list ) half = num_items * 2 Print the source code around the line which raised the exception if( num_items % 2 ) : return item_list[half] return item_list[(half - 1):(half + 1)] Print the contents of the list make_list( size=0 ) : ipdb> item_list ['0'] ipdb> half 2 ipdb> c Print the value of the index ah-ha! In [4]: Return to ipython Test this with middle.py in Demos/pdb 498 Example session DEBUGGING MIDDLE.PY FROM THE START >>> import pdb >>> import middle >>> pdb.runcall( middle.run ) > z:\projects\pgtraining\pdb\middle.py(32)run() We know make_list is ok, so -> for i in range( 1, 11 ) : skip over it with “next” (Pdb) s > z:\projects\pgtraining\pdb\middle.py(33)run() -> l = make_list( i ) (Pdb) n > z:\projects\pgtraining\pdb\middle.py(34)run() -> print "The middle item(s) in %s\n\tis/are %s\n" % (l, get_middle( l )) (Pdb) s --Call-Continue to execute lines > z:\projects\pgtraining\pdb\middle.py(2)get_middle() -> def get_middle( item_list ) : until we see something (Pdb) s suspicious ... > z:\projects\pgtraining\pdb\middle.py(11)get_middle() -> return item_list[half] (Pdb) item_list ['0'] Print the contents of the list (Pdb) half 2 Print the value of the index 499 ah-ha! Debugging exercise In Demos/ directory, file_sum.py contains a class which computes the sum of every number in a file. file_sum.py has 2 bugs. Once fixed, the script runs like this: C:\> python file_sum.py dir.dat The sum of all numbers in dir.dat is 4355 500 Other debugging tools • ipdb (not in std lib) offers the same functionalities as pdb (set_trace allowing to march through execution) but allow more interactive exploration thanks to the tab completion like in ipython. BUT still only allow 1 line evaluations. • To do more exploration at a given point in an application, IPython can be invoked, with its embed function: from IPython import embed ; embed() It starts a normal ipython session with the namespace populated from the namespace of your application at the break point. To exit, ctrl-d. 501 Python and Other Languages: An Overview 502 Why use another language with Python? • Best of both worlds: Tested and optimized legacy code + flexibility of Python • Python as glue: Combine several components into one larger program • Speedup Python: Speed up performance critical components with a faster language • Division of labor: Let Fortran be an array processing DSL, let Python and its ecosystem handle automatic testing, file IO, text processing, data munging, GUI integration, HTTP serving, etc. 503 External language toolkit Wrapping legacy code and external libraries: • Wrap with hand-written extension module • Cython wrap C/C++ with custom interface • SWIG Automatically wrap large C/C++ libraries • f2py Automatically wrap Fortran libraries • ctypes Easy to manually wrap external libraries Speed up Python code: • Hand-written extension module • Cython Add type information to Python • Weave Speed up NumPy array expressions • Shedskin and others not covered here 504 Wrapping and Levels of Granularity Coarse-grained: • Python communicates with external program via config files and command line interface using subprocess Medium-grained: • In external program, create a handful of functions that are designed to be wrapped with Python • Wrap a select few pieces of external program and customize the interface Fine-grained: • Wrap everything using automatic tool like SWIG or f2py. • Expose the entire library and its API to Python. 505 Hand Wrapping C/C++ 506 The Wrapper Function fact.h #ifndef FACT_H #define FACT_H int fact(int n); #endif fact.c #include "fact.h" int fact(int n) { if (n <=1) return 1; else return n*fact(n-1); } WRAPPER FUNCTION static PyObject* wrap_fact(PyObject *self, PyObject *args) { /* Python-C data conversion */ int n, result; if (!PyArg_ParseTuple(args, "i", &n)) return NULL; /* C Function Call */ result = fact(n); /* C->Python data conversion */ return Py_BuildValue("i", result); } 507 Complete Example /* Must include Python.h before any standard headers! */ #include "Python.h" #include "fact.h" /* Define the wrapper functions exposed to Python (must be static) */ static PyObject* wrap_fact(PyObject *self, PyObject *args) { int n, result; /* Python->C Conversion */ if (!PyArg_ParseTuple(args, "i", &n)) return NULL; /* Call our function */ result = fact(n); /* C->Python Conversion */ return Py_BuildValue("i", result); } /* Method table declaring the names of functions exposed to Python */ static PyMethodDef ExampleMethods[] = { {"fact", wrap_fact, METH_VARARGS, "Calculate the factorial of n"}, {NULL, NULL, 0, NULL} /* Sentinel */ }; /* Module initialization function called at "import example" */ PyMODINIT_FUNC initexample(void) { (void) Py_InitModule("example", ExampleMethods); } 508 Compiling/using your extension BUILD DYNAMIC LIB BY HAND (EPD ON WINDOWS) C:\...\> gcc –c fact.c fact_wrap.c –I. –Ic:\python27\include C:\...\> gcc -shared fact.o fact_wrap.o -Lc:\python27\libs -lpython27 -o example.pyd BUILD BY HAND (UNIX) ~\...\> gcc –c -fpic fact.c fact_wrap.c –I. –I/usr/include/python2.7 ~\...\> gcc -shared fact.o fact_wrap.o -L/usr/lib/python2.7/config -lpython2.7 -o example.so USING THE MODULE IN PYTHON In [1]: import example In [2]: dir(example) Out[2]: ['__doc__', '__file__', '__name__', '__package__', ‘fact'] Note that the include and library directories are platform specific. Also, be sure to link against the appropriate version of Python (2.7 in these examples). 509 Building using setup.py SETUP.PY # setup.py files are the Python equivalent of Make from distutils.core import setup, Extension ext = Extension(name='example', sources=['fact_wrap.c', 'fact.c']) setup(name="example", ext_modules=[ext]) CALLING SETUP SCRIPTS # Build and install the module. $ setup.py build $ setup.py install # Or build the module "in-place" to ease testing. $ setup.py build_ext –inplace --compiler=mingw32 * Blue text needed only on windows. 510 More details More details on the wrapping functions (C <-> Python): http://docs.python.org/extending More details on writing a setup.py file to incorporate extension modules: http://docs.python.org/extending/building.html Using NumPy arrays in hand-wrapped C code: http://docs.scipy.org/doc/numpy/reference/c-api.html ... and a simple example: demo/hand_wrap_ndarray/ 511 Cython 512 What is Cython? Cython is a Python-like language for writing extension modules. It allows one to mix Python with C or C++ and significantly lowers the barrier to speeding up Python code. The cython command generates a C or C++ source file from a Cython source file; the C/C++ source is then compiled into a heavily optimized extension module. Cython has built-in support for working with NumPy arrays and Python buffers, making numerical programming nearly as fast as C and (nearly) as easy as Python. http://www.cython.org/ 513 A really simple example PI.PYX # Define a function. Include type information for the argument. def multiply_by_pi(int num): return num * 3.14159265359 SETUP_PI.PY # Cython has its own "extension builder" module that knows how # to build cython files into python modules. from distutils.core import setup from distutils.extension import Extension from Cython.Distutils import build_ext ext = Extension("pi", sources=["pi.pyx"]) setup(ext_modules=[ext], cmdclass={'build_ext': build_ext}) See demo/cython for this example. Build it using build.bat. 514 A simple Cython example CALLING MULTIPLY_BY_PI FROM PYTHON $ python setup_pi.py build_ext --inplace -c mingw32 >>> import pi >>> pi.multiply_by_pi() Traceback (most recent call last): File "", line 1, in ? TypeError: function takes exactly 1 argument (0 given) >>> pi.multiply_by_pi("dsa") Traceback (most recent call last): File "", line 1, in ? TypeError: an integer is required >>> pi.multiply_by_pi(3) 9.4247779607700011 515 (some of) the generated code C CODE GENERATED BY CYTHON static PyObject *__pyx_f_2pi_multiply_by_pi(PyObject *__pyx_self, PyObject *__pyx_args, PyObject *__pyx_kwds); /*proto*/ static PyObject *__pyx_f_2pi_multiply_by_pi(PyObject *__pyx_self, PyObject *__pyx_args, PyObject *__pyx_kwds) { int __pyx_v_num; PyObject *__pyx_r; PyObject *__pyx_1 = 0; static char *__pyx_argnames[] = {"num",0}; if (!PyArg_ParseTupleAndKeywords(__pyx_args, __pyx_kwds, "i", __pyx_argnames, &__pyx_v_num)) return 0; /* "C:\pi.pyx":2 */ __pyx_1 = PyFloat_FromDouble((__pyx_v_num * 3.14159265359)); if (!__pyx_1) {__ pyx_filename = __pyx_f[0]; __pyx_lineno = 2; goto __pyx_L1;} __pyx_r = __pyx_1; __pyx_1 = 0; 516 Def vs. CDef DEF — PYTHON FUNCTIONS CDEF — C FUNCTIONS # Call inc for values in sequence. def inc_seq(seq, offset): result = [] for val in seq: res = inc(val, offset) result.append(res) return result # cdef becomes a C function call. cdef int fast_inc(int num, int offset): return num + offset # fast_inc for a sequence def fast_inc_seq(seq, offset): result = [] for val in seq: res = fast_inc(val, offset) result.append(res) return result INC FROM PYTHON FAST_INC FROM PYTHON # inc is callable from Python. >>> inc.inc(1,3) 4 >>> a = range(4) >>> inc.inc_seq(a, 3) [3,4,5,6] # fast_inc not callable in Python >>> inc.fast_inc(1,3) Traceback: ... no 'fast_inc' # But fast_inc_seq is 2x faster # for large arrays. >>> inc.fast_inc_seq(a, 3) 517 [3,4,5,6] # Python callable function. def inc(int num, int offset): return num + offset CPdef: combines def + cdef CPDEF — C AND PYTHON FUNCTIONS # cdef becomes a C function call. cpdef fast_inc(int num, int offset): return num + offset # Calls compiled version inside Cython file def inc_seq(seq, offset): result = [] for val in seq: res = fast_inc(val, offset) result.append(res) return result FAST_INC FROM PYTHON # fast_inc is now callable in Python via Python wrapper >>> inc.fast_inc(1,3) 4 # No speed degradation here >>> inc.inc_seq(a, 3) [3,4,5,6] 518 Functions from C Libraries EXTERNAL C FUNCTIONS # len_extern.pyx # First, "include" the header file you need. cdef extern from "string.h": # Describe the interface for the functions used. cdef extern int strlen(char *c) def get_len(char *message): # strlen can now be used from Cython code (but not Python)… return strlen(message) CALL FROM PYTHON >>> import len_extern >>> len_extern.strlen Traceback (most recent call last): AttributeError: 'module' object has no attribute 'strlen' >>> len_extern.get_len("woohoo!") 7 519 Structures from C Libraries TIME_EXTERN.PYX cdef extern from "time.h": # Describe the structure you are using. struct tm: ... int tm_mday # Day of the month: 1-31 int tm_mon # Months *since* january: 0-11 int tm_year # Years since 1900 ... ctypedef long time_t tm* localtime(time_t *timer) time_t time(time_t *tloc) def get_date(): """ Return a tuple with the current day, month, and year.""" cdef time_t t cdef tm* ts t = time(NULL) ts = localtime(&t) return ts.tm_mday, ts.tm_mon + 1, ts.tm_year CALLING FROM PYTHON >>> extern_time.get_date() (8, 4, 2011) 520 Classes SHRUBBERY.PYX cdef class Shrubbery: # Class level variables cdef int width, height def __init__(self, w, h): self.width = w self.height = h def describe(self): print "This shrubbery is",self.width,"by ",self.height," cubits." CALLING FROM PYTHON >>> import shrubbery >>> x = shrubbery.Shrubbery(1, 2) >>> x.describe() This shrubbery is 1 by 2 cubits. >>> print x.width 521 AttributeError: 'shrubbery.Shrubbery' object has no attribute 'width' Classes from C++ libraries rectangle_extern.h class Rectangle { public: int x0, y0, x1, y1; Rectangle(int x0, int y0, int x1, int y1); ~Rectangle(); int getLength(); int getHeight(); int getArea(); void move(int dx, int dy);}; The implementation of the class and methods is done inside rectangle_extern.cpp. See demo/cython for this example. 522 Classes from C++ libraries rectangle.pyx cdef extern from "rectangle_extern.h": cdef cppclass Rectangle: Rectangle(int, int, int, int) int x0, y0, x1, y1 int getLength() int getHeight() int getArea() void move(int, int) cdef class PyRectangle: cdef Rectangle *thisptr # hold a C++ instance which we're wrapping def __cinit__(self, int x0, int y0, int x1, int y1): self.thisptr = new Rectangle(x0, y0, x1, y1) def __dealloc__(self): del self.thisptr def getLength(self): 523 return self.thisptr.getLength() Classes from C++ libraries SETUP.PY from distutils.core import setup from Cython.Distutils import build_ext from distutils.extension import Extension setup( ext_modules=[Extension("rectangle", sources = ["rectangle.pyx", "rectangle_extern.cpp"], language = "c++")], cmdclass = {'build_ext': build_ext}) CALLING FROM PYTHON In [1]: import rectangle In [2]: r = rectangle.PyRectangle(1,1,2,2) # calls __cinit__ In [3]: r.getLength() Out[3]: 1 In [4]: r.getHeight() AttributeError:rectangle.PyRectangle object has no attribute getHeight In [5]: del r # calls __dealloc__ 524 Using NumPy with Cython #cython: boundscheck=False # Import the numpy cython module shipped with Cython. cimport numpy as np ctypedef np.float64_t DOUBLE def sum(np.ndarray[DOUBLE] ary): # How long is the array in the first dimension? cdef int n = ary.shape[0] # Define local variables used in calculations. cdef unsigned int i cdef double sum # Sum algorithm implementation. sum = 0.0 for i in range(0, n): sum = sum + ary[i] return sum C:\demo\cython>test_sum.py elements: 1,000,000 python sum(approx sec, result): 7.030 numpy sum(sec, result): 0.047 cython sum(sec, result): 0.047 525 Problem: Make This Fast! def mandelbrot_escape(x, y, n): z_x = x z_y = y for i in range(n): z_x, z_y = z_x**2 - z_y**2 + x, 2*z_x*z_y + y if z_x**2 + z_y**2 >= 4.0: break else: i = -1 return i def generate_mandelbrot(xs, ys, n): d = empty(shape=(len(ys), len(xs))) for j in range(len(ys)): for i in range(len(xs)): d[j,i] = mandelbrot_escape(xs[i], ys[j], n) return d 526 Step 1: Add Type Information Type information can be added to function signatures: def mandelbrot_escape(double x, double y, int n): ... def generate_mandelbrot(xs, ys, int n): ... Variables can be declared to have a type using 'cdef': def generate_mandelbrot(xs, ys, int n): cdef int i,j cdef int N = len(xs) cdef int M = len(ys) ... 527 Step 2: Use Cython C Functions In Cython you can declare functions to be C functions using 'cdef' instead of 'def': cdef int mandelbrot_escape(float x, float y, int n): ... This makes the functions: Generate actual C functions, so they are much faster. Not visible to Python, but freely usable in your Cython module. Arbitrary Python objects can still be passed in and out of C functions using the 'object' type. 528 Solution 2: This is Fast! cdef int mandelbrot_escape(double x, double y, int n): cdef double z_x = x cdef double z_y = y cdef int i for i in range(n): z_x, z_y = z_x**2 - z_y**2 + x, 2*z_x*z_y + y if z_x**2 + z_y**2 >= 4.0: break else: i = -1 return i def generate_mandelbrot(xs, ys, int n): cdef int i,j cdef int N = len(xs) cdef int M = len(ys) d = empty(dtype=int, shape=(N, M)) for j in range(M): for i in range(N): d[j,i] = mandelbrot_escape(xs[i], ys[j], n) return d 529 Step 3: Use NumPy in Cython In our example, we are still using Python-level numpy calls to do our array indexing: d[j,i] = mandelbrot_escape(xs[i], ys[j], n) If we use the Cython interface to NumPy, we can declare our arrays to be C-level numpy extension types, and gain even more speed. NumPy arrays are declared using a special buffer notation: cimport numpy as np ... cdef np.ndarray[int, ndim=2] my_array You must declare both the type of the array, and the number of dimensions. All the standard numpy types are declared in the numpy cython declarations. 530 Solution 3: This is Really Fast! mandel.pyx cimport numpy as np ... def generate_mandelbrot(np.ndarray[double, ndim=1] xs, np.ndarray[double, ndim=1] ys, int n): cdef int i,j cdef int N = len(xs) cdef int M = len(ys) cdef np.ndarray[int, ndim=2] d = empty(dtype=int, shape=(N, M)) for j in range(M): for i in range(N): d[j,i] = mandelbrot_escape(xs[i], ys[j], n) return d setup.py ... import numpy ... ext = Extension("mandel", ["mandel.pyx"], include_dirs = [numpy.get_include()]) 531 Step 4: Parallelization using OpenMP Cython supports native parallelism. To use this kind of parallelism, the Global Interpreter Lock (GIL) must be released. It currently supports OpenMP (more backends might be supported in the future). 1) Release the GIL before a block of code: with nogil: # This block of code is executed after releasing the GIL 2) Declare that a cdef function can be called safely without the GIL: cdef int mandelbrot_escape(double x, double y, int n) nogil: … 3) Parallelize for-loops with prange: from cython.parallel import prange ... for j in prange(M): for i in prange(N): d[j,i] = mandelbrot_escape(xs[i], ys[j], n) 532 Solution 4: Even faster! mandel.pyx from cython.parallel import prange ... cdef int mandelbrot_escape(double x, double y, int n) nogil: ... def generate_mandelbrot(np.ndarray[double, ndim=1] xs, np.ndarray[double, ndim=1] ys, int n): """ Generate a mandelbrot set """ cdef ... with nogil: for j in prange(M): for i in prange(N): d[j,i] = mandelbrot_escape(xs[i], ys[j], n) return d setup.py ext = Extension("mandel", ["mandelbrot.pyx"], include_dirs = [numpy.get_include()], extra_compile_args=['-fopenmp'], extra_link_args=['-fopenmp’]) 533 Conclusion Solution Time Speed-up Pure Python 630.72 s x1 Cython (Step 1) 2.7776 s x 227 Cython (Step 2) 1.9608 s x 322 Cython+Numpy (Step 3) 0.4012 s x 1572 Cython+Numpy+prange (Step 4) 0.2449 s x 2575 Timing performed on a 2.3 GHz Intel Core i7 MacBook Pro with 8GB RAM using a 2000x2000 array and an escape time of n=100. [July 20, 2012] 534

Source Exif Data:
File Type                       : PDF
File Type Extension             : pdf
MIME Type                       : application/pdf
PDF Version                     : 1.5
Linearized                      : No
Page Count                      : 267
Producer                        : pdfTeX-1.40.13
Creator                         : TeX
Create Date                     : 2014:01:15 12:25:39-06:00
Modify Date                     : 2014:01:15 12:25:39-06:00
Trapped                         : False
PTEX Fullbanner                 : This is pdfTeX, Version 3.1415926-2.4-1.40.13 (TeX Live 2012) kpathsea version 6.1.0
EXIF Metadata provided by EXIF.tools

Navigation menu