Python For Athena Manual

ENTHOUGHT

S C I E N T I F I C C O M P U T I N G S O L U T I O N S

Python for Athena

Enthought, Inc.

www.enthought.com

All trademarks and registered trademarks are the property of their

respective owners.

Enthought, Inc.

515 Congress Avenue

Suite 2100

Austin, TX 78701

www.enthought.com

Q1-2014

Python for Athena

Enthought, Inc.

www.enthought.com

Background ......................................................................... 2

IPython ............................................................................. 6

Language Introduction ............................................................. 19

Strings ......................................................................... 23

Lists (Indexing and Slicing) ..................................................... 32

Mutable vs. Immutable ......................................................... 38

Tuple .......................................................................... 39

Dictionaries .................................................................... 40

Sets ........................................................................... 44

Assignment .................................................................... 47

If Statements .................................................................. 49

While Loops ................................................................... 51

For Loops ...................................................................... 52

List Comprehensions ............................................................ 53

Looping Patterns ............................................................... 56

Functions ...................................................................... 59

Two types of arguments ........................................................ 61

Variable number of arguments ................................................... 62

Modules ....................................................................... 65

PYTHONPATH ................................................................ 71

Reading Files ................................................................... 72

Writing Files ................................................................... 74

Simple Class Example ........................................................... 75

Overloading Addition ........................................................... 76

Sorting ........................................................................ 77

Error and Exception Handling ...................................................... 80

Finally Clause .................................................................. 84

Else Clause ..................................................................... 85

Raising Exceptions .............................................................. 86

Error Messages ................................................................. 87

Warnings ...................................................................... 88

Deﬁning New Exceptions ........................................................ 90

File IO Error Handling and ’with’ statement ...................................... 91

Object Oriented Programming in Python .......................................... 93

Class Deﬁnition ................................................................ 97

Object Creation Process ........................................................ 98

Simple Class Example ........................................................... 99

Overloading Addition ........................................................... 100

Inheritance ..................................................................... 101

Super .......................................................................... 104

Advanced Python Topics ........................................................... 106

Classes and properties .......................................................... 107

Variable Scoping Rules .......................................................... 113

Class Scoping Rules ............................................................ 117

Partial Function Application (Curries) ............................................ 119

Decorators ..................................................................... 120

Iterators ....................................................................... 129

Generators ..................................................................... 130

Timing Code ................................................................... 132

Dynamic Code Execution ....................................................... 135

Dynamic byte-code generation .................................................. 137

csv ﬁles ........................................................................ 138

Connecting to Databases ........................................................ 141

Regular Expressions in Python ................................................... 146

datetime ....................................................................... 156

Timing Code ................................................................... 158

sys module ..................................................................... 161

os module ...................................................................... 164

Remote calls using XMLRPC .................................................... 166

Calling Sub-processes ........................................................... 168

Multiprocessing ..................................................................... 173

What is multiprocessing? ....................................................... 174

Multiprocessing vs. Threading ................................................... 175

The Process Class .............................................................. 176

Inter-process Communication .................................................... 178

Synchronization ................................................................ 180

Task Pools ..................................................................... 182

Shared Memory ................................................................ 184

Multiprocessing and NumPy ..................................................... 185

NumPy ............................................................................. 187

Matplotlib Basics (an interlude) ................................................. 194

Plotting errors bars with Matplotlib .............................................. 209

Plotting in 3D .................................................................. 211

Introducing NumPy Arrays ...................................................... 215

Slicing/Indexing Arrays ......................................................... 218

Multi-Dimensional Arrays ....................................................... 219

Fancy Indexing ................................................................. 224

Array Data Structure ........................................................... 230

Array Calculation Methods ...................................................... 243

Summary of Array Attributes and Methods ....................................... 247

Array Creation Functions ........................................................ 251

Trig and Other Functions ....................................................... 256

Vectorizing Functions ........................................................... 258

Array Operators ................................................................ 259

Universal Function Methods ..................................................... 263

Other NumPy Functions ........................................................ 269

Array Broadcasting ............................................................. 272

Vector Quantization ............................................................ 278

Structured Arrays ............................................................... 284

Memory Mapped Arrays ........................................................ 288

Output Formats ................................................................ 301

Error Handling ................................................................. 303

Interactive 3D Visualization with Mayavi - mlab ................................... 305

Mlab .......................................................................... 306

Various types of plots with Mlab ................................................ 308

Figure decorations .............................................................. 315

Mayavi GUI .................................................................... 316

Selected topics on scientiﬁc analysis with SciPy ................................... 318

Statistics ....................................................................... 320

Linear Algebra .................................................................. 328

Interpolation ................................................................... 335

Optimization ................................................................... 344

Data analysis with pandas .......................................................... 351

Pandas ........................................................................ 352

Creating pandas data structures ................................................. 354

Reading from and writing to ﬁles ................................................ 357

Index manipulation and data alignment .......................................... 358

Computations with pandas ...................................................... 362

Pivot tables and advanced reshaping ............................................. 365

Data aggregation ............................................................... 367

Dealing with dates and times .................................................... 369

Visualization ................................................................... 371

Software Craftsmanship in Python ................................................. 373

Naming Variables ............................................................... 381

Documenting Code ............................................................. 399

Coding Standards .............................................................. 422

Functions and Refactoring Code ................................................. 446

Code Read Example ............................................................ 447

Testing Code ................................................................... 456

Monitoring execution ........................................................... 468

Timing Code ................................................................... 472

Timing and proﬁling ................................................................ 474

Timing Code ................................................................... 475

Proﬁling with cProﬁle ........................................................... 477

cProﬁle with kernprof.py script .................................................. 478

cProﬁle from ipython ........................................................... 479

Proﬁling with line proﬁler and kernprof .......................................... 487

The Python Debugger .............................................................. 492

Overview of Python with other languages ......................................... 502

Interface with C : hand wrapping .................................................. 506

Cython .............................................................................. 512

Def vs. CDef ................................................................... 517

Def vs. CDef ................................................................... 518

Functions from C Libraries ...................................................... 519

Structures from C Libraries ...................................................... 520

Classes ......................................................................... 521

Classes ......................................................................... 523

Using NumPy with Cython ...................................................... 525

1

Enthought, Inc.

www.enthought.com

Python for Athena

2

What Is Python?

ONE LINER

Python is an interpreted programming language that allows you to do almost

anything possible with a compiled language (C/C++/Fortran) without requiring all the

complexity.

PYTHON HIGHLIGHTS

•Automatic garbage

collection

•Dynamic typing

•Interpreted and interactive

•Object-oriented

•“Batteries Included”

•Free

•Portable

•Easy to Learn and Use

•Truly Modular

3

Who is using Python?

WALL STREET

PROCTER & GAMBLE

GOOGLE

HOLLYWOOD

Banks and hedge funds rely on

Python for their high speed trading

systems, data analysis, and

visualization.

REDHAT

YOUTUBE

Anaconda, the Redhat Linux installer

program, is written in Python.

One of top three languages used at

Google along with C++ and Java.

Guido van Rossum worked there.

Digital animation and special effects:

•Industrial Light and Magic

•Imageworks

•Tippett Studios

•Disney

•Dreamworks

PETROLEUM INDUSTRY

Geophysics and exploration tools:

•ConocoPhillips

•Shell

Fluid dynamics simulation tools.

Highly scalable web application for

delivering video content.

4

PYPL PopularitY of Programming Language index

* https://sites.google.com/site/pydatalog/pypl/PyPL-PopularitY-of-Programming-Language (CC by 3.0 license)

GitHub Top Languages 2013

5

6

IPython

An enhanced

interactive Python shell

7

Starting IPython

IPython can be started:

•by typing ipython --pylab in any

terminal/command prompt

•w/ the pylab icon

8

Starting IPython in Canopy

9

PyLab: Interactive Python Environment

The PyLab mode in IPython handles some

gory details behind the scenes. It allows

both the Python command interpreter

(above) and the GUI plot window (right) to

coexist. This involves a bit of multi-

threaded magic.

PyLab also imports some handy functions

into the command interpreter for user

convenience.

10

IPython notebook: exportable session

Contains in 1 file the

inputs, the outputs,

figures, plain text

(markdown), titles and

more...

•.ipynb files.

•Create a new one in

Canopy by selecting:

File > New >

IPython notebook

11

IPython

In [1]: a=1

In [2]: a

Out[2]: 1

STANDARD PYTHON

In [3]: b = [1,2,3]

# List available variables.

In [4]: %whos

Variable Type Data/Info

----------------------------

a int 1

b list n=3

# Remove user defined variables.

In [5]: %reset

In [6]: %whos

Interactive namespace is empty.

In [7]: plot

NameError: name 'plot' is not

defined

# Reload pylab.

In [8]: %pylab

Welcome to pylab,…

RESET

AVAILABLE VARIABLES “%reset” also removes the

names imported by PyLab,

such as the plot command.

12

# Change directory (note Unix style forward slashes!)

In [9]: cd c:/python_class/Demos/speed_of_light

c:\python_class\Demos\speed_of_light

# List directory contents (Unix style, not “dir”).

In [10]: ls

Volume in drive C has no label.

Volume Serial Number is 5417-593D

Directory of c:\python_class\Demos\speed_of_light

09/01/2008 02:53 PM <DIR> .

09/01/2008 02:53 PM <DIR> ..

09/01/2008 02:48 PM 1,188 exercise_speed_of_light.txt

09/01/2008 02:48 PM 2,682,023 measurement_description.pdf

09/01/2008 02:48 PM 187,087 newcomb_experiment.pdf

09/01/2008 02:48 PM 1,312 speed_of_light.dat

09/01/2008 02:48 PM 1,436 speed_of_light.py

09/01/2008 02:48 PM 1,232 speed_of_light2.py

6 File(s) 2,874,278 bytes

2 Dir(s) 11,997,437,952 bytes free

Directory Navigation in IPython

Tab completion helps you find and type directory and file names.

13

# Print working directory name (Unix style, not “cd”).

In [11]: pwd

c:\python_class\Demos\speed_of_light

# Bookmark the demo and exercise directories, so we can return

# to them easily.

In [12]: cd ..

c:\python_class\Demos

In [13]: %bookmark demo

In [14]: cd ../Exercises

c:\python_class\Exercises

In [15]: %bookmark exer

In [16]: %bookmark –l

demo -> c:\python_class\Demos

exer -> c:\python_class\Exercises

In [17]: cd demo

(bookmark:demo) -> c:\python_class\Demos

Directory Bookmarks

14

# tab completion

In [11]: %run speed_of_li

speed_of_light.dat speed_of_light.py

# execute a python file

In [11]: %run speed_of_light.py

Running Scripts in IPython

15

Function Info

# Follow a command with '?' to print its documentation.

In [19]: len?

Type: builtin_function_or_method

Base Class: <type 'builtin_function_or_method'>

String Form: <built-in function len>

Namespace: Python builtin

Docstring:

len(object) -> integer

Return the number of items of a sequence or mapping.

HELP USING ?

16

Function Info

# Follow a command with '??' to print its source code.

In [43]: squeeze??

def squeeze(a):

"""Remove single-dimensional entries from the shape of a.

Examples

--------

>>> x = array([[[1,1,1],[2,2,2],[3,3,3]]])

>>> x.shape

(1, 3, 3)

>>> squeeze(x).shape

(3, 3)

"""

try:

squeeze = a.squeeze

except AttributeError:

return _wrapit(a, 'squeeze')

return squeeze()

?? can’t show the source

code for “extension” functions

that are implemented in C.

SHOW SOURCE CODE USING ??

17

IPython History

HISTORY COMMAND

# list previous commands. Use

# ‘magic’ % because ‘hist’ is

# histogram function in pylab

In [3]: %hist

a=1

a

# list string from prompt[2]

In [4]: _i2

Out[4]: 'a\n'

INPUT HISTORY

# grab previous result

In [5]: _

Out[5]: 'a\n’

# grab result from prompt[2]

In [6]: _2

Out[6]: 1

OUTPUT HISTORY

The up and down arrows

scroll through your ipython

input history.

18

Reading Simple Tracebacks

In [9]: 1 + "hello"

--------------------

TypeError Traceback (most recent call last)

C:\...<ipython-input…> in <module>()

----> 1 1 + "hello"

TypeError: unsupported operand type(s) for +: 'int' and 'str'

ERROR ADDING AN INTEGER TO A STRING

The “type” of error

that occurred. Short message about

why it occurred.

Location and code

where error occurred.

# Again we fail when adding two variables, but note that the

# traceback tells us we have a completely different problem.

# In this case, our variable doesn't exist, so the operation fails.

In [10]: undefined_var + 1

…

NameError: name 'undefined_var' is not defined

ERROR TRYING TO ADD A NON-EXISTENT VARIABLE

19

Language Introduction

20

# real numbers

>>> b = 1.4 + 2.3

>>> b

3.6999999999999997

# "prettier" version.

>>> print b

3.7

>>> type(b)

# complex numbers

>>> c = 2+1.5j

>>> c

(2+1.5j)

# adding two values

>>> 1 + 1

2

# setting a variable

>>> a = 1

>>> a

1

# checking a variable’s type

>>> type(a)

# an arbitrarily long integer

>>> a = 12345678901234567890

>>> a

12345678901234567890L

>>> type(a)

# Remove 'a' from the 'namespace'

>>> del a

>>> a

NameError: name 'a' is not

defined

Interactive Calculator

The four numeric types in Python on 32-bit

architectures are:

integer (4 byte)

long integer (any precision)

float (8 byte like C’s double)

complex (16 byte)

The numpy module, which we will see later,

supports a larger number of numeric types.

21

# don't do this

>>> max = 100

# ...some time later...

>>> x = max(4, 5)

TypeError: 'int' object is not

callable

>>> 1+2-(3*4/6)**5+(7%5)

-27

More Interactive Calculation

ARITHMETIC OPERATIONS

>>> abs(-3)

3

>>> max(0, min(10, -1, 4, 3))

0

>>> round(2.718281828)

3.0

SIMPLE MATH FUNCTIONS

TYPE CONVERSION

>>> int(2.718281828)

2

>>> float(2)

2.0

>>> 1+2.0

3.0

OVERWRITING FUNCTIONS (!)

ALTERNATIVE NOTATIONS

>>> 0xFF

255

>>> 023 # octal!

19

>>> 6e3

6000.0

IN-PLACE OPERATIONS

>>> b = 2.5

>>> b += 0.5 # b = b + 0.5

>>> b

3.0

# Also -=, *=, /=, etc.

22

>>> q = 1 > 0

>>> q

True

>>> type(q)

# <, >, <=, >=, ==, !=

>>> 1 >= 2

False

>>> 1 + 1 == 2

True

>>> 2**3 != 3**2

True

# Chained comparisons

>>> 1 < 10 < 100

True

Logical expressions, bool data type

COMPARISON OPERATORS

bool DATA TYPE

and OPERATOR

>>> 1 > 0 and 5 == 5

True

# If first operand is false,

# the second is not evaluated.

>>> 1 < 0 and max(0,1,2) > 1

False

>>> a = 50

>>> a < 10 or a > 90

False

# If first operand is true,

# the second is not evaluated.

>>> a = 0

>>> a < 10 or a > 90

True

or OPERATOR

>>> not 10 <= a <= 90

True

not OPERATOR

23

Strings

# using double quotes

>>> s = "hello world"

>>> print s

hello world

# single quotes also work

>>> s = 'hello world'

>>> print s

hello world

>>> s = "12345"

>>> len(s)

5

CREATING STRINGS

# concatenating two strings

>>> "hello " + "world"

'hello world'

# repeating a string

>>> "hello " * 3

'hello hello hello '

STRING OPERATIONS

STRING LENGTH

SPLIT/JOIN STRINGS

# split space-delimited words

>>> s = "hello world"

>>> wrd_lst = s.split()

>>> print wrd_lst

['hello', 'world']

# join words back together

# with a space in between

>>> space = ' '

>>> space.join(wrd_lst)

'hello world'

24

A few string methods and functions

>>> s = "hello world"

>>> s.replace('world','Mars')

'hello Mars'

REPLACING TEXT

CONVERT TO UPPER CASE

>>> s.upper()

'HELLO WORLD'

>>> s = "\t hello \n"

>>> s.strip()

'hello'

REMOVE WHITESPACE

NUMBERS TO STRINGS

>>> str(1.1 + 2.2)

'3.3'

>>> repr(1.1 + 2.2)

'3.3000000000000003'

>>> str(1)

'1'

>>> hex(255)

'0xFF'

>>> oct(19)

'023'

STRINGS TO NUMBERS

>>> int('23')

23

>>> int('FF', 16)

255

>>> float('23')

23.0

25

Available string methods

# list available methods on a string

>>> dir(s)

[…

'capitalize',

'center',

'count',

'decode',

'encode',

'endswith',

'expandtabs',

'find',

'format',

'index',

'isalnum',

'isalpha',

'isdigit',

'islower',

'isspace',

'istitle',

'isupper',

'join',

'ljust',

'lower',

'lstrip',

'partition',

'replace',

'rfind',

'rindex',

'rjust',

'rpartition',

'rsplit',

'rstrip',

'split',

'splitlines',

'startswith',

'strip',

'swapcase',

'title',

'translate',

'upper',

'zfill']

26

Multi-line Strings

# strings in triple quotes

# retain line breaks

>>> a = """hello

... world"""

>>> print a

hello

world

# The \ character means line

# continuation. Be careful

# because it must be the last

# character on line.

>>> a = "hello " \

... "world"

>>> print a

hello world

TRIPLE QUOTES

# group strings using

# parentheses

>>> a = ("hello "

... "world")

>>> print a

hello world

MULTI-LINE WITH PARENTHESES

LINE CONTINUATION

# including the new line

>>> a = "hello\nworld"

>>> print a

hello

world

NEW LINE CHARACTER

String Formatting

str.format(*args, **kwargs)

Replacement field: {<field_name> :<format_spec> }

•Delimited by curly brackets. (Use {{ and }} to include curly brackets in the output.)

•If field_name is an integer, it refers to a position in the positional arguments:

>>> '{0} is greater than {1}'.format(100, 50)

'100 is greater than 50'

•If field_name is a name, it refers to a keyword argument:

>>> '{last}, {first}'.format(first='Ellen', last='Ripley')

'Ripley, Ellen'

The format() method replaces the replacement fields in the string with the values given as

arguments. Any other text in the string remains unchanged. For example:

Optional

>>> '{0:2d}-{1}: {name}, ${price:.2f}'.format(7, 19, name='SC1', price=3.4)

' 7-19: SC1, $3.40'

27

28

String Formatting - Format Spec

ALIGNMENT OPTION

Type Meaning

b Binary format.

c Character; converts int to unicode char.

d Decimal integer (base 10).

o Octal (base 8).

x Hex (base 16), lower case.

X Hex (base 16), upper case.

n Number; same as 'd', but uses current locale.

None Same as 'd'.

INTEGER TYPE CODES

For numbers only.

Char Meaning

+ Include a sign for positive and negative number.

- Indicate sign for negative numbers only (default)

space Include a leading space for positive numbers.

SIGN OPTION

FLOATING POINT TYPE CODES

Type Meaning

e Scientific notation.

E Scientific notation, with upper case 'E'.

f Fixed point.

F Fixed point; same as 'f'.

g General format.

G General format; same as 'g', with upper case 'E' when

necessary.

n Number; same as 'g', but uses current locale.

% Percentage. Multiplies by 100 and displays with 'f',

followed by a percent sign.

None Same as 'g'.

STRING TYPE CODES

Type Meaning

s String. This is the default, and may be omitted.

Char Meaning

< Left aligned.

> Right aligned.

= (For numeric types only.) Pad after the sign but

before the digits (e.g. +000000120).

^ Center within the available space.

If an alignment character is given, it may be preceded by

a fill character.

The format spec is a sequence of characters

including:

•the alignment option,

•the sign option,

•the width (and .precision) option

•the type code.

>>> 'price: ${0:=-7.2f}'.format(3.4)

'price: $ 3.40'

29

String Formatting - Examples

# Basic string formatting

>>> print '{} {} {}'.format('a', 'b', 'c')

a b c

# Numbered fields refer to the position of the arguments

>>> print '{2} {1} {0}'.format('a', 'b', 'c')

c b a

# Named fields refer to keyword arguments

>>> print '{color} {n} {x}'.format(n=10, x=1.5, color='blue')

blue 10 1.5

# Positional and keyword arguments can be combined

>>> print '{color} {0} {x} {1}'.format(10, 'foo', x=1.5, color='blue')

blue 10 1.5 foo

# Precision and padding

>>> from math import pi

>>> print '{0:10} {1:10d} {c:10.2f}'.format('foo', 5, c=2*pi)

foo 5 6.28

30

String Formatting - Examples cont.

# Fixed point format (and a named keyword argument).

>>> print '[{x:5.0f}] [{x:5.1f}] [{x:5.2f}]'.format(x=12.3456)

[ 12] [ 12.3] [12.35]

# Sign options.

>>> '{:-f}; {:-f}'.format(3.14, -3.14) # Default

'3.140000; -3.140000'

>>> '{:+f}; {:+f}'.format(3.14, -3.14) # Use '+'

'+3.140000; -3.140000'

>>> '{: f}; {: f}'.format(3.14, -3.14) # Use ' '

' 3.140000; -3.140000'

# Alignment (and using a numbered positional argument).

>>> print '[{0:<10s}] [{0:>10s}] [{0:^10s}]'.format('PYTHON')

[PYTHON ] [ PYTHON] [ PYTHON ]

# Alignment with fill character.

>>> print '[{0:*<10s}] [{0:*>10s}] [{0:*^10s}]'.format('PYTHON')

[PYTHON****] [****PYTHON] [**PYTHON**]

# Different bases for an integer (hex, decimal, octal, binary).

>>> '{0:X} {0:d} {0:o} {0:b}'.format(254)

'FE 254 376 11111110'

31

Formatting with %

FORMAT OPERATOR %

# the % operator formats values

# to strings using C conventions.

>>> s = "some numbers:"

>>> x = 1.34

>>> y = 2

>>> t = "%s %f, %d" % (s,x,y)

>>> print t

some numbers: 1.340000, 2

>>> y = -2.1

>>> print "%f\n%f" % (x,y)

1.340000

-2.100000

>>> print "% f\n% f" % (x,y)

1.340000

-2.100000

>>> print "%4.2f" % x

1.34

Conversion Meaning

d or i Signed integer decimal

o Unsigned octal

u Unsigned decimal

x Unsigned hexadecimal (lowercase)

X Unsigned hexadecimal (uppercase)

e Floating point exponential format (lowercase)

E Floating point exponential format (uppercase)

F or f Floating point decimal format

G or g Floating point format or exponential

c Single character

r Converts object using repr()

s Converts object using str()

CONVERSION CODES

Flag Meaning

0 The conversion will be zero padded for

numeric values.

- The converted value is left adjusted (overrides

the "0" conversion if both are given).

<space> (a space) A blank should be left before a

positive number (or empty string) produced by

a signed conversion.

+ A sign character ("+" or "-") will precede the

conversion (overrides a "space" flag).

CONVERSION FLAGS

32

List objects

>>> a = [10,11,12,13,14]

>>> print a

[10, 11, 12, 13, 14]

LIST CREATION WITH BRACKETS

# simply use the + operator

>>> [10, 11] + [12, 13]

[10, 11, 12, 13]

CONCATENATING LIST

REPEATING ELEMENTS IN LISTS

# the range function is helpful

# for creating a sequence

>>> range(5)

[0, 1, 2, 3, 4]

>>> range(2,7)

[2, 3, 4, 5, 6]

>>> range(2,7,2)

[2, 4, 6]

# the multiply operator

# does the trick

>>> [10, 11] * 3

[10, 11, 10, 11, 10, 11]

range(start, stop, step)

33

Indexing

# list

# indices: 0 1 2 3 4

>>> a = [10,11,12,13,14]

>>> a[0]

10

RETRIEVING AN ELEMENT

The first element in an array has index=0

as in C. Take note Fortran programmers!

NEGATIVE INDICES

# negative indices count

# backward from the end of

# the list

#

# indices: -5 -4 -3 -2 -1

>>> a = [10,11,12,13,14]

>>> a[-1]

14

>>> a[-2]

13

SETTING AN ELEMENT

>>> a[1] = 21

>>> print a

[10, 21, 12, 13, 14]

OUT OF BOUNDS

>>> a[10]

Traceback (innermost last):

File "<interactive input>",line 1,in ?

IndexError: list index out of range

34

More on list objects

# use in or not in

>>> a = [10,11,12,13,14]

>>> 13 in a

True

>>> 13 not in a

False

DOES THE LIST CONTAIN x ?

LIST CONTAINING MULTIPLE

TYPES

# list containing integer,

# string, and another list

>>> a = [10,'eleven',[12,13]]

>>> a[1]

'eleven'

>>> a[2]

[12, 13]

# use multiple indices to

# retrieve elements from

# nested lists

>>> a[2][0]

12

>>> len(a)

3

LENGTH OF A LIST

# use the del keyword

>>> del a[2]

>>> a

[10,'eleven']

DELETING OBJECT FROM LIST

Prior to version 2.5, Python was limited to

sequences with ~2 billion elements.

Python 2.5 can handle up to 263 elements.

35

Slicing

# indices:

# -5 -4 -3 -2 -1

# 0 1 2 3 4

>>> a = [10,11,12,13,14]

# [10,11,12,13,14]

>>> a[1:3]

[11, 12]

# negative indices work also

>>> a[1:-2]

[11, 12]

>>> a[-4:3]

[11, 12]

SLICING LISTS

# omitted boundaries are

# assumed to be the beginning

# (or end) of the list

# grab first three elements

>>> a[:3]

[10, 11, 12]

# grab last two elements

>>> a[-2:]

[13, 14]

# every other element

>>> a[::2]

[10, 12, 14]

var[lower:upper:step]

Extracts a portion of a sequence by specifying a lower and upper bound.

The lower-bound element is included, but the upper-bound element is not included.

Mathematically: [lower, upper). The step value specifies the stride between elements.

OMITTING INDICES

36

A few methods for list objects

some_list.reverse( )

Add the element x to the end

of the list some_list.

some_list.sort( cmp )

some_list.append( x )

some_list.index( x )

some_list.count( x ) some_list.remove( x )

Count the number of times x

occurs in the list.

Return the index of the first

occurrence of x in the list.

Delete the first occurrence of x from the

list.

Reverse the order of elements in the

list.

By default, sort the elements in

ascending order. If a compare

function is given, use it to sort

the list.

some_list.extend( sequence )

Concatenate sequence onto this list.

some_list.insert( index, x )

Insert x before the specified index.

Return the element at the specified

index. Also, remove it from the list.

some_list.pop( index )

37

List methods in action

>>> a = [10,21,23,11,24]

# add an element to the list

>>> a.append(11)

>>> print a

[10,21,23,11,24,11]

# how many 11s are there?

>>> a.count(11)

2

# extend with another list

>>> a.extend([5,4])

>>> print a

[10,21,23,11,24,11,5,4]

# where does 11 first occur?

>>> a.index(11)

3

# insert 100 at index 2?

>>> a.insert(2, 100)

>>> print a

[10,21,100,23,11,24,11,5,4]

# pop the item at index=3

>>> a.pop(3)

23

# remove the first 11

>>> a.remove(11)

>>> print a

[10,21,100,24,11,5,4]

# sort the list (in-place)

# Note: use sorted(a) to

# return a new list.

>>> a.sort()

>>> print a

[4,5,10,11,21,24,100]

# reverse the list

>>> a.reverse()

>>> print a

[100,24,21,11,10,5,4]

38

Mutable vs. Immutable

# Mutable objects, such as

# lists, can be changed

# in place.

# insert new values into list

>>> a = [10,11,12,13,14]

>>> a[1:3] = [5,6]

>>> print a

[10, 5, 6, 13, 14]

MUTABLE OBJECTS IMMUTABLE OBJECTS

# Immutable objects, such as

# integers and strings,

# cannot be changed in place.

# try inserting values into

# a string

>>> s = 'abcde'

>>> s[1:3] = 'xy'

Traceback (innermost last):

File "<interactive input>",line 1,in ?

TypeError: object doesn't support

slice assignment

# here's how to do it

>>> s = s[:1] + 'xy' + s[3:]

>>> print s

'axyde'

The cStringIO module treats strings

like a file buffer and allows insertions.

It’s useful when working with large

strings or when speed is paramount.

39

Tuple – Immutable Sequence

>>> a = (10,11,12,13,14)

>>> print a

(10, 11, 12, 13, 14)

TUPLE CREATION

PARENTHESES ARE OPTIONAL

LENGTH-1 TUPLE

>>> (10,)

(10,)

>>> (10)

10

TUPLES ARE IMMUTABLE

>>> a = 10,11,12,13,14

>>> print a

(10, 11, 12, 13, 14)

(10) is not a tuple,

but an integer

with parentheses.

# create a list

>>> a = range(10,15)

[10, 11, 12, 13, 14]

# cast the list to a tuple

>>> b = tuple(a)

>>> print b

(10, 11, 12, 13, 14)

# try inserting a value

>>> b[3] = 23

TypeError: 'tuple' object doesn't

support item assignment

40

Dictionaries

Dictionaries store key/value pairs. Indexing a dictionary by a key returns

the value associated with it. The key must be immutable.

# Create an empty dictionary using curly brackets.

>>> record = {}

# Each indexed assignment creates a new key/value pair.

>>> record['first'] = 'Jmes'

>>> record['last'] = 'Maxwell'

>>> record['born'] = 1831

>>> print record

{'first': 'Jmes', 'born': 1831, 'last': 'Maxwell'}

# Create another dictionary with initial entries.

>>> new_record = {'first': 'James', 'middle':'Clerk'}

# Now update the first dictionary with values from the new one.

>>> record.update(new_record)

>>> print record

{'first': 'James', 'middle': 'Clerk', 'last':'Maxwell', 'born':

1831}

DICTIONARY EXAMPLE

41

Accessing and deleting keys and values

ACCESS USING INDEX NOTATION REMOVE AN ENTRY WITH DEL

ACCESS WITH get(key, default)

>>> print record['first']

James

>>> del record['middle']

>>> record

{'born': 1831, 'first':

'James', 'last': 'Maxwell'}

The get() method returns the value

associated with a key; the optional

second argument is the return value if

the key is not in the dictionary.

>>> record.get('born',0)

1831

>>> record.get('home', 'TBD')

'TBD'

>>> record['home']

KeyError: ...

REMOVE WITH pop(key, default)

pop() removes the key from the

dictionary and returns the value; the

optional second argument is the return

value if the key is not in the dictionary.

>>> record.pop('born', 0)

1831

>>> record

{'first': 'James', 'last':

'Maxwell'}

>>> record.pop('born', 0)

0

42

A few dictionary methods

some_dict.clear( )

some_dict.copy( )

x in some_dict

some_dict.keys( )

some_dict.values( )

some_dict.items( )

Remove all key/value pairs from

the dictionary, some_dict.

Create a copy of the dictionary

Test whether the dictionary contains the

key x.

Return a list of all the keys in the

dictionary.

Return a list of all the values in the

dictionary.

Return a list of all the key/value pairs in

the dictionary.

43

Dictionary methods in action

# dict of animals:count pairs

>>> barn = {'cows': 1,

... 'dogs': 5,

... 'cats': 3}

# test for chickens

>>> 'chickens' in barn

False

# get a list of all keys

>>> barn.keys()

['cats','dogs','cows']

# get a list of all values

>>> barn.values()

[3, 5, 1]

# return key/value tuples

>>> barn.items()

[('cats', 3), ('dogs', 5),

('cows', 1)]

# How many cats?

>>> barn['cats']

3

# Change the number of cats.

>>> barn['cats'] = 10

>>> barn['cats']

10

# Add some sheep.

>>> barn['sheep'] = 5

>>> barn['sheep']

5

44

Set objects

A set is an unordered collection of

unique, immutable objects.

DEFINITION

# an empty set

>>> s = set()

# convert a sequence to set

>>> t = set([1,2,3,1])

# note removal of duplicates

>>> t

set([1, 2, 3])

CONSTRUCTION

ADD ELEMENTS

# Remove an element. Raise

# exception if not in the set.

>>> t.remove(1)

>>> t

set([2, 3, 5, 6, 7])

# Remove and return an

# arbitrary element from the set.

>>> t.pop()

2

# Remove element from list

# if it exists.

>>> t.discard(3)

>>> t

set([5, 6, 7])

# Otherwise, do nothing.

>>> t.discard(20)

>>> t

set([5, 6, 7])

>>> t.add(5)

set([1, 2, 3, 5])

>>> t.update([5,6,7])

set([1, 2, 3, 5, 6, 7])

REMOVE ELEMENTS

45

Set Operations

A B

>>> a.union(b)

set([1, 2, 3, 4, 5, 6])

>>> a | b

set([1, 2, 3, 4, 5, 6])

UNION

INTERSECTION

>>> a.symmetric_difference(b)

set([1, 2, 5, 6])

>>> a ^ b #xor

set([1, 2, 5, 6])

>>> a.intersection(b)

set([3, 4])

>>> a & b

set([3, 4])

SYMMETRIC DIFFERENCE

>>> a.difference(b)

set([1, 2])

>>> a – b

set([1, 2])

DIFFERENCE

>>> a = set([1,2,3,4])

>>> b = set([3,4,5,6])

OVERLAPPING SETS

A B

46

Set Containment

AB

# Is b fully contained in a?

>>> b.issubset(a)

True

>>> b <= a

True

ISSUBSET ISSUPERSET

>>> a = set([1,2,3])

>>> b = set([1,2])

OVERLAPPING SETS

# Does a fully contain b?

>>> a.issuperset(b)

True

>>> a >= b

True

There is also an immutable type of

set called a frozenset (analogous to

a tuple) which can be used as a

dictionary key or an element of a

set.

47

Assignment of “simple” object

>>> x = 0

Assignment creates object references.

x

y

# This causes x and y to point

# to the same value.

>>> y = x

# Re-assigning y to a new value

# decouples the two variables.

>>> y = "foo"

>>> print x

0

x

y

0

foo

48

3 4

Assignment of Container object

>>> x = [0, 1, 2]

Assignment creates object references.

0 1 2

x

y

# This causes x and y to point

# to the same list.

>>> y = x

# A change to y also changes x.

>>> y[1] = 6

>>> print x

[0, 6, 2]

0 6 2

x

y

# Re-assigning y to a new list

# decouples the two variables.

>>> y = [3, 4]

x

0 6 2

y

49

If statements

if/elif/else provides conditional execution of code blocks.

if <condition>:

elif <condition>:

else:

# a simple if statement

>>> x = 10

>>> if x > 0:

... print 'Hey!'

... print 'x > 0'

... elif x == 0:

... print 'x is 0'

... else:

... print 'x is negative'

... < hit return >

Hey!

x > 0

IF EXAMPLE IF STATEMENT FORMAT

50

Test Values

•zero, None, “”, and empty objects are treated as False.

•All other objects are treated as True.

# empty objects test as false

>>> x = []

>>> if x:

... print 1

... else:

... print 0

... < hit return >

0

EMPTY OBJECTS

It often pays to be explicit. If you are

testing for an empty list, then test for:

if len(x) == 0:

...

This is clearer to future readers of

your code. It also can avoid bugs

where x==None may be passed in and

unexpectedly go down this path.

51

While loops

while loops iterate until a condition is met

# the condition tested is

# whether lst is empty

>>> lst = range(3)

>>> while lst:

... print lst

... lst = lst[1:]

... < hit return >

[0, 1, 2]

[1, 2]

[2]

while <condition>:

WHILE LOOP BREAKING OUT OF A LOOP

# breaking from an infinite

# loop

>>> i = 0

>>> while True:

... if i < 3:

... print i,

... else:

... break

... i = i + 1

... < hit return >

0 1 2

52

For loops

for loops iterate over a sequence of objects

>>> for item in range(5):

... print item,

... < hit return >

0 1 2 3 4

# For a large range, xrange()

# is faster and more memory

# efficient.

>>> for item in xrange(10**6):

... print item,

... < hit return >

0 1 2 3 4 5 6 7 8 9 10 11 ...

>>> animals=['dogs','cats',

... 'bears']

>>> accum = ''

>>> for animal in animals:

... accum += animal + ' '

... < hit return >

>>> print accum

dogs cats bears

for <loop_var> in <sequence>:

TYPICAL SCENARIO LOOPING OVER A STRING

>>> for item in 'abcde':

... print item,

... < hit return >

a b c d e

LOOPING OVER A LIST

53

List Comprehension

# element by element transform of

# a list by applying an

# expression to each element

>>> a = [10,21,23,11,24]

>>> results=[]

>>> for val in a:

... results.append(val+1)

>>> results

[11, 22, 24, 12, 25]

LIST TRANSFORM WITH LOOP LIST COMPREHENSION

# list comprehensions provide

# a concise syntax for this sort

# of element by element

# transformation

>>> a = [10,21,23,11,24]

>>> [val+1 for val in a]

[11, 22, 24, 12, 25]

LIST COMPREHENSION WITH

FILTER

>>> a = [10,21,23,11,24]

>>> [val+1 for val in a if val>15]

[22, 24, 25]

# transform only elements that

# meet a criteria

>>> a = [10,21,23,11,24]

>>> results=[]

>>> for val in a:

... if val>15:

... results.append(val+1)

>>> results

[22, 24, 25]

FILTER-TRANSFORM WITH LOOP

Consider using a list

comprehension whenever you

need to transform one sequence

to another.

54

Generator Expressions

>>> set([str(i) for i in range(5)])

set(['1', '0', '3', '2', '4'])

# tests on sequences

>>> any([i >= 3 for i in range(5)])

True

>>> all([i >= 3 for i in range(5)])

False

# summation

>>> sum([i**2 for i in range(5)])

30

LIST COMPREHENSION

>>> set(str(i) for i in range(5))

set(['1', '0', '3', '2', '4'])

# tests on sequences

>>> any(i >= 3 for i in range(5))

True

>>> all(i >= 3 for i in range(5))

False

# summation

>>> sum(i**2 for i in range(5))

30

Generator expressions are like list comprehensions without the brackets:

GENERATOR EXPRESSION

55

Multiple assignments

# creating a tuple without ()

>>> d = 1,2,3

>>> d

(1, 2, 3)

# multiple assignments

>>> a,b,c = 1,2,3

>>> print b

2

# multiple assignments from a

# tuple

>>> a,b,c = d

>>> print b

2

# also works for lists

>>> a,b,c = [1,2,3]

>>> print b

2

FROM TUPLE TO TUPLE FROM LIST TO TUPLE

56

Looping Patterns

# Looping through a sequence of

# tuples allows multiple

# variables to be assigned.

>>> pairs = [(0,'a'),(1,'b'),

... (2,'c')]

>>> for index, value in pairs:

... print index, value

0 a

1 b

2 c

MULTIPLE LOOP VARIABLES ZIP

# zip 2 or more sequences

# into a list of tuples

>>> x = [0, 1, 2]

>>> y = ['a', 'b', 'c']

>>> zip(x,y)

[(0,'a'), (1,'b'), (2,'c')]

>>> for index, value in zip(x,y):

... print index, value

0 a

1 b

2 c

ENUMERATE

# enumerate -> index, item.

>>> y = ['a', 'b', 'c']

>>> for index, value in enumerate(y):

... print index, value

0 a

1 b

2 c

REVERSED

>>> z = [(0,'a'),(1,'b'),(2,'c')]

for index, value in reversed(z):

... print index, value

2 c

1 b

0 a

57

Looping over a dictionary

>>> for key in d:

... print key, d[key]

a 1

c 3

b 2

DEFAULT LOOPING (KEYS)

LOOPING OVER KEYS (EXPLICIT)

>>> for key in d.keys():

... print key, d[key]

a 1

c 3

b 2

>>> d = {'a':1, 'b':2, 'c':3}

>>> for val in d.values():

... print val

1

3

2

LOOPING OVER VALUES

LOOPING OVER ITEMS

>>> for key, val in d.items():

... print key, val

a 1

c 3

b 2

59

Anatomy of a function

def add(arg0, arg1):

"""Add two numbers"""

a = arg0 + arg1

return a

Function arguments are listed,

separated by commas. They are passed

by assignment.

The keyword def indicates the

start of a function.

Indentation is

used to indicate

the contents of

the function. It

is not optional,

but a part of the

syntax.

An optional return

statement specifies

the value returned from the

function. If return is omitted,

the function returns the

special value None.

A colon ( : )

terminates

the function

signature.

An optional docstring

documents the function

in a standard way for

tools like ipython.

60

Our new function in action

# We'll create our function

# on the fly in the

# interpreter.

>>> def add(x,y):

... a = x + y

... return a

# Test it out with numbers.

>>> val_1 = 2

>>> val_2 = 3

>>> add(val_1,val_2)

5

# How about strings?

>>> val_1 = 'foo'

>>> val_2 = 'bar'

>>> add(val_1,val_2)

'foobar'

# Functions can be assigned

# to variables.

>>> func = add

>>> func(val_1, val_2)

'foobar'

# How about numbers and strings?

>>> add('abc',1)

Traceback (innermost last):

File "<interactive input>", line 1, in ?

File "<interactive input>", line 2, in add

TypeError: cannot add type "int" to string

61

Function Calling Conventions

# The "standard" calling

# convention we know and love.

>>> def add(x, y):

... return x + y

>>> add(2, 3)

5

POSITIONAL ARGUMENTS DEFAULT VALUES

# Arguments can be

# assigned default values.

>>> def quad(x,a=1,b=1,c=0):

... return a*x**2 + b*x + c

# Use defaults for a, b and c.

>>> quad(2.0)

6.0

# Set b=3. Defaults for a & c.

>>> quad(2.0, b=3)

10.0

# Keyword arguments can be

# passed in out of order.

>>> quad(2.0, c=1, a=3, b=2)

17.0

# specify argument names

>>> add(x=2, y=3)

5

# or even a mixture if you are

# careful with order

>>> add(2, y=3)

5

KEYWORD ARGUMENTS

62

Function Calling Conventions

# Pass in any number of

# arguments. Extra arguments

# are put in the tuple args.

>>> def foo(x, y, *args):

... print x, y, args

>>> foo(2, 3, 'hello', 4)

2 3 ('hello', 4)

VARIABLE NUMBER OF ARGS VARIABLE KEYWORD ARGS

# Extra keyword arguments

# are put into the dict kw.

>>> def bar(x, y=1, **kw):

... print x, y, kw

>>> bar(1, y=2, a=1, b=2)

1 2 {'a': 1, 'b': 2}

63

Function Calling Conventions

# This signature takes any

# number of positional and

# keyword arguments.

>>> def foo(*args, **kw):

... print args, kw

>>> foo(2, 3, x='hello', y=4)

(2, 3) {'x': 'hello', 'y': 4}

THE ‘ANYTHING’ SIGNATURE

# To return multiple values

# from a function, we return

# a tuple containing those

# values. This is a common

# use of multiple (tuple)

# assignment.

>>> def functions(x):

... y1 = x**2 + x

... y2 = x**3 + x**2 + 2*x

... return y1, y2

>>> a, b = functions(c)

MULTIPLE FUNCTION RETURNS

64

Expanding Function Arguments

>>> def add(x, y):

... return x + y

# '*' in a function call

# converts a sequence into the

# arguments to a function.

>>> vars = [1,2]

>>> add(*vars)

3

POSITIONAL ARGUMENT

EXPANSION KEYWORD ARGUMENT

EXPANSION

>>> def bar(x, y=1, **kw):

... print x, y, kw

# '**' expands a

# dictionary into keyword

# arguments for a function.

vars = {'y':3, 'z':4}

>>> bar(1, **vars)

1, 3, {'z': 4}

65

Modules

# ex1.py

PI = 3.1416

def sum(lst):

tot = lst[0]

for value in lst[1:]:

tot = tot + value

return tot

w = [0,1,2,3]

print sum(w), PI

EX1.PY FROM SHELL

[ej@bull ej]$ python ex1.py

6 3.1416

FROM INTERPRETER

# load and execute the module

>>> import ex1

6 3.1416

# get/set a module variable

>>> print ex1.PI

3.1416

>>> ex1.PI = 3.14159

>>> print ex1.PI

3.14159

# call a module function

>>> t = [2,3,4]

>>> ex1.sum(t)

9

66

Modules cont.

# ex1.py version 2

PI = 3.14159

def sum(lst):

tot = 0

for value in lst:

tot = tot + value

return tot

w = [0,1,2,3,4]

print sum(w), PI

EDITED EX1.PY

INTERPRETER

# load and execute the module

>>> import ex1

6 3.1416

< edit file >

# import module again

>>> import ex1

# nothing happens!!!

# Use reload to force a

# previously imported library

# to be reloaded.

>>> reload(ex1)

10 3.14159

67

Modules cont. 2

A Python file can be used as a script, or as a module, or both.

# An example module that can

# be run as a script.

PI = 3.1416

def sum(lst):

""" Sum the values in a

list.

"""

tot = 0

for value in lst:

tot = tot + value

return tot

EX2.PY

def add(x,y):

" Add two values."

a = x + y

return a

def test():

w = [0,1,2,3]

assert( sum(w) == 6)

print 'test passed'

# This code runs only if this

# module is the main program.

if __name__ == '__main__':

test()

68

import Variations

# The most basic import

>>> import ex2

>>> ex2.PI

3.1416

BASIC IMPORTS

ALIASING A NAME

IMPORTING SPECIFIC SYMBOLS

IMPORTING *EVERYTHING*

# Pull *everything* into the

# local namespace.

>>> from ex2 import *

>>> PI

3.1416

>>> add(3,4.5)

7.5

# Use an ‘alias’

>>> import ex2 as e2

>>> e2.PI

3.1416

# Select specific names to

# bring into the local

# namespace.

>>> from ex2 import add, PI

>>> PI

3.1416

>>> add(2,3)

5

69

Modules cont. 3

foo/

__init__.py

bar.py (defines func)

baz.py (defines zap)

FILE STRUCTURE PACKAGES

Often a library will contain several

modules. These may be organized

in a hierarchical directory structure,

and imported using "dotted module

names". The first and the intermediate

names (if any) are called "packages".

Example:

>>> from foo.bar import func

>>> from foo.baz import zap

bar.py and baz.py are modules in the

package foo.

•The directory foo must be in the

python search path

•The file __init__.py indicates

that foo is a package. It may be an

empty file.

•In the simplest case:

package = directory

module = python file

70

Standard Modules

Python has a large library of standard modules ("batteries included"):

re - regular expressions

copy – shallow and deep copy operations

datetime - time and date objects

math, cmath - real and complex math

decimal, fractions - arbitrary precision

decimal and rational number objects

os, os.path, shutil - filesystem operations

sqlite3 - internal SQLite database

gzip, bz2, zipfile, tarfile – compression and

archiving formats

csv, netrc – file format handling

xml – various modules for handling XML

htmllib – an HTML parser

httplib, ftplib, poplib, socket, etc. –

modules for standard internet protocols

cmd – support for command interpreters

pdb – Python interactive debugger

profile, cProfile, timeit – Python profilers

collections, heapq, bisect – standard CS

algorithms and data structures

mmap – memory-mapped files

threading, Queue – threading support

multiprocessing – process based ‘threading’

subprocess – executing external commands

pickle, cPickle – object serialization

struct – interpret bytes as packed binary data

and many more… To see the content of one:

>>> dir(module_name)

71

Setting up PYTHONPATH

WINDOWS UNIX: .cshrc

UNIX: .bashrc

• Right-click on My Computer

• Click Properties

• Click Advanced Tab

• Click Environment Variables Button at

the bottom of the Advanced Tab

• Click New to create

PYTHONPATH or

• Click Edit to change existing

PYTHONPATH

• Changes take effect in the next

Command Prompt or IPython session.

PYTHONPATH is an environment variable (or set of registry entries on

Windows) that lists the directories Python searches for modules.

!! NOTE: The following should !!

!! all be on one line !!

setenv PYTHONPATH

$PYTHONPATH:$HOME/your_modules

PYTHONPATH=$PYTHONPATH:$HOME/your

_modules

export PYTHONPATH

72

Reading files

>>> results = []

>>> f = open('c:\\rcs.txt','r')

# Read all the lines.

>>> lines = f.readlines()

>>> f.close()

# Discard the header.

>>> lines = lines[1:]

>>> for line in lines:

... # split line into fields

... fields = line.split()

... # convert text to numbers

... freq = float(fields[0])

... vv = float(fields[1])

... hh = float(fields[2])

... # group & append to results

... all = [freq,vv,hh]

... results.append(all)

... < hit return >

FILE INPUT EXAMPLE

EXAMPLE FILE: RCS.TXT

#freq (MHz) vv (dB) hh (dB)

100 -20.3 -31.2

200 -22.7 -33.6

>>> for i in results: print i

[100.0, -20.30…, -31.20…]

[200.0, -22.70…, -33.60…]

PRINTING THE RESULTS

See demo/reading_files directory

for code.

73

More compact version

>>> results = []

>>> f = open('c:\\rcs.txt','r')

>>> f.readline()

'#freq (MHz) vv (dB) hh (dB)\n'

>>> for line in f:

... all = [float(val) for val in line.split()]

... results.append(all)

... < hit return >

>>> for i in results:

... print i

... < hit return >

ITERATING ON A FILE AND LIST COMPREHENSIONS

EXAMPLE FILE: RCS.TXT

#freq (MHz) vv (dB) hh (dB)

100 -20.3 -31.2

200 -22.7 -33.6

74

Writing files

>>> # Mode 'w': create new file:

>>> f = open('a.txt', 'w')

>>> f.write('Hello, world!')

>>> f.close()

>>> open('a.txt').read()

'Hello, world!'

>>> # Use the 'with' statement:

>>> with open('a.txt', 'w') as f:

.... f.write('Wow!')

....

>>> open('a.txt').read()

'Wow!'

>>> # Mode 'a': append to file:

>>> with open('a.txt', 'a') as f:

.... f.write(' Boo.')

....

>>> open('a.txt').read()

'Wow! Boo.'

FILE OUTPUT

>>> # Redirect output of a

>>> # print statement:

>>> f = open('a.txt', 'w')

>>> print >> f, "Here I am."

>>> f.close()

>>> open('a.txt').read()

'Here I am.\n'

REDIRECTED PRINT

WRITE AND READ

>>> f = open('a.txt', 'w+')

>>> print >> f, 12, 34, 56

>>> f.seek(3)

>>> f.read(2)

'34'

>>> f.close()

75

Particle Class Example

>>> class Particle(object):

... # Initializer method

... def __init__(self, m, v):

... self.mass = m # Assign attribute values of new object

... self.velocity = v

... # Method for calculating object momentum

... def momentum(self):

... return self.mass * self.velocity

... # A "magic" method defines object's string representation.

... # Evaluating the repr recovers the Particle object.

... def __repr__(self):

... return "Particle({0}, {1})".format(

... repr(self.mass), repr(self.velocity))

SIMPLE PARTICLE CLASS

EXAMPLE

>>> a = Particle(3.2, 4.1)

>>> print a.momentum()

13.12

>>> a

Particle(3.2, 4.1)

76

Overloading Addition

... def __add__(self, other)

... if not isinstance(other, Particle):

... return NotImplemented

... mnew = self.mass + other.mass

... vnew = (self.momentum() + other.momentum())/mnew

... return Particle(mnew, vnew)

SIMPLE PARTICLE CLASS

Classes can override many behaviors using special method names —

including numeric behavior.

EXAMPLE (cont.)

>>> b = Particle(8.6, 10.2)

>>> b, a

(Particle(8.6, 10.2), Particle(3.2, 4.1))

>>> c = a + b

>>> c

Particle(11.8, 8.545762711864406)

>>> print c.momentum()

100.84

77

Sorting

# sorting

>>> x = ['s','o','r','t']

# new list

>>> sorted(x)

['o', 'r', 's', 't']

# in-place

>>> x.sort()

>>> x

['o', 'r', 's', 't']

# reversing

# new list (note explicit 'list')

>>> x = ['b', 'a', 'c', 'k']

>>> list(reversed(x))

['k', 'c', 'a', 'b']

# in-place

>>> x.reverse()

>>> x

['k', 'c', 'a', 'b']

CUSTOM SORTS IN-PLACE VS NEW LIST

# define a key function to

# transform values before

# comparing in a sort

>>> def ignore_case(x):

... return x.lower()

>>> x = ['S','o','r','T']

>>> sorted(x)

['S', 'T', 'o', 'r']

>>> sorted(x, key=ignore_case)

['o', 'r', 'S', 'T']

78

Sorting

# comparison functions for a variety of particle values

>>> def by_mass(x):

... return x.mass

>>> def by_momentum(x):

... return x.momentum()

>>> def by_kinetic_energy(x):

... return 0.5*x.mass*x.velocity**2

# sorting particles in a list by their various properties

>>> from particle import Particle

>>> x = [Particle(1.2,3.4), Particle(2.1,2.3), Particle(4.6,.7)]

>>> sorted(x, key=by_mass)

[(m:1.2, v:3.4), (m:2.1, v:2.3), (m:4.6, v:0.7)]

>>> sorted(x, key=by_momentum)

[(m:4.6, v:0.7), (m:1.2, v:3.4), (m:2.1, v:2.3)]

>>> sorted(x, key=by_kinetic_energy)

[(m:4.6, v:0.7), (m:2.1, v:2.3), (m:1.2, v:3.4)]

SORTING CLASS INSTANCES

See demo/particle

directory for sample

code.

79

Excellent source of help

http://www.python.org/doc

80

Error and Exception

Handling

81

Basic Exception Handling

>>> import math

>>> math.log10(10.0)

1.0

>>> math.log10(0.0)

Traceback (innermost last):

ValueError: math domain error

>>> a = 0.0

>>> try:

... r = math.log10(a)

... except ValueError:

... print 'Warning: overflow occurred. Value set to -100.0'

... # set value to -100.0 and continue

... r = -100.0

Warning: overflow occurred. Value set to -100.0

>>> print r

-100.0

CATCHING ERROR AND CONTINUING

ERROR ON LOG OF ZERO

82

Exceptions

try:

a = value_bag[key]

r = math.log10(a)

except KeyError as error:

print "Variable '%s' not found" % error.args

ACCESS TO THE EXCEPTION OBJECT

try:

a = value_bag[key]

r = math.log10(a)

except KeyError as error:

print "Variable '%s' not found" % error.args

except ValueError:

print 'Warning: overflow occurred. Value set to -100.0'

# set value to -100.0 and continue

r = -100.0

MULTIPLE EXCEPTION CLAUSES

Exception Object

83

Exceptions

try:

a = value_bag[key]

r = math.log10(a)

except (KeyError, ValueError) as error:

print "an error occurred", error

HANDLING MULTIPLE EXCEPTIONS IN ONE CODE BLOCK

# If the Exception exception type is

# specified, almost all exceptions

# are caught. Note: Don't do this in

# libraries because it can “mask”

# unexpected exceptions, which

# should be passed to calling code.

try:

employee = directory_lookup(name)

except Exception as e:

print ”An error occurred: %s" % e

CATCH ALL EXCEPTIONS

You can also leave off the

exception type completely and

catch all exceptions, including

SystemExit and

KeyboardInterrupt. This can

make it hard to stop execution

of a program, so this is almost

never a good idea.

84

Exceptions – Finally Clause

# The finally clause always executes. Use it for code that

# needs to “clean up” a resource whether the code executed

# successfully or not.

try:

# We don’t want others to use the radio_station object

# while it is on the air.

radio_station.on_air = True

radio_station.broadcast("Pinball Wizard")

except KeyError as error:

print "Could not find song '%s'" % error.args

finally:

# If an exception occurs, or the song finishes,

# the station is taken off air.

radio_station.on_air = False

TRY, FINALLY

85

Exceptions – Else Clause

# The else clause only executes if an exception *does not*

# occur.

# An example from Python’s standard tutorial.

#

# Print out the line count for all the file names passed in

# on the command line.

for arg in sys.argv[1:]:

try:

f = open(arg, 'r')

except IOError:

print 'cannot open', arg

else:

print arg, 'has', len(f.readlines()), 'lines'

f.close()

TRY, ELSE

86

Raising Exceptions

# Use the raise statement to raise an exception from your code.

def percent_range_check(percent):

if not 0.0 <= percent <= 100.0:

msg = "Percent value should be between 0 and 100"

# This is the standard, Python 3.0 compatible

# way to raise exceptions.

raise ValueError(msg)

# The following also works, but is deprecated. It will not work

# in Python 3.0

# raise ValueError, msg

>>> percent_range_check(101.0)

ValueError: Percent value should be between 0 and 100

RAISE

87

Error Messages

# Use the raise statement to raise an exception from your code.

def percent_range_check(percent):

if not 0.0 <= percent <= 100.0:

msg = "Percent value should be between 0 and 100"

raise ValueError(msg)

>>> percent_range_check(101.0)

ValueError: Percent value should be between 0 and 100

RAISE

# If possible, print out the offending value in error messages.

def percent_range_check2(percent):

if not 0.0 <= percent <= 100.0:

msg = "Percent (%3.2f) is not between 0 and 100" % percent

raise ValueError(msg)

>>> percent_range_check2(101.0)

ValueError: Percent(101.00) is not between 0 and 100

INFORMATIVE ERROR MESSAGES

88

Warnings

# Warnings alert users without halting execution.

import warnings

def percent_range_warning(percent):

if not 0.0 <= percent <= 100.0:

msg = "Percent(%3.2f) is not between 0 and 100" % percent

warnings.warn(msg, RuntimeWarning)

# Warnings are different than exceptions because they don’t halt execution.

>>> percent_range_warning(101.0)

RuntimeWarning: Percent(101.00) is not between 0 and 100

# You can choose to ignore certain types of warnings globally.

# allowable actions: error, ignore, always, default, once, module

>>> warnings.filterwarnings(action="ignore", category=RuntimeWarning)

>>> percent_range_warning(101.0)

WARNINGS

89

Exception and Warnings Example

import warnings

def weighted_average(values, weights):

"""

Return the average of all the values weighted by the weights array.

Both values and weights are numpy arrays and must be the same length.

The sum of the weights must be 1.0.

"""

#ensure weights sum to (nearly) 1.0

weights_total = sum(weights)

if not allclose(weights_total, 1.0, atol=1e-8):

msg = "The sum of the weights should usually be 1.0." \

"Instead, it was '%f'" % weights_total

warnings.warn(msg)

# Provide useful error message for unequal array lengths.

if len(weights) != len(values):

msg = "The values (len=%d) and weights (len=%d) arrays" \

"must have the same lengths." % (len(values), len(weights))

raise ValueError(msg)

# weighted average calculation

avg = sum(values * weights) / weights_total

return avg

90

Defining New Exceptions

class LengthMismatchError(ValueError):

""" Sequence of wrong length was used """

pass

def check_for_length_mismatch(a, b):

""" Compare the lengths of sequences a and b. If they are different,

raise a LengthMismatchError.

"""

if len(a) != len(b):

msg = "The two sequences do not have the same length " \

"(%d!=%d)." % (len(a), len(b))

raise LengthMismatchError(msg)

>>> check_for_length_mismatch([1,2],[1,2,3])

LengthMismatchError: The two sequences do not have the same length (2!=3).

EXCEPTION CLASSES

# Catching either LengthMismatchError, or ValueError will catch our exception.

>>> try:

... check_for_length_mismatch([1,2],[1,2,3])

... except ValueError:

... print "exception caught"

exception caught

CATCHING BASE EXCEPTIONS

91

Robust File IO Error Handling

try:

file = open(file_name, "rb")

try:

# Move into the file header and read the “name” field.

file.seek(NAME_OFFSET)

name = file.read(12)

# Other file manipulation...

finally:

# Ensure File is closed, even if an exception occurs.

file.close()

except IOError as err:

# Handle file IO Errors that occur during opening the file or

# reading data from the file here. Use err.errno to determine

# the type of IOError if necessary.

# Using the logging system to report unhandled exceptions.

logging.exception("unexpected IOError")

TYPICAL FILE IO TRY BLOCKS

92

Using ‘with’ for File IO Error Handling

# Enable the "with" statement feature within this module.

# Not necessary for Python >= 2.6.

from __future__ import with_statement

try:

with open("myfile.txt", "rb") as file:

# Move into the file header and read the “name” field.

file.seek(NAME_OFFSET)

name = file.read(12)

# Other file manipulation...

except IOError as err:

# Handle file IO Errors that occur during opening the file or

# reading data from the file here. Use err.errno to determine

# the type of IOError if necessary.

# Using the logging system to report unhandled exceptions.

logging.exception("unexpected IOError")

TYPICAL FILE IO TRY BLOCKS

93

Object Oriented

Programming

In Python

94

Modeling a Power Plant

# Data that describes the

# modeled "object"

name: Comanche Peak

maximum_output: 2.3 Gigawatts

current_output: ? Gigawatts

status: normal

night_watchman: Homer Simpson (!)

...

ATTRIBUTES

BEHAVIORS (METHODS)

# tasks or operations that the

# model does.

start_up()

shut_down()

emergency_shutdown()

...

95

Creating Objects

Object oriented programming unifies (encapsulates)

attributes and behavior within a single

definition, called a Class.

Instances, or Objects, of a class are specific

manifestations of that class.

>>> plant_1 = PowerPlant(name='Comanche Peak')

# Create a 2nd instance from the same class

>>> plant_2 = PowerPlant(name='Susquehanna')

INSTANTIATING CLASS INSTANCES (OBJECTS)

Class Instance

96

Working with Objects

# Our befuddled friend.

>>> homer = Person(first='Homer', last='Simpson')

# Create the power plant he is in charge of...

>>> plant = PowerPlant(name='Comanche Peak', maximum_output=2.3,

night_watchman=homer)

# Start the plant using the 'start_up' method.

>>> plant.start_up()

# Homer does his thing.

>>> homer.eat('donut')

>>> homer.take_nap()

# Check the status (an attribute) of

# the plant. No Bueno.

>>> if plant.status != 'normal':

... print 'status:', plant.status

... homer.speak('Doh!')

... plant.emergency_shutdown()

status: meltdown

Homer says 'Doh!'

97

Class Definition

class PowerPlant(object):

# Constructor method

def __init__(self, # self is ALWAYS the first argument.

name='', maximum_output=0.0, night_watchman=None):

# assign passed-in arguments to new object

self.name = name

self.maximum_output = maximum_output

self.night_watchman = night_watchman

# Intialize other attributes.

self.current_output = 0.0

self.status = 'normal'

# class methods

def start_up(self):

self.start_reactor_cooling_pump()

self.reduce_boric_acid_concentration()

self.remove_control_rods()

def shut_down(self):

...

98

Object Creation Process

# What does Python do when it sees this line of code?

>>> plant = PowerPlant(name='Comanche Peak', maximum_output=2.3,

night_watchman=homer)

CREATING A CLASS OBJECT

# The first step is to call the PowerPlant class' "magic"

# __new__ method to create the object. (This is a secret…)

new_object = PowerPlant.__new__(PowerPlant,

name='Comanche Peak',

maximum_output=2.3,

night_watchman=homer)

# The second step is to call the magic "constructor" method, __init__.

# This is the method you will need to write...

PowerPlant.__init__(new_object,

name='Comanche Peak',

maximum_output=2.3,

night_watchman=homer)

# Python hands this newly created object back to you.

plant = new_object

UNDER THE COVERS…

99

Particle Class Example

>>> class Particle(object):

... # Initializer method

... def __init__(self, m, v):

... self.mass = m # Assign attribute values of new object

... self.velocity = v

... # Method for calculating object momentum

... def momentum(self):

... return self.mass * self.velocity

... # A "magic" method defines object's string representation.

... # Evaluating the repr recovers the Particle object.

... def __repr__(self):

... return "Particle({0}, {1})".format(

... repr(self.mass), repr(self.velocity))

SIMPLE PARTICLE CLASS

EXAMPLE

>>> a = Particle(3.2, 4.1)

>>> print a.momentum()

13.12

>>> a

Particle(3.2, 4.1)

100

Overloading Addition

... def __add__(self, other)

... if not isinstance(other, Particle):

... return NotImplemented

... mnew = self.mass + other.mass

... vnew = (self.momentum() + other.momentum())/mnew

... return Particle(mnew, vnew)

SIMPLE PARTICLE CLASS

Classes can override many behaviors using special method names —

including numeric behavior.

EXAMPLE (cont.)

>>> b = Particle(8.6, 10.2)

>>> b, a

(Particle(8.6, 10.2), Particle(3.2, 4.1))

>>> c = a + b

>>> c

Particle(11.8, 8.545762711864406)

>>> print c.momentum()

100.84

101

Inheritance

PowerPlant

NuclearPowerPlant HydroPowerPlant

102

Base Class

class PowerPlant(object):

# Constructor method

def __init__(self, # "self" is ALWAYS the first argument.

name='', maximum_output=0.0, night_watchman=None):

# assign passed-in arguments to new object

self.name = name

self.maximum_output = maximum_output

self.night_watchman = night_watchman

# Initialize other attributes.

self.current_output = 0.0

self.status = 'normal'

# Default implementation for methods

def start_up(self):

self.current_output = self.maximum_output

def shut_down(self):

self.current_output = 0.0

...

103

Sub-Classing

# Derive 'specialized' classes from the PowerPlant base class.

# They will 'inherit' the methods of the base class.

class NuclearPowerPlant(PowerPlant):

# over-ride methods that need custom behavior.

def start_up(self):

self.start_reactor_cooling_pump()

self.reduce_boric_acid_concentration()

self.remove_control_rods()

def shut_down(self):

...

104

Using Super

# Use 'super' to call the "super-class" (parent class) methods.

class HydroPowerPlant(PowerPlant):

def __init__(self, name='', river_name='',

maximum_output=0.0, night_watchman=None):

# Use 'super' to call an over-ridden method from the

# base class.

super(HydroPowerPlant, self).__init__(name=name,

maximum_output=maximum_output,

night_watchman=night_watchman)

self.river_name = river_name

# over-ride methods that need custom behavior.

def start_up(self):

self.open_sluice_gate()

self.unlock_turbine_shaft()

self.remove_control_rods()

105

Interchangeable Classes

# The same code will work without modification for a

# NuclearPowerPlant or a HydroPowerPlant.

>>> homer = Person(first='Homer', last='Simpson')

# Create the power plant he is in charge of...

>>> plant = HydroPowerPlant(name='Comanche Peak', maximum_output=2.3,

night_watchman=homer)

# Start the plant using the 'start_up' method.

>>> plant.start_up()

# Homer does his thing.

>>> homer.eat('donut')

>>> homer.take_nap()

# Check the status (an attribute) of the plant. No Bueno.

>>> if plant.status != 'normal':

... print 'status:', plant.status

... homer.speak('Doh!')

... plant.emergency_shutdown()

status: Fish stuck in impeller.

Homer says 'Doh!'

106

Advanced Python Topics

107

Properties

108

Properties: an example

>>> a = MyNumberA(number=13)

>>> a.number

13

>>> a.number = 200

>>> a.number

200

>>> a.set_number('0xFF')

>>> a.number

255

>>> a.number = '0xFFFF' # !!!

>>> a.number

‘0xFFFF’

The set_number method allows

the value to be an integer or a

hex or binary string (e.g. ‘0xFF’

or ‘0b1101’).

# my_number.py

class MyNumberA(object):

def __init__(self, number):

self.number = number

def set_number(self, value):

if isinstance(value, basestring):

self.number = eval(value)

else:

self.number = value

109

Properties: definition

Wouldn’t this be nice?

>>> a.number = '0xFFFF'

>>> a.number

65535

>>> a.number = '0b1101'

>>> a.number

13

The number magically converts a hex string or a binary string to the integer value.

We can do this, by creating a property. A property allows us to implement something

that acts like an attribute, but attempting to get or set a value of the attribute actually results

in a function call. property() is a built-in function in Python.

property(fget=None, fset=None, fdel=None, doc=None)

110

Properties: application 1

>>> b = MyNumberB(number=100)

>>> b.number

100

>>> b.number = '0xFF'

>>> b.number

255

>>> b.number = '0b1101'

>>> b.number

13

# my_numberB.py

class MyNumberB(object):

def __init__(self, number):

self._number = number

def _get_number(self):

return self._number

def _set_number(self, value):

if isinstance(value, basestring):

number = eval(value)

else:

number = value

self._number = number

number = property(_get_number, _set_number)

111

Properties: application 2

For example, the energy and momentum methods of the Particle class can be

implemented as read-only properties:

# My_particle_class.py

class Particle(object):

…

def _get_momentum(self):

return self.mass * self.velocity

def _get_energy(self):

return 0.5 * self.mass * self.velocity**2

def _not_allowed(self, value):

raise RuntimeError("Can not set energy or momentum

directly.")

energy = property(_get_energy, _not_allowed)

momentum = property(_get_momentum, _not_allowed)

112

Properties

Then, for example:

>>> p = Particle(mass=3.0, velocity=4.0)

>>> p.momentum

12.0

>>> p.energy = 100

Traceback (most recent call last):

File "<stdin>", line 1, in <module>

File "particle_with_properties.py", line 23, in _not_allowed

raise RuntimeError("can not set energy or momentum directly.")

RuntimeError: can not set energy or momentum directly.

113

General Scoping Rules

•Local Function Scope

•Enclosing Function Scope

•Global Scope

•Builtin Scope

Variables are looked up from

namespaces in the following order:

114

Scoping Rules

# If a requested variable is not

# found in the local scope, the

# "global" or module level scope

# is searched.

# global module level variable

c = 1

def foo(a,b):

d = a + b + c

LOCAL SCOPE

GLOBAL SCOPE

MODIFYING GLOBALS

# a, b, c and d are all local

# variables in the function.

def foo(a,b):

c = 1

d = a + b + c

# Modifying globals from within

# a local scope requires the

# global keyword.

c = 1

def foo():

global c

c = 2

>>> foo()

>>> print c

2

115

Scoping Rules

BUILTIN SCOPE

# If not found in the local or

# global scope, the "builtin"

# scope is searched. 'len' is

# a builtin function.

def list_length(a):

return len(a)

>>> a = [1,2,3]

>>> print list_length(a)

3

# The "builtin" functions are

# found in the special

# __builtin__ module

>>> import __builtin__

>>> __builtin__.len

<built-in function len>

117

Class Scoping Rules

# global variable

var = 0

class MyClass(object):

# class variable

var = 1

def access_class_c(self):

print 'class var:', self.var

def access_global_c(self):

print 'global var:', var

def write_class_c(self):

# Modify the class variable.

MyClass.var = 2

print 'class var:', self.var

def write_instance_c(self):

# Create an instance variable.

self.var = 3

print 'instance var:', self.var

See

demo/variable_scoping/class_

scoping.py.

>>> obj = MyClass()

>>> obj.access_class_c()

class var: 1

>>> obj.access_global_c()

global var: 0

>>> obj.write_class_c()

class var: 2

>>> obj.write_instance_c()

instance var: 3

CLASS SCOPING EXAMPLES

118

Lexical Scoping Rules

NESTED SCOPE FUNCTION CLOSURE

# Nested functions have access

# to the enclosing function’s

# variables.

def outer():

a = 1

def inner():

print "a =", a

inner()

>>> outer()

a = 1

# The inner() function has access

# to the outer variables that it

# uses for its entire lifetime.

def outer():

a = 1

def inner():

return a

return inner #Note: returns

#function object!

>>> func = outer()

>>> print 'a (1):', func()

a (1): 1

See

demo/variable_scoping/lexical

_scoping.py.

119

Partial Function Application (or Curries)

PARTIAL

# functools.partial "freezes"

# one or more arguments of a

# function to create a simplified

# signature and/or modify default

# arguments.

# By default, int uses base 10

# conversion from string to int.

>>> int('10010')

10010

# Create function to convert

# base 2 strings to ints.

>>> from functools import partial

>>> basetwo = partial(int, base=2)

>>> basetwo('10010')

18

120

Decorators

121

Func as objects and arguments

Passing Functions Function Objects

>>> def foo(x):

... print x

# foo is of type ‘function’.

>>> type(foo)

# Functions have a set of

# attributes, like other objects.

>>> dir(foo)

['__call__', ..., 'func_closure’,

'func_code', 'func_defaults’,

'func_dict', 'func_doc',

'func_globals’, 'func_name']

# __call__ is the most important

>>> foo(42)

42

In Python, functions are first-class objects.

# Since functions are objects,

# like everything else in Python,

# they can be passed as arguments

# to other functions!

>>> def foo(x):

... print x

...

>>> def bar(f, x):

... x += 1

... f(x)

...

>>> bar(foo, 42)

43

Note: Python has a ‘lambda’

keyword for defining anonymous

functions inline.

122

Decorators

# A basic decorator

>>> def dec(f):

... print "I am decorating function ", id(f)

... return f

>>> declen = dec(len)

I am decorating function 2338768

>>> declen([10, 20, 30])

3

Decorators are functions which take a function as an argument and

(usually) return another function.

Example

123

Decorators

# Makes a new function that will loudly call the original.

# Note the definition of new_func inside the definition of

# loud.

>>> def loud(f):

... def new_func(*args, **kw):

... print "Calling with", args, kw

... rtn = f(*args, **kw)

... print "Return value is", rtn

... return rtn

... return new_func

>>> loudlen = loud(len) # Create a modified len().

>>> n = loudlen([10, 20, 30]) # Call it.

Calling with ([10, 20, 30],) {}

Return value is 3

>>> n

3

Example

124

Decorators

# Given a decorator dec’, the phrasing...

def foo(x):

print x

foo = dec(foo)

# can be replaced by...

@dec

def foo(x):

print x

# If the decorator returns a function,

# decorations may be chained.

@dec1

@dec2

def foo(x):

print x

Python uses the ‘@’ symbol to replace a function with its decorated version.

@ Notation

125

Decorator Examples

# A decorator that adds 1

def plus_one(f):

def new_func(x):

return f(x) + 1

return new_func

# A decorator that multiplies by 2

def times_two(f):

def new_func(x):

return f(x) * 2

return new_func

# A decorated function

@plus_one

@times_two

def foo(x):

return int(x)

>>> foo(13)

27

126

Decorator Factories

# A basic decorator generator

def super_dec(x, y, z):

# The decorator we are creating

def dec(f):

# The function to return

def new_func(*args, **kw):

print x + y + z

return f(*args, **kw)

return new_func

return dec

If you want to change the behavior of the decorator when you decorate, you

do this by writing a decorator factory. A factory is a function that returns a

decorator.

127

Decorator Factory Example

# Appends a string to the function output

def append_string(s):

def dec(f):

def new_func(*args, **kw):

new_s = f(*args, **kw) + s

return new_s

return new_func

return dec

@append_string(", World")

def hello():

return "Hello"

>>> hello()

Hello, World

@append_string(", Moon")

def goodnight():

return "Goodnight"

>>> goodnight()

Goodnight, Moon

128

Iterators, generators and

coroutines

129

Iterators

class FootballRushIterator(object):

def __init__(self, count=5):

self.count = count

def __iter__(self):

# Prepare an iterable object.

self._counter = 1

return self

def next(self):

cnt = self._counter

if cnt <= self.count:

# Return the next element

res = str(cnt) + " mississippi"

self._counter += 1

return res

else:

# Or signal the end of iteration.

raise StopIteration

(WITH GENERATOR)

CLASS DEFINTION

>>> rusher = FootballRushIterator()

>>> for count in rusher:

... print count

1 mississippi

2 mississippi

3 mississippi

4 mississippi

5 mississippi

Iterators are anything that can be looped over. They shield users from

implementation details of looping over an object. Classes that have __iter__ and

next methods can be used as iterators.

def get_rusher(count=5):

for i in range(count):

yield "%d mississippi" % (i+1,)

>>> for count in get_rusher():

... print count

USING ITERATORS

130

Generators

def simple_generator():

print 'first'

yield 1

print 'second'

yield 2

>>> sequence = simple_generator()

>>> sequence.next()

first

1

>>> sequence.next()

second

2

>>> sequence.next()

Traceback (most recent call last):

File "<stdin>" line 1, in ?

StopIteration

USING GENERATORS GENERATOR BASICS

# Create a generator that returns the

# first N elements of the Fibonacci

# sequence.

def fib(N=10):

a, b = 0, 1

for count in range(N):

yield b

a, b = b, a+b

>>> fib_gen = fib(6)

>>> for value in fib_gen:

print value

1

2

3

5

8

Generators are a special kind of function that contain a yield statement. These

functions are really iterator factories. The returned iterator can be looped over or

its next() method can be called directly to return elements from the sequence.

131

Generators --> CoRoutines

def receiving_generator():

while True:

val = (yield)

print "Received", val

>>> sequence = receiving_generator()

# First next() call take to the first

# yield

>>> sequence.next()

Received None

>>> sequence.send(1)

Received 1

>>> sequence.send("hello")

Received hello

>>> sequence.close()

EMITTING AND RECEIVING COROUTINES BASICS

Generators can receive values as well using the return value of a yield

expression: they are called coroutines.

def emiting_receiving_generator():

i = -1

while True:

i += 1

val = (yield i)

print "Received", val

>>> sequence = receiving_generator()

>>> sequence.next()

0

>>> sequence.next()

Received None

1

>>> sequence.send("hello")

Received hello

2

>>> sequence.close()

132

Context managers and the with

statement

Introduction

• New in python 2.5, though it’s then required to import it:

from __future__ import with_statement

•Context managers are objects that possess the

__enter__ and __exit__ methods. They are used to

define a sequence of setup and tear-down operations

that the with statement will call automatically even if

exceptions are raised, similar to a try/finally block.

•An example of a context manager is the file pointer

which includes f.close() in its __exit__ method.

133

Example

134

class MyContext(object):

def __enter__(self):

print "I entered"

return self

def __exit__(self, type, value, tb):

print("Exception info: %s, %s, %s" % (type, value, tb)

print "Before exiting important things to do... ”

CONTEXT CREATION

>>> with MyContext() as c:

... print "I am doing things..."

... raise RuntimeError("Crash!")

I entered

I am doing things...

Exception info: ’RuntimeError’,

Crash!, <traceback object at 0x824>

Before exiting important things

to do...

RuntimeError: Crash!

>>> c

<__main__.MyContext at

0xa3c09b0>

USING THE CONTEXT

135

Dynamic Code Execution

EVAL EXEC

# The "eval" function will

# evaluate a Python expression

# within a given global and local

# namespace (dictionary).

# Both the global and local

# dictionary are optional.

# eval(expr, globals, locals)

>>> a = 1

>>> eval("a+1")

2

# Specify a local dictionary.

>>> local = dict(a=2)

>>> glob = {}

>>> eval("a+1", glob, local)

3

# The "exec" statement will

# execute a Python statement

# within a given global and

# local namespace (dictionary).

# Both the global and local

# dictionary are optional.

# exec(stmt, globals, locals)

>>> a = 1

>>> exec("b = a+1")

>>> print b

2

# Specify a local dictionary.

>>> local = dict(a=2)

>>> glob = {}

>>> exec("b=a+1", glob, local)

>>> print local

{'a': 2, 'b': 3}

Careful about eval’ing/exec’ing an untrusted user’s input: they have full

access to python.

136

exec with a dictionary-like namespace

EXEC

class LoudDict(dict):

""" This dictionary announces whenever one of its values

is get or set.

"""

def __getitem__(self, key):

value = super(LoudDict, self).__getitem__(key)

print "retrieving %s which is %s" % (key, value)

return value

def __setitem__(self, key, value):

print "setting %s = %s" % (key, value)

super(LoudDict, self).__setitem__(key, value)

# Execute a statement in our "LoudDict" and watch it announce

# variable get/set during the execution.

>>> local = LoudDict(a=1)

>>> exec "b=a+1" in {}, local

retrieving a which is 1

setting b = 2

See demo/dictionary_like_exec.py

137

Dynamic ByteCode Generation

COMPILE

# The ”compile" function will generate bytecode from a Python

# expression or statement. It can then be evaluated multiple

# times without regenerating the bytecode each time.

# Syntax: compile(str, filename, mode)

>>> a = 1

>>> c=compile("a+2", "", "eval")

>>> eval(c)

3

>>> c=compile("b=a+2", "", "exec")

>>> exec(c)

>>> b

3

>>> a = 3

>>> exec(c)

>>> b

5

Careful about eval’ing/exec’ing an untrusted user’s input: they have full

access to python.

138

csv – read and write CSV files

The csv modules provides classes for reading and writing Comma Separated

Value (CSV) files.

csv.reader - Read a list of comma-separated values from a file.

csv.writer - Output a list (or other iterable) to a file.

The format is configurable; in particular, the data delimiter does not have to be

a comma.

Default “dialect” is compatible with Excel.

139

csv examples

Sample data file “data.csv”:

"alpha 1", 100, -1.443

"beta 3", 12, -0.0934

"gamma 3a", 192, -0.6621

"delta 2a", 15, -4.515

>>> import csv

>>> rdr = csv.reader(

open("data.csv"))

>>> for row in rdr:

... print row

['alpha 1', ' 100', ' -1.443']

['beta 3', ' 12', ' -0.0934']

['gamma 3a', ' 192', ' -0.6621']

['delta 2a', ' 15', ' -4.515']

>>> import csv

>>> data = [("One", 1, 1.5),

("Two", 2, 8.0)]

>>> f = open("out.csv", "w")

>>> wrtr = csv.writer(f)

>>> wrtr.writerows(data)

>>> f.close()

One,1,1.5

Two,2,8.0

Output file “out.csv”:

READER WRITER

140

csv options

delimiter: str (a single character)

The field separator character

doublequote: bool

If True, a quote inside a string is doubled.

If False, prefix the quote with escapechar.

escapechar: str (a single character)

The character that indicates the following

character has no special meaning.

quotechar: str (a single character)

Character used to quote strings containing

delimiters or other special characters.

quoting: one of csv.QUOTE_* constants

Controls when quotes should be generated

by the writer and recognized by the reader.

>>> data

[('One', 1, 1.5), ('Two', 2, 8.0)]

>>> f = open("out.txt", "w")

>>> wrtr = csv.writer(f,

delimiter='|',

quoting=csv.QUOTE_ALL)

>>> wrtr.writerows(data)

>>> f.close()

"One"|"1"|"1.5"

"Two"|"2"|"8.0"

OPTIONS (not a complete list) EXAMPLE

Output file “out.csv”:

141

Connecting to Databases

142

DB-API 2.0

PEP (Python Enhancement Proposal) 249 detailed version

2.0 of the DB-API to promote consistency between

Python front-ends to data-bases.

Database

Module

or Modules for Python 2.X

Oracle cx_Oracle, DCOracle2, mxODBC, pyodbc

PostgreSQL psycopg2, PyGreSQL, pyPgSQL, mxODBC, pyodbc, pg8000

MySQL MySQLdb, mxODBC, pyodbc, myconnpy

Sqlite sqlite3 (included in standard library)

Microsoft SQL Server adodbapi, pymssql, mssql, mxODBC, pyodbc

Ingres ingresdbi

IBM DB2 Ibm_db, PyDB2, ceODBC, mxODBC, pyodbc

Sybase ASE Sybase, mxODBC

Sybase SQL Anywhere mxODBC, sqlanydb

SAP DB sdb.dbapi, sapdbapi, mxODBC, sdb.sql, sapdb

Informix InformixDB, mxODBC,

Firebird KInterbasdb

143

DB-API 2.0

Connect to the database

Get a cursor object

Execute SQL statements

Execute SQL Statements (cont.)

Close cursor and connection

import <somedbmodule> as db

# Connect to the data-source

# Extra arguments typically

# include user and password

conn = db.connect(<dsn>,…)

c = conn.cursor()

c.execute(“create table stocks(date

text, trans text, symbol text, qty real,

price real)”)

c.execute(“insert into stocks values

('2006-01-05','BUY','RHAT',100,35.14)”)

conn.commit() c.close()

conn.close()

t = ('2006-04-06', 'SELL',

'IBM', 500, 53.00)

c.execute(“insert into stocks values

(?,?,?,?,?)”, t)

# some modules use a different

# place-holder instead of ?

# see db.paramstyle

t = (‘IBM’,)

c.execute(“select * from stocks where

symbol=? order by price”, t)

for row in c:

print row

single = c.fetchone()

list_of_lists = c.fetchall()

144

An ORM: SQLAlchemy

146

Regular Expressions

in Python

147

What is a regular expression?

A regular expression is a pattern designed to match

specific strings or sub-strings.

A regular expression can match very specific strings, as

well as very general strings.

Once matched, specific fields in the string can be extracted

or substituted. Information about the location, and type

of match is stored in a match object.

148

>>> import re

Some of the more common uses of the re API

>>> re.match(pattern, string[, flags])

>>> re.search(pattern, string[, flags])

Matches (or search) the pattern against the string, returning a match object on

success and None on failure (flags described later). re.match only searches for a

match at the beginning of the string. re.search will search through the entire string.

>>> re.findall(pattern, string)

Returns a list of matches of pattern in string. re.finditer returns an iterator.

>>> re.split(pattern, string[, maxsplit])

Returns a list of sub-strings, delimited by the regular expression pattern. If maxsplit

is specified only maxsplit splits will occur.

>>> re.sub(pattern, repl, string[, count])

Returns a string where every occurrence matching the regular expression pattern in

string is replaced by repl, unless count is reached (specifies how many

substitutions to make).

>>> re.compile(pattern)

Compiles the regex and returns a pattern object which can be used for searching,

matching, substituting, …. match, search, findall … are then methods of the object.

149

Regular expression syntax

. Matches any character except newline

\w Matches any alphanumeric character

\d Matches any decimal digit, equal to [0-9]

\s Matches whitespace, equal to [ \t\n\r\f\v]

[…] Indicates a set of characters which can

be matched. Ranges can be specified

using - (e.g. a-z, 0-9).

(...) Groups characters/symbols into 1 block

| Stands for logical OR

^ Takes the complement of the following set

* Causes the previous RE to match zero

or more repetitions

+ Matches 1 or more repetitions of the

previous RE

? Matches zero or 1 occurrences of the

previous RE

{

m

} Matches

m

occurrences of the previous

RE

A regex pattern is composed of regular characters as well as various symbols for

matching patterns in strings...below are some of the common symbols.

\D \S and \W denote the

complement of the same sets

For e.g., ca*t matches ‘ct’, ‘cat’, ‘caaaat’, …

(ab\d)|(ac\d) matches ‘ab1’, ‘ac9’, …

([^a-q]bd) matches ‘rbd’, ‘5bd’, …

150

Match objects & groups

When a regular expression matches, a match object is returned.

>>> myString = "hello world"

>>> print re.match("hello (\w+)", myString)

<_sre.SRE_Match object at 0x009E45E0>

Check if a match was successful, extract matched groups, location, …

>>> myString = "hello there"

>>> match = re.match("hello (\w+)", myString)

>>> if match != None:

... print match.group(1)

'there'

>>> print match.start, match.end

6 11

Groups are numbered left to right, based on open parenthesis...

>>> re.match("hello (\w+(\d+)*) (\w+)", myString)

Group 1 Group 2 Group 3

Nested groups

count too!

151

Regular expression example

compile() return a re pattern object

>>> patt1 = re.compile("hello\s+user\s+number\s+(\d+)(\.\d+)*")

Match one or

more space

characters First group will

match 1 or

more digits

2nd group will match an

optional dot then 1 or more

digits. Optional because of *,

dot is escaped with \ since it

is a special symbol

>>> patt1 = re.compile("hello\s+user\s+number\s+(\d+)(\.\d+)*")

>>> s1 = "hello user number 87"

>>> s2 = "hello user number 87.34“

>>> m = patt1.match(s1)

>>> print m.group(1)

'87'

>>> m = patt1.match(s2)

>>> print m.groups()

('87', '.34')

152

Parse the following text to extract the

number and the type of all travellers:

"5 men, 7 dogs and 12 goats are

travelling together."

Exercise

153

A note about parsing XHTML

Source: http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags

154

NumPy’s fromregex

NumPy contains a function fromregex that can be used to convert arbitrary

text files to (structured) arrays.

data = fromregex(file, pattern, dtype)

file: file-like object or filename string.

pattern: Regular expression to parse file with.

Groups must correspond to fields in the

dtype.

dtype: data-type for the structured array with

field for each group in the regular

expression.

Output is a structured array where each element is a

match to the regular expression in the file.

155

fromregex example

# Create the file

>>> f = open('test.dat', 'w')

>>> f.write("1312 foo\n1534 bar\n444 qux")

>>> f.close()

# match [digits, whitespaces, anything]

>>> regexp = "(\d+)\s+(...)"

>>> output = fromregex('test.dat', regexp,

... [('num’,np.int64),('key','S3')])

>>> output

array([(1312L, 'foo'), (1534L, 'bar'), (444L, 'qux')],

dtype=[('num', '<i8'), ('key', '|S3')])

>>> output['num']

array([1312, 1534, 444], dtype=int64)

156

datetime – Dates and Times

TIME OBJECT DATE OBJECT

# date(year, month, day)

# date in Gregorian calendar,

# assuming its permanence

>>> d1 = date(2007, 9, 25)

>>> d2 = date(2008, 9, 25)

>>> d1.strftime('%A %m/%d/%y')

'Tuesday 09/25/07'

# difference is timedelta

>>> print d2 – d1

366 days, 0:00:00

>>> (d2-d1).days

366

>>> print date.today()

2008-09-24

>>> from datetime import date, time

# time(hour, min, sec, us)

# local time of day

# always 24 hrs per day

>>> t1 = time(15, 38)

>>> t2 = time(18)

>>> t1.strftime('%I:%M %p')

'03:38 PM'

# difference is not supported

>>> print t2 – t1

Traceback ...

TypeError: unsupported operand ...

# use datetime objects for

# difference operation.

157

datetime – Dates and Times

DATETIME OBJECT

# datetime(year, month, day,

hr, min, sec, us)

# combination of date and time

>>> d1 = datetime.now()

>>> print d1

2008-09-24 14:20:30.978207

>>> d2 = d1 + timedelta(30)

>>> d2.strftime('%A %m/%d/%y')

'Friday 10/24/08'

# creating datetime from

# a format string

>>> datetime.strptime('2/10/01',

'%m/%d/%y')

datetime.datetime(2001, 2, 10, 0, 0)

>>> from datetime import datetime, timedelta

Directive Meaning

%a (%A) Abbrev. (full) weekday

name.

%w Weekday number [0(Sun),6]

%b (%B) Abbrev. (full) month name

%d Day of month [01,31]

%H (%I) Hour [00,23] ([01,12])

%j Day of the year [001,366]

%m Month [01,12]

%M Minute [0,59]

%p AM or PM

%S Second [00,61]

%U (%W) Week number of the year

[00,53] Sunday (Monday) as

first day of week.

%y (%Y) Year without (with)

century [00,99]

DATETIME FORMAT STRING

158

Context managers and the with

statement

Introduction

• New in python 2.5, though it’s then required to import it:

from __future__ import with_statement

•Context managers are objects that possess the

__enter__ and __exit__ methods. They are used to

define a sequence of setup and tear-down operations

that the with statement will call automatically even if

exceptions are raised, similar to a try/finally block.

•An example of a context manager is the file pointer

which includes f.close() in its __exit__ method.

159

Example

160

class MyContext(object):

def __enter__(self):

print "I entered"

return self

def __exit__(self, type, value, tb):

print("Exception info: %s, %s, %s" % (type, value, tb)

print "Before exiting important things to do... ”

CONTEXT CREATION

>>> with MyContext() as c:

... print "I am doing things..."

... raise RuntimeError("Crash!")

I entered

I am doing things...

Exception info: ’RuntimeError’,

Crash!, <traceback object at 0x824>

Before exiting important things

to do...

RuntimeError: Crash!

>>> c

<__main__.MyContext at

0xa3c09b0>

USING THE CONTEXT

161

sys module

>>> import sys

Some frequently used attributes and functions—see the

reference manual for complete details.

sys.argv

List of command line arguments.

sys.argv[0] is the name of the python script.

Example:

sys.exc_info()

Returns a tuple (type, value, traceback)

sys.exc_clear()

Clear all exception information.

Command Line Arguments Exception Information

# File: print_args.py

import sys

print sys.argv

$ python print_args.py 1 foo

['print_args.py', '1', 'foo']

>>> try:

... x = 1/0

... except Exception:

... print sys.exc_info()

...

(<type 'exceptions.ZeroDivisionError'>,

ZeroDivisionError('integer division or

modulo by zero',), <traceback object at

0x9a8c8>)

>>>

162

sys module

sys.exit(arg)

Exit from Python. arg is optional. It can be an

integer giving the exit status (defaults to zero).

If not an integer, None is equivalent to passing

0, and any other argument is printed to

sys.stderr and the exit status is 1.

sys.stdin

sys.stdout

sys.stderr

The interpreter’s standard input, output and

error streams.

sys.__stdin__

sys.__stdout__

sys.__stderr__

The original values of sys.stdin, sys.stdout

and sys.stderr at the start of the program.

Exit Standard File Objects

Python’s module search path

sys.path

A list of strings that specifies the interpreter’s

search path for modules.

A program is free to modify this list

dynamically.

163

sys module

sys.version

A string containing information

about the Python version.

sys.version_info

A tuple containing information about

the Python version: major, minor,

micro, releaselevel and serial.

>>> sys.version

'2.6.5 |EPD 6.2-2 (32-bit)|

(r265:79063, May 7 2010, 13:28:19)

[MSC v.1500 32 bit (Intel)]'

>>> sys.version_info

(2, 6, 5, 'final', 0)

sys.platform

A string containing the platform identifier.

Windows: ‘win32’

Mac OSX: ‘darwin’

Linux: ‘linux2’

sys.getwindowsversion()

Return a tuple that describes the version of

Windows currently running: major, minor,

build, platform, and service_pack. (More is

included in Python 2.7.)

>>> sys.platform

‘win32’

>>> sys.getwindowsversion()

(5, 1, 2600, 2, ‘Service Pack 3’)

Python Version Platform Information

See also the platform module in the

standard library.

164

os module

>>> import os

os.remove(path) os.unlink(path)

Remove a file from disk (file can be

either the full path or a file from the

current working directory will be

removed.

os.chdir(path)

Change the current working directory to

the provided path.

os.getcwd()

Return the current working directory.

os.listdir(path)

Return a list of strings containing all the

files in the given path (does not include '.‘

or '..' in the listing).

os.linesep (e.g. '\n' or '\r\n')

Line separator in text mode.

os.sep (e.g. '/' or '\')

Path separator on file system.

os.pathsep (e.g. ':' or ';')

Search path separator (i.e. in

environment variables).

Path Operations Separation Constants

os.environ

Dictionary of all environment variables

os.urandom(len)

String of random bytes

os.error

Error object

Others

165

os.path

os.path.isfile(path)

Test whether a path is a regular file.

os.path.isdir(path)

Test whether a path refers to an existing

directory.

os.path.exists(path)

Test whether a path exists.

os.path.isabs(path)

Test whether a path is absolute.

os.path.abspath(path)

Return an absolute path.

os.path.dirname(path)

Return the directory component of a

pathname.

os.path.basename(path)

Return the final component of a

pathname.

os.path.splitext(path)

Split the extension from a pathname.

Returns (root, ext).

os.path.expanduser(path)

Expand ~ and ~user. If user or $HOME is

unknown, do nothing.

os.path – tests Others

os.path – split and join

os.path.split(path)

Split a pathname. Returns the tuple

(head, tail).

os.path.join(a, *p)

Join two or more path components.

166

Remote Call example — xmlrpc

from SimpleXMLRPCServer import SimpleXMLRPCServer

def fact(n):

if n <= 1:

return 1

else:

return n * fact(n-1)

# Start a server listening for requests on port 8001.

if __name__ == '__main__':

server = SimpleXMLRPCServer((‘localhost', 8001))

# Register fact as a function provided by the server.

server.register_function(fact)

server.serve_forever()

>>> import xmlrpclib

>>> svr = xmlrpclib.Server("http://localhost:8001")

>>> svr.fact(10)

3628800

CLIENT CODE — CALLS REMOTE FACTORIAL FUNCTION

XMLRPC FACTORIAL SERVER – VERSION 1

167

Remote Call example — xmlrpc

from SimpleXMLRPCServer import SimpleXMLRPCRequestHandler, \

SimpleXMLRPCServer

def fact(n):

if n <= 1: return 1

else: return n * fact(n-1)

# Override the _dispatch() handler to call the requested function.

class my_handler(SimpleXMLRPCRequestHandler):

def _dispatch(self, method, params):

print "CALL", method, params

return apply(eval(method), params)

# Start a server listening for requests on port 8001.

if __name__ == '__main__':

server = SimpleXMLRPCServer((‘localhost', 8001), my_handler)

server.serve_forever()

>>> import xmlrpclib

>>> svr = xmlrpclib.Server("http://localhost:8001")

>>> svr.fact(10)

3628800

CLIENT CODE — CALLS REMOTE FACTORIAL FUNCTION

XMLRPC FACTORIAL SERVER – VERSION 2

168

Calling Subprocesses

169

Executing other programs

>>> from subprocess import Popen, PIPE, STDOUT

p = Popen([cmd,arg1,…,argn], env=None, cwd=None, shell=False,

stdin=None, stdout=None, stderr=None)

cmd,arg1,…,argn Command to run plus any arguments

stdin, stderr, stdout

Standard file handles: can be None (use

handles of the current process), any open file

descriptor, any open file object, or PIPE.

(stderr can also be STDOUT for a redirect).

env Dictionary of environment variables. If None,

then the current process environment is used.

shell Run the cmd in the shell if True

cwd Directory to change to before executing the

command.

Command begins to execute immediately!

170

Executing other programs

# On Windows

>>> Popen('dir', shell=True, cwd="c:\\")

Volume in drive C has no label.

Volume Serial Number is A080-3756

Directory of c:\

08/30/2007 … AUTOEXEC.BAT

08/30/2007 … CONFIG.SYS

…

01/24/2009 … Python25

01/24/2009 … WINDOWS

2 File(s) 0 bytes

7 Dir(s) 1,401,319,424 bytes free

# On Unix (no shell needed)

>>> fid = open('simple.txt','w')

>>> Popen('ls', cwd='/usr', stdout=fid)

>>> fid.close()

bin

games

include

lib

lib32

lib64

local

sbin

share

src

X11R6

SIMPLE COMMAND EXECUTION

simple.txt

171

Executing other programs

WAIT FOR TERMINATION POLL PERIODICALLY

from subprocess import Popen

print "Starting..."

# sleep for 10 seconds

p = Popen(['sleep','10'])

# wait to finish

# and get return code

ret = p.wait()

# return code also stored

# in subprocess object

# after completion

print "Done", p.returncode

Starting

<10 seconds later>

Done 0

from subprocess import Popen

import sys

print "Starting..."

# sleep for 10 seconds

p = Popen(['sleep','10'])

# continue on with processing

print "Continuing",

k=0

while p.poll() is None:

k += 1

if (k % 100000) == 0:

print k//100000,

sys.stdout.flush()

print

print "Done", p.returncode

Starting

Continuing 1 2 3 … 45

Done 0

172

Executing other programs

# Setup process which will receive

# stdin through a PIPE and send

# stdout to a PIPE

>>> p = Popen(['python', 'count.py',

'c'], stdin=PIPE, stdout=PIPE,

stderr=PIPE)

>>> instr = "gatccatgca"

>>> out, err = p.communicate(instr)

>>> print out

3

>>> shlex.split('python count.py c’)

['python', 'count.py', 'c']

import sys

letter = sys.argv[1]

data = sys.stdin.read()

print data.count(letter)

count.py

PIPES

"Pipes" are used to send and/or receive data from another process on the same

machine.

SHLEX TO CREATE COMPLEX COMMAND

173

Multiprocessing

174

What is Multiprocessing?

The multiprocessing module provides access to task-based parallel

computing using separate Python processes.

Multiprocessing has several features:

•part of Python’s standard library,

•cross-platform,

•low-level parallel constructs: locks, queues, pipes, semaphores, shared memory

access

•higher-level constructs: task pools with automatic scheduling, shared memory

object wrappers, functional and iterator-like design support, etc.

The multiprocessing module is useful for:

•batch computing, embarrassingly parallel computations

•parallel computing with minimal shared state

•GUI programming: do calculations in a background process

•many, many others.

http://docs.python.org/library/multiprocessing.html

175

Multiprocessing vs. Threading

The threading module provides thread-based parallelism to Python; in CPython,

threads are limited by the Global Interpreter Lock.

Only one thread may hold the lock at any point in time, holding the lock allows

that thread to execute Python instructions.

Multiprocessing provides a way around this performance limitation.

The multiprocessing module has a nearly identical API to the threading module;

this is intentional.

Learn multiprocessing, you will learn majority of threading.

Primary differences:

•Threads share the same address space as the spawning thread, whereas

processes have separate address spaces.

•often more communication is required with processes; shared memory

can mitigate.

•Spawning new processes is generally more expensive (memory, time) than

spawning new threads.

•Trivial API differences.

176

The Process Class

Process(target, name=None, args=(), kwargs={})

target: a callable object, called when the start() method is called

name: string, name to give to the process; default given if None

args: tuple, arguments to pass to the target when called

kwargs: dict, keyword args to pass to the target

The basic methods for Process objects are:

start(): start the process’ activity

join([timeout]): block the calling process until the process finishes, or until

timeout

is_alive(): True if process is still running, false otherwise

terminate(): Terminate the process; may cause deadlocks, synchronization

errors, orphaned processes; use only when necessary.

The Process Class

import multiprocessing as mp

from time import sleep

from random import random

def worker(num):

sleep(2.0 * random())

name = mp.current_process().name

print "worker {},name:{}".format(num, name)

if __name__ == '__main__’:

master = mp.current_process().name

print "Master name: {}".format(master)

for i in range(2):

p = mp.Process(target=worker, args=(i,))

p.start()

# Close all child processes spawn

[p.join() for p in mp.active_children()]

CREATING A PROCESS OUTPUT

$ python basics.py

Master name: MainProcess

worker 1, name: Process-2

worker 0, name: Process-1

177

178

def worker(q):

v = q.get() # blocking!

print "got '{}' from parent".format(v)

if __name__ == '__main__':

q = mp.Queue()

p = mp.Process(target=worker, args=(q,))

p.start() # blocks at q.get()

v = 'parent'

print "putting '{}' on queue".format(v)

q.put(v)

p.join()

QUEUE BASICS

Multiprocessing provides a Queue class and a Pipe class for communicating

between processes. Queue and Pipe objects are thread and process safe.

Queue – Communicate in one direction; put() and get() methods.

Pipe – two-ended Queue, each end has a send() and recv() method.

OUTPUT

$ python queue_basic.py

putting 'parent' on queue

got 'parent' from parent

Communication Between Processes

179

Communication Between Processes

import multiprocessing as mp

def worker(p):

msg = 'whiskey tango fox trot'

print "sending {!r} to parent".format(msg)

p.send(msg)

v = p.recv()

print ”got {!r} from parent".format(v)

if __name__ == '__main__':

# The two ends of the pipe: the parent and the child connections

p_conn, c_conn = mp.Pipe()

p = mp.Process(target=worker, args=(c_conn,))

p.start()

msg = 'foobar'

print ”got {!r} from child".format(p_conn.recv())

print "sending {!r} to child".format(msg)

p_conn.send(msg)

p.join()

PIPE BASICS

Synchronization

180

import multiprocessing as mp

def print_lock(lk, i):

name = mp.current_process().name

lk.acquire()

for j in range(100):

print i, "from process", name

lk.release()

if __name__ == '__main__':

lk = mp.Lock()

ps = [mp.Process(target=print_lock,

args=(lk,i))

for i in range(100)]

[p.start() for p in ps]

[p.join() for p in ps]

Lock() objects provide a low-level way to guarantee that code is executed by only one

process at a time.

LOCK BASICS

import multiprocessing as mp

def print_lock(lk, i):

name = mp.current_process().name

with lk:

for j in range(100):

print i, "from process", name

if __name__ == '__main__':

lk = mp.Lock()

ps = [mp.Process(target=print_lock,

args=(lk,i))

for i in range(100)]

[p.start() for p in ps]

[p.join() for p in ps]

USING THE WITH STATEMENT

181

Synchronization

import multiprocessing as mp

def wait_on_event(e):

name = mp.current_process().name

e.wait()

print name, "finished waiting"

if __name__ == '__main__':

e = mp.Event()

ps = [mp.Process(target=wait_on_event, args=(e,))

for i in range(10)]

[p.start() for p in ps]

print "e.is_set()", e.is_set()

raw_input("press any key to set event")

e.set()

Event() objects provide a way to notify other processes that something

happened.

EVENT USAGE

Task Pools

182

Pool() objects provide a convenient way to perform embarrassingly parallel

computations with automatic load balancing.

Pool(processes=None, initializer=None, initargs=())

processes – int, number of processes in pool, cpu_count() by default

initializer – callable, if not None, initializer(*initargs) is called by each

process when starting.

Methods:

map(func, iterable, chunksize=None) – parallel equivalent of the map()

builtin function. Blocks until result is ready and returns a list.

map_async(func, iterable, chunksize=None, callback=None)

Same as map(), nonblocking. Returns a result object.

imap(func, iterable, chunksize=None)

A version of map that returns an iterator instead of a list.

183

Task Pools

import multiprocessing as mp

def is_prime(num):

# determine whether num is prime

return (num, isp)

if __name__ == '__main__':

# create a pool with cpu_count() procs.

pl = mp.Pool()

results = pl.map(is_prime, range(50, 100))

for num, isp in results:

if isp: print num, “is prime”

BASIC POOL USAGE

$ python fermat_prime.py

53 is prime

59 is prime

61 is prime

67 is prime

...

OUTPUT Scaling computed for 106 integers while

varying pool size. Max speedup is ~4.5X

on 8 cores. This problem has high

communication overhead.

See demo_pool.py for the code.

Shared Memory

184

Multiprocessing provides the Value and Array classes for shared memory

programming; there is also the multiprocessing.sharedctypes module to create

ctypes objects and arrays in shared memory.

Passing a shared memory object to the args of a Process() object will pass a proxy

to the target; the actual memory location is in shared memory (via mmap). Any

modifications to the proxy object will be seen by all processes.

This allows parallel array computations without much communication

overhead.

Shared Memory

a=Array() b=Value()

def worker1(a):

# modify a

def worker2(a):

# can see worker1’s changes

Multiprocessing + NumPy

185

NumPy arrays can be made to view multiprocessing shared memory objects.

import multiprocessing as mp

from multiprocessing import sharedctypes

from numpy import ctypeslib

def fill_arr(arr_view, i):

arr_view.fill(i)

if __name__ == '__main__':

ra = sharedctypes.RawArray('i', 4)

arr = ctypeslib.as_array(ra)

arr.shape = (2, 2)

p1 = mp.Process(target=fill_arr,

args=(arr[:1, :], 1))

p2 = mp.Process(target=fill_arr,

args=(arr[1:, :], 2))

p1.start(); p2.start()

p1.join(); p2.join()

print arr

SHARED MEMORY VIEW OUTPUT

$ python shared_numpy.py

[[1 1]

[2 2]]

186

Multiprocessing and others

The multiprocessing module is typically used for non-numerical calculations, when

Python’s role as a glue between several concurrent processes can be used.

For numerical calculations, numerically-optimized third-party libraries such as PyMPI,

MPI4Py, PyOpenCL, CLyther, PyCuda, etc. are typically used.

Multiprocessing is useful when one needs to add concurrency to an application

without requiring external dependencies.

187

NumPy

188

NumPy and SciPy

NDArray

multi-dimensional

array object

UFunc

fast array

math operations

fft random linalg

NumPy [ Data Structure Core ]

SciPy [ Scientific Algorithms ]

interpolate

integrate

spatial

weave

odr

stats

optimize

fftpack

sparse

special

linalg

signal

cluster

io

ndimage

189

NumPy

•Website: http://numpy.scipy.org/

•Offers Matlab-ish capabilities within Python

•NumPy replaces Numeric and Numarray

•Developed by Travis Oliphant

• >100 “committers” to the project (github.com)

•NumPy 1.0 released October, 2006

•NumPy 1.7.1 released April, 2013

•~45K downloads/month from SourceForge

This does not count:

•Linux distributions that include NumPy

•Enthought distributions that include NumPy

190

Helpful Sites

SCIPY DOCUMENTATION PAGE NUMPY EXAMPLES

http://docs.scipy.org/doc http://www.scipy.org/Numpy_Example_List_With_Doc

191

Getting Started

In [1]: from numpy import *

In [2]: __version__

Out[2]: 1.6.0

or

In [1]: from numpy import \

array, ...

C:\> ipython –-pylab

In [1]: array([1,2,3])

Out[1]: array([1, 2, 3])

IMPORT NUMPY

USING IPYTHON -PYLAB

Often at the command line, it is

handy to import everything from

NumPy into the command shell.

However, if you are writing scripts,

it is easier for others to read and

debug in the future if you use

explicit imports.

IPython has a ‘pylab’ mode where

it imports all of NumPy, Matplotlib,

and SciPy into the namespace for

you as a convenience. It also enables

threading for showing plots.

While IPython is used for all the

demos, ‘>>>’ is used on future

slides instead of ‘In [1]:’ to

save space.

192

Array Operations

>>> a = array([1,2,3,4])

>>> b = array([2,3,4,5])

>>> a + b

array([3, 5, 7, 9])

>>> a * b

array([ 2, 6, 12, 20])

>>> a ** b

array([ 1, 8, 81, 1024])

# create array from 0 to 10

>>> x = arange(11.)

# multiply entire array by

# scalar value

>>> c = (2*pi)/10.

>>> c

0.62831853071795862

>>> c*x

array([ 0.,0.628,…,6.283])

# in-place operations

>>> x *= c

>>> x

array([ 0.,0.628,…,6.283])

# apply functions to array

>>> y = sin(x)

SIMPLE ARRAY MATH MATH FUNCTIONS

NumPy defines these constants:

pi = 3.14159265359

e = 2.71828182846

193

Plotting Arrays

>>> plot(x,y) >>> from chaco import shell

>>> shell.plot(x,y)

MATPLOTLIB CHACO SHELL

194

Matplotlib Basics

(an interlude)

195

http://matplotlib.org/

196

Line Plots

PLOT AGAINST INDICES MULTIPLE DATA SETS

>>> x = linspace(0,2*pi,50)

>>> plot(sin(x))

>>> plot(x, sin(x),

... x, sin(2*x))

Matplotlib Menu Bar

197

Zoom

tool Save to file

(png, svg,

eps, pdf, …)

Back to

original

view Previous/

next views Pan

tool

Plot size

parameter

within

window

198

Line Plots

LINE FORMATTING MULTIPLE PLOT GROUPS

# red, dot-dash, triangles

>>> plot(x, sin(x), 'r-^')

>>> plot(x, sin(x), 'b-o',

... x, sin(2*x), 'r-^')

199

Scatter Plots

SIMPLE SCATTER PLOT COLORMAPPED SCATTER

>>> x = linspace(0,2*pi,50)

>>> y = sin(x)

>>> scatter(x, y)

# marker size/color set with data

>>> x = rand(200)

>>> y = rand(200)

>>> size = rand(200)*30

>>> color = rand(200)

>>> scatter(x, y, size, color)

>>> colorbar()

200

Multiple Figures

>>> t = linspace(0,2*pi,50)

>>> x = sin(t)

>>> y = cos(t)

# Now create a figure

>>> figure()

>>> plot(x)

# Now create a new figure.

>>> figure()

>>> plot(y)

201

Multiple Plots Using subplot

>>> x = array([1,2,3,2,1])

>>> y = array([1,3,2,3,1])

# To divide the plotting area

>>> subplot(2, 1, 1)

>>> plot(x)

# Now activate a new plot

# area.

>>> subplot(2, 1, 2)

>>> plot(y)

rows

columns

active plot

Plot 1

Plot 2

If this is used in a python script, a call

to the function show() is required.

202

Adding Lines to a Plot

MULTIPLE PLOTS ERASING OLD PLOTS

# By default, previous lines

# are "held" on a plot.

>>> plot(sin(x))

>>> plot(cos(x))

# Set hold(False) to erase

# old lines

>>> plot(sin(x))

>>> hold(False)

>>> plot(cos(x))

203

Legend

LEGEND LABELS WITH PLOT LABELING WITH LEGEND

# Add labels in plot command.

>>> plot(sin(x), label='sin')

>>> plot(cos(x), label='cos')

>>> legend()

# Or as a list in legend().

>>> plot(sin(x))

>>> plot(cos(x))

>>> legend(['sin', 'cos'])

204

Titles and Grid

TITLES AND AXIS LABELS PLOT GRID

>>> plot(x, sin(x))

>>> xlabel('radians')

# Keywords set text properties.

>>> ylabel('amplitude',

... fontsize='large')

>>> title('Sin(x)')

# Display gridlines in plot

>>> grid()

205

Clearing and Closing Plots

CLEARING A FIGURE CLOSING PLOT WINDOWS

>>> plot(x, sin(x))

# clf will clear the current

# plot (figure).

>>> clf()

# close() will close the

# currently active plot window.

>>> close()

# close('all') closes all the

# plot windows.

>>> close('all')

206

Image Display

# Get the Lena image from scipy.

>>> from scipy.misc import lena

>>> img = lena()

# Display image with the jet

# colormap, and setting

# x and y extents of the plot.

>>> imshow(img,

... extent=[-25,25,-25,25],

... cmap = cm.bone)

# Add a colorbar to the display.

>>> colorbar()

207

Plotting from Scripts

INTERACTIVE MODE NON-INTERACTIVE MODE

# In IPython, plots show up

# as soon as a plot command

# is called.

>>> figure()

>>> plot(sin(x))

>>> figure()

>>> plot(cos(x))

# script.py

# In a script, you must call

# the show() command to display

# plots. Call it at the end of

# all your plot commands for

# best performance.

figure()

plot(sin(x))

figure()

plot(cos(x))

# Plots will not appear until

# this command is issued.

show()

208

Histograms

HISTOGRAM HISTOGRAM 2

# plot histogram

# defaults to 10 bins

>>> hist(randn(1000))

# change the number of bins

>>> hist(randn(1000), 30)

209

Plots with error bars

ERRORBAR

# Assume points are known

# with errors in both axis

>>> x = linspace(0,2*pi,10)

>>> y = sin(x)

>>> yerr = rand(10)

>>> xerr = rand(10)/3

>>> errorbar(x, y, yerr, xerr)

210

Plots with error bars II

BOXPLOT

# Assume 4 experiments have measured a

# quantity with a certain medians and

# std deviations.

# Data

>>> from numpy.random import normal

>>> exp = [normal(0, 1+i/5, size=(500,))

for i in np.arange(4.)]

>>> medians=[np.median(e) for e in exp]

>>> stds = [np.std(e) for e in exp]

# Plotting

>>> conf_intervals = [

(med - std, med + std) for

med, std in zip(medians, stds)]

>>> pos = np.arange(len(exp)))+1

>>> boxplot(exp, sym='k+',

positions=positions,

usermedians=medians,

conf_intervals=conf_intervals)

211

3D Plots with Matplotlib

SURFACE PLOT PARAMETRIC CURVE

>>> from mpl_toolkits.mplot3d import

Axes3D

>>> x, y = mgrid[-5:5:35j, 0:10:35j]

>>> z = x*sin(x)*cos(0.25*y)

>>> fig = figure()

>>> ax = fig.gca(projection='3d')

>>> ax.plot_surface(x, y, z,

... rstride=1, cstride=1,

... cmap=cm.jet)

>>> xlabel('x'); ylabel('y')

>>> from mpl_toolkits.mplot3d import

Axes3D

>>> t = linspace(0, 30, 1000)

>>> x, y, z = [t*cos(t), t*sin(t), t]

>>> fig = figure()

>>> ax = fig.gca(projection='3d')

>>> ax.plot(x, y, z)

>>> xlabel(‘x’)

>>> ylabel(‘y’)

212

Surface Plots with mlab

# Create 2D array where values

# are radial distance from

# the center of array.

>>> from numpy import mgrid

>>> from scipy import special

>>> x,y = mgrid[-25:25:100j,

... -25:25:100j]

>>> r = sqrt(x**2+y**2)

# Calculate Bessel function of

# each point in array and scale.

>>> s = special.j0(r)*25

# Display surface plot.

>>> from mayavi import mlab

>>> mlab.surf(x,y,s)

>>> mlab.scalarbar()

>>> mlab.axes()

More Details

213

•Simple examples with increasing difficulty:

http://matplotlib.org/examples/index.html

•Gallery (huge): http://matplotlib.org/gallery.html

214

Continuing NumPy…

215

Introducing NumPy Arrays

>>> a = array([0,1,2,3])

>>> a

array([0, 1, 2, 3])

SIMPLE ARRAY CREATION

>>> type(a)

numpy.ndarray

CHECKING THE TYPE

>>> a.dtype

dtype('int32')

NUMERIC ‘TYPE’ OF ELEMENTS

>>> a.itemsize

4

BYTES PER ELEMENT

# Shape returns a tuple

# listing the length of the

# array along each dimension.

>>> a.shape

(4,)

>>> shape(a)

(4,)

# Size reports the entire

# number of elements in an

# array.

>>> a.size

4

>>> size(a)

4

ARRAY SHAPE

ARRAY SIZE

216

Introducing NumPy Arrays

# Return the number of bytes

# used by the data portion of

# the array.

>>> a.nbytes

16

BYTES OF MEMORY USED

>>> a.ndim

1

NUMBER OF DIMENSIONS

217

Setting Array Elements

>>> a.dtype

dtype('int32')

# assigning a float into

# an int32 array truncates

# the decimal part

>>> a[0] = 10.6

>>> a

array([10, 1, 2, 3])

# fill has the same behavior

>>> a.fill(-4.8)

>>> a

array([-4, -4, -4, -4])

BEWARE OF TYPE

COERCION

# set all values in an array

>>> a.fill(0)

>>> a

array([0, 0, 0, 0])

# this also works, but may

# be slower

>>> a[:] = 1

>>> a

array([1, 1, 1, 1])

FILL

>>> a[0]

0

>>> a[0] = 10

>>> a

array([10, 1, 2, 3])

ARRAY INDEXING

218

Slicing

# indices: 0 1 2 3 4

>>> a = array([10,11,12,13,14])

# [10,11,12,13,14]

>>> a[1:3]

array([11, 12])

# negative indices work also

>>> a[1:-2]

array([11, 12])

>>> a[-4:3]

array([11, 12])

SLICING ARRAYS

# omitted boundaries are

# assumed to be the beginning

# (or end) of the list

# grab first three elements

>>> a[:3]

array([10, 11, 12])

# grab last two elements

>>> a[-2:]

array([13, 14])

# every other element

>>> a[::2]

array([10, 12, 14])

var[lower:upper:step]

Extracts a portion of a sequence by specifying a lower and upper bound.

The lower-bound element is included, but the upper-bound element is not included.

Mathematically: [lower, upper). The step value specifies the stride between elements.

OMITTING INDICES

219

>>> a[1,3]

13

>>> a[1,3] = -1

>>> a

array([[ 0, 1, 2, 3],

[10,11,12,-1]])

Multi-Dimensional Arrays

>>> a = array([[ 0, 1, 2, 3],

[10,11,12,13]])

>>> a

array([[ 0, 1, 2, 3],

[10,11,12,13]])

>>> a[1]

array([10, 11, 12, -1])

row

column

MULTI-DIMENSIONAL ARRAYS

>>> a.shape

(2, 4)

SHAPE = (ROWS,COLUMNS)

GET/SET ELEMENTS

ADDRESS SECOND (ONETH)

ROW USING SINGLE INDEX

ELEMENT COUNT

>>> a.size

8

NUMBER OF DIMENSIONS

>>> a.ndim

2

220

Arrays from/to ASCII files

# Read data into a list of lists,

# and THEN convert to an array.

file = open('myfile.txt')

# Create a list for all the data.

data = []

for line in file:

# Read each row of data into a

# list of floats.

fields = line.split()

row_data = [float(x) for x

in fields]

# And add this row to the

# entire data set.

data.append(row_data)

# Finally, convert the "list of

# lists" into a 2D array.

data = array(data)

file.close()

BASIC PATTERN ARRAYS FROM/TO TXT FILES

# loadtxt() automatically generates

# an array from the txt file

arr = loadtxt('Data.txt', skiprows=1,

dtype=int, delimiter=",",

usecols = (0,1,2,4),

comments = "%")

# Save an array into a txt file

savetxt('filename', arr)

-- BEGINNING OF THE FILE

% Day, Month, Year, Skip, Avg Power

01, 01, 2000, x876, 13 % crazy day!

% we don’t have Jan 03rd

04, 01, 2000, xfed, 55

Data.txt

Arrays to/from Files

221

OTHER FILE FORMATS

Many file formats are supported in various packages:

File format Package name(s) Functions

txt numpy loadtxt, savetxt, genfromtxt, fromfile, tofile

csv csv reader, writer

Matlab scipy.io loadmat, savemat

hdf pytables, h5py

NetCDF netCDF4, scipy.io.netcdf netCDF4.Dataset, scipy.io.netcdf.netcdf_file

File format Package name Comments

wav scipy.io.wavfile Audio files

LAS/SEG-Y Scipy cookbook, EPD Data files in Geophysics

jpeg, png, … PIL, scipy.misc.pilutil Common image formats

FITS pyfits, astropy.io.fits Image files in Astronomy

This includes many industry specific formats:

222

Array Slicing

>>> a[0,3:5]

array([3, 4])

>>> a[4:,4:]

array([[44, 45],

[54, 55]])

>>> a[:,2]

array([2,12,22,32,42,52])

50 51 52 53 54 55

40 41 42 43 44 45

30 31 32 33 34 35

20 21 22 23 24 25

10 11 12 13 14 15

0 1 2 3 4 5

SLICING WORKS MUCH LIKE

STANDARD PYTHON SLICING

>>> a[2::2,::2]

array([[20, 22, 24],

[40, 42, 44]])

STRIDES ARE ALSO POSSIBLE

223

Slices Are References

>>> a = array((0,1,2,3,4))

# create a slice containing only the

# last element of a

>>> b = a[2:4]

>>> b

array([2, 3])

>>> b[0] = 10

# changing b changed a!

>>> a

array([ 0, 1, 10, 3, 4])

Slices are references to memory in the original array.

Changing values in a slice also changes the original array.

224

Fancy Indexing

# manual creation of masks

>>> mask = array([0,1,1,0,0,1,0,0],

... dtype=bool)

# conditional creation of masks

>>> mask2 = a < 30

# fancy indexing

>>> y = a[mask]

>>> print y

[10 20 50]

>>> a = arange(0,80,10)

# fancy indexing

>>> indices = [1, 2, -3]

>>> y = a[indices]

>>> print y

[10 20 50]

010 20 30 40 50 60 70

10 20 50

a

y

INDEXING BY POSITION INDEXING WITH BOOLEANS

225

Fancy Indexing in 2-D

>>> a[(0,1,2,3,4),(1,2,3,4,5)]

array([ 1, 12, 23, 34, 45])

>>> a[3:,[0, 2, 5]]

array([[30, 32, 35],

[40, 42, 45],

[50, 52, 55]])

>>> mask = array([1,0,1,0,0,1],

dtype=bool)

>>> a[mask,2]

array([2,22,52])

50 51 52 53 54 55

40 41 42 43 44 45

30 31 32 33 34 35

20 21 22 23 24 25

10 11 12 13 14 15

0 1 2 3 4 5

Unlike slicing, fancy indexing creates copies instead of a view into original array.

227

“Incomplete” Indexing

40 41 42 43 44 45

30 31 32 33 34 35

20 21 22 23 24 25

10 11 12 13 14 15

0 1 2 3 4 5

1

0

1

0

40 41 42 43 44 45

20 21 22 23 24 25

10 11 12 13 14 15

>>> y = a[condition]

condition a y

a y

>>> y = a[:3]

228

3D Example

# retrieve two slices from a

# 3D cube via indexing

>>> y = a[:,:,[2,-2]]

0

1

2

y

a

MULTIDIMENSIONAL

229

Where

# find the indices in array

# where expression is True

>>> a = array([0, 12, 5, 20])

>>> a > 10

array([False, True, False,

True], dtype=bool)

# Note: it returns a tuple!

>>> where(a > 10)

(array([1, 3]),)

1 DIMENSION

# In general, the tuple

# returned is the index of the

# element satisfying the

# condition in each dimension.

>>> a = array([[0, 12, 5, 20],

[1, 2, 11, 15]])

>>> loc = where(a > 10)

>>> loc

(array([0, 0, 1, 1]),

array([1, 3, 2, 3]))

# Result can be used in

# various ways:

>>> a[loc]

array([12, 20, 11, 15])

n DIMENSIONS

230

Array Data Structure

231

Array Data Structure

232

Indexing with newaxis

newaxis is a special index that inserts a new axis in the array at

the specified location.

Each newaxis increases the array’s dimensionality by 1.

>>> shape(a)

(3,)

>>> y = a[newaxis,:]

>>> shape(y)

(1, 3)

>>> y = a[:,newaxis]

>>> shape(y)

(3, 1)

> y = a[newaxis, newaxis, :]

> shape(y)

(1, 1, 3)

0 1 2

a

2

1

0

2

1

0

0 1 2

1 X 3 3 X 1 1 X 1 X 3

233

“Flattening” Arrays

a.flatten()

# Create a 2D array

>>> a = array([[0,1],

[2,3]])

# Flatten out elements to 1D

>>> b = a.flatten()

>>> b

array([0,1,2,3])

# Changing b does not change a

>>> b[0] = 10

>>> b

array([10,1,2,3])

>>> a

array([[0, 1],

[2, 3]])

a.flatten() converts a multi-

dimensional array into a 1-D array. The new

array is a copy of the original data.

a.flat

>>> a.flat

<numpy.flatiter obj...>

>>> a.flat[:]

array(0,1,2,3)

>>> b = a.flat

>>> b[0] = 10

>>> a

array([[10, 1],

[ 2, 3]])

no change

a.flat is an attribute that returns an

iterator object that accesses the data in the

multi-dimensional array data as a 1-D array.

It references the original memory.

changed!

234

“(Un)raveling” Arrays

a.ravel() a.ravel() MAKES A COPY

# create a 2-D array

>>> a = array([[0,1],

[2,3]])

# flatten out elements to 1-D

>>> b = a.ravel()

>>> b

array([0,1,2,3])

# changing b does change a

>>> b[0] = 10

>>> b

array([10,1,2,3])

>>> a

array([[10, 1],

[ 2, 3]])

changed!

a.ravel() is the same as a.flatten(),

but returns a reference (or view) of the array

if possible (i.e., the memory is contiguous).

Otherwise the new array copies the data.

# create a 2-D array

>>> a = array([[0,1],

[2,3]])

# transpose array so memory

# layout is no longer contiguous

>>> aa = a.transpose()

>>> aa

array([[0, 2],

[1, 3]])

# ravel creates a copy of data

>>> b = aa.ravel()

array([0,2,1,3])

# changing b doesn't change a

>>> b[0] = 10

>>> b

array([10,1,2,3])

>>> a

array([[0, 1],

[2, 3]])

235

Reshaping Arrays

>>> a = arange(6)

>>> a

array([0, 1, 2, 3, 4, 5])

>>> a.shape

(6,)

# reshape array in-place to

# 2x3

>>> a.shape = (2,3)

>>> a

array([[0, 1, 2],

[3, 4, 5]])

SHAPE

# return a new array with a

# different shape

>>> a.reshape(3,2)

array([[0, 1],

[2, 3],

[4, 5]])

# reshape cannot change the

# number of elements in an

# array

>>> a.reshape(4,2)

ValueError: total size of new

array must be unchanged

RESHAPE

236

Transpose

>>> a = array([[0,1,2],

... [3,4,5]])

>>> a.shape

(2,3)

# Transpose swaps the order

# of axes. For 2-D this

# swaps rows and columns.

>>> a.transpose()

array([[0, 3],

[1, 4],

[2, 5]])

# The .T attribute is

# equivalent to transpose().

>>> a.T

array([[0, 3],

[1, 4],

[2, 5]])

TRANSPOSE

>>> b = a.T

# Changes to b alter a.

>>> b[0,1] = 30

>>> a

array([[ 0, 1, 2],

[30, 4, 5]])

# Transpose does not move

# values around in memory. It

# only changes the order of

# "strides" in the array

>>> a.strides

(12, 4)

>>> a.T.strides

(4, 12)

TRANSPOSE RETURNS VIEWS

TRANSPOSE AND STRIDES

237

Squeeze

>>> a = array([[1,2,3],

... [4,5,6]])

>>> a.shape

(2,3)

# insert an "extra" dimension

>>> a.shape = (2,1,3)

>>> a

array([[[0, 1, 2]],

[[3, 4, 5]]])

# squeeze removes any

# dimension with length==1

>>> a = a.squeeze()

>>> a.shape

(2,3)

SQUEEZE

238

Diagonals

>>> a = array([[11,21,31],

... [12,22,32],

... [13,23,33]])

# Extract the diagonal from

# an array.

>>> a.diagonal()

array([11, 22, 33])

# Use offset to move off the

# main diagonal (offset can

# be negative).

>>> a.diagonal(offset=1)

array([21, 32])

DIAGONAL DIAGONALS WITH INDEXING

# "Fancy" indexing also works.

>>> i = [0,1,2]

>>> a[i, i]

array([11, 22, 33])

# Indexing can also be used

# to set diagonal values…

>>> a[i, i] = 2

>>> i2 = array([0,1])

# upper diagonal

>>> a[i2, i2+1] = 1

# lower diagonal

>>> a[i2+1, i2]= -1

>>> a

array([[ 2, 1, 31],

[-1, 2, 1],

[13, -1, 2]])

239

Complex Numbers

>>> a = array([1+1j, 2, 3, 4])

array([1.+1.j, 2.+0.j, 3.+0.j,

4.+0.j])

>>> a.dtype

dtype('complex128')

# real and imaginary parts

>>> a.real

array([ 1., 2., 3., 4.])

>>> a.imag

array([ 1., 0., 0., 0.])

# set imaginary part to a

# different set of values

>>> a.imag = (1,2,3,4)

>>> a

array([1.+1.j, 2.+2.j, 3.+3.j,

4.+4.j])

COMPLEX ARRAY ATTRIBUTES

>>> a = array([0., 1, 2, 3])

# .real and .imag attributes

# are available

>>> a.real

array([ 0., 1., 2., 3.])

>>> a.imag

array([ 0., 0., 0., 0.])

# but .imag is read-only

>>> a.imag = (1,2,3,4)

TypeError: array does not

have imaginary part to set

FLOAT (AND OTHER) ARRAYS

>>> a.conj()

array([1.-1.j, 2.-2.j, 3.-3.j,

4.-4.j])

CONJUGATION

240

Array Constructor Examples

# Default to double precision

>>> a = array([0,1.0,2,3])

>>> a.dtype

dtype('float64')

>>> a.nbytes

32

FLOATING POINT ARRAYS

>>> a = array([0,1.,2,3],

... dtype=float32)

>>> a.dtype

dtype('float32')

>>> a.nbytes

16

REDUCING PRECISION

>>> a = array([0,1,2,3],

... dtype=uint8)

>>> a.dtype

dtype('uint8')

>>> a.nbytes

4

UNSIGNED INTEGER BYTE

ARRAY FROM BINARY DATA

# frombuffer or fromfile

# to create an array from

# binary data.

>>> a = frombuffer('foo',

... dtype=uint8)

>>> a

array([102, 111, 111])

# Reverse operation

>>> a.tofile('foo.dat')

241

NumPy dtypes

Basic Type Available NumPy types Comments

Boolean bool Elements are 1 byte in size.

Integer int8, int16, int32, int64,

int128, int

int defaults to the size of long in C for the

platform.

Unsigned

Integer uint8, uint16, uint32, uint64,

uint128, uint

uint defaults to the size of unsigned long in

C for the platform.

Float float16, float32, float64,

float,longfloat,

float is always a double precision floating

point value (64 bits). longfloat represents

large precision floats. Its size is platform

dependent.

Complex complex64, complex128,

complex, longcomplex

The real and imaginary elements of a

complex64 are each represented by a single

precision (32 bit) value for a total size of 64

bits.

Strings str, unicode For example, dtype=‘S4’ would be used for an

array of 4-character strings.

Object object Represent items in array as Python objects.

Records void Used for arbitrary data structures.

242

Type Casting

>>> a = array([1.5, -3],

... dtype=float32)

>>> a

array([ 1.5, -3.], dtype=float32)

# upcast

>>> asarray(a, dtype=float64)

array([ 1.5, -3. ])

# downcast

>>> asarray(a, dtype=uint8)

array([ 1, 253], dtype=uint8)

# asarray is efficient.

# It does not make a copy if the

# type is the same.

>>> b = asarray(a, dtype=float32)

>>> b[0] = 2.0

>>> a

array([ 2., -3.],dtype=float32)

>>> a = array([1.5, -3],

... dtype=float64)

>>> a.astype(float32)

array([ 1.5, -3.], dtype=float32)

>>> a.astype(uint8)

array([ 1, 253],dtype=uint8)

# astype is safe.

# It always returns a copy of

# the array.

>>> b = a.astype(float64)

>>> b[0] = 2.0

>>> a

array([1.5, -3.])

ASARRAY ASTYPE

243

Array Calculation Methods

>>> a = array([[1,2,3],

[4,5,6]])

# sum() defaults to adding up

# all the values in an array.

>>> sum(a)

21

# supply the keyword axis to

# sum along the 0th axis

>>> sum(a, axis=0)

array([5, 7, 9])

# supply the keyword axis to

# sum along the last axis

>>> sum(a, axis=-1)

array([ 6, 15])

SUM FUNCTION

# a.sum() defaults to adding

# up all values in an array.

>>> a.sum()

21

# supply an axis argument to

# sum along a specific axis

>>> a.sum(axis=0)

array([5, 7, 9])

SUM ARRAY METHOD

# product along columns

>>> a.prod(axis=0)

array([ 4, 10, 18])

# functional form

>>> prod(a, axis=0)

array([ 4, 10, 18])

PRODUCT

244

Min/Max

# Find index of minimum value.

>>> a.argmin(axis=0)

2

# functional form

>>> argmin(a, axis=0)

2

ARGMIN

MIN

>>> a = array([2.,3.,0.,1.])

>>> a.min(axis=0)

0.0

# Use NumPy's amin() instead

# of Python's built-in min()

# for speedy operations on

# multi-dimensional arrays.

>>> amin(a, axis=0)

0.0

# Find index of maximum value.

>>> a.argmax(axis=0)

1

# functional form

>>> argmax(a, axis=0)

1

ARGMAX

MAX

>>> a = array([2.,3.,0.,1.])

>>> a.max(axis=0)

3.0

# functional form

>>> amax(a, axis=0)

3.0

245

Statistics Array Methods

>>> a = array([[1,2,3],

[4,5,6]])

# mean value of each column

>>> a.mean(axis=0)

array([ 2.5, 3.5, 4.5])

>>> mean(a, axis=0)

array([ 2.5, 3.5, 4.5])

>>> average(a, axis=0)

array([ 2.5, 3.5, 4.5])

# average can also calculate

# a weighted average

>>> average(a, weights=[1,2],

... axis=0)

array([ 3., 4., 5.])

MEAN

# Standard Deviation

>>> a.std(axis=0)

array([ 1.5, 1.5, 1.5])

# variance

>>> a.var(axis=0)

array([2.25, 2.25, 2.25])

>>> var(a, axis=0)

array([2.25, 2.25, 2.25])

STANDARD DEV./VARIANCE

246

Other Array Methods

# Limit values to a range.

>>> a = array([[1,2,3],

[4,5,6]])

# Set values < 3 equal to 3.

# Set values > 5 equal to 5.

>>> a.clip(3, 5)

array([[3, 3, 3],

[4, 5, 5]])

CLIP

# Round values in an array.

# NumPy rounds to even, so

# 1.5 and 2.5 both round to 2.

>>> a = array([1.35, 2.5, 1.5])

>>> a.round()

array([ 1., 2., 2.])

# Round to first decimal place.

>>> a.round(decimals=1)

array([ 1.4, 2.5, 1.5])

ROUND

# Calculate max – min for

# array along columns

>>> a.ptp(axis=0)

array([3, 3, 3])

# max – min for entire array.

>>> a.ptp(axis=None)

5

PEAK TO PEAK

247

Summary of (most) array

attributes/methods (1/4)

BASIC ATTRIBUTES

a.dtype Numerical type of array elements: float 32, uint8, etc.

a.shape Shape of array (m, n, o, …)

a.size Number of elements in entire array

a.itemsize Number of bytes used by a single element in the array

a.nbytes Number of bytes used by entire array (data only)

a.ndim Number of dimensions in the array

SHAPE OPERATIONS

a.flat An iterator to step through array as if it were 1D

a.flatten() Returns a 1D copy of a multi-dimensional array

a.ravel() Same as flatten(), but returns a "view" if possible

a.resize(new_size) Changes the size/shape of an array in place

a.swapaxes(axis1, axis2) Swaps the order of two axes in an array

a.transpose(*axes) Swaps the order of any number of array axes

a.T Shorthand for a.transpose()

a.squeeze() Removes any length==1 dimensions from an array

248

Summary of (most) array

attributes/methods (2/4)

FILL AND COPY

a.copy() Returns a copy of the array

a.fill(value) Fills an array with a scalar value

CONVERSION/COERCION

a.tolist() Converts array into nested lists of values

a.tostring() Raw copy of array memory into a Python string

a.astype(dtype) Returns array coerced to the given type

a.byteswap(False) Converts byte order (big <->little endian)

a.view(type_or_dtype) Creates a new ndarray that sees the same memory but interprets it as a

new datatype (or subclass of ndarray)

COMPLEX NUMBERS

a.real Returns the real part of the array

a.imag Returns the imaginary part of the array

a.conjugate() Returns the complex conjugate of the array

a.conj() Returns the complex conjugate of the array (same as conjugate)

249

Summary of (most) array

attributes/methods (3/4)

SAVING

a.dump(file) Stores binary array data to file

a.dumps() Returns a binary pickle of the data as a string

a.tofile(fid, sep="",

format="%s") Formatted ASCII output to a file

SEARCH/SORT

a.nonzero() Returns indices for all non-zero elements in the array

a.sort(axis=-1) Sort the array elements in place, along axis

a.argsort(axis=-1) Finds indices for sorted elements, along axis

a.searchsorted(b) Finds indices where elements of b would be inserted in a to maintain order

ELEMENT MATH OPERATIONS

a.clip(low, high) Limits values in the array to the specified range

a.round(decimals=0) Rounds to the specified number of digits

a.cumsum(axis=None) Cumulative sum of elements along axis

a.cumprod(axis=None) Cumulative product of elements along axis

250

Summary of (most) array

attributes/methods (4/4)

All the following methods “reduce” the size of the array by 1 dimension by

carrying out an operation along the specified axis. If axis is None, the

operation is carried out across the entire array.

REDUCTION METHODS

a.sum(axis=None) Sums values along axis

a.prod(axis=None) Finds the product of all values along axis

a.min(axis=None) Finds the minimum value along axis

a.max(axis=None) Finds the maximum value along axis

a.argmin(axis=None) Finds the index of the minimum value along axis

a.argmax(axis=None) Finds the index of the maximum value along axis

a.ptp(axis=None) Calculates a.max(axis) – a.min(axis)

a.mean(axis=None) Finds the mean (average) value along axis

a.std(axis=None) Finds the standard deviation along axis

a.var(axis=None) Finds the variance along axis

a.any(axis=None) True if any value along axis is non-zero (logical OR)

a.all(axis=None) True if all values along axis are non-zero (logical AND)

251

Array Creation Functions

arange(start,stop=None,step=1,

dtype=None)

Nearly identical to Python’s range().

Creates an array of values in the range

[start,stop) with the specified step value.

Allows non-integer values for start, stop, and

step. Default dtype is derived from the start,

stop, and step values.

>>> arange(4)

array([0, 1, 2, 3])

>>> arange(0, 2*pi, pi/4)

array([ 0.000, 0.785, 1.571,

2.356, 3.142, 3.927, 4.712,

5.497])

# Be careful…

>>> arange(1.5, 2.1, 0.3)

array([ 1.5, 1.8, 2.1])

ARANGE ONES, ZEROS

ones(shape, dtype=float64)

zeros(shape, dtype=float64)

shape is a number or sequence specifying

the dimensions of the array. If dtype is not

specified, it defaults to float64.

>>> ones((2,3),dtype=float32)

array([[ 1., 1., 1.],

[ 1., 1., 1.]],

dtype=float32)

>>> zeros(3)

array([ 0., 0., 0.])

252

Array Creation Functions (cont.)

# Generate an n by n identity

# array. The default dtype is

# float64.

>>> a = identity(4)

>>> a

array([[ 1., 0., 0., 0.],

[ 0., 1., 0., 0.],

[ 0., 0., 1., 0.],

[ 0., 0., 0., 1.]])

>>> a.dtype

dtype('float64')

>>> identity(4, dtype=int)

array([[ 1, 0, 0, 0],

[ 0, 1, 0, 0],

[ 0, 0, 1, 0],

[ 0, 0, 0, 1]])

EMPTY AND FILL

IDENTITY

# empty(shape, dtype=float64,

# order='C')

>>> a = empty(2)

>>> a

array([1.78021120e-306,

6.95357225e-308])

# fill array with 5.0

>>> a.fill(5.0)

array([5., 5.])

# alternative approach

# (slightly slower)

>>> a[:] = 4.0

array([4., 4.])

253

Array Creation Functions (cont.)

# Generate N evenly spaced

# elements between (and

# including) start and

# stop values.

>>> linspace(0,1,5)

array([0.,0.25.,0.5,0.75, 1.0])

ROW SHORTCUT

LINSPACE

# r_ and c_ are "handy" tools

# (cough hacks…) for creating

# row and column arrays.

# used like arange

# -- real stride value

>>> r_[0:1:.25]

array([ 0., 0.25., 0.5, 0.75])

# used like linspace

# -- complex stride value

>>> r_[0:1:5j]

array([0.,0.25.,0.5,0.75,1.0])

# concatenate elements

>>> r_[(1,2,3),0,0,(4,5)]

array([1, 2, 3, 0, 0, 4, 5])

LOGSPACE

# Generate N evenly spaced

# elements on a log scale

# between base**start and

# base**stop (default base=10).

>>> logspace(0,1,5)

array([ 1., 1.77, 3.16, 5.62,

10.])

254

Array Creation Functions (cont.)

OGRID

MGRID

# Construct an "open" grid

# of points (not filled in

# but correctly shaped for

# math operations to be

# broadcast correctly).

>>> x,y = ogrid[0:3,0:3]

>>> x

array([[0],

[1],

[2]])

>>> y

array([[0, 1, 2]])

>>> print x+y

[[0 1 2]

[1 2 3]

[2 3 4]]

# Get equally spaced points

# in N output arrays for an

# N-dimensional (mesh) grid.

>>> x,y = mgrid[0:5,0:5]

>>> x

array([[0, 0, 0, 0, 0],

[1, 1, 1, 1, 1],

[2, 2, 2, 2, 2],

[3, 3, 3, 3, 3],

[4, 4, 4, 4, 4]])

>>> y

array([[0, 1, 2, 3, 4],

[0, 1, 2, 3, 4],

[0, 1, 2, 3, 4]])

255

Matrix Objects

# Matlab-like creation from string

>>> A = mat('1,2,4;2,5,3;7,8,9')

>>> print A

Matrix([[1, 2, 4],

[2, 5, 3],

[7, 8, 9]])

# matrix exponents

>>> print A**4

Matrix([[ 6497, 9580, 9836],

[ 7138, 10561, 10818],

[18434, 27220, 27945]])

# matrix multiplication

>>> print A*A.I

Matrix([[ 1., 0., 0.],

[ 0., 1., 0.],

[ 0., 0., 1.]])

BLOCK MATRICES

MATRIX CREATION

# create a matrix from

# sub-matrices

>>> a = array([[1,2],

[3,4]])

>>> b = array([[10,20],

[30,40]])

>>> bmat('a,b;b,a')

matrix([[ 1, 2, 10, 20],

[ 3, 4, 30, 40],

[10, 20, 1, 2],

[30, 40, 3, 4]])

256

Element by element distance

calculation using

Trig and Other Functions

sin(x) sinh(x)

cos(x) cosh(x)

arccos(x) arccosh(x)

arctan(x) arctanh(x)

arcsin(x) arcsinh(x)

arctan2(x,y)

22 yx 

TRIGONOMETRIC

exp(x) log(x)

log10(x) sqrt(x)

absolute(x) conjugate(x)

negative(x) ceil(x)

floor(x) fabs(x)

hypot(x,y) fmod(x,y)

maximum(x,y) minimum(x,y)

OTHERS

hypot(x,y)

VECTOR OPERATIONS

dot(x,y) vdot(x,y)

inner(x,y) outer(x,y)

cross(x,y) kron(x,y)

tensordot(x,y[,axis])

257

More Basic Functions

TYPE HANDLING

iscomplexobj

iscomplex

isrealobj

isreal

imag

real

real_if_close

isscalar

isneginf

isposinf

isinf

isfinite

isnan

nan_to_num

common_type

typename

SHAPE MANIPULATION

atleast_1d

atleast_2d

atleast_3d

expand_dims

apply_over_axes

apply_along_axis

hstack

vstack

dstack

column_stack

hsplit

vsplit

dsplit

split

squeeze

OTHER USEFUL FUNCTIONS

fix

mod

amax

amin

ptp

sum

cumsum

prod

cumprod

diff

angle

roots

poly

any

all

disp

unique

nansum

nanmax

nanargmax

nanargmin

nanmin

unwrap

sort_complex

trim_zeros

fliplr

flipud

rot90

eye

diag

select

extract

insert

258

Vectorizing Functions

# special.sinc already available

# This is just for show.

def sinc(x):

if x == 0.0:

return 1.0

else:

w = pi*x

return sin(w) / w

# attempt

>>> x = array((1.3, 1.5))

>>> sinc(x)

ValueError: The truth value of

an array with more than one

element is ambiguous. Use

a.any() or a.all()

>>> from numpy import vectorize

>>> vsinc = vectorize(sinc)

>>> vsinc(x)

array([-0.1981, -0.2122])

>>> x2 = linspace(-5, 5, 101)

>>> plot(x2, vsinc(x2))

SOLUTION

SCALAR SINC FUNCTION

259

Mathematical Binary Operators

a + b  add(a,b)

a - b  subtract(a,b)

a % b  remainder(a,b)

a * b  multiply(a,b)

a / b  divide(a,b)

a ** b  power(a,b)

MULTIPLY BY A SCALAR

ELEMENT BY ELEMENT

ADDITION

ADDITION USING AN OPERATOR

FUNCTION

>>> a = array((1,2))

>>> a*3.

array([3., 6.])

>>> a = array([1,2])

>>> b = array([3,4])

>>> a + b

array([4, 6])

>>> add(a,b)

array([4, 6])

# Overwrite contents of a.

# Saves array creation

# overhead.

>>> add(a,b,a) # a += b

array([4, 6])

>>> a

array([4, 6])

IN-PLACE OPERATION

260

Comparison and Logical Operators

>>> a = array(((1,2,3,4),(2,3,4,5)))

>>> b = array(((1,2,5,4),(1,3,4,5)))

>>> a == b

array([[True, True, False, True],

[False, True, True, True]])

# functional equivalent

>>> equal(a,b)

array([[True, True, False, True],

[False, True, True, True]])

equal (==)

greater_equal (>=)

logical_and

logical_not

not_equal (!=)

less (<)

logical_or

greater (>)

less_equal (<=)

logical_xor

2-D EXAMPLE Be careful with if statements

involving numpy arrays. To

test for equality of arrays,

don't do:

if a == b:

Rather, do:

if all(a==b):

For floating point,

if allclose(a,b):

is even better.

261

Bitwise Operators

>>> a = array((1,2,4,8))

>>> b = array((16,32,64,128))

>>> bitwise_or(a,b)

array([ 17, 34, 68, 136])

# bit inversion

>>> a = array((1,2,3,4), uint8)

>>> invert(a)

array([254, 253, 252, 251], dtype=uint8)

# left shift operation

>>> left_shift(a,3)

array([ 8, 16, 24, 32], dtype=uint8)

bitwise_and (&)

bitwise_or (|)

right_shift (>>)

left_shift (<<)

invert (~)

bitwise_xor (^)

BITWISE EXAMPLES

When possible,

operation made

bitwise are another

way to speed up

computations.

262

Bitwise and Comparison Together

# When combining comparisons with bitwise operations,

# precedence requires parentheses around the comparisons.

>>> a = array([1,2,4,8])

>>> b = array([16,32,64,128])

>>> (a > 3) & (b < 100)

array([ False, False, True, False])

PRECEDENCE ISSUES

LOGICAL AND ISSUES

# Note that logical AND isn't supported for arrays without

# calling the logical_and function.

>>> a>3 and b<100

Traceback (most recent call last):

ValueError: The truth value of an array with more than one

element is ambiguous. Use a.any() or a.all()

# Also, you cannot currently use the "short version" of

# comparison with NumPy arrays.

>>> 2<a<4

Traceback (most recent call last):

ValueError: The truth value of an array with more than one

element is ambiguous. Use a.any() or a.all()

263

Universal Function Methods

The mathematical, comparative, logical, and bitwise

operators

op

that take two arguments (binary operators)

have special methods that operate on arrays:

op.reduce(a,axis=0)

op.accumulate(a,axis=0)

op.outer(a,b)

op.reduceat(a,indices)

264

op.reduce()

]1[...]1[]0[

][

1

0











Naaa

na

N

n

(a)add.reducey

op.reduce(a) applies op to all

the elements in a 1-D array a

reducing it to a single value.

For example:

>>> a = array([1,2,3,4])

>>> add.reduce(a)

10

>>> a = array([1,1,0,1])

>>> logical_and.reduce(a)

False

>>> logical_or.reduce(a)

True

ADD EXAMPLE

STRING LIST EXAMPLE

LOGICAL OP EXAMPLES

>>> a = array(['ab','cd','ef'],

... dtype=object)

>>> add.reduce(a)

'abcdef'

265

op.reduce()

>>> add.reduce(a)

array([60, 64, 68])

For multidimensional arrays, op.reduce(a,axis)applies op to the elements of

a along the specified axis. The resulting array has dimensionality one less than a.

The default value for axis is 0.

30 31 32

20 21 22

10 11 12

0 1 2

93

63

33

3

>>> add.reduce(a,1)

array([ 3, 33, 63, 93])

30 31 32

20 21 22

10 11 12

0 1 2

60 64 68

SUM COLUMNS BY DEFAULT SUMMING UP EACH ROW

1

0

1

0

266

op.accumulate()

>>> a = array([1,1,0])

>>> logical_and.accumulate(a)

array([True, True, False])

>>> logical_or.accumulate(a)

array([True, True, True])

op.accumulate(a) creates a

new array containing the intermediate

results of the reduce operation at

each element in a.

For example:















 







0

1

0

1

0

][],...,[],[

n

N

nn

nanana

late(a)add.accumuy

ADD EXAMPLE

>>> a = array([1,2,3,4])

>>> add.accumulate(a)

array([ 1, 3, 6, 10])

STRING LIST EXAMPLE

>>> a = array(['ab','cd','ef'],

... dtype=object)

>>> add.accumulate(a)

array([ab,abcd,abcdef],

dtype=object)

LOGICAL OP EXAMPLES

267

op.reduceat()

>>> a = array([0,10,20,30,

... 40,50])

>>> indices = array([1,4])

>>> add.reduceat(a,indices)

array([60, 90])

op.reduceat(a,indices)

applies op to ranges in the 1-D array a

defined by the values in indices.

The resulting array has the same length

as indices.

For multidimensional arrays,

reduceat() is always applied along the

last

axis (sum of rows for 2-D arrays).

This is different from the default for

reduce() and accumulate().

1 4

010 20 30 40 50

EXAMPLE





][

][

][y[i]

1iindices

iindicesn

na

For example:

y = add.reduceat(a, indices)

268

op.outer()

op.outer(a,b) forms all possible combinations of elements between

a and b using op. The shape of the resulting array results from

concatenating the shapes of a and b. (Order matters.)

>>> add.outer(a,b)

a[3]+b[0] a[3]+b[1] a[3]+b[2]

a[2]+b[0] a[2]+b[1] a[2]+b[2]

a[1]+b[0] a[1]+b[1] a[1]+b[2]

a[0]+b[0] a[0]+b[1] a[0]+b[2]

a b

b[0] b[1] b[2]a[0] a[1] a[2] a[3]

b[2]+a[0]

b[1]+a[0]

b[0]+a[0]

b[2]+a[1]

b[1]+a[1]

b[0]+a[1]

b[2]+a[2] b[2]+a[3]

b[1]+a[2] b[1]+a[3]

b[0]+a[2] b[0]+a[3]

>>> add.outer(b,a)

269

Array Functions – choose()

c0 c1 c2 c3

choice_array

0 1 3

1 3

0 0 0

6 7 8

3 4 5

0 1 2

5 5 5

2 2 2

2 2 2 9

2

6 5 9

5 9

0 1 2

2

choose()

y

>>> y = choose(choice_array,(c0,c1,c2,c3))

270

Example - choose()

>>> a < 10

array([[True, True, True],

[False, False, False],

dtype=bool)

>>> choose(a<10,(a,10))

array([[10, 10, 10],

[10, 11, 12],

[20, 21, 22]])

>>> lt = a < 10

>>> gt = a > 15

>>> choice = lt + 2 * gt

>>> choice

array([[1, 1, 1],

[0, 0, 0],

[2, 2, 2]])

>>> choose(choice,(a,10,15))

array([[10, 10, 10],

[10, 11, 12],

[15, 15, 15]])

>>> a

array([[ 0, 1, 2],

[10, 11, 12],

[20, 21, 22]])

CLIP LOWER AND UPPER

VALUES

CLIP LOWER VALUES TO 10

271

Array Functions – concatenate()

concatenate((a0,a1,…,aN),axis=0)

The input arrays (a0,a1,…,aN) are concatenated along the given

axis. They must have the same shape along every axis

except

the

one given.

10 11 12

0 1 2

60 61 62

50 51 52

60 61 62

50 51 52

10 11 12

0 1 2

60 61 62

50 51 52

10 11 12

0 1 2

10 11 12

0 1 2

60 61 62

50 51 52

x y

>>> concatenate((x,y)) >>> concatenate((x,y),1) >>> array((x,y))

See also vstack(), hstack() and dstack() respectively.

272

Array Broadcasting

NumPy arrays of different dimensionality can be combined in the same

expression. Arrays with smaller dimension are broadcasted to match the

larger arrays, without copying data. Broadcasting has two rules.

In [3]: a = ones((3, 5)) # a.shape == (3, 5)

In [4]: b = ones((5,)) # b.shape == (5,)

In [5]: b.reshape(1, 5) # result is a (1,5)-shaped array.

In [6]: b[newaxis, :] # equivalent, more concise.

RULE 1: PREPEND ONES TO SMALLER ARRAYS’ SHAPE

In [7]: c = a + b # c.shape == (3, 5)

is logically equivalent to...

In [8]: tmp_b = b.reshape(1, 5)

In [9]: tmp_b_repeat = tmp_b.repeat(3, axis=0)

In [10]: c = a + tmp_b_repeat

# But broadcasting makes no copies of “b”s data!

RULE 2: DIMENSIONS OF SIZE 1 ARE REPEATED WITHOUT COPYING

273

Array Broadcasting

stretch stretch

stretch

4x1 3

4x3 3

4x3 4x3

274

Broadcasting Rules

+

30 30 30

20 20 20

10 10 10

0 0 0 0 1 2

=

3

The

trailing

axes of either arrays must be 1 or both must have the same

size for broadcasting to occur. Otherwise, a

"ValueError: shape mismatch: objects cannot be

broadcast to a single shape" exception is thrown.

4x3 4

mismatch

!

275

Broadcasting in Action

>>> a = array((0,10,20,30))

>>> b = array((0,1,2))

>>> y = a[:, newaxis] + b

+

30

20

10

00 1 2

=

30 31 32

20 21 22

10 11 12

0 1 2

276

Application: distance from center

In [1]: a = linspace(0, 1, 15) - 0.5

In [2]: b = a[:, newaxis] # b.shape == (15, 1)

In [3]: dist2 = a**2 + b**2 # broadcasting sum.

In [4]: dist = sqrt(dist2)

In [5]: imshow(dist); colorbar()

277

Broadcasting’s usefulness

Broadcasting can often be used to replace needless data replication inside a

NumPy array expression.

np.meshgrid() – use newaxis appropriately in broadcasting expressions.

np.repeat() – broadcasting makes repeating an array along a dimension of

size 1 unnecessary.

In [3]: x, y = \

meshgrid([1,2],[3,4,5])

In [4]: z = x + y

MESHGRID: COPIES DATA

In [5]: x = array([1,2])

In [6]: y = array([3,4,5])

In [7]: z = \

x[newaxis,:] + y[:,newaxis]

BROADCASTING: NO COPIES

278

Vector Quantization Example

Feature 1

Feature 2

Target 1

Target 2

0

1

2

3

4

279

Vector Quantization Example

Feature 1

Feature 2

0

1

2

3

4

Minimum

Distance

280

Vector Quantization Example

Feature 1

Feature 2

Observations

0

1

2

3

4

281

Vector Quantization Example

diff = obs[newaxis,:,:] – book[:,newaxis,:]

c1

c2

c3

c4

c0

Code Book

( book )

x y z

x

Observations

( obs )

y z

10x3

5x3

o1

o2

o3

o4

o0

o6

o7

o8

o9

o5

o1o2 o3o0 o5o6o4 o7o8 o9

distance = sqrt(sum(diff**2,axis=-1))

code_index = argmin(distance,axis=0))

5x10x3 1x10x3 5x1x3

282

VQ Speed Comparisons

Method Run Time

(sec) Speed

Up

Matlab 5.3 1.611 -

Python VQ1, double

2.245 0.71

Python VQ1, float 1.138 1.42

Python VQ2, double

1.637 0.98

Python VQ2, float 0.954 1.69

C, double 0.066 24.40

C, float 0.064 24.40

•4000 observations with 16 features categorized into 40 codes

on Pentium III 500 MHz.

•VQ1 uses the technique described on the previous slide verbatim.

•VQ2 applies broadcasting on an observation by observation basis.

This turned out to be much more efficient because it is less memory intensive.

283

Broadcasting Indices

Broadcasting can also be used to slice elements from different

“depths” in a 3-D (or any other shape) array. This is a very powerful

feature of indexing.

>>> xi,yi = ogrid[:3,:3]

>>> zi = array([[0, 1, 2],

[1, 1, 2],

[2, 2, 2]])

>>> horizon = data_cube[xi,yi,zi]

Indices Selected Data

284

“Structured” Arrays

# "Data structure" (dtype) that describes the fields and

# type of the items in each array element.

>>> particle_dtype = dtype([('mass','float32'), ('velocity', 'float32')])

# This must be a list of tuples.

>>> particles = array([(1,1), (1,2), (2,1), (1,3)],

dtype=particle_dtype)

>>> print particles

[(1.0, 1.0) (1.0, 2.0) (2.0, 1.0) (1.0, 3.0)]

# Retrieve the mass for all particles through indexing.

>>> print particles['mass']

[ 1. 1. 2. 1.]

# Retrieve particle 0 through indexing.

>>> particles[0]

(1.0, 1.0)

# Sort particles in place, with velocity as the primary field and

# mass as the secondary field.

>>> particles.sort(order=('velocity','mass'))

>>> print particles

[(1.0, 1.0) (2.0, 1.0) (1.0, 2.0) (1.0, 3.0)]

# See demo/multitype_array/particle.py.

285

“Structured” Arrays

Brad Jane John Fred

33 25 47 54

135.0 105.0 225.0 140.0

Henry George Brian Amy

29 61 32 27

154.0 202.0 137.0 187.0

Ron Susan Jennifer Jill

19 33 18 54

188.0 135.0 88.0 145.0

name char[10]

age int

weight double

Elements of an array can be

any fixed-size data structure! EXAMPLE

>>> from numpy import dtype, empty

# structured data format

>>> fmt = dtype([('name', 'S10'),

('age', int),

('weight', float)

])

>>> a = empty((3,4), dtype=fmt)

>>> a.itemsize

22

>>> a['name'] = [['Brad', … ,'Jill']]

>>> a['age'] = [[33, … , 54]]

>>> a['weight'] = [[135, … , 145]]

>>> print a

[[('Brad', 33, 135.0)

…

('Jill', 54, 145.0)]]

286

Nested Datatype

nested.dat

287

Nested Datatype (cont’d)

>>> dt = dtype([('time', uint64),

... ('size', uint32),

... ('position', [('az', float32),

... ('el', float32),

... ('region_type', uint8),

... ('region_ID', uint16)]),

... ('gain', uint8),

... ('samples', int16, 2048)])

>>> data = loadtxt('nested.dat', dtype=dt, skiprows = 2)

>>> data['position']['az']

array([ 0.71559399, 0.70687598, 0.69815701, 0.68942302,

0.68068302, ...], dtype=float32)

The data file can be extracted with the following code:

288

Memory Mapped Arrays

•Methods for Creating:

–memmap: subclass of ndarray that manages the

memory mapping details.

–frombuffer: Create an array from a memory mapped

buffer object.

–ndarray constructor: Use the buffer keyword to

pass in a memory mapped buffer.

•Limitations:

–Files must be < 2GB on Python 2.4 and before.

–Files must be < 2GB on 32-bit machines.

–Python 2.5 and higher on 64 bit machines is

theoretically "limited" to 17.2 billion GB (17 Exabytes).

289

Memory Mapped Example

# Create a "memory mapped" array where

# the array data is stored in a file on

# disk instead of in main memory.

>>> from numpy import memmap

>>> image = memmap('some_file.dat',

dtype=uint16,

mode='r+',

shape=(5,5),

offset=header_size)

# Standard array methods work.

>>> mean_value = image.mean()

# Standard math operations work.

# The resulting scaled_image *is*

# stored in main memory. It is a

# standard numpy array.

>>> scaled_image = image * .5

image:

2D NumPy array

shape: 5, 5

dtype: uint16

some_file.dat

<header> 110111…

<data> 0110000001

0010010111011000

1101001001000100

1111010101000010

0010111000101011

00011110101011…

290

memmap

The memmap subclass of array handles opening and closing files as

well as synchronizing memory with the underlying file system.

memmap(filename, dtype=uint8, mode='r+',

offset=0, shape=None, order=0)

filename Name of the underlying file. For all modes, except for 'w+', the file must already exist

and contain at least the number of bytes used by the array.

dtype The numpy data type used for the array. This can be a "structured" dtype as well as

the standard simple data types.

offset Byte offset within the file to the memory used as data within the array.

mode <see next slide>

shape Tuple specifying the dimensions and size of each dimension in the array. shape=(5,10)

would create a 2D array with 5 rows and 10 columns.

order 'C' for row major memory ordering (standard in the C programming language) and 'F'

for column major memory ordering (standard in Fortran).

291

memmap -- mode

The mode setting for memmap arrays is used to set the access flag

when opening the specified file using the standard mmap module.

memmap(filename, dtype=uint8, mode='r+',

offset=0, shape=None, order=0)

mode A string indicating how the underlying file should be opened.

'r' or 'readonly': Open an existing file as an array for reading.

'c' or 'copyonwrite': "Copy on write" arrays are "writable" as Python arrays, but

they never modify the underlying file.

'r+' or 'readwrite': Create a read/write array from an existing file. The file will

have "write through" behavior where changes to the array are written to the

underlying file. Use the flush() method to ensure the array is

synchronized with the file.

'w+' or 'write': Create the file or overwrite if it exists. The array is filled

with zeros and has "write through" behavior similar to 'r+'.

292

memmap -- write through behavior

# Create a memory mapped "write through" file, overwriting it if it exists.

In [66]: q=memmap('new_file.dat',mode='w+',shape=(2,5))

In [67]: q

memmap([[0, 0, 0, 0, 0],

[0, 0, 0, 0, 0]], dtype=uint8)

# Print out the contents of the underlying file. Note: It

# doesn't print because 0 isn't a printable ascii character.

In [68]: !cat new_file.dat

# Now write the ascii value for 'A' (65) into our array.

In [69]: q[:] = ord('A')

In [70]: q

memmap([[65, 65, 65, 65, 65],

[65, 65, 65, 65, 65]], dtype=uint8)

# Ensure the OS has written the data to the file, and examine

# the underlying file. It is full of 'A's as we hope.

In [71]: q.flush()

In [72]: !cat new_file.dat

AAAAAAAAAA

293

memmap -- copy on write behavior

# Create a copy-on-write memory map where the underlying file is never

# modified. The file must already exist.

# This is a memory efficient way of working with data on disk as arrays but

# ensuring you never modify it.

In [73]: q=memmap('new_file.dat',mode='c',shape=(2,5))

In [74]: q

memmap([[65, 65, 65, 65, 65],

[65, 65, 65, 65, 65]], dtype=uint8)

# Set values in array to something new.

In [75]: q[1] = ord('B')

In [76]: q

memmap([[65, 65, 65, 65, 65],

[66, 66, 66, 66, 66]], dtype=uint8)

# Even after calling flush(), the underlying file is not updated.

In [77]: q.flush()

In [78]: !cat new_file.dat

AAAAAAAAAA

294

Using Offsets

# Create a memory mapped array with 10 elements.

In [1]: q=memmap('new_file.dat',mode='w+', dtype=uint8, shape=(10,))

In [2]: q[:] = arange(0,100,10)

memmap([ 0, 10, 20, 30, 40, 50, 60, 70, 80, 90], dtype=uint8)

# Now, create a new memory mapped array (read only) with an offset into

# the previously created file.

In [3]: q=memmap('new_file.dat',mode='r', dtype=uint8, shape=6, offset=4)

In [4]: q

memmap([40, 50, 60, 70, 80, 90], dtype=uint8)

# The number of bytes required by the array must be equal or less than

# the number of bytes available in the file.

In [3]: q=memmap('new_file.dat',mode='r', dtype=uint8, shape=7, offset=4)

ValueError: mmap length is greater than file size

0 10 20 30 40 50 60 70 80 90

new_file.dat

0 10 20 30 40 50 60 70 80 90

new_file.dat

295

64 bit floating point data…

Working with file headers

# Create a dtype to represent the header.

header_dtype = dtype([('rows', int32), ('cols', int32)])

# Create a memory mapped array using this dtype. Note the shape is empty.

header = memmap(file_name, mode='r', dtype=header_dtype, shape=())

# Read the row and column sizes from using this structured array.

rows = header['rows']

cols = header['cols']

# Create a memory map to the data segment, using rows, cols for shape

# information and the header size to determine the correct offset.

data = memmap(file_name, mode='r+', dtype=float64,

shape=(rows, cols), offset=header_dtype.itemsize)

rows (int32)

data

header

File Format: cols (int32)

296

64 bit floating point data…

memory maps with ndarray

# mmap is a standard Python module for working with memory maps.

import mmap

import numpy

# Create a dtype to represent the header.

header_dtype = numpy.dtype([('rows', int32), ('cols', int32)])

# Open a file for read/write access in binary mode.

file = open(file_name,'r+b')

# Create a read-only memory map from the opened file with the

# correct size to read the header of the file.

mm = mmap.mmap(file.fileno(), header_dtype.itemsize,

access=mmap.ACCESS_READ)

< continued >

rows (int32)

data

header

File Format: cols (int32)

297

64 bit floating point data…

memory maps with ndarray

# Create a new array using the ndarray constructor.

# The first argument is the shape, and we pass in the data type and the

# memory buffer to use (mm) as keyword arguments.

header = numpy.ndarray((), dtype=header_dtype, buffer=mm)

rows = header['rows']

cols = header['cols']

# Create a writable memory map to use for the data array. The size of the

# memory map in bytes is the size of a float64 (8) * rows * columns.

mm = mmap.mmap(file.fileno(), 8*rows*cols, access=mmap.ACCESS_WRITE)

# Create our data array using this new memory map. Start the arrays

# data at the memory location directly after the header using offset.

data = numpy.ndarray((rows, cols), dtype=float64, buffer=mm,

offset=header_dtype.itemsize)

rows (int32)

data

header

File Format: cols (int32)

298

Structured Arrays

Example

Elements of array can be any

fixed-size data structure!

char[12] int64 float32

Name Time Value

MSFT_profit 10 6.20

GOOG_profit 12 -1.08

MSFT_profit 18 8.40

INTC_profit 25 -0.20

GOOG_profit 1000325 3.20

GOOG_profit 1000350 4.50

INTC_profit 1000385 -1.05

MSFT_profit 1000390 5.60

>>> import numpy as np

>>> fmt = np.dtype([('name', 'S12'),

('time', np.int64),

('value', np.float32)])

>>> vals = [('MSFT_profit', 10, 6.20),

('GOOG_profit', 12, -1.08),

('INTC_profit', 1000385, -1.05)

('MSFT_profit', 1000390, 5.60)]

>>> arr = np.array(vals, dtype=fmt)

# or

>>> arr = np.fromfile('db.dat', dtype=fmt)

# or

>>> arr = np.memmap('db.dat', dtype=fmt,

mode='c')

…

MSFT_profit 10 6.20 GOOG_profit 12 -1.08

Disk

INTC_profit 1000385 -1.05 MSFT_profit 1000390 5.60

299

Memmap Timings (3D arrays)

500

1000

500 MB

file

(16 bit)

z

x

y

Operations

(500x500x1000)

Linux OS X

In

Memory Memory

Mapped In

Memory Memory

Mapped

read 2103 ms 11.0 ms 3505.00 27.00

x slice 1.8 ms 4.8 ms 1.80 8.30

y slice 2.8 ms 4.6 ms 4.40 7.40

z slice 9.2 ms 13.8 ms 10.40 18.70

downsample

4x4 0.02 ms 125 ms 0.02 198.70

Linux: Ubuntu 4.10, Dell Precision 690, Dual Quad Core Zeon X5355 2.6 GHz, 8 GB Memory

OS X: OS X 10.5, MacBook Pro Laptop, 2.6 GHz Core Duo, 4 GB Memory

All times in millesconds (ms).

300

Parallel FFT On Memory Mapped File

z

x

y

P 1

P 2

P 3

P 4

500 MB

Data File

Processors Time

(seconds) Speed Up

1 11.75 1.0

2 6.06 1.9

4 3.36 3.5

8 2.50 4.7

301

Controlling Output Format

set_printoptions(precision=None, threshold=None,

edgeitems=None, linewidth=None,

suppress=None)

precision The number of digits of precision to use for floating point output.

The default is 8.

threshold Array length where NumPy starts truncating the output and prints

only the beginning and end of the array. The default is 1000.

edgeitems Number of array elements to print at beginning and end of array

when threshold is exceeded. The default is 3.

linewidth Characters to print per line of output. The default is 75.

suppress Indicates whether NumPy suppresses printing small floating point

values in scientific notation. The default is False.

302

Controlling Output Formats

>>> a = arange(1e6)

>>> a

array([ 0.00000000e+00, 1.00000000e+00, 2.00000000e+00, ...,

9.99997000e+05, 9.99998000e+05, 9.99999000e+05])

>>> set_printoptions(precision=3)

>>> a

array([ 0.000e+00, 1.000e+00, 2.000e+00, ...,

1.000e+06, 1.000e+06, 1.000e+06])

PRECISION

>>> set_printoptions(precision=8)

>>> a = array((1, 2, 3, 1e-15))

>>> a

array([ 1.00000000e+00, 2.00000000e+00, 3.00000000e+00,

1.00000000e-15])

>>> set_printoptions(suppress=True)

>>> a

array([ 1., 2., 3., 0.])

SUPPRESSING SMALL NUMBERS

303

Controlling Error Handling

seterr(all=None, divide=None, over=None,

under=None, invalid=None)

all All error types to the specified value

divide Divide-by-zero errors

over Overflow errors

under Underflow errors

invalid Invalid floating point errors

Set the error handling flags in ufunc operations on a per thread basis.

Each of the keyword arguments can be set to ‘ignore’, ‘warn’, ‘print’, ‘log’,

‘raise’, or ‘call’.

304

Controlling Error Handling

>>> a = array((1,2,3))

>>> a/0.

Warning: divide by zero encountered in divide

array([ 1.#INF0000e+000, 1.#INF0000e+000, 1.#INF0000e+000])

# Ignore division-by-zero. Also, save old values so that

# we can restore them.

>>> old_err = seterr(divide='ignore')

>>> a/0.

array([ 1.#INF0000e+000, 1.#INF0000e+000, 1.#INF0000e+000])

# Restore original error handling mode.

>>> old_err

{'divide': 'print', 'invalid': 'print', 'over': 'print',

'under': 'ignore'}

>>> seterr(**old_err)

>>> a/0.

Warning: divide by zero encountered in divide

array([ 1.#INF0000e+000, 1.#INF0000e+000, 1.#INF0000e+000])

305

Interactive 3D Visualization

with

Mayavi-mlab

306

3D Visualization with Mayavi

C:\ ipython –-pylab

>>> from mayavi import mlab

matplotlib also has an mlab

namespace. Be sure you are using

the one from mayavi

307

Interactive Example

# create arrays

>>> x, y = mgrid[-5:5:100j,-5:5:100j]

>>> r = x**2 + y**2

>>> z = sin(r)/r

# plot with mayavi

>>> from mayavi import mlab

>>> mlab.surf(x,y,5*z)

# plot with pylab

>>> imshow(z)

308

Some Plotting Commands

0D data

mlab.points3d(x, y, z)

1D data

mlab.plot3d(x, y, z)

3D data

mlab.contour3d(x, y, z)

Vector field

mlab.quiver(x, y, z, u, v, w)

2D data

mlab.surf(x, y, z)

309

Points in 3D

mlab.points3d(x, y, z, color=(1.0,0.0,1.0), mode=‘sphere’, scale_factor=0.1)

x.shape == y.shape == z.shape

color = (R, G, B)

0.0 <= R, G, B <= 1.0

default is (1.0, 1.0, 1.0)

mode = ‘sphere’, ‘cone’, ‘cube’, ‘arrow’, ‘cylinder’, ‘point’,

‘2darrow’, ‘2dcircle’, ‘2dcross’, ‘2ddash’, ‘2ddiamond’,

‘2dhooked_arrow’, ‘2dsquare’, ‘2dthick_arrow’,

‘2dthick_cross’, ‘2dtriangle’, ‘2dvertex’

scaling applied

from numpy.random import rand

x,y,z = rand(30),rand(30),rand(30)

mlab.axes()

310

Lines in 3D

mlab.plot3d(x, y, z, color=(0.0,1.0,1.0), tube_radius=0.03)

1-d arrays giving end-points of

connected line-segments

len(x) == len(y) == len(z)

color = (R, G, B)

0.0 <= R, G, B <= 1.0

default is (1.0, 1.0, 1.0)

A float giving the radius of the

tubes representing the lines.

default is 0.025

from numpy.random import rand

x,y,z = rand(5), rand(5), rand(5)

mlab.points3d(x,y,z,scale_factor=0.06)

mlab.axes()

311

Surfaces in 3D

mlab.surf(z, colormap=‘RdBu’) or mlab.surf(x, y, z, colormap=‘RdBu’)

2d array of “heights”

1d (or 2d) arrays producing a (possibly

non-uniform) orthogonal grid (such as

returned from ogrid or mgrid)

2d array of “heights”

from numpy import ogrid, hypot

from scipy.special import j0

x,y = ogrid[-15:15:100j, -15:15:100j]

z = j0(hypot(x,y))

312

Volumes in 3D

mlab.contour3d(data, contours=6, colormap=‘Dark2’)

3d array of

volume data

from numpy import ogrid, tanh, cos

from scipy.special import j0

x,y,z = ogrid[-5:5:64j, -5:5:64j,

-5:5:64j]

data = cos(0.5*y)*j0(x+y*1.3)*sin(0.8*z)

mlab.axes(ranges=[-5,5]*3)

Integer number of contours or

List of specific contours

313

Vector Fields in 3D – quiver3d

mlab.quiver3d(u, v, w) or mlab.quiver3d(x, y, z, u, v, w)

3-d arrays each

representing a component

of a 3-d vector field. All

must be the same shape.

1, 2, or 3-d arrays of the same

shape as u, v, and w providing

real locations of the vectors.

from numpy import mgrid

x,y,z = mgrid[-5:5:12j, -5:5:12j, -5:5:12j]

u,v,w = -z, y, x

mlab.quiver3d(u,v,w, colormap=‘OrRd’)

any colormap argument will color glyphs

according to magnitude of vector.

1, 2, or 3-d arrays of the

same shape as x, y, and z

representing components

of the vectors.

314

Vector Fields in 3D – flow

mlab.flow(u, v, w, linetype=‘tube’, seedtype=‘sphere’, colormap=‘PuRd’)

3-d arrays representing

vector field. Can also be

called with x,y,z,u,v,w

where x, y, and z are all

3-d arrays.

linetype= ‘line’, ‘ribbon’, or ‘tube’

from numpy import mgrid

x,y,z = mgrid[-5:5:12j, -5:5:12j, -5:5:12j]

u,v,w = -z, y, x

any colormap argument will color

according to magnitude of vector.

seedtype = ‘sphere’, ‘plane’, ‘line’, or

‘point’

streamlines are calculated from

seedpoints spaced throughout the

specified seed widget. The seed

widget is shown and can be interacted

with to compute new streamlines.

315

Figure Decorations

>>> mlab.title('A title')

>>> mlab.axes()

>>> mlab.colorbar()

>>> zlabel('Depth')

>>> mlab.clf()

>>> mlab.figure()

>>> mlab.gcf()

316

Graphical User Interface

mlab.show_pipeline()

317

Examples and Demos

>>> mlab.test_<TAB>

mlab.test_points3d()

mlab.test_plot3d()

mlab.test_surf()

mlab.test_contour3d()

mlab.test_quiver3d()

mlab.test_molecule()

mlab.test_flow()

mlab.test_mesh()

Use ?? in IPython to look at the

source code of these examples.

318

Scientific Analysis

with

SciPy

319

Overview

CURRENT PACKAGES

•Available at www.scipy.org

•Open source BSD style license

•38 contributors to the project in 2011

•Special Functions (scipy.special)

•Signal Processing (scipy.signal)

•Image Processing (scipy.ndimage)

•Fourier Transforms (scipy.fftpack)

•Optimization (scipy.optimize)

•Numerical Integration (scipy.integrate)

•Linear Algebra (scipy.linalg)

•Input/Output (scipy.io)

•Statistics (scipy.stats)

•Fast Execution (scipy.weave)

•Clustering Algorithms (scipy.cluster)

•Sparse Matrices (scipy.sparse)

•Interpolation (scipy.interpolate)

•...and more.

Note: “import scipy” does NOT import all these packages. Must be imported individually

320

Statistics

scipy.stats — CONTINUOUS DISTRIBUTIONS

over 80

continuous

distributions!

pdf

cdf

rvs

ppf

stats

fit

sf

isf

METHODS

entropy

nnlf

moment

freeze

321

Using stats objects

>>> from scipy.stats import norm

# Sample normal dist. 100 times.

>>> samp = norm.rvs(size=100)

>>> x = linspace(-5, 5, 100)

# Calculate probability dist.

>>> pdf = norm.pdf(x)

# Calculate cumulative dist.

>>> cdf = norm.cdf(x)

>>> x = linspace(0, 1, 100)

# Calculate Percent Point Function

>>> ppf = norm.ppf(x)

DISTRIBUTIONS

322

Distribution objects

Every distribution can be modified by loc and scale keywords

(many distributions also have required shape arguments to select from a family)

LOCATION (loc) --- shift left (<0) or right (>0) the distribution

SCALE (scale) --- stretch (>1) or compress (<1) the distribution

323

Example distributions

location offset from zero (rarely used)

scale e 

shape 

NORM (norm) – N(,)

Only location and scale

arguments:

LOG NORMAL (lognorm)

S is lognormal

location mean 

scale standard deviation 

log(S) is N(, )

one shape parameter!

324

Setting location and Scale

>>> from scipy.stats import norm

# Normal dist with mean=10 and std=2

>>> dist = norm(loc=10, scale=2)

>>> x = linspace(-5, 15, 100)

# Calculate probability dist.

>>> pdf = dist.pdf(x)

# Calculate cumulative dist.

>>> cdf = dist.cdf(x)

# Get 100 random samples from dist.

>>> samp = dist.rvs(size=100)

# Estimate parameters from data

>>> mu, sigma = norm.fit(samp)

>>> print “%4.2f, %4.2f” % (mu, sigma)

10.07, 1.95

NORMAL DISTRIBUTION

.fit returns best

shape + (loc, scale)

that explains the data

325

Statistics

scipy.stats — Discrete Distributions

10 standard

discrete

distributions

(plus any

finite RV)

pmf

cdf

rvs

ppf

stats

sf

isf

METHODS

moment

entropy

freeze

326

Using stats objects

# Create loaded dice.

>>> from scipy.stats import rv_discrete

>>> xk = [1, 2, 3, 4, 5, 6]

>>> pk = [0.3, 0.35, 0.25, 0.05,

0.025, 0.025]

>>> new = rv_discrete(name='loaded',

values=(xk,pk))

# Calculate histogram

>>> samples = new.rvs(size=1000)

>>> bins = linspace(0.5, 6.5, 7)

>>> subplot(211)

>>> hist(samples, bins=bins, normed=True)

# Calculate pmf

>>> x = range(0, 8)

>>> subplot(212)

>>> stem(x,new.pmf(x))

CREATING NEW DISCRETE DISTRIBUTIONS

327

Statistics

# Sample two normal distributions

# and create a bimodal distribution

>>> rv1 = stats.norm()

>>> rv2 = stats.norm(2.0, 0.8)

>>> samples = hstack([rv1.rvs(size=100),

rv2.rvs(size=100)])

# Use a Gaussian kernel density to

# estimate the PDF for the samples.

>>> from scipy.stats.kde import gaussian_kde

>>> approximate_pdf = gaussian_kde(samples)

>>> x = linspace(-3, 6, 200)

# Compare the histogram of the samples to

# the PDF approximation.

>>> hist(samples, bins=25, normed=True)

>>> plot(x, approximate_pdf(x), 'r')

CONTINUOUS DISTRIBUTION ESTIMATION USING GAUSSIAN KERNELS

328

Linear Algebra

scipy.linalg — FAST LINEAR ALGEBRA

•Uses optimized BLAS and MKL if available — very fast

•Low-level access to BLAS and LAPACK routines in

modules linalg.blas, and linalg.lapack (FORTRAN order)

•High level matrix routines

•Linear Algebra Basics: inv, pinv, solve, det, norm,

lstsq, ...

•Decompositions: eig, lu, svd, orth, cholesky, qr,

schur, ...

•Matrix Functions: expm, logm, sqrtm, cosm, coshm, funm

(general matrix functions)

329

Solving A*X = B

SOLVE Ax=b LEAST SQUARES Ax=b

>>> from scipy import linalg

>>> a = array([[1.0, 3.0, 2.0],

... [4.0, 0.0, 1.0],

... [2.0, 2.0, 2.0]])

>>> b = array([2.0, 1.0, 0.0])

>>> x = linalg.solve(a, b)

>>> x

array([ 1.5, 3.5, -5. ])

# Check:

>>> dot(a, x)

array([ 2., 1., 0.])

# Over-specified problem:

>>> a = array([[1.0, 4.0],

... [3.0, 0.0],

... [2.0, 1.0]])

>>> b = array([2.0, 1.0, 0.0])

>>> x, res, rnk, svals = \

... linalg.lstsq(a, b)

# Least squares solution:

>>> x

array([ 0.1832, 0.4059])

# Sum of squared residuals:

>>> res

0.83663366336633671

# Effective rank of `a`:

>>> rnk

2

# Singular values of `a`:

>>> svals

array([ 4.6567, 3.0521])

330

To solve many equations AX=B for different

matrices B, factor A =L*U , L is lower

triangular (L) and U is upper-triangular

>>> from scipy import linalg

>>> a = array([[1,3,5],

... [2,5,1],

... [2,3,6]])

# time consuming factorization

>>> lu, piv = linalg.lu_factor(a)

# fast solve for 1 or more

# right hand sides.

>>> b = array([10,8,3])

>>> linalg.lu_solve((lu, piv), b)

array([-7.82608696, 4.56521739,

0.82608696])

# check

>>> dot(a, _)

array([ 10., 8., 3.])

LU FACTORIZATION

>>> a = array([[12, -51, 4],

... [ 6, 167, -68],

... [-4, 24, -41]])

>>> q, r = linalg.qr(a)

>>> q

array([[-0.8571, 0.3943, 0.3314],

[-0.4286, -0.9029, -0.0343],

[ 0.2857, -0.1714,

0.9429]])

>>> r

array([[ -14., -21., 14.],

[ 0., -175., 70.],

[ -0., 0., -35.]])

QR FACTORIZATION

To solve Ax=b, factor A = QR, where Q is

orthogonal and R is upper triangular.

See the qr docstring for more advanced

options (e.g. pivoting for rank-revealing QR

decomposition).

Factorizations

331

Eigenvalues, eigenvectors

SCHUR DECOMPOSITION

>>> t, z = linalg.schur(a)

>>> t

array(

[[ 156.137, 64.737, 85.651],

[ 0. , 16.06 , 15.814],

[ 0. , 0. , -34.197]])

>>> z

array([[ 0.328, -0.94 , 0.098],

[-0.937, -0.337, -0.093],

[-0.121, 0.061, 0.991]])

# Diagonals in t are eigenvalues:

>>> linalg.eigvals(a)

array([ 156.137+0.j, 16.060+0.j,

-34.197+0.j])

Factor A = ZTZ-1, where Z is unitary and T is

upper triangular.

EIGENVALUES AND VECTORS

>>> a = array([[1,3,5],

... [2,5,1],

... [2,3,6]])

# compute eigenvalues/vectors

>>> vals, vecs = linalg.eig(a)

# print eigenvalues

>>> vals

array([ 9.39895873+0.j,

-0.73379338+0.j,

3.33483465+0.j])

# eigenvectors are in columns

# print first eigenvector

>>> vecs[:,0]

array([-0.57028326,

-0.41979215,

-0.70608183])

# norm of vector should be 1.0

>>> linalg.norm(vecs[:,0])

1.0

332

Linear Algebra

>>> a = array([[2, 1],

... [2, 0]])

>>> u, s, vt = linalg.svd(a)

>>> u

array([[-0.75 , -0.662],

[-0.662, 0.75 ]])

# s holds the singular values.

>>> s

array([ 2.921, 0.685])

>>> vt

array([[-0.966, -0.257],

[ 0.257, -0.966]])

# Use to inverse A

>>> inv_a = dot(vt.T,

dot(diag(1/s), u.T))

>>> dot(a, inv_a)

array([[ 1.0, -4.44089210e-16],

[ -2.22044605e-16, 1.0]])

SINGULAR VALUE DECOMPOSITION

# Nonsquare example:

underspecified

# set of equations

>>> a = array([[2, 1, -4],

[2, 3, 1]])

>>> u, s, vt = linalg.svd(a)

>>> u

array([[-0.938, -0.347],

[-0.347, 0.938]])

>>> s

array([ 4.702, 3.59 ])

>>> vt

array([[-0.546, -0.421, 0.724],

[ 0.329, 0.687, 0.648],

[ 0.77 , -0.592, 0.237]])

Factor A = U S V*, where U and V are unitary, and S is diagonal. This generalizes diagonalization.

# Non-square mat: Pseudo-inverse

# with least-square method

>>> b = array([[2, 1, -4],

[2, 3, 1]])

>>> linalg.pinv(b)

array([[ 0.07719298,

0.12631579],

[ 0.01754386,

0.21052632],

[-0.20701754,

0.11578947]])

>>> linalg.det(a)

-2.0

#If LU decomposition available:

>>> lu, _ = linalg.lu_factor(a)

>>> np.prod(lu.diagonal())

-2.0

Inverse, and determinants

333

# Square matrix

>>> from scipy import linalg

>>> a = array([[2, 1],

... [2, 0]])

>>> a_inv = linalg.inv(a)

>>> a_inv

array([[ 0. , 0.5],

[ 1. , -1. ]])

# Check

>>> dot(a, a_inv)

array([[ 1., 0.],

[ 0., 1.]])

# Alternatively (slower)

>>> linalg.solve(a, indentity(2))

array([[ 0. , 0.5],

[ 1. , -1. ]])

MATRIX INVERSION

MATRIX DETERMINANT

334

Matrix Objects

SOLVE

>>> from numpy import mat

>>> a = mat('[1,3,5;2,5,1;2,3,6]')

>>> a

matrix([[1, 3, 5],

[2, 5, 1],

[2, 3, 6]])

STRING CONSTRUCTION

INVERTED ATTRIBUTE

TRANSPOSE ATTRIBUTE

DIAGONAL

>>> a.T

matrix([[1, 2, 2],

[3, 5, 3],

[5, 1, 6]])

>>> a.I

matrix([[-1.1739, 0.1304, 0.956],

[ 0.4347, 0.1739, -0.391],

[ 0.1739, -0.130, 0.0434]

])

>>> a.diagonal()

matrix([[1, 5, 6]])

>>> a.diagonal(-1)

matrix([[2, 3]])

>>> b = mat('10;8;3')

>>> a.I*b

matrix([[-7.82608696],

[ 4.56521739],

[ 0.82608696]])

>>> from scipy import linalg

>>> linalg.solve(a,b)

matrix([[-7.82608696],

[ 4.56521739],

[ 0.82608696]])

335

Interpolation

scipy.interpolate — General purpose Interpolation

•1D Interpolating Class

• Constructs callable function from data points and

desired spline interpolation order.

• Function takes vector of inputs and returns

interpolated value using the spline.

• Radial basis functions

• Simple but effective N-dimensional interpolation

• Works with scattered data

336

1D Spline Interpolation

interp1d(x, y, kind='linear', axis=-1, copy=True,

bounds_error=True, fill_value=numpy.nan)

Returns a function that uses interpolation to find the

value of new points.

•x – 1d array of increasing real values which

cannot contain duplicates

•y – Nd array of real values whose length along the

interpolation axis must be len(x)

•kind – kind of interpolation (e.g. 'linear',

'nearest', 'quadratic', 'cubic'). Can also be an

integer n>1 which returns interpolating spline

(with minimum sum-of-squares discontinuity in nth

derivative).

•axis – axis of y along which to interpolate

•copy – make internal copies of x and y

•bounds_error – raise error for out-of-bounds

•fill_value – if bounds_error is False, then use

this value to fill in out-of-bounds.

>>> from scipy.interpolate import interp1d

337

1D Spline Interpolation

# demo/interpolate/spline.py

from scipy.interpolate import interp1d

from pylab import plot, axis, legend

from numpy import linspace

# sample values

x = linspace(0,2*pi,6)

y = sin(x)

# Create a spline class for interpolation.

# kind='nearest' -> zeroth older hold.

# kind='linear' -> linear interpolation

# kind=n -> use an nth order spline

spline_fit = interp1d(x,y,kind=5)

xx = linspace(0,2*pi, 50)

yy = spline_fit(xx)

# display the results.

plot(xx, sin(xx), 'r-', x, y, 'ro', xx, yy, 'b--', linewidth=2)

axis('tight')

legend(['actual sin', 'original samples', 'interpolated curve'])

340

func = Rbf(x, y, …, function='multiquadric',

epsilon=None, smooth=0, norm=euclidean_norm)

Returns a function that uses interpolation to find the

value of new points.

•x – array of real values

•y – array of real values the same shape as x

•… - additional array arguments can be provided for N-d

interpolation

•function – basis function used to interpolate; one of

'multiquadric', 'inverse multiquadric', 'gaussian',

'linear', 'cubic', 'quintic', 'thin-plate'

•epsilon – adjustable constant for ’gaussian' or

'multiquadrics' functions. Defaults to approximate

average distance between nodes.

•smooth – Greater than zero increases smoothness of

approximation. Default of 0 is interpolation.

•norm – function (vectorized) that returns the distance

between two points

>>> from scipy.interpolate import Rbf

Radial Basis Functions

341

Radial Basis Functions

The function is approximated by a sum of radial functions with certain weights

nj:

Typical radial functions:

342

1D RBF Interpolation

# demo/interpolate/spline.py

from scipy.interpolate import Rbf

from pylab import plot, axis, legend

from numpy import linspace

# sample values

x = linspace(0,2*pi,6)

y = sin(x)

# Create a new function

fit = Rbf(x,y, function='gaussian')

xx = linspace(0,2*pi, 50)

yy = fit(xx)

# display the results.

plot(xx, sin(xx), 'r--', x, y, 'ro', xx, yy, 'b-', linewidth=2)

axis('tight')

legend(['actual sin', 'original samples', 'interpolated curve'])

343

2D RBF Interpolation

EXAMPLE

>>> from scipy.interpolate import \

... Rbf

>>> from numpy import hypot, mgrid

>>> from scipy.special import j0

>>> x, y = mgrid[-5:6,-5:6]

>>> z = j0(hypot(x,y))

>>> newfunc = Rbf(x, y, z)

>>> xx, yy = mgrid[-5:5:100j,

-5:5:100j]

# xx and yy are both 2-d

# result is evaluated

# element-by-element

>>> zz = newfunc(xx, yy)

>>> from mayavi import mlab

>>> mlab.surf(x, y, z*5)

>>> mlab.figure()

>>> mlab.surf(xx, yy, zz*5)

>>> mlab.points3d(x,y,z*5,

scale_factor=0.5)

344

Optimization

Univariate Function Minimization

minimize_scalar - minimize a scalar-valued function of a single variable:

available methods include brent, bounded, golden

Multivariate Function Minimization (constrained and unconstrained)

minimize – minimize scalar-valued function of one or more variables: available

methods include BFGS, Nelder-Mead simplex, Newton conjugate gradient, COBYLA,

SLSQP, anneal, brute

Least Squares Fitting

leastsq - wrapper to the Fortran leastsq function from MINPACK

Curve Fitting

curve_fit – powerful, general, tool to fit functional parameters.

Root Finding

brentq, brenth, ridder, bisect, newton – find root of a scalar function

fixed_point – find the fixed point of a function

root – multidimensional root-finding: methods include hybr and lm for small problems

and krylov, brodyen2, and anderson for large problems.

scipy.optimize — Minimization and Root Finding

345

Optimization: 1-D Minimization

# minimize 1st order Bessel

# function between 4 and 7

>>> from scipy.special import j1

>>> from scipy.optimize import \

... minimize_scalar

>>> x = np.linspace(2, 7, 200)

>>> j1x = j1(x)

>>> plot(x, j1x)

>>> result = minimize_scalar(j1,

... method="bounded",

... bounds=[4, 7])

>>> j1_min = j1(result.x)

>>> plot(result.x, j1_min,'ro')

EXAMPLE: MINIMIZE BESSEL FUNCTION

Result Object

346

Output of minimize, minimize_scalar, and root is a Result object

with attributes:

x : solution to the optimization

success : True/False – did the optimization succeed?

status : integer termination status

message : termination status message

fun, jac, hess : values of the function, Jacobian,

Hessian, (if available) at the

optimized value

nfev, njev, nhev : number of evaluations of the

function, Jacobian, or Hessian (if available)

... : other attributes are added by

specific methods

Note: older legacy functions (fmin, fmin_bgfs, fminbound, fsolve, etc.) do not have a

unified API and are replaced by the above.

347

Optimization: Using Derivatives

Rosenbrock function

>>> from scipy.optimize import rosen_der

>>> result = minimize(rosen, x0,

... jac=rosen_der)

>>> print result.x

[ 1. 1. 1. 1. 1.]

>>> print result.nfev, result.njev

66 66

minimize: USING DERIVATIVE

minimize: WITHOUT DERIVATIVE

>>> from scipy.optimize import rosen

>>> x0 = [1.3, 1.6, -0.5, -1.8, 0.8]

>>> result = minimize(rosen, x0)

>>> print result.x

[ 1. 1. 1. 1. 1.]

>>> print result.nfev

462

348

Optimization: Solving Nonlinear Equations

>>> def equations(x,a,b,c):

... x0, x1, x2 = x

... eqs = \

... [3 * x0 - cos(x1*x2) + a,

... x0**2 - 81*(x1+0.1)**2 + sin(x2) + b,

... exp(-x0*x1) + 20*x2 + c]

... return eqs

>>> from scipy.optimize import root

# coefficients

>>> a = -0.5 ; b = 1.06

>>> c = (10 * pi - 3.0) / 3

# Optimization start location.

>>> initial_guess = [0.1, 0.1, -0.1]

# Solve the system of non-linear equations.

>>> result = root(equations, initial_guess, args=(a, b, c))

>>> print “root:”, result.x

root: [ 0.5 0. -0.52]

>>> print “solution at root:”, result.fun

solution at root: [ 0. -0. 0.]

SYSTEM OF EQUATIONS ROOT

349

Optimization: Data Fitting

>>> from scipy.optimize import curve_fit

>>> from scipy.stats import norm

# Define the function to fit.

>>> def function(x, a, b, f, phi):

... result = a * exp(-b * sin(f * x + phi))

... return result

# Create a noisy data set.

>>> actual_params = [3, 2, 1, pi/4]

>>> x = linspace(0,2*pi,25)

>>> exact = function(x, *actual_params)

>>> noisy = exact + 0.3*norm.rvs(size=len(x))

# Use curve_fit to estimate the function parameters from the noisy data.

>>> initial_guess = [1,1,1,1]

>>> estimated_params, err_est = curve_fit(function, x, noisy, p0=initial_guess)

>>> estimated_params

array([3.1705, 1.9501, 1.0206, 0.7034])

# err_est is an estimate of the covariance matrix of the estimates

# (i.e. how good of a fit is it)

>>> err_est.diagonal()

array([ 0.0572, 0.006362, 0.0005834, 0.008979])

NONLINEAR LEAST SQUARES CURVE FITTING

350

Fitting Polynomials (NumPy)

>>> from numpy import polyfit, poly1d

>>> from scipy.stats import norm

# Create clean data.

>>> x = linspace(0, 4.0, 100)

>>> y = 1.5 * exp(-0.2 * x) + 0.3

# Add a bit of noise.

>>> noise = 0.1 * norm.rvs(size=100)

>>> noisy_y = y + noise

# Fit noisy data with a linear model.

>>> linear_coef = polyfit(x, noisy_y, 1)

>>> linear_poly = poly1d(linear_coef)

>>> linear_y = linear_poly(x)

# Fit noisy data with a quadratic model.

>>> quad_coef = polyfit(x, noisy_y, 2)

>>> quad_poly = poly1d(quad_coef)

>>> quad_y = quad_poly(x)

POLYFIT(X, Y, DEGREE)

351

Data Analysis with

Pandas

352

Pandas (version 0.12.0, Jul 2013) is a library to make analysis of structured datasets easy.

Author: Wes McKinney, http://pandas.pydata.org/

License: BSD

• New array-based data-structures with axis labeling + nice representation in ipython,

• Date/time management built-in, including timezones,

• Data alignment, data merging, data aggregation (group by), missing data management,

• Statistical tools (describe, moving stats, covariance, correlation, panel regression),

• Easy visualization (line plot, bar chart, boxplot, scatter plot matrix, ...) with Matplotlib.

•Built on top of: NumPy, pytables, matplotlib.

•Replaces scikits.timeseries. Is now an optional dependency of statsmodels.

•Offers similar features to larry, datarray (labeled arrays).

RELATED PROJECTS

GOALS

New Data Structures

353

PANDAs = PANel DAta = multi-dimensional data in stats & econometrics.

Data-structures:

•A Series is a 1D data-structure that derives from np.ndarray.

•A DataFrame is a 2D data-structure that can be viewed as a dictionary of series. It is NOT

a subclass of an numpy array.

•A Panel is a 3D data-structure that can be viewed as a dict of DataFrames.

index

columns

items

0

1

2

4

2

5

6

1

3

John

Jill

Chris

Dave

Ellen

Frank

1981

1975

NaN

1965

1973

1976

Doe

Ford

Jones

Smith

Frank

Hart

1

2

3

4

5

6

M

F

M

F

M

4.18

5.26

3.91

1.23

5.52

4.71

-1.18

1.26

0.91

NaN

0.52

-0.71

Rk, Fst N, Lst N, DoB Gend Grade offset

UT

TAMU

354

Creating (Time)Series

FROM LIST AND DICT

# Data and corresponding indices can

# be stored in lists.

>>> index = ['a', 'b', 'c', 'd']

>>> Series(range(4), index=index,

name='first series')

a 0

b 1

c 2

d 3

Name: first series

# data + indices in a dict

>>> d = {'a':0,'b':1,'c':2,'d':1}

>>> s = Series(d, name='first series')

>>> print s.name

'first series'

>>> s.name = ''

>>> print s.index

Index([a, b, c, d], dtype=object)

>>> print s.values, type(s.values)

array([0, 1, 2, 1], dtype=int64)

numpy.ndarray

>>> s.dtype

dtype('int64')

>>> from numpy.random import randn

>>> Series(randn(4), index=index)

a -1.062984

b -0.961625

c -0.720323

d 1.100681

FROM A NUMPY ARRAY

ACCESS ELEMENTS

# Access elements like an array

>>> s[2]

2

>>> s[:2]

a 0

b 1

# Access elements like a dictionary

>>> print s['c'], 'c' in s

2 True

# Slicing works on indices!

>>> s['a':'c']

a 0

b 1

c 2

355

Creating DataFrames

FROM A DICT OF SERIES

# DF from a dict of series: keys are

# column names.

>>> s2=Series(randn(3), name='B',

index=index[1:])

>>> d = {'A':s, 'B':s2}

>>> df = DataFrame(d)

A B

a 0 NaN

b 1 -0.961625

c 2 -0.720323

d 1 1.100681

>>> df.index, df.columns

Index([a, b, c, d], dtype=object)

Index([A, B], dtype=object)

>>> df.shape, df.dtypes

(4,2)

A int64

B float64

>>> df.values

array([[ 0. , nan],

[ 1. , -0.96162549],

[ 2. , -0.72032263],

[ 1. , 1.100680533]])

>>> DataFrame(randn(4,4), index=index,

columns=['A','B','C','D'])

A B C D

a 0.28164 -0.36826 0.04011 1.25030

b -0.71049 -1.23956 -0.08504 -0.08336

c -1.29446 0.70709 1.39642 0.49035

d 0.74632 -0.03512 -0.69237 0.81488

# List(series) interpreted as an array

>>> df2 = concat([s,s2], axis=1,

keys=['A', 'B'])

FROM A NUMPY ARRAY

ACCESS OR ADD ELEMENTS

>>> col1 = df['A']

>>> col2 = df.B

>>> df['flag'] = df['B'] > 0

>>> df.ix['c']

A 2

B -0.7203226

flag False

Name: c

>>> df.ix['c','B']

-0.7203226322119064

356

Creating Panels

FROM A DICT OF DATAFRAMES

# Panel from a dict of dataframes:

# keys used for third 'items' axis.

>>> df2=df.ix[::2,:]

>>> data = {'item1': df,'item2': df2}

>>> wp = Panel(data)

>>> wp.items, wp.major_axis,

wp.minor_axis

Index([item1, item2], dtype=object)

Index([a, b, c, d], dtype=object)

Index([A, B, flag], dtype=object)

>>> wp.shape

(2, 4, 3)

>>> wp.values

array([[[0, nan, False],

[1, -0.96162549, False],

[2, -0.72032263, False],

[1, 1.100680533, True]],

[[0.0, nan, False],

[nan, nan, nan],

[2.0, -0.72032263, False],

[nan, nan, nan]]],

dtype=object)

>>> items = ['foo','bar','baz','biz']

>>> Panel(randn(4,4,4), items=items)

Dimensions: 4 (items) x 4 (major) x 4

(minor)

Items: foo to biz

Major axis: 0 to 3

Minor axis: 0 to 3

FROM A NUMPY ARRAY

ACCESS OR ADD ELEMENTS

>>> df1 = wp['item1']

A B flag

a 0 NaN False

b 1 -0.9616255 False

c 2 -0.7203226 False

d 1 1.100681 True

>>> df2 = wp.item2

>>> del wp['item1']

>>> wp.ix['item2','c','A']

2.0

357

Pandas IO

Format Method, Function, Class

txt, csv read_table, read_csv

clipboard read_clipboard

pickle read_pickle

HDF5 read_hdf, HDFStore

Excel read_excel, ExcelFile

R pandas.rpy.common.load_data

READING FUNCTIONS EXAMPLES

# Pickle is supported

>>> df.to_pickle('foo.pickle')

>>> read_pickle('foo.pickle')

# To load a table from an ascii file

df = read_table('foo.txt', sep=',',

header=1, skiprows=2, index_col=0,

names=['Paris', 'NY', 'Tokyo',

'London'], parse_dates=True,

na_values = [-9999])

# Long term storage: HDF5

>>> st = HDFStore('foo.h5')

>>> st['data1'] = df

>>> st['ser1'] = s

>>> s2 = st['ser1']

>>> st.close()

Format Method, Function, Class

txt, csv to_string, to_csv

html to_html

pickle to_pickle

HDF5 to_hdf, HDFStore

Excel to_excel, ExcelWriter

WRITING FUNCTIONS

Indexes and data alignment

358

ALIGNMENT FROM OPERATIONS

# Operations automatically align on

# the index (different from ndarray)

>>> s[1:] + s[:-2]

a NaN

b 1

c 2

d NaN

e NaN

RE-INDEXING

# reindex shuffles the existing index

>>> s.reindex(['c', 'b', 'a', 'e'])

c 2

b 1

a 0

e NaN

# The index attr can be overwritten

>>> s.index = ['p', 'q', 'r', 's']

# ‘rename’ modifies index label

>>> s.rename(lambda a: a.upper())

P 0

Q 1

R 2

S 3

# Sort by values

>>> s.values[:] = [5, 3, 0, 2]

>>> s.order()

r 0

s 2

q 3

p 5

# Explicit alignment of BOTH series

>>> s, s2 = s.align(s2)

# Sort a DF by a (list of) column(s)

>>> df.sort('A')

A B flags

a 0 NaN False

b 1 -0.961625 False

d 1 1.100681 True

c 2 -0.720323 False

Python For Athena Manual

python_for_athena_manual

Navigation menu

Versions of this User Manual:

Views

Navigation