{{ variablename }} represents a Django variable.
(% block title %} represents blocks. Contents
of a block are evaluated by Django and are
displayed. These blocks can be replaced by the
child templates.
Q The magazine Issue Tracker in action – list of issues
Now we need to create the issue_list.html
template. This template is responsible for
displaying all the issues available in the system.
ludIssueTracker/ludIssueTracker/templates/
ludissues/issue_list.html
{% extends ‘base.html’ %}
{% block title %}View Issues - {%
endblock %}
{% block content %}
{% endblock %}
Here we are inheriting the base.html file that we
created earlier. {% for issue in object_list %}
runs on the object sent by the urls.py. Then we
are iterating on the object_list for issue.id and
issue.name.
Now we will create issue_detail.html. This
template is responsible for displaying the detail
view of a case.
ludIssueTracker/ludIssueTracker/templates/
ludissues/issue_detail.html
{% extends ‘base.html’ %}
{% block title %}Issue #{{ object.id
}} - {% endblock %}
{% block content %}
Issue #{{ object.id }} {{
object.status }}
Information
Opened {{
object.opened_on }} ago
Last modified
{{ object.modified_on }} ago
Owner
p>
{{ object.
owner }}
Summary
{{ object.
summary }}
{% endblock %}
And that’s everything! The issue tracker app is
now complete and ready to use. You can now
point your browser at localhost:8000 to start
using the app.
The Python Book 65
Python essentials
0
5
n
o
h
s
t
y
it p
P
Python is a programming language that lets you work more quickly and
integrate your systems more effectively. Today, Python is one of the most
popular programming languages in the open source space. Look around
and you will find it running everywhere, from various configuration tools
to XML parsing. Here is the collection of 50 gems to make your Python
experience worthwhile…
Basics
1. Running Python scripts
command at the command prompt (>>>), one by
one, and the answer is immediate.
Python interpreter can be started by issuing
the command:
2. Running Python
programs from
Python interpreter
$ python
kunal@ubuntu:~$ python
Python 2.6.2 (release26-maint, Apr
19 2009, 01:56:41)
[GCC 4.3.3] on linux2
Type “help”, “copyright”, “credits”
or “license” for more information.
>>>
The Python interactive interpreter makes it
easy to try your first steps in programming and
using all Python commands. You just issue each
In this article, all the code starting at the
>>> symbol is meant to be given at the
Python prompt.
On most of the UNIX systems, you can run
Python scripts from the command line.
$ python mypyprog.py
66 The Python Book
It is also important to remember that Python
takes tabs very seriously – so if you are
receiving any error that mentions tabs, correct
the tab spacing.
3. Dynamic typing
In Java, C++, and other statically typed
languages, you must specify the data type of
the function return value and each function
argument. On the other hand, Python is
a dynamically typed language. In Python
you never have to explicitly specify the data
type of anything. Based on what value you
assign, Python will keep track of the data
type internally.
Python essentials
4. Python statements
x,y = my_function.minmax(25, 6.3)
Python uses carriage returns to separate
statements, and a colon and indentation to
separate code blocks. Most of the compiled
programming languages, such as C and C++, use
semicolons to separate statements and curly
brackets to separate code blocks.
Example:
The built-in function ‘dir()’ can be used to find
out which names a module defines. It returns a
sorted list of strings.
5. == and = operators
Python uses ‘==’ for comparison and ‘=’ for
assignment. Python does not support inline
assignment, so there’s no chance of accidentally
assigning the value when you actually want to
compare it.
6. Concatenating strings
You can use ‘+’ to concatenate strings.
>>> print ‘kun’+’al’
kunal
7. The __init__ method
The __init__ method is run as soon as
an object of a class is instantiated. The
method is useful to do any initialization
you want to do with your object. The
__init__ method is analogous to a constructor in
C++, C# or Java.
Example:
class Person:
def __init__(self, name):
self.name = name
def sayHi(self):
print ‘Hello, my name is’, self.name
p = Person(‘Kunal’)
p.sayHi()
9. Module defined names
>>> import time
>>> dir(time)
[‘__doc__’, ‘__file__’, ‘__name__’,
‘__package__’, ‘accept2dyear’,
‘altzone’, ‘asctime’, ‘clock’,
‘ctime’, ‘daylight’, ‘gmtime’,
‘localtime’, ‘mktime’, ‘sleep’,
‘strftime’, ‘strptime’, ‘struct_
time’, ‘time’, ‘timezone’, ‘tzname’,
‘tzset’]
10. Module internal
documentation
You can see the internal documentation (if
available) of a module name by looking at
.__doc__.
Example:
>>> import time
>>> print time.clock.__doc__
clock() -> floating point number
You can convert a list to string in either of the
following ways.
1st method:
>>> mylist = [‘spam’, ‘ham’, ‘eggs’]
>>> print ‘, ‘.join(mylist)
spam, ham, eggs
2nd method:
>>> print ‘\n’.join(mylist)
spam
ham
eggs
15. Tab completion
in Python interpreter
You can achieve auto completion inside Python
interpreter by adding these lines to your .pythonrc
file (or your file for Python to read on startup):
import rlcompleter, readline
readline.parse_and_bind(‘tab: complete’)
This will make Python complete partially typed
function, method and variable names when you
press the Tab key.
16. Python
documentation tool
11. Passing arguments
to a Python script
$ pydoc -g
Output:
[~/src/python $:] python initmethod.py
Hello, my name is Kunal
import sys
print sys.argv
8. Modules
12. Loading modules or
commands at startup
# file my_function.py
def minmax(a,b):
if a <= b:
min, max = a, b
else:
min, max = b, a
return min, max
Module Usage
import my_function
14. Converting a list
to a string for display
This example returns the CPU time or real time
since the start of the process or since the first
call to clock(). This has as much precision as the
system records.
Python lets you access whatever you have passed
to a script while calling it. The ‘command line’
content is stored in the sys.argv list.
To keep your programs manageable as they
grow in size, you may want to break them up into
several files. Python allows you to put multiple
function definitions into a file and use them as a
module that can be imported into other scripts and
programs. These files must have a .py extension.
Example:
dateobj = DateTime(string)
You can load predefined modules or
commands at the startup of any Python
script by using the environment variable
$PYTHONSTARTUP. You can set environment
variable $PYTHONSTARTUP to a file which
contains the instructions load necessary
modules or commands .
13. Converting a string
to date object
You can use the function ‘DateTime’ to convert a
string to a date object.
Example:
from DateTime import DateTime
You can pop up a graphical interface for searching
the Python documentation using the command:
You will need python-tk package for this to work.
17. Python
documentation server
You can start an HTTP server on the given port on
the local machine. This will give you a nice-looking
access to all Python documentation, including
third-party module documentation.
$ pydoc -p
18. Python development
software
There are plenty of tools to help with Python
development. Here are a few important ones:
IDLE: The Python built-in IDE, with
autocompletion, function signature popup help,
and file editing.
IPython: Another enhanced Python shell with
tab-completion and other features.
Eric3: A GUI Python IDE with autocompletion,
class browser, built-in shell and debugger.
WingIDE: Commercial Python IDE with
free licence available to open-source
developers everywhere.
The Python Book 67
1ZUIPOFTTFOUJBMT
Built-in
modules
19. Executing functions
at the time of Python
interpreter termination
You can use ‘atexit’ module to execute functions
at the time of Python interpreter termination.
Example:
def sum():
print(4+5)
def message():
print(ŎExecuting Nowŏ)
import atexit
atexit.register(sum)
atexit.register(message)
Output:
Executing Now
9
20. Converting from integer
to binary, hexadecimal
and octal
Python provides easy-to-use functions – bin(),
hex() and oct() – to convert from integer to binary,
decimal and octal format respectively.
Example:
>>> bin(24)
Ă0b11000ā
>>> hex(24)
Ă0x18ā
>>> oct(24)
Ă030ā
21. Converting any
charset to UTF-8
You can use the following function to convert any
charset to UTF-8.
data.decode(Ŏinput_charset_hereŏ).
encode(Ăutf-8ā)
22. Removing
duplicates from lists
If you want to remove duplicates from a list,
just put every element into a dict as a key (for
example with ‘none’ as value) and then check
dict.keys().
from operator import setitem
def distinct(l):
d = {}
map(setitem, (d,)*len(l), l, [])
return d.keys()
685IF1ZUIPO#PPL
23. Do-while loops
Since Python has no do-while or do-until loop
constructs (yet), you can use the following
method to achieve similar results:
sum: This function returns the sum of all
elements in the list. It accepts an optional
second argument: the value to start with when
summing (defaults to 0).
while True:
do_something()
if condition():
break
28. Representing
fractional numbers
24. Detecting system
platform
Fraction([numerator
[,denominator]])
To execute platform-specific functions, it is very
useful to detect the platform on which the Python
interpreter is running. You can use ‘sys.platform’
to find out the current platform.
Example:
On Ubuntu Linux
>>> import sys
>>> sys.platform
Ălinux2ā
On Mac OS X Snow Leopard
>>> import sys
>>> sys.platform
Ădarwinā
25. Disabling and enabling
garbage collection
Sometimes you may want to enable or disable
the garbage collector at runtime. You can
use the ‘gc’ module to enable or disable the
garbage collection.
Example:
>>> import gc
>>> gc.enable
>>> gc.disable
26. Using C-based modules
for better performance
Many Python modules ship with counterpart
C modules. Using these C modules will
give a significant performance boost in
complex applications.
Example:
cPickle instead of Pickle, cStringIO
instead of StringIO .
27. Calculating maximum,
minimum and sum
out of any list or iterable
You can use the following built-in functions.
max: Returns the largest element in the list.
min: Returns the smallest element in the list.
Fraction instance can be created using the
following constructor:
29. Performing
math operations
The ‘math’ module provides a plethora of
mathematical functions. These work on integer
and float numbers, except complex numbers.
For complex numbers, a separate module is
used, called ‘cmath’.
For example:
math.acos(x): Return arc cosine of
x.
math.cos(x): Returns cosine of x.
math.factorial(x) : Returns x
factorial.
30. Working with arrays
The ‘array’ module provides an efficient way to
use arrays in your programs. The ‘array’ module
defines the following type:
array(typecode [, initializer])
Once you have created an array object, say
myarray, you can apply a bunch of methods to it.
Here are a few important ones:
myarray.count(x): Returns the
number of occurrences of x in a.
myarray.extend(x): Appends x at the
end of the array.
myarray.reverse(): Reverse the
order of the array.
31. Sorting items
The ‘bisect’ module makes it very easy to keep
lists in any possible order. You can use the
following functions to order lists.
bisect.insort(list, item [, low [,
high]])
Inserts item into list in sorted order. If item is
already in the list, the new entry is inserted to
the right of any existing entries.
bisect.insort_left(list, item [, low
[, high]])
Inserts item into list in sorted order. If item is
already in the list, the new entry is inserted to
the left of any existing entries.
Python essentials
32. Using regular
expression-based
search
The ‘re’ module makes it very easy to use regxpbased searches. You can use the function
‘re.search()’ with a regexp-based expression.
Check out the example below.
Example:
>>> import re
>>> s = “Kunal is a bad boy”
>>> if re.search(“K”, s): print
“Match!” # char literal
...
Match!
>>> if re.search(“[@A-Z]”, s): print
“Match!” # char class
... # match either at-sign or capital
letter
Match!
>>> if re.search(“\d”, s): print
“Match!” # digits class
...
33. Working with bzip2 (.bz2)
compression format
You can use the module ‘bz2’ to read and write
data using the bzip2 compression algorithm.
bz2.compress() : For bz2
compression
bz2.decompress() : For bz2
decompression
Example:
# File: bz2-example.py
import bz2
MESSAGE = “Kunal is a bad boy”
compressed_message = bz2.
compress(MESSAGE)
decompressed_message = bz2.
decompress(compressed_message)
print “original:”, repr(MESSAGE)
print “compressed message:”,
repr(compressed_message)
print “decompressed message:”,
repr(decompressed_message)
Output:
[~/src/python $:] python bz2example.py
original: ‘Kunal is a bad boy’
compressed message: ‘BZh91AY&SY\xc4\
x0fG\x98\x00\x00\x02\x15\x80@\x00\
x00\x084%\x8a \x00”\x00\x0c\x84\r\
x03C\xa2\xb0\xd6s\xa5\xb3\x19\x00\
xf8\xbb\x92)\xc2\x84\x86 z<\xc0’
decompressed message: ‘Kunal is a
bad boy’
34. Using SQLite database
with Python
SQLite is fast becoming a very popular embedded
database because of its zero configuration
needed, and superior levels of performance. You
can use the module ‘sqlite3’ in order to work with
SQLite databases.
Example:
>>> import sqlite3
>>> connection = sqlite.connect(‘test.
db’)
>>> curs = connection.cursor()
>>> curs.execute(‘’’create table item
... (id integer primary key, itemno
text unique,
... scancode text, descr text, price
real)’’’)
35. Working with zip files
You can use the module ‘zipfile’ to work with
zip files.
zipfile.ZipFile(filename [, mode [,
compression [,allowZip64]]])
Open a zip file, where the file can be either a path
to a file (a string) or a file-like object.
zipfile.close()¶
Close the archive file. You must call ‘close()’ before
exiting your program or essential records will not
be written.
zipfile.extract(member[, path[,
pwd]])
Extract a member from the archive to the current
working directory; ‘member’ must be its full
name (or a zipinfo object). Its file information
is extracted as accurately as possible. ‘path’
specifies a different directory to extract to.
‘member’ can be a filename or a zipinfo object.
‘pwd’ is the password used for encrypted files.
36. Using UNIX-style
wildcards to search
for filenames
You can use the module ‘glob’ to find all the
pathnames matching a pattern according to the
rules used by the UNIX shell. *, ?, and character
ranges expressed with [ ] will be matched.
Example:
>>> import glob
>>> glob.glob(‘./[0-9].*’)
[‘./1.gif’, ‘./2.txt’]
>>> glob.glob(‘*.gif’)
[‘1.gif’, ‘card.gif’]
>>> glob.glob(‘?.gif’)
[‘1.gif’]
37. Performing basic file
operations (copy, delete
and rename)
You can use the module ‘shutil’ to perform basic
file operation at a high level. This module works
with your regular files and so will not work with
special files like named pipes, block devices, and
so on.
shutil.copy(src,dst)
Copies the file src to the file or directory dst.
shutil.copymode(src,dst)
Copies the file permissions from src to dst.
shutil.move(src,dst)
Moves a file or directory to dst.
shutil.copytree(src, dst, symlinks
[,ignore]])
Recursively copy an entire directory at src.
shutil.rmtree(path [, ignore_errors
[, onerror]])
Deletes an entire directory.
38. Executing UNIX
commands from Python
You can use module commands to execute UNIX
commands. This is not available in Python 3 –
instead you need to use the module ‘subprocess’.
Example:
>>> import commands
>>> commands.getoutput(‘ls’)
‘bz2-example.py\ntest.py’
39. Reading environment
variables
You can use the module ‘os’ to gather operatingsystem-specific information:
Example:
>>> import os
>>> os.path >>> os.environ {‘LANG’: ‘en_
IN’, ‘TERM’: ‘xterm-color’, ‘SHELL’:
‘/bin/bash’, ‘LESSCLOSE’:
‘/usr/bin/lesspipe %s %s’,
‘XDG_SESSION_COOKIE’:
‘925c4644597c791c704656354adf56d61257673132.347986-1177792325’,
‘SHLVL’: ‘1’, ‘SSH_TTY’: ‘/dev/
pts/2’, ‘PWD’: ‘/home/kunal’,
‘LESSOPEN’: ‘| /usr/bin
lesspipe
......}
>>> os.name
‘posix’
>>> os.linesep
‘\n’
The Python Book 69
Python essentials
40. Sending email
You can use the module ‘smtplib’ to send email
using an SMTP (Simple Mail Transfer Protocol)
client interface.
smtplib.SMTP([host [, port]])
Example (send an email using Google Mail
SMTP server):
import smtplib
# Use your own to and from email
address
fromaddr = ‘kunaldeo@gmail.com’
toaddrs = ‘toemail@gmail.com’
msg = ‘I am a Python geek. Here is
the proof.!’
# Credentials
# Use your own Google Mail
credentials while running the
program
username = ‘kunaldeo@gmail.com’
password = ‘xxxxxxxx’
# The actual mail send
server = smtplib.SMTP(‘smtp.gmail.
com:587’)
# Google Mail uses secure
connection for SMTP connections
server.starttls()
server.login(username,password)
server.sendmail(fromaddr, toaddrs,
msg)
server.quit()
41. Accessing
FTP server
‘ftplib’ is a fully fledged client FTP module for
Python. To establish an FTP connection, you
can use the following function:
ftplib.FTP([host [, user [, passwd
[, acct [, timeout]]]]])
Example:
host = “ftp.redhat.com”
username = “anonymous”
password = “kunaldeo@gmail.com”
import ftplib
import urllib2
ftp_serv = ftplib.
FTP(host,username,password)
# Download the file
u = urllib2.urlopen (“ftp://
ftp.redhat.com/pub/redhat/linux/
README”)
# Print the file contents
print (u.read())
Output:
[~/src/python $:] python
ftpclient.py
70 The Python Book
Older versions of Red Hat Linux have been moved
to the following location: ftp://archive.download.
redhat.com/pub/redhat/linux/
42. Launching a webpage
with the default web
browser
The ‘webbrowser’ module provides a convenient
way to launch webpages using the default
web browser.
Example (launch google.co.uk with system’s
default web browser):
>>> import webbrowser
>>> webbrowser.open(‘http://google.
co.uk’)
True
43. Creating secure hashes
The ‘hashlib’ module supports a plethora of
secure hash algorithms including SHA1, SHA224,
SHA256, SHA384, SHA512 and MD5.
Example (create hex digest of the given text):
>>> import hashlib
# sha1 Digest
>>> hashlib.sha1(“MI6 Classified
Information 007”).hexdigest()
‘e224b1543f229cc0cb935a1eb9593
18ba1b20c85’
# sha224 Digest
>>> hashlib.sha224(“MI6 Classified
Information 007”).hexdigest()
‘3d01e2f741000b0224084482f905e9b7b97
7a59b480990ea8355e2c0’
# sha256 Digest
>>> hashlib.sha256(“MI6 Classified
Information 007”).hexdigest()
‘2fdde5733f5d47b672fcb39725991c89
b2550707cbf4c6403e fdb33b1c19825e’
# sha384 Digest
>>> hashlib.sha384(“MI6 Classified
Information 007”).hexdigest()
‘5c4914160f03dfbd19e14d3ec1e74bd8b99
dc192edc138aaf7682800982488daaf540be
9e0e50fc3d3a65c8b6353572d’
# sha512 Digest
>>> hashlib.sha512(“MI6 Classified
Information 007”).hexdigest()
‘a704ac3dbef6e8234578482a31d5ad29d25
2c822d1f4973f49b850222edcc0a29bb89077
8aea807a0a48ee4ff8bb18566140667fbaf7
3a1dc1ff192febc713d2’
# MD5 Digest
>>> hashlib.md5(“MI6 Classified
Information 007”).hexdigest()
‘8e2f1c52ac146f1a999a670c826f7126’
44. Seeding random
numbers
You can use the module ‘random’ to generate
a wide variety of random numbers. The most
used one is ‘random.seed([x])’. It initialises
the basic random number generator. If x is
omitted or None, current system time is used;
current system time is also used to initialise the
generator when the module is first imported.
45. Working with CSV
(comma-separated
values) files
CSV files are very popular for data exchange over
the web. Using the module ‘csv’, you can read and
write CSV files.
Example:
import csv
# write stocks data as commaseparated values
writer = csv.writer(open(‘stocks.
csv’, ‘wb’, buffering=0))
writer.writerows([
(‘GOOG’, ‘Google, Inc.’, 505.24, 0.47,
0.09),
(‘YHOO’, ‘Yahoo! Inc.’, 27.38, 0.33,
1.22),
(‘CNET’, ‘CNET Networks, Inc.’, 8.62,
-0.13, -1.49)
])
# read stocks data, print status
messages
stocks = csv.reader(open(‘stocks.
csv’, ‘rb’))
status_labels = {-1: ‘down’, 0:
‘unchanged’, 1: ‘up’}
for ticker, name, price, change, pct
in stocks:
status = status_
labels[cmp(float(change), 0.0)]
print ‘%s is %s (%s%%)’ % (name,
status, pct)
47. Installing thirdparty modules using
setuptools
‘setuptools’ is a Python package which lets you
download, build, install, upgrade and uninstall
packages very easily.
To use ‘setuptools’ you will need to install
from your distribution’s package manager.
After installation you can use the command
‘easy_install’ to perform Python package
management tasks.
Python essentials
Third-party modules
Example (installing simplejson using
setuptools):
kunal@ubuntu:~$ sudo easy_install
simplejson
Searching for simplejson
Reading http://pypi.python.org/simple/
simplejson/
Reading http://undefined.org/
python/#simplejson
Best match: simplejson 2.0.9
Downloading http://pypi.python.
org/packages/source/s/simplejson/
simplejson-2.0.9.tar.gz#md5=af5e67a39c
a3408563411d357e6d5e47
Processing simplejson-2.0.9.tar.gz
Running simplejson-2.0.9/setup.py
-q bdist_egg --dist-dir /tmp/easy_
install-FiyfNL/simplejson-2.0.9/eggdist-tmp-3YwsGV
Adding simplejson 2.0.9 to easyinstall.pth file
Installed /usr/local/lib/python2.6/
dist-packages/simplejson-2.0.9-py2.6linux-i686.egg
Processing dependencies for simplejson
Finished processing dependencies for
simplejson
46. Logging to system log
You can use the module ‘syslog’ to write to system
log. ‘syslog’ acts as an interface to UNIX syslog
library routines.
Example:
import syslog
syslog.syslog(‘mygeekapp: started
logging’)
for a in [‘a’, ‘b’, ‘c’]:
b = ‘mygeekapp: I found letter ‘+a
syslog.syslog(b)
syslog.syslog(‘mygeekapp: the script
goes to sleep now, bye,bye!’)
Output:
$ python mylog.py
$ tail -f /var/log/messages
Nov 8 17:14:34 ubuntu -- MARK -Nov 8 17:22:34 ubuntu python:
mygeekapp: started logging
Nov 8 17:22:34 ubuntu python:
mygeekapp: I found letter a
Nov 8 17:22:34 ubuntu python:
mygeekapp: I found letter b
Nov 8 17:22:34 ubuntu python:
mygeekapp: I found letter c
Nov 8 17:22:34 ubuntu python:
mygeekapp: the script goes to sleep
now, bye,bye!
48. Generating PDF
documents
‘ReportLab’ is a very popular module for PDF
generation from Python.
Perform the following steps to install ReportLab
$ wget http://www.reportlab.org/ftp/
ReportLab_2_3.tar.gz
$ tar xvfz ReportLab_2_3.tar.gz
$ cd ReportLab_2_3
$ sudo python setup.py install
For a successful installation, you should see a
similar message:
############SUMMARY INFO###########
###################################
#Attempting install of _rl_accel, sgmlop
& pyHnj
#extensions from ‘/home/kunal/python/
ReportLab_2_3/src/rl_addons/rl_accel’
###################################
#Attempting install of _renderPM
#extensions from ‘/home/kunal/python/
ReportLab_2_3/src/rl_addons/renderPM’
# installing with freetype version 21
###################################
Example:
>>> from reportlab.pdfgen.canvas import
Canvas
# Select the canvas of letter page size
>>> from reportlab.lib.pagesizes import
letter
>>> pdf = Canvas(“bond.pdf”, pagesize =
letter)
# import units
>>> from reportlab.lib.units import cm,
mm, inch, pica
>>> pdf.setFont(“Courier”, 60)
>>> pdf.setFillColorRGB(1, 0, 0)
>>> pdf.drawCentredString(letter[0] / 2,
inch * 6, “MI6 CLASSIFIED”)
>>> pdf.setFont(“Courier”, 40)
>>> pdf.drawCentredString(letter[0] / 2,
inch * 5, “For 007’s Eyes Only”)
# Close the drawing for current page
>>> pdf.showPage()
# Save the pdf page
>>> pdf.save()
Output:
@image:pdf.png
@title: PDF Output
Perform the following steps to install
Python-Twitter:
$ wget http://python-twitter.
googlecode.com/files/python-twitter0.6.tar.gz
$ tar xvfz python-twitter*
$ cd python-twitter*
$ sudo python setup.py install
Example (fetching followers list):
>>> import twitter
# Use you own twitter account here
>>> mytwi = twitter.Api(username=’kunald
eo’,password=’xxxxxx’)
>>> friends = mytwi.GetFriends()
>>> print [u.name for u in friends]
[u’Matt Legend Gemmell’, u’jono wells’,
u’The MDN Big Blog’, u’Manish Mandal’,
u’iH8sn0w’, u’IndianVideoGamer.com’,
u’FakeAaron Hillegass’, u’ChaosCode’,
u’nileshp’, u’Frank Jennings’,..’]
50. Doing Yahoo! news
search
You can use the Yahoo! search SDK to access
Yahoo! search APIs from Python.
Perform the following steps to install it:
$wget http://developer.yahoo.com/
download/files/yws-2.12.zip
$ unzip yws*
$ cd yws*/Python/pYsearch*/
$ sudo python setup.py install
Example:
# Importing news search API
>>> from yahoo.search.news import
NewsSearch
>>> srch = NewsSearch(‘YahooDemo’,
query=’London’)
# Fetch Results
>>> info = srch.parse_results()
>>> info.total_results_available
41640
>>> info.total_results_returned
10
>>> for result in info.results:
... print “’%s’, from %s” %
(result[‘Title’], result[‘NewsSource’])
...
‘Afghan Handover to Be Planned at London
Conference, Brown Says’, from Bloomberg
.................
49. Using Twitter API
You can connect to Twitter using the ‘PythonTwitter’ module.
The Python Book 71
8PSLXJUI
Python
74$SFBUEZOBNJDUFNQMBUFT
6TF+JOKB 'MBTLBOENPSF
78&YUFOTJPOTGPS9#.$
&OIBODF9#.$XJUIUIJTUVUPSJBM
844DJFOUJöDDPNQVUJOH
(FUUPHSJQTXJUI/VN1Z
88*OTUBOUNFTTBHJOH
(FUDIBUUJOHVTJOH1ZUIPO
943FQMBDFZPVSTIFMM
6TF1ZUIPOGPSZPVSQSJNBSZTIFMM
981ZUIPOGPSTZTUFNBENJOT
)PX1ZUIPOIFMQTTZTUFNBENJOJTUSBUJPO
1024DSBQF8JLJQFEJB
6TF#FBVUJGVM4PVQUPSFBEPõ
JOF
88
i&OIBODFZPVSTZTUFNTXJUI
1ZUIPOTQPXFSGVMDBQBCJMJUJFTw
725IF1ZUIPO#PPL
75
78
84
5IF1ZUIPO#PPL73
8PSLXJUI1ZUIPO
Creating dynamic templates
with Flask, Jinja2 and Twitter
Create a dynamic webpage with Twitter and Flask’s rendering
engine, Jinja2
Resources
Python 2.7+
Flask 0.10.0: flask.pocoo.org
Flask GitHub:
github.com/mitsuhiko/flask
A Twitter account
Your favourite text editor
Code downloaded from FileSilo
QThe template uses a loop to generate a list of Twitter tweets
Python and Flask are a great combination
when you’re looking to handle the Twitter
OAuth process and build requests to obtain
tokens. We’ve used Twitter here because of
the large amount of easily digestible data
available on it. However, since Twitter adheres
to the standards set out by OAuth 1.0, the code
we’ve used to sign and build requests can
be modified to work with any third-party API
using the same standard without a great deal
of work. For years PHP has been a mainstay
of template generation, but now with welldocumented frameworks such as Flask,
Sinatra and Handlebars, the ability to use
powerful scripting languages greatly improves
our ability to make great web services. Here,
we’re going to use Python, Flask and its
templating engine to display tweets. Flask
comes with the super-nifty Jinja2 templating
engine, If you’re familiar with Node.js or frontend JavaScript, the syntax will look very
similar to the Handlebars rendering engine.
But, before we dive into that, we need to
organise some of the example code that we’re
using for this.
745IF1ZUIPO#PPL
01
Rearranging our code
Server code can get messy and
unmaintainable quickly, so the first thing we’re
going to do is move our helper functions to
another file and import them into our project,
much like you would a module. This way, it will
be clear which functions are our server logic
and endpoints and which are generic Python
functions. Open the TwitterAuthentication file
downloaded from FileSilo (stored under Twitter
OAuth files) and locate the getParameters,
sign_request
and
create_oauth_headers
functions. Cut and paste them into a new file
called helpers.py in the root of your project
folder. At the top of this file we want to import
some libraries.
import urllib, collections, hmac,
binascii, time, random, string
helpers. Because Python is smart, It will look
in the current directory for a helpers.py file
before it looks for a system module. Now every
function included in helpers.py is accessible
to our project. All we need to do to call them is
prepend our the methods we called before with
helper.function_name and it will execute. For
sign_request, we’ll need to pass our
oauth_secret and consumer_secret for each
request rather than accessing it from the
session. Adjust the function declaration like so:
def sign_request(parameters, method,
baseURL, consumer_secret, oauth_secret):
02
server.py modules
With a lot of the modules needed in this
project having been moved to helpers.py, we
can now remove most of them from server.py.
If we amend our first import statement to be…
from hashlib import sha1
import urllib2, time, random, json
Now we can head back over to server.py and
import the helper functions back into our
project. We do this by simply calling import
…our project will continue to function as it did
before. Note the addition of the json module:
Work with Python
we’ll be using that later as we start handling
Twitter data.
Having Flask use a rendering engine is
super-simple. Flask comes packaged with
the Jinja2 template rendering engine, so we’ve
nothing to install – we just need to import the
package into the project. We can do this by
adding render_template to the end of our from
flask import […] statement.
03
Our first template
Now that we have a rendering engine,
we need to create some templates for it to
use. In the root of our project’s folder, create
a new folder called templates. Whenever
we try to render a template, Flask will look in
this folder for the template specified. To get
to grips with templating, we’ll rewrite some
of our authentication logic to use a template,
rather than manually requesting endpoints. In
templates, create an index.html file. You can
treat this HTML file like any other – included in
the resources for this tutorial is an index.html
that includes all of the necessary head tags and
declarations for this file.
04
Fig 01
{% if session[‘oauth_token’] != “” %}
{% endif %}
QThe BSD-licensed Flask is easy to set up
and use – check out the website for more info
Code on
FileSilo
Rendering our template
In server.py, let’s create a route for ‘/’ to
handle the authorisation process.
@app.route(‘/’)
def home():
if not ‘oauth_token’ in session:
session.clear()
session[‘oauth_secret’] = ‘’
session[‘oauth_token’] = ‘’
return render_template(‘index.html’)
It’s a simple function: all we want to do is check
whether or not we have an oauth_token already
and create those properties in the Flask session
so we don’t throw an error if we try to access
it erroneously. In order to send our generated
template in response to the request, we return
render_template(‘index.html’).
05
Template variables
We can choose to send variables to our
template with render_template(‘index.htm’,
variableOne=value, variableTwo=Value) but
in this instance we don’t need to as each template
has access to the request and session variables.
The Python Book 75
Work with Python
Open index.html. All code executed in a Flask
template is contained within {% %}. As this is our
homepage, we want to direct users accordingly,
So let’s check if we’ve got an access token (Fig 01).
Between the ifs and else of the template is
standard HTML. If we want to include some data
– for example, the access token – we can just add
{{ session[‘oauth_token’] }} in the HTML and it
will be rendered in the page. Previously, in our /
authorised endpoint, we would display the OAuth
token that we received from Twitter; however, now
that we have a template, we can redirect our users
back our root URL and have a page rendered for us
that explains the progress we’ve made.
06
Getting lost
(and then found again)
With every server, some things get misplaced or
people get lost. So how do we handle this? Rather
than defining a route, we can define a handler
that deals with getting lost.
@app.errorhandler(404)
def fourOhFour(error):
return render_template(‘fourohfour.html’)
If a page or endpoint is requested and triggers a
404, then the fourOhFour function will be fired. In
this case, we’ll generate a template that tells the
user, but we could also redirect to another page or
dump the error message.
07
intercept and handle the path get-tweets. The
second lets us define a parameter that we can
use as a value in our getTweets function. By
including count=0 in our function declaration,
we ensure that there will always be a default
value when the function is executed; this way we
don’t have to check the value is present before
we access it. If a value is included in the URL, it
will override the value in the function. The string
inside the determines the
name of the variable. If you want the variable
passed to the function to have a specific type,
you can include a converter with the variable
name. For example, if we wanted to make sure
that was always an integer instead of a
float or string, we’d define our route like so:
Let’s get some tweets
So now we know how to build templates,
let’s grab some tweets to display. In server.py
define a new route, get-tweets,like so:
@app.route(‘/get-tweets’)
@app.route(‘/get-tweets/’)
def getTweets(count=0):
You’ll notice that unlike our other authentication
endpoints, we’ve made two declarations.
The first is a standard route definition: it will
76 The Python Book
10
Signing and sending our request
We’ve built our parameters, So let’s
sign our request and then add the signature to
the parameters (Fig 03).
Before we create the authorisation headers,
we need to remove the count and user_id
values from the tweetRequestParams
dictionary, otherwise the signature we just
created won’t be valid for the request. We can
achieve this with the del keyword. Unlike our
token requests, this request is a GET request,
so instead of including the parameters
in the request body, we define them as
query parameters.
?count=tweetRequestParams[‘count’]
&user_id=tweetRequestParams[‘user_id’]
@app.route(‘/get-tweets/’)
09
Checking our session
and building our request
Before we start grabbing tweets, we want to
run a quick check to make sure we have the
necessary credentials and if not, redirect the
user back the authorisation flow. We can do
this by having Flask respond to the request
with a redirection header, like so:
Static files
Pretty much every webpage uses
JavaScript, CSS and images, but where do we
keep them? With Flask we can define a folder
for use with static content. For Flask, we create
a static folder in the root of our project and
access files by calling /static/css/styles.css or
/static/js/core.js. The tutorial resources include a
CSS file for styling this project.
08
Now we know how to build templates,
let’s grab some tweets to display
if session[‘oauth_token’] == “” or
session[‘oauth_secret’] == “”:
return redirect(rootURL)
Assuming we have all we need, we can start to
build the parameters for our request (Fig 02).
You’ll notice that the nonce value is different
from that in our previous requests. Where the
nonce value in our authenticate and authorise
requests can be any random arrangement of
characters that uniquely identify the request,
for all subsequent requests the nonce needs
to be a 32-character hexadecimal string using
only the characters a-f. If we add the following
function to our helpers.py file, we can quickly
build one for each request.
def nonce(size=32, chars=”abcdef” +
string.digits):
return ‘’.join(random.choice
(chars) for x in range(size))
11
Handling Twitter’s response
Now we’re ready to fire off the request
and we should get a JSON response back
from Twitter. This is where we’ll use the json
module we imported earlier. By using the
json.loads function, we can parse the JSON
into a dictionary that we can access and we’ll
pass through to our tweets.html template.
tweetResponse = json.
loads(httpResponse.read())
return render_template(‘tweets.html’,
data=tweetResponse)
Whereas before, we accessed the session
to get data into our template, this time
we’re explicitly passing a value through to
our template.
12
Displaying our tweets
Let’s create that template now, exactly
the same as index.html but this time, instead of
using a conditional, we’re going to create a loop
to generate a list of tweets we’ve received.
First, we check that we actually received
some data from our request to Twitter. If we’ve
got something to render, we’re ready to work
through it, otherwise we’ll just print that we
didn’t get anything.
Once again, any template logic that we want
to use to generate our page is included between
Work with Python
{% %}. This time we’re creating a loop; inside the
loop we’ll be able to access any property we have
of that object and print it out. In this template
we’re going to create an
element for each
tweet we received and display the user’s profile
picture and text of the tweet (Fig 04).
In our template we can access properties
using either dot notation (.) or with square
brackets ([]). They behave largely the same;
the [] notation will check for an attribute on
the dictionary or object defined whereas the
. notation will look for an item with the same
name. If either cannot find the parameter
specified, it will return undefined. If this occurs,
the template will not throw an error, it will simply
print an empty string. Keep this in mind if your
template does not render the expected data:
you’ve probably just mis-defined the property
you’re trying to access.
Unlike traditional Python, we need to
tell the template where the for loop and if/
else statements end, so we do that with
{% endfor %} and {% endif %}.
13
Flask filters
Sometimes, when parsing from JSON,
Python can generate erroneous characters
that don’t render particularly well in HTML.
You may notice that after tweet[‘text’] there
is |forceescape, This is an example of a Flask
filter; it allows us to effect the input before we
render – in this case it’s escaping our values
for us. There are many, many different builtin filters that come included with Flask. Your
advisor recommends a full reading of all the
potential options.
14
Wrapping up
That’s pretty much it for templating
with Flask. As we’ve seen, it’s insanely quick
and easy to build and deploy dynamic sites.
Flask is great tool for any Python developer
looking to run a web service. Although we’ve
used Twitter to demonstrate Flask’s power,
all of the techniques described can be used
with any third-party service or database
resource. Flask can work with other rendering
engines, such as Handlebars (which is
superb), but Jinja2 still needs to be present
to run Flask and conflicts can occur between
the two engines. With such great integration
between Flask and Jinja2, it makes little
sense to use another engine outside of very
specific circumstances.
tweetRequestParams = {
“oauth_consumer_key” : consumer_key,
“oauth_nonce” : helpers.nonce(32),
“oauth_signature_method” : “HMAC-SHA1”,
“oauth_timestamp” : int(time.time()),
“oauth_version” : “1.0”,
“oauth_token” : session[‚Äòoauth_token’],
“user_id” : session[‘user_id’],
“count” : str(count)
}
Fig 02
tweetRequest = helpers.sign_request(tweetRequestParams, “GET”,
“https://api.twitter.com/1.1/statuses/user_timeline.json”, consumer_secret,
session[‘oauth_secret’])
Fig 03
tweetRequestParams[“oauth_signature”] = tweetRequest
makeRequest=urllib2.Request(“https://api.twitter.com/1.1/statuses/
user_timeline.json?count=” + tweetRequestParams[‘count’] + “&user_id=”
+ tweetRequestParams[‘user_id’])
del tweetRequestParams[‘user_id’], tweetRequestParams[‘count’]
makeRequest.add_header(“Authorization”, helpers.create_oauth_
headers(tweetRequestParams))
try:
httpResponse = urllib2.urlopen(makeRequest)
except urllib2.HTTPError, e:
return e.read()
{% if data %}
Fig 04
{% endif %}
The Python Book 77
Work with Python
Rating (only available for
hosted plug-ins)
Current media
selection
List of
installed
plug-ins
Configure
launcher
Localised
description string
Make extensions for
XBMC with Python
Opens
changelog for
the plug-in
Python is the world’s most popular easy-to-use open source
language. Learn how to use it to build your own features for
XBMC, the world’s favourite FOSS media centre
Resources
XBMC: www.xbmc.org/download
Python 2.7x
Python IDE (optional)
Code on FileSilo
XBMC is perhaps the most important thing that
has ever happened in the open source media
centre space. It started its life on the original
Xbox videogames console and since then it has
become the de facto software for multimedia
aficionados. It also has been forked into many
other successful media centre applications such
as Boxee and Plex. XBMC has ultimately grown
into a very powerful open source application with
a solid community behind it. It supports almost
78 The Python Book
all major platforms, including different hardware
architectures. It is available for Linux, Windows,
Mac OS X, Android, iOS and Raspberry Pi.
In these pages we will learn to build extensions
for XBMC. Extensions are a way of adding
features to XBMC without having to learn the
core of XBMC or alter that core in any way. One
additional advantage is that XBMC uses Python
as its scripting language, and this can be also
used to build the extensions. This really helps
new developers get involved in the project since
Python is easy to learn compared to languages
like C/C++ (from which the core of XBMC is made).
XBMC supports various types of extensions (or
Add-ons): Plugins, Programs and Skins. Plugins
add features to XBMC. Depending on the type
of feature, a plug-in will appear in the relevant
media section of XBMC. For example, a YouTube
plug-in would appear in the Videos section.
Scripts/Programs are like mini-applications for
XBMC. They appear in the Programs section.
Skins are important since XBMC is a completely
customisable application – you can change
the look and feel of just about every facet of
the package.
Depending upon which category your
extension fits, you will have to create the
extension directory accordingly. For example…
Plug-ins:
plugin.audio.ludaudi: An audio plug-in
plugin.video.ludvidi: A video plug-in
script.xxx.xxx: A program
In this tutorial we will build an XBMC plug-in
called LUD Entertainer. This plug-in will provide a
nice way to watch videos from Reddit from within
XBMC. Our plug-in will show various content such
as trailers and documentaries from Reddit. We’ll
also allow our users to add their own Subreddit.
Each video can then be categorised as Hot, New,
Top, Controversial etc. With this plug-in we will
demonstrate how easy it is hook into XBMC’s
built-in method to achieve a very high-quality
user experience.
Due to space limitations, we aren’t able to print
the full code here. We recommend downloading
the complete code from FileSilo.
Work with Python
01
Preparing the directory structure
As we have mentioned previously, each
XBMC extension type follows a certain directory
naming convention. In this case we are building
a video plug-in, so the plug-in directory name
would be plugin.video.ludlent. But that’s just the
root directory name – we will need several other
folders and files as well.
The following describes the directory structure of
LUD Linux Entertainer:
plugin.video.ludent – Root Plugin directory
|-- addon.xml
|-- changelog.txt
|-- default.py
|-- icon.png
|-- LICENSE.txt
|-- README
`-- resources
|-- lib
`-- settings.xml
02
Creating addon.xml
An addon.xml file needs to be created in
the root of the extension directory. The addon.xml
file contains the primary metadata from a XBMC
extension. It contains overview, credits, version
information and dependencies information about
the extension.
The root element of addon.xml is the
element. It is defined as:
rest of the content is placed here
Here, id is the identifier for the plug-in, so
it should be unique among all the XBMC
extensions, and id is also used for the directory
name; version tells XBMC the extension
version number, which helps in its ability to
deliver automatic updates – XBMC follows the
Major.Minor.Patch versioning convention; name is
the English title of the plug-in.
Note: Steps 3 to 5 cover entries that need to be
added within the addon.xml file.
03
Adding dependency information
Dependency inside an extension is
managed using the element.
In the above code we have added a dependency
to a library called xbmc.python version
2.1. Currently it is added as a mandatory
dependency. To make the dependency
optional you will need to add optional="true";
eg
In the above example we have added core
dependency xbmc.python to 2.1.0 because it’s
the version shipped with XBMC version Frodo
12.0 and 12.1 . If you were to add xbmc.python
to 2.0 then it would only work in XBMC Eden 11.0
and not in the latest version.
For the current version of XBMC 12.1, the
following versions of core XBMC components
are shipped:
xbmc.python 2.1.0
xbmc.gui 4.0.0
xbmc.json 6.0.0
xbmc.metadata 2.1.0
xbmc.addon 12.0.0
In addition to xbmc.python we are also adding
some third-party plug-ins as dependencies,
such as plugin.video.youtube. These plug-ins
will be installed automatically when we install
plugin.video.ludent.
04
Setting up the provider and
entry point
Our extension is supposed to provide the video
content for XBMC. In order to convey that, we
have to set up the following element:
video
Here, the library attribute sets up the plug-in
entry point. In this example default.py will be
executed when the user activates the plug-in.
The elements sets up the media
type it provides. This also gets reflected in the
placement of the plug-in. Since ours is a video
plug-in, it will show up in the Videos section
of XBMC.
05
Setting up plug-in metadata
Metadata about the plug-in is provided in
. The
following are the important elements…
: Most of the time, XBMC extensions
are cross-platform compatible. However, if you
depend on the native platform library that is only
available on certain platforms then you will need
to set the supported platforms here. Accepted
values for the platform are: all, linux, osx, osx32,
osx64, ios (Apple iOS) , windx (Windows DirectX),
wingl (Windows OpenGL) and android.
: This gives a brief
description of the plug-in. Our example sets the
language attribute as English, but you can use
other languages too.
: A detailed description of the
plug-in.
: Webpage where the plug-in is hosted.
: Source code repository URL. If you are
hosting your plug-in on GitHub, you can mention
the repository URL here.
: Discussion forum URL for your plug-in.
: Author email. You can directly type email
or use a bot-friendly email address like max at
domain dot com.
06
Setting changelog, icon, fanart
and licence
We need a few additional files in the plug-in
directory…
changelog.txt: You should list the changes made
to your plug-in between releases. The changelog
is visible from the XBMC UI.
An example changelog:
0.0.1
- Initial Release
0.0.2
- Fixed Video Buffering Issue
icon.png: This will represent the plug-in in the
XBMC UI. It needs to be a non-transparent PNG
file of size 256x256.
fanart.jpg (optional): The fanart.jpg is rendered
in the background if a user selects the plug-in
in XBMC. The art needs to be rendered in HDTV
formats, so its size can range from 1280x720
(720p) up to the maximum 1920x1080 (1080p).
The Python Book 79
Work with Python
License.txt: This file contains the licence of
the distributed plug-in. The XBMC project
recommends the use of the Creative Commons
Attribution-ShareAlike 3.0 licence for skins,
and GPL 2.0 for add-ons. However, most of the
copyleft licences can be used.
Note: For the purpose of packaging, extensions/
add-ons/themes/plug-ins are the same.
07
Providing settings for the plug-in
Settings can be provided by the file
resources/settings.xml. These are great for userconfigurable options.
Partial: resources/settings.xml
Here, label defines the language id string which
will then be used to display the label. id defines
the name which will be used for programmatic
access. type defines the data type you want
to collect; it also affects the UI which will be
displayed for the element. default defines the
default value for the setting. You should always
use a default value wherever possible to provide a
better user experience.
80 The Python Book
The following are a few important settings
types that you can use…
text: Used for basic string inputs.
ipaddress: Used to collect internet addresses.
number: Allows you enter a number. XBMC will
also provide an on-screen numeric keyboard for
the input.
slider: This provides an elegant way to collect
integer, float and percentage values. You can get
the slider setting in the following format:
In the above example we are creating a slider with
min range 1, max range 10 and step as 1. In the
option field we are stating the data type we are
interested in – we can also set option to "float"
or "percent".
bool: Provides bool selection in the form of on
or off.
file: Provides a way to input file paths. XBMC will
provide a file browser to make the selection of file.
If you are looking to make selection for a specific
type of file you can use audio, video, image or
executable instead of file.
folder: Provides a way to browse for a folder…
Example:
Here, source sets the start location for the
folder, while option sets the write parameter for
the application.
sep & lsep: sep is used to draw a horizontal line
in the setting dialog; lsep is used for drawing
a horizontal line with text. They do not collect
any input but are there for building better user
interface elements…
08
Language support
Language support is provided in
the form of the strings.xml file located in
resources/languages/[language name]. This
approach is very similar to many large software
projects, including Android, where static strings
are never used.
resource/language/english/string.xml
example:
Add subreddit
string>
HotNewTopControversial
string>
HourDayWeekMonthYear
As you may have seen in the settings.xml
example, all the labels are referring to string
ids. You can have many other languages as
well. Depending upon the language XBMC is
running in, the correct language file will be
loaded automatically.
Post XBMC Frodo (12.1), strings.xml will be
deprecated. Post Frodo, XBMC will be moved
to a GNU gettext-based translation system;
gettext uses PO files. You can use a tool called
xbmc-xml2po to convert strings.xml into
equivalent PO files.
09
Building default.py
Since our plug-in is small, it will all be
contained inside default.py. If you are developing
a more complex add-on then you can create
supporting files in the same directory. If your
library depends upon third-party libraries, you
have two ways to go about it. You can either place
the third-party libraries into the resources/lib
folder; or bundle the library itself into a plug-in,
then add that plug-in as the dependency in the
addon.xml file.
Our plug-in works with reddit.tv. This is the
website from Reddit which contains trending
videos shared by its readers. Videos posted on
Reddit are actually sourced from YouTube, Vimeo
and Dailymotion.
We will be starting off default.py using the
following imports:
import urllib
import urllib2
…
import xbmcplugin
Work with Python
import xbmcgui
import xbmcaddon
Apart from xbmcplugin, xbmcgui and
xbmcaddon, the rest are all standard Python
libraries which are available on PyPI (Python
Package Index) via pip. You will not need to install
any library yourself since the Python runtime for
XBMC has all the components built in.
urllib and urllib2 help in HTTP communication.
socket is used for network I/O; re is used
for regular expression matching; sqlite3 is
the Python module for accessing an SQLite
embedded database; xbmcplugin, xbmcgui and
xbmcaddon contain the XBMC-specific routine.
10
Initialising
During the initialisation process, we will
be reading various settings from settings.xml.
Settings can be read in the following way:
addon = xbmcaddon.Addon()
filterRating = int(addon.
getSetting("filterRating"))
filterVoteThreshold = int(addon.getS
etting("filterVoteThreshold"))
In order to read settings of type bool you will need
to do something like:
filter = addon.getSetting("filter")
== "true"
We are also setting the main URL, plug-in handle
and the user agent for it:
pluginhandle = int(sys.argv[1])
urlMain = "http://www.reddit.com"
userAgent = "Mozilla/5.0 (Windows NT
6.2; WOW64; rv:22.0) Gecko/20100101
Firefox/22.0"
opener = urllib2.build_opener()
opener.addheaders = [(‘User-Agent’,
userAgent)]
11
Reading localised strings
As mentioned, XBMC uses strings.xml to
serve up the text. In order to read those strings,
you will need to use getLocalizedString.
translation = addon.
getLocalizedString
translation(30002)
In this example, translation(30002) will
return the string "Hot" when it is running in an
English environment.
idFile
idPath
strFilename
1
1
plugin://plugin.
2
2
plugin://plugin.
1
2013-08-07 22:42
3
2
plugin://plugin.
1
2013-08-08 00:09
4
2
plugin://plugin.
1
2013-08-08 00:55
5
2
plugin://plugin.
1
2013-08-08 00:58
12
playCount
lastPlayed
dateAdded
2013-08-06 23:47
Building helper functions
In this step we will look at some of the
important helper functions.
getDbPath(): This returns the location of the
SQLite database file for videos. XBMC stores
library and playback information in SQLite DB
files. There are separate databases for videos
and music, located inside the .xbmc/userdata/
Database folder. We are concerned with the
videos DB. It is prefixed with ‘MyVideos’…
def getDbPath():
path = xbmc.
translatePath("special://userdata/
Database")
files = os.listdir(path)
latest = ""
for file in files:
if file[:8] == ‘MyVideos’
and file[-3:] == ‘.db’:
if file > latest:
latest = file
return os.path.join(path,
latest)
getPlayCount(url): Once we have the database
location, we can get the play count using a
simple SQL query. The MyVideo database
contains a table called files, which keeps a
record of all the video files played in XBMC by
filename. In this case it will be URL.
dbPath = getDbPath()
conn = sqlite3.connect(dbPath)
c = conn.cursor()
def getPlayCount(url):
c.execute(‘SELECT playCount FROM
files WHERE strFilename=?’, [url])
result = c.fetchone()
if result:
result = result[0]
if result:
return int(result)
return 0
return -1
The above table is an example of a files table.
addSubreddit(): Our plug-in allows users to add
their own Subreddit. This function takes the
Subreddit input from the user, then saves it in
the subreddits file inside the addon data folder.
The following sets the subreddits file location:
subredditsFile = xbmc.
translatePath("special://profile/
addon_data/"+addonID+"/subreddits")
this translates into .xbmc/userdata/
addon_data/plugin.video.ludent/
subreddits
def addSubreddit():
keyboard = xbmc.Keyboard(‘’,
translation(30001))
keyboard.doModal()
if keyboard.isConfirmed() and
keyboard.getText():
subreddit = keyboard.
getText()
fh = open(subredditsFile,
‘a’)
fh.write(subreddit+’\n’)
fh.close()
This function also demonstrates how to take
a text input from the user. Here we are calling
the Keyboard function with a text title. Once it
detects the keyboard, it writes the input in the
subreddits file with a newline character.
getYoutubeUrl(id): When we locate a YouTube
URL to play, we pass it on to the YouTube plug-in
(plugin.video.youtube) to handle the playback. To
do so, we need to call it in a certain format…
def getYoutubeUrl(id):
url = "plugin://plugin.
video.youtube/?path=/root/
video&action=play_video&videoid=" +
id
return url
The Python Book 81
Work with Python
On the same lines, we can build a function to
place links as well…
Similarly for Vimeo:
def getVimeoUrl(id):
url = "plugin://plugin.video.
vimeo/?path=/root/video&action=play_
video&videoid=" + id
return url
And for Dailymotion:
def getDailyMotionUrl(id):
url = "plugin://plugin.video.
dailymotion_com/?url=" + id +
"&mode=playVideo"
return url
Once we have the video URL resolved into the
respective plug-in, playing it is very simple:
def playVideo(url):
listitem = xbmcgui.
ListItem(path=url)
xbmcplugin.
setResolvedUrl(pluginhandle, True,
listitem)
13
Populating plug-in content listing
xbmcplugin contains various routines
for handling the content listing inside the
plug-ins UI. The first step is to create directory
entries which can be selected from the XBMC
UI. For this we will use a function called
xbmcplugin.addDirectoryItem.
For our convenience we will be abstracting
addDirectoryItem to suit it to our purpose, so
that we can set name, URL, mode, icon image
and type easily.
def addDir(name, url, mode,
iconimage, type=""):
u = sys.argv[0]+"?url="+urllib.
quote_plus(url)+"&mode="+str(mode)+"
&type="+str(type)
ok = True
liz = xbmcgui.ListItem(name,
iconImage="DefaultFolder.png",
thumbnailImage=iconimage)
liz.setInfo(type="Video",
infoLabels={"Title": name})
ok = xbmcplugin.
addDirectoryItem(handle=int(sys.
argv[1]), url=u, listitem=liz,
isFolder=True)
return ok
82 The Python Book
def addLink(name, url, mode,
iconimage, description, date):
u = sys.argv[0]+"?url="+urllib.
quote_plus(url)+"&mode="+str(mode)
ok = True
liz = xbmcgui.ListItem(name,
iconImage="DefaultVideo.png",
thumbnailImage=iconimage)
liz.setInfo(type="Video",
infoLabels={"Title": name, "Plot":
description, "Aired": date})
liz.setProperty(‘IsPlayable’,
‘true’)
ok = xbmcplugin.
addDirectoryItem(handle=int(sys.
argv[1]), url=u, listitem=liz)
return ok
Based on the abstractions we have just created,
we can create the base functions which will
populate the content. But before we do that,
let’s first understand how Reddit works. Most of
the Reddit content filters are provided through
something called Subreddits. This allows you to
view discussions related to a particular topic. In
our plug-in we are interested in showing videos;
we also want to show trailers, documentaries
etc. We access these using Subreddits. For
example, for trailers it would be reddit.com/r/
trailers. For domains we can use /domain; for
example, to get all the YouTube videos posted
on Reddit, we will call reddit.com/domain/
youtube.com. Now you may ask what is the
guarantee that this Subreddit will only list
videos? The answer is that it may not. For that
reason we scrape the site ourselves to find
videos. More on this in the next step.
The first base function we’ll define is index().
This is called when the user starts the plug-in.
def index():
defaultEntries = ["videos",
"trailers", "documentaries",
"music"]
entries = defaultEntries[:]
if os.path.
exists(subredditsFile):
fh = open(subredditsFile,
‘r’)
content = fh.read()
fh.close()
spl = content.split(‘\n’)
for i in range(0, len(spl),
1):
if spl[i]:
subreddit = spl[i]
entries.
append(subreddit)
entries.sort()
for entry in entries:
if entry in defaultEntries:
addDir(entry.title(),
"r/"+entry, ‘listSorting’, "")
else:
addDirR(entry.title(),
"r/"+entry, ‘listSorting’, "")
addDir("[ Vimeo.com ]",
"domain/vimeo.com", ‘listSorting’,
"")
addDir("[ Youtu.be ]", "domain/
youtu.be", ‘listSorting’, "")
addDir("[ Youtube.com
]", "domain/youtube.com",
‘listSorting’, "")
addDir("[ Dailymotion.com
]", "domain/dailymotion.com",
‘listSorting’, "")
addDir("[B]"+translation(30001)+" -[/B]", "",
‘addSubreddit’, "")
xbmcplugin.
endOfDirectory(pluginhandle)
Here, the penultimate entry makes a call to
addSubreddit. listSorting takes care of sorting
out the data based on criteria such as Hot,
New etc. It also calls in Reddit’s JSON function,
which returns nice easy-to-parse JSON data.
We have created a settings entry for all the
sorting criteria. Based on what is set, we go
ahead and build out the sorted list.
def listSorting(subreddit):
if cat_hot:
addDir(translation(30002),
urlMain+"/"+subreddit+"/hot/.
json?limit=100", ‘listVideos’, "")
if cat_new:
addDir(translation(30003),
urlMain+"/"+subreddit+"/new/.
json?limit=100", ‘listVideos’, "")
if cat_top_d:
addDir(translation(30004)+":
"+translation(30007),
urlMain+"/"+subreddit+"/
top/.json?limit=100&t=day",
‘listVideos’, "")
xbmcplugin.
endOfDirectory(pluginhandle)
Work with Python
def listVideos(url):
currentUrl = url
xbmcplugin.setContent(pluginhandle, "episodes")
content = opener.open(url).read()
spl = content.split(‘"content"’)
for i in range(1, len(spl), 1):
entry = spl[i]
try:
match = re.compile(‘"title": "(.+?)"’, re.DOTALL).findall(entry)
title = match[0].replace("&", "&")
match = re.compile(‘"description": "(.+?)"’, re.DOTALL).
findall(entry)
description = match[0]
match = re.compile(‘"created_utc": (.+?),’, re.DOTALL).findall(entry)
downs = int(match[0].replace("}", ""))
rating = int(ups*100/(ups+downs))
if filter and (ups+downs) > filterVoteThreshold and rating <
filterRating:
continue
title = title+" ("+str(rating)+"%)"
match = re.compile(‘"num_comments": (.+?),’, re.DOTALL).
findall(entry)
comments = match[0]
description = dateTime+" | "+str(ups+downs)+" votes:
"+str(rating)+"% Up | "+comments+" comments\n"+description
match = re.compile(‘"thumbnail_url": "(.+?)"’, re.DOTALL).
findall(entry)
thumb = match[0]
matchYoutube = re.compile(‘"url": "http://www.youtube.com/
watch\\?v=(.+?)"’, re.DOTALL).findall(entry)
matchVimeo = re.compile(‘"url": "http://vimeo.com/(.+?)"’,
re.DOTALL).findall(entry)
url = ""
if matchYoutube:
url = getYoutubeUrl(matchYoutube[0])
elif matchVimeo:
url = getVimeoUrl(matchVimeo[0].replace("#", ""))
if url:
addLink(title, url, ‘playVideo’, thumb, description, date)
except:
pass
match = re.compile(‘"after": "(.+?)"’, re.DOTALL).findall(entry)
xbmcplugin.endOfDirectory(pluginhandle)
if forceViewMode:
xbmc.executebuiltin(‘Container.SetViewMode(‘+viewMode+’)’)
14
Populating the episode view (listing videos)
At this point we have the URL in hand, which returns JSON data; now we need to extract the
data out of it which will make sense to us.
By looking at the JSON data, you can see there’s a lot of interesting information present here. For
example, url is set to youtube.com/watch?v=n4rTztvVx8E; title is set to ‘The Counselor – Official
Trailer’. There also many other bits of data that we will use, such as ups, downs, num_comments,
thumbnail_url and so on. In order to filter out the data that we need, we will use regular expressions.
There is one more thing to note: since we are not presenting directories any more but are ready to
place content, we have to set the xbmcplugin.setContent to episodes mode.
In the code listed to the left here, we are
opening the URL, then – based on regular
expression matches – we are discovering
the location title, description, date, ups,
downs and rating. We are also locating
video thumbnails and then passing them on
to XBMC.
Later in the code, we also try to match the
URL to a video provider. With our plug-in we are
supporting YouTube, Vimeo and Dailymotion.
If this is detected successfully, we call the
helper functions to locate the XBMC plugin based playback URL. During this whole
parsing process, if any exception is raised, the
whole loop is ignored and the next JSON item
is parsed.
15
Installing & running the add-on
You can install the add-on using one of
the following two methods:
• You can copy the plug-in directory to
.xbmc/addons.
• You can install the plug-in from the zip file. To
do so, compress the add-on folder into a zip file
using the command:
$ zip -r plugin.video.ludent.zip
plugin.video.ludent
To install the plug-in from the zip file, open
XBMC, go to System then Add-ons, then click
‘Install from zip file’. The benefit of installing
from a zip file is that XBMC will automatically
try to install all the dependent plug-ins as well.
Once you have the plug-in installed, you can
run it by going to the Videos Add-ons section of
XBMC, selecting Get More… and then clicking
on LUD Reddit Viewer.
You can access the settings dialog of the
plug-in by right-clicking the LUD Reddit Viewer,
then selecting ‘Add-on settings’.
So, you have seen how robust and powerful
XBMC’s extension system is. In this example,
we were able to leverage the full power of
Python (including those magical regular
expression matches) from within XBMC.
XBMC itself also offers a robust UI framework,
which provides a very professional look for
our add-on.
As powerful as it may seem, we have only
built a video plug-in. XBMC’s extension system
also provides a framework for building fully
fledged programs (called Programs). We will
cover this in a later issue.
The Python Book 83
Work with Python
Matplotlib
generated output
A simple Python
program for
Polynomial Fitting!
Finding help
is easy
A Python script
that uses SciPy to
process an image
Scientific computing
with NumPy
Powerful calculations with
NumPy, SciPy and Matplotlib
Resources
NumPy:
www.numpy.org
SciPy:
www.scipy.org
Matplotlib:
www.matplotlib.org
84 The Python Book
NumPy is the primary Python package for
performing scientific computing. It has a
powerful N-dimensional array object, tools
for integrating C/C++ and Fortran code, linear
algebra, Fourier transform, and random
number capabilities, among other things.
NumPy also supports broadcasting, which is
a clever way for universal functions to deal in
a meaningful way with inputs that do not have
exactly the same form.
Apart from its capabilities, the other
advantage of NumPy is that it can be integrated
into Python programs. In other words, you may
get your data from a database, the output of
another program, an external file or an HTML
page and then process it using NumPy.
This article will show you how to install
NumPy, make calculations, plot data, read and
write external files, and it will introduce you to
some Matplotlib and SciPy packages that work
well with NumPy.
NumPy also works with Pygame, a Python
package for creating games, though explaining
its use is beyond of the scope of this article.
It is considered good practice to try the
various NumPy commands inside the Python
shell before putting them into Python programs.
The examples in this article are using either
Python shell or iPython.
01
Installing NumPy
Most Linux distributions have a
ready-to-install package you can use. After
installation, you can find out the NumPy version
you are using by executing the following:
$ python
Python 2.7.3 (default, Mar 13 2014, 11:03:55)
[GCC 4.7.2] on linux2
Type "help", "copyright", "credits" or
"license" for more information.
>>> numpy.version.version
Work with Python
Traceback (most recent call last):
File "", line 1, in
NameError: name 'numpy' is not defined
>>> import numpy
>>> numpy.version.version
'1.6.2'
>>>
03
Making simple
calculations
Not only have you found the NumPy version but
you also know that NumPy is properly installed.
02
About NumPy
Despite its simplistic name, NumPy is
a powerful Python package that is mainly for
working with arrays and matrices.
There are many ways to create an array but
the simplest is by using the array() function:
>>> oneD = array([1,2,3,4])
The aforementioned command creates a
one-dimensional array. If you want to create a
two-dimensional array, you can use the array()
function as follows:
>>> twoD = array([ [1,2,3],
...
[3,3,3],
...
[-1,-0.5,4],
...
[0,1,0]] )
You can also create arrays with more dimensions.
03
Making simple calculations
using NumPy
Given an array named myArray, you can find
the minimum and maximum values in it by
executing the following commands:
>>> myArray.min()
>>> myArray.max()
Should you wish to find the mean value of all
array elements, run the next command:
>>> myArray.mean()
Similarly, you can find the median of the array
by running the following command:
>>> median(myArray)
The median value of a set is an element that
divides the data set into two subsets (left
and right subsets) with the same number of
elements. If the data set has an odd number of
elements, then the median is part of the data
set. On the other side, if the data set has an
even number of elements, then the median is
the mean value of the two centre elements of
the sorted data set.
04
Using arrays with NumPy
NumPy not only embraces the indexing
methods used in typical Python for strings and
lists but also extends them. If you want to select
a given element from an array, you can use the
following notation:
06
Writing to files
Writing variables to a file is largely
similar to reading a file. If you have an array
variable named aa1, you can easily save its
contents into a file called aa1.txt by using the
following command:
>>> twoD[1,2]
In [17]: np.savetxt("aa1.txt", aa1)
You can also select a part of an array (a slice)
using the following notation:
As you can easily imagine, you can read
the contents of aa1.txt later by using the
loadtxt() function.
>>> twoD[:1,1:3]
Finally, you can convert an array into a Python
list using the tolist() function.
05
Reading files
Imagine that you have just extracted
information from an Apache log file using AWK and
you want to process the text file using NumPy.
The following AWK code finds out the total
number of requests per hour:
$ cat access.log | cut -d[ -f2 | cut -d]
-f1 | awk -F: '{print $2}' | sort -n | uniq
-c | awk '{print $2, $1}' > timeN.txt
The format of the text file (timeN.txt) with the
data is the following:
00
01
02
03
191
225
121
104
Reading the timeN.txt file and assigning it to a
new array variable can be done as follows:
07
Common functions
08
Working with matrices
NumPy supports many numerical and
statistical functions. When you apply a function
to an array, the function is automatically applied
to all array elements.
When working with matrices, you can find the
inverse of a matrix AA by typing “AA.I”. You can
also find its eigenvalues by typing “np.linalg.
eigvals(AA)” and its eigenvector by typing “np.
linalg.eig(BB)”.
A special subtype of a two-dimensional
NumPy array is a matrix. A matrix is like an
array except that matrix multiplication replaces
element-by-element multiplication. Matrices
are generated using the matrix (or mat) function
as follows:
In [2]: AA = np.mat('0 1 1; 1 1 1; 1 1 1')
You can add two matrices named AA and BB by
typing AA + BB. Similarly, you can multiply them
by typing AA * BB.
aa = np.loadtxt("timeN.txt")
The Python Book 85
Work with Python
09
Plotting with
Matplotlib
09
Plotting with Matplotlib
The first move you should make is to
install Matplotlib. As you can see, Matplotlib has
many dependencies that you should also install.
The first thing you will learn is how to
plot a polynomial function. The necessary
commands for plotting the 3x^2-x+1
polynomial are the following:
import numpy as np
import matplotlib.pyplot as plt
myPoly = np.poly1d(np.array([3, -1, 1]).
astype(float))
x = np.linspace(-5, 5, 100)
y = myPoly(x)
plt.xlabel('x values')
plt.ylabel('f(x) values')
xticks = np.arange(-5, 5, 10)
yticks = np.arange(0, 100, 10)
plt.xticks(xticks)
plt.yticks(yticks)
plt.grid(True)
plt.plot(x,y)
The variable that holds the polynomial
is myPoly. The range of values that will
be plotted for x is defined using “x =
np.linspace(-5, 5, 100)”. The other important
variable is y, which calculates and holds the
values of f(x) for each x value.
It is important that you start ipython using
the “ipython --pylab=qt” parameters in order
to see the output on your screen. If you are
interested in plotting polynomial functions,
you should experiment more, as NumPy can
also calculate the derivatives of a function and
plot multiple functions in the same output.
SciPy is built on top of NumPy
and is more advanced
10
In [36]:
In [37]:
In [38]:
In [39]:
In [40]:
In [41]:
Out[41]:
array([
In [42]:
Out[42]:
In [43]:
Out[43]:
86 The Python Book
from scipy.stats import poisson, lognorm
mySh = 10;
myMu = 10;
ln = lognorm(mySh)
p = poisson(myMu)
ln.rvs((10,))
9.29393114e-02, 1.15957068e+01,
8.26370734e-07, 5.64451441e-03,
4.98471222e-06, 1.45947948e+02,
5.87353720e-02])
p.rvs((10,))
array([12, 11, 9, 9, 9, 10, 9,
ln.pdf(3)
0.013218067177522842
9.78411983e+01,
4.61744055e-09,
9.25502852e-06,
4, 13, 8])
Fig 01
About SciPy
SciPy is built on top of NumPy and
is more advanced than NumPy. It supports
numerical integration, optimisations, signal
processing, image and audio processing,
and statistics. The example in Fig. 01 (to the left)
uses a small part of the scipy.stats package that
is about statistics.
The example uses two statistics distributions
and may be difficult to understand even if you
know mathematics, but it is presented in order
to give you a better taste of SciPy commands.
11
Using SciPy for image processing
Now we will show you how to process
and transform a PNG image using SciPy.
The most important part of the code is the
following line:
Work with Python
image = np.array(Image.open('SA.png').
convert('L'))
This line allows you to read a usual PNG
file and convert it into a NumPy array for
additional processing. The program will
also separate the output into four parts and
displays a different image for each of these
four parts.
12
Other useful functions
It is very useful to be able to find out
the data type of the elements in an array; it
can be done using the dtype() function.
Similarly, the ndim() function returns the
number of dimensions of an array.
When reading data from external files, you
can save their data columns into separate
variables using the following way:
Process and transform a PNG
image using SciPy
11
Using SciPy for
image processing
In [10]: aa1,aa2 = np.loadtxt("timeN.txt",
usecols=(0,1), unpack=True)
The aforementioned command saves column
1 into variable aa1 and column 2 into variable
aa2. The “unpack=True” allows the data to be
assigned to two different variables. Please
note that the numbering of columns starts
with 0.
13
Fitting to polynomials
The NumPy polyfit() function tries to fit
a set of data points to a polynomial. The data
was found from the timeN.txt file, created
earlier in this article.
The Python script uses a fifth degree
polynomial, but if you want to use a different
degree instead then you only have to change
the following line:
13
Fitting to
Polynomials
coefficients = np.polyfit(aa1, aa2, 5)
14
Array broadcasting in NumPy
To close, we will talk more about
array broadcasting because it is a very
useful characteristic. First, you should know
that array broadcasting has a rule: in order
for two arrays to be considered for array
broadcasting, “the size of the trailing axes for
both arrays in an operation must either be the
same size or one of them must be one.”
Put simply, array broadcasting allows
NumPy to “change” the dimensions of an array
by filling it with data in order to be able to do
calculations with another array. Nevertheless,
you cannot stretch both dimensions of an
array to do your job.
The Python Book 87
Work with Python
The server notifies
all clients when a
new client joins
Each message
has a time stamp
prefixed to it
Similarly, the server
notifies all clients
when a client leaves
A client can detect
when the server
exits without
crashing or hanging
Instant messaging with Python
How to program both the client, complete with a GUI, and
server of a simple instant messenger in Python
Resources
A computer – running your favourite Linux
distribution
Internet connection – to access
documentation
Python 2.x, PyGTK and GObject –
packages installed
88 The Python Book
He’re we’ll be implementing an instant
messenger in Python with a client-server
architecture. This means each client connects
to the server, which relays any message that
one client sends to all other clients. The server
will also notify the other clients when someone
joins or leaves the server. The instant messenger
can work anywhere a TCP socket can: on the
same computer with the loopback interface,
across various computers on a LAN, or even
over the internet if you were to configure your
router correctly. However, our messages aren’t
encrypted, so we wouldn’t recommend that.
Writing an instant messenger is an interesting
technical problem that covers a bunch of
areas that you may not have come across while
programming before:
• We’ll be employing sockets, which are used
to transmit data across networks.
• We’ll also be using threading, which allows a
program to do multiple things at once.
• We’ll cover the basics of writing a simple
graphical user interface with GTK, as well as
how to interact with that from a different thread.
• Finally, we’ll be touching on the use of
regular expressions to easily analyse and extract
data from strings.
Before getting started, you’ll need to have
a Python2.x interpreter installed, as well as
the PyGTK bindings and the Python2 GObject
bindings. The chances are that if you have a
system with a fair amount of software on it,
you will already have these packages, so it may
be easier to wait and see if you’re missing any
libraries when you attempt to import them. All of
the above packages are commonly used, so you
should be able to install them using your distro’s
package manager.
Work with Python
01
The server
The server will do the following jobs:
• Listen for new clients
• Notify all clients when a new client joins
• Notify all clients when a client leaves
• Receive and deliver messages to all clients
We’re going to write the server side of the
instant messenger first, as the client requires
it. There will be two code files, so it’s a good
idea to make a folder to keep them inside. You
can create an empty file with the command
touch [filename], and mark that file as
executable using chmod +x [filename]. This
file is now ready to edit in your favourite editor.
[liam@liam-laptop
Python-IM
[liam@liam-laptop
Python-IM/
[liam@liam-laptop
IM-Server.py
[liam@liam-laptop
+x IM-Server.py
02
Python]$ mkdir
Python]$ cd
Python-IM]$ touch
Python-IM]$ chmod
Starting off
As usual, we need to start off with the
line that tells the program loader what it needs
to interpret the rest of the file with. In your
advisor’s case, that line is:
#!/usr/bin/env python2.
On your system, it may need to be changed to
#!/usr/bin/env/ python2.6 or #!/usr/
bin/env python2.7
After that, we’ve written a short comment about
what the application does, and imported the
required libraries. We’ve already mentioned
what the threading and socket libraries are
for. The re library is used for searching strings
with regular expressions. The signal library is
used for dealing with signals that will kill the
program, such as SIGINT. SIGINT is sent when
Ctrl+C is pressed. We handle these signals so
that the program can tell the clients that it’s
exiting rather than dying unexpectedly. The sys
library is used to exit the program. Finally, the
time library is used to put a sensible limit on how
frequently the body of while loops execute.
#!/usr/bin/env python2
# The server side of an instant
messaging application. Written as
part of a Linux User & Developer
tutorial by Liam Fraser in 2013.
import threading
import
import
import
import
import
03
socket
re
signal
sys
time
The Server class
The Server class is the main class of our
instant messenger server. The initialiser of this
class accepts a port number to start listening
for clients on. It then creates a socket, binds the
socket to the specified port on all interfaces,
and then starts to listen on that port. You can
optionally include an IP address in the tuple that
contains the port. Passing in a blank string like
we have done causes it to listen on all interfaces.
The value of 1 passed to the listen function
specifies the maximum number of queued
connections we can accept. This shouldn’t be
a problem as we’re not expecting a bunch of
clients to connect at exactly the same time.
Now that we have a socket, we’ll create an
empty array that will be later used to store a
collection of client sockets that we can echo
messages to. The final part is to tell the signal
library to run the self.signal_handler function,
which we have yet to write, when a SIGINT or
SIGTERM is sent to the application so that we
can tidy up nicely.
class Server():
def __init__(self, port):
# Create a socket and bind it to a
port
self.listener = socket.
socket(socket.AF_INET, socket.SOCK_
STREAM)
self.listener.bind((‘’,
port))
self.listener.listen(1)
print “Listening on port
{0}”.format(port)
# Used to store all of the client
sockets we have, for echoing
to them
self.client_sockets = []
# Run the function self.signal_
handler when Ctrl+C is pressed
signal.signal(signal.SIGINT,
self.signal_handler)
signal.signal(signal.
SIGTERM, self.signal_handler)
04
The server’s main loop
Useful
documentation
Threading: docs.python.org/2/library/
threading.html
Sockets: docs.python.org/2/library/
socket.html
Regular expressions: docs.python.
org/2/library/re.html
The signal handler: docs.python.org/
2/library/signal.html
PyGTK: www.pygtk.org/
pygtk2reference
GObject: www.pygtk.org/
pygtk2reference/gobject-functions.html
sockets and then starts an instance of the
ClientListener class, which we have yet to
write, in a new thread. Sometimes, defining
interfaces you are going to call before you’ve
written them is good, because it can give an
overview of how the program will work without
worrying about the details.
Note that we’re printing information as we go
along, to make debugging easier should we need
to do it. Sleeping at the end of the loop is useful
to make sure the while loop can’t run quickly
enough to hang the machine. However, this is
unlikely to happen as the line that accepts new
connections is blocking, which means that the
program waits for a connection before moving
on from that line. For this reason, we need to
enclose the line in a try block, so that we can
catch the socket error and exit when we can no
longer accept connections. This will usually be
when we’ve closed the socket during the process
of quitting the program.
def run(self):
while True:
# Listen for clients, and create a
ClientThread for each new client
print “Listening for
more clients”
try:
(client_socket,
client_address) = self.listener.
accept()
except socket.error:
sys.exit(“Could not
The server’s main loop essentially
accepts new connections from clients,
adds that client’s socket to the collection of
The Python Book 89
Work with Python
accept any more connections”)
self.client_sockets.
append(client_socket)
print “Starting client
thread for {0}”.format(client_
address)
client_thread =
ClientListener(self, client_socket,
client_address)
client_thread.start()
time.sleep(0.1)
05
The echo function
We need a function that can be called
from a client’s thread to echo a message to each
client. This function is pretty simple. The most
important part is that sending data to sockets is
in a try block, which means that we can handle
the exception if the operation fails, rather than
having the program crash.
def echo(self, data):
# Send a message to each socket in
self.client_socket
print "echoing: {0}".
format(data)
for socket in self.client_
sockets:
# Try and echo to all clients
try:
socket.sendall(data)
except socket.error:
print "Unable to send
message"
06
Finishing the Server class
The remainder of the Server class is
taken up with a couple of simple functions;
one to remove a socket from the collection of
sockets, which doesn’t need an explanation,
and the signal_handler function that we talked
about in the initialiser of the class. This function
stops listening for new connections, and
unbinds the socket from the port it was listening
on. Finally, we send a message to each client to
let them know that we are exiting. The signal will
continue to close the program as expected once
the signal_handler function has ended.
def remove_socket(self, socket):
90 The Python Book
# Remove the specified socket from the
client_sockets list
self.client_sockets.
remove(socket)
def signal_handler(self, signal,
frame):
# Run when Ctrl+C is pressed
print "Tidying up"
# Stop listening for new connections
self.listener.close()
# Let each client know we are quitting
self.echo("QUIT")
07
The client thread
The class that is used to deal with each
client inherits the Thread class. This means
that the class can be created, then started with
client_thread.start(). At this point, the
code in the run function of the class will be run in
the background and the main loop of the Server
class will continue to accept new connections.
We have to start by initialising the Thread base
class, using the super keyword. You may have
noticed that when we created a new instance of
the ClientListener class in the server’s main loop,
we passed through the server’s self variable. We
do this because it’s better for each instance of the
ClientListener class to have its own reference to
the server, rather than using the global one that
we’ll create later to actually start the application.
class ClientListener(threading.
Thread):
def __init__(self, server,
socket, address):
# Initialise the Thread base class
super(ClientListener,
self).__init__()
# Store the values that have been
passed to the constructor
self.server = server
self.address = address
self.socket = socket
self.listening = True
self.username = "No
Username"
08
The client thread’s loop
The loop that runs in the client thread
is pretty similar to the one in the server. It keeps
listening for data while self.listening is true,
and passes any data it gets to a handle_msg
function that we will write shortly. The value
passed to the socket.recv function is the size of
the buffer to use while receiving data.
def run(self):
# The thread's loop to receive and
process messages
while self.listening:
data = ""
try:
data = self.socket.
recv(1024)
except socket.error:
"Unable to recieve
data"
self.handle_msg(data)
time.sleep(0.1)
# The while loop has ended
print "Ending client thread
for {0}".format(self.address)
09
Tidying up
We need to have a function to tidy up
the thread. We’ll call this either when the client
sends us a blank string (indicating that it’s
stopped listening on the socket) or sends us the
string “QUIT”. When this happens, we’ll echo to
every client that the user has quit.
def quit(self):
# Tidy up and end the thread
self.listening = False
self.socket.close()
self.server.remove_
socket(self.socket)
self.server.echo("{0} has
quit.\n".format(self.username))
10
Handling messages
There are three possible messages our
clients can send:
• QUIT
• USERNAME user
• Arbitrary string to be echoed to all clients
The client will also send a bunch of empty
messages if the socket has been closed, so we
will end their thread if that happens. The code
should be pretty self-explanatory apart from
the regular expression part. If someone sends
the USERNAME message, then the server tells
every client that a new user has joined. This is
tested with a regular expression. ^ indicates the
start of the string, $ indicates the end, and the
brackets containing .* extract whatever comes
after “USERNAME ”.
Work with Python
We need to tell GObject that we’ll be
using threading
def handle_msg(self, data):
# Print and then process the message
we’ve just recieved
print "{0} sent: {1}".
format(self.address, data)
# Use regular expressions to test for
a message like "USERNAME liam"
username_result =
re.search('^USERNAME (.*)$', data)
if username_result:
self.username =
username_result.group(1)
self.server.echo("{0}
has joined.\n".format(self.
username))
elif data == "QUIT":
# If the client has sent quit then
close this thread
self.quit()
elif data == "":
# The socket at the other end is
probably closed
self.quit()
else:
# It's a normal message so echo it to
everyone
self.server.echo(data)
11
Starting the server
The code that actually starts the Server
class is as follows. Note that you are probably
best picking a high-numbered port as you need
to be root to open ports <1024.
if __name__ == "__main__":
# Start a server on port 59091
server = Server(59091)
server.run()
12
The client
Create a new file for the client as we did
for the server and open it in your favourite editor.
The client requires the same imports as the
server, as well as the gtk, gobject and datetime
libraries. One important thing we need to do is to
tell GObject that we’ll be using threading, so we
can call functions from other threads and have
the main window, which is running in the main
GTK thread, update.
#!/usr/bin/env python2
# The client side of an instant
messaging application. Written as
part of a Linux User & Developer
tutorial by Liam Fraser in 2013.
import
import
import
import
import
import
import
threading
gtk
gobject
socket
re
time
datetime
# Tell gobject to expect calls from
multiple threads
gobject.threads_init()
13
The client graphical user interface
The user interface of the client isn’t
the main focus of the tutorial, and won’t be
explained in as much detail as the rest of
the code. However, the code should be fairly
straightforward to read and we have provided
links to documentation that will help.
Our MainWindow class inherits the gtk
Window class, so we need to start by initialising
that using the super keyword. Then we create
the controls that will go on the window, connect
any events they have to functions, and finally
lay out the controls how we want. The destroy
event is raised when the program is closed, and
the other events should be obvious.
GTK uses a packing layout, in which you use
Vboxes and Hboxes to lay out the controls. V
and H stand for vertical and horizontal. These
controls essentially let you split a window
up almost like a table, and will automatically
decide the size of the controls depending on the
size of the application.
GTK doesn’t come with a control to enter
basic information, such as the server’s IP
address, port and your chosen username, so
we’ve made a function called ask_for_info,
which creates a message box, adds a text
box to it and then retrieves the results. We’ve
done this because it’s simpler and uses less
code than creating a new window to accept
the information.
class MainWindow(gtk.Window):
def __init__(self):
# Initialise base gtk window class
super(MainWindow, self).__
init__()
# Create controls
self.set_title("IM Client")
vbox = gtk.VBox()
hbox = gtk.HBox()
self.username_label = gtk.
Label()
self.text_entry = gtk.
Entry()
send_button = gtk.
Button("Send")
self.text_buffer = gtk.
TextBuffer()
text_view = gtk.
TextView(self.text_buffer)
# Connect events
self.connect("destroy",
self.graceful_quit)
send_button.
connect("clicked", self.send_
message)
# Activate event when user presses
Enter
self.text_entry.
connect("activate", self.send_
message)
# Do layout
vbox.pack_start(text_view)
hbox.pack_start(self.
username_label, expand = False)
8PSLXJUI1ZUIPO
hbox.pack_start(self.text_
entry)
hbox.pack_end(send_button,
expand = False)
vbox.pack_end(hbox, expand
= False)
# Show ourselves
self.add(vbox)
self.show_all()
# Go through the configuration
process
self.configure()
def ask_for_info(self,
question):
# Shows a message box with a text
entry and returns the response
dialog = gtk.
MessageDialog(parent = self, type =
gtk.MESSAGE_QUESTION,
flags = gtk.DIALOG_MODAL |
gtk.DIALOG_DESTROY_WITH_PARENT,
buttons = gtk.BUTTONS_OK_CANCEL,
message_format = question)
entry = gtk.Entry()
entry.show()
dialog.vbox.pack_end(entry)
response = dialog.run()
response_text = entry.
get_text()
dialog.destroy()
if response == gtk.RESPONSE_
OK:
return response_text
else:
return None
14
Configuring the client
This code is run after we’ve added the
controls to the main window, and asks the user
for input. Currently, the application will exit if the
user enters an incorrect server address or port;
but this isn’t a production system, so that’s fine.
def configure(self):
# Performs the steps to connect to
the server
# Show a dialog box asking for server
address followed by a port
server = self.ask_for_
info("server_address:port")
# Regex that crudely matches an IP
address and a port number
regex = re.search('^(\d+\.\
d+\.\d+\.\d+):(\d+)$', server)
address = regex.group(1).
strip()
port = regex.group(2).
strip()
# Ask for a username
self.username = self.ask_
for_info("username")
self.username_label.set_
text(self.username)
# Attempt to connect to the server
and then start listening
self.network =
Networking(self, self.username,
address, int(port))
self.network.listen()
15
The remainder of MainWindow
The rest of the MainWindow class has
plenty of comments to explain itself, as follows.
One thing to note is that when a client sends a
message, it doesn’t display it in the text view
straight away. The server is going to echo the
message to each client, so the client simply
displays its own message when the server
echoes it back. This means that you can tell if
the server is not receiving your messages when
you don’t see a message that you send.
def add_text(self, new_text):
# Add text to the text view
text_with_timestamp = "{0}
{1}".format(datetime.datetime.now(),
new_text)
# Get the position of the end of
the text buffer, so we know where to
insert new text
end_itr = self.text_buffer.
get_end_iter()
# Add new text at the end of the buffer
self.text_buffer.insert(end_
itr, text_with_timestamp)
def send_message(self, widget):
# Clear the text entry and send the
message to the server
# We don't need to display it as it
will be echoed back to each client,
including us.
new_text = self.text_entry.
get_text()
self.text_entry.set_text("")
message = "{0} says: {1}\n".
format(self.username, new_text)
self.network.send(message)
def graceful_quit(self, widget):
# When the application is closed,
tell GTK to quit, then tell the
server we are quitting and tidy up
the network
gtk.main_quit()
self.network.send("QUIT")
self.network.tidy_up()
The server is going to echo the
message to each client
Work with Python
16
The client’s Networking class
Much of the client’s Networking class is
similar to that of the server’s. One difference is
that the class doesn’t inherit the Thread class –
we just start one of its functions as a thread.
class Networking():
def __init__(self, window,
username, server, port):
# Set up the networking class
self.window = window
self.socket = socket.
socket(socket.AF_INET, socket.SOCK_
STREAM)
self.socket.connect((server,
port))
self.listening = True
# Tell the server that a new user
has joined
self.send("USERNAME {0}".
format(username))
def listener(self):
# A function run as a thread that
listens for new messages
while self.listening:
data = ""
try:
data = self.socket.
recv(1024)
except socket.error:
"Unable to recieve
data"
self.handle_msg(data)
# Don't need the while loop to be
ridiculously fast
time.sleep(0.1)
17
Running a function as a thread
The listener function above will be run
as a thread. This is trivial to do. Enabling the
daemon option on the thread means that it will
die if the main thread unexpectedly ends.
def listen(self):
# Start the listening thread
self.listen_thread =
threading.Thread(target=self.
listener)
# Stop the child thread from keeping
the application open
self.listen_thread.daemon =
True
self.listen_thread.start()
18
Finishing the Networking class
Again, most of this code is similar to
the code in the server’s Networking class. One
difference is that we want to add some things to
the text view of our window. We do this by using
the idle_add function of GObject. This allows
us to call a function that will update the window
running in the main thread when it is not busy.
def send(self, message):
# Send a message to the server
print "Sending: {0}".
format(message)
try:
self.socket.
sendall(message)
except socket.error:
print "Unable to send
message"
def tidy_up(self):
# We'll be tidying up if either we are
quitting or the server is quitting
self.listening = False
self.socket.close()
# We won't see this if it's us
that's quitting as the window will
be gone shortly
gobject.idle_add(self.
window.add_text, "Server has
quit.\n")
def handle_msg(self, data):
if data == "QUIT":
# Server is quitting
self.tidy_up()
elif data == "":
# Server has probably closed
unexpectedly
self.tidy_up()
else:
# Tell the GTK thread to add some
text when it's ready
gobject.idle_add(self.
window.add_text, data)
19
Starting the client
The main window is started by initialising
an instance of the class. Notice that we don’t
need to store anything that is returned. We then
start the GTK thread by calling gtk.main().
if __name__ == "__main__":
# Create an instance of the main
window and start the gtk main loop
MainWindow()
gtk.main()
20
you’ve started the server, open an instance of
the client and enter 127.0.0.1:port, where
‘port’ is the port you decided to use. The server
will print the port it’s listening on to make this
easy. Then enter a username and click OK. Here
is an example output from the server with two
clients. You can use the client over a network
by replacing 127.0.0.1 with the IP address of the
server. You may have to let the port through your
computer’s firewall if it’s not working.
[liam@liam-laptop Python]$ ./IMServer.py
Listening on port 59091
Listening for more clients
Starting client thread for
('127.0.0.1', 38726)
('127.0.0.1', 38726) sent: USERNAME
client1
echoing: client1 has joined.
Listening for more clients
Starting client thread for
('127.0.0.1', 38739)
('127.0.0.1', 38739) sent: USERNAME
client2
echoing: client2 has joined.
Listening for more clients
('127.0.0.1', 38739) sent: client2
says: Hi
echoing: client2 says: Hi
('127.0.0.1', 38726) sent: client1
says: Hi
echoing: client1 says: Hi
('127.0.0.1', 38726) sent: QUIT
echoing: client1 has quit.
Ending client thread for
('127.0.0.1', 38726)
^CTidying up
echoing: QUIT
Could not accept any more
connections
('127.0.0.1', 38739) sent:
echoing: client2 has quit.
Ending client thread for
('127.0.0.1', 38739)
21
That’s it!
So, it’s not perfect and could be a little
more robust in terms of error handling, but we
have a working instant messenger server that
can accept multiple clients and relay messages
between them. More importantly, we have
learned a bunch of new concepts and methods
of working.
Trying it out
You’ll want a few terminals: one to
start the server, and some to run clients. Once
The Python Book 93
Work with Python
Replace your shell
with Python
Python is a great programming language, but did you know that
it is even capable of replacing your primary shell (command-line
interface)? Here, we explain all…
Resources
You will require a version of Python installed on
your system. The good news is you don’t have to
do anything to get it installed. Most of the Linux
distributions already ship with either Python 2.6 or
Python 2.7
94 The Python Book
We all use shell on a daily basis. For most
of us, shell is the gateway into our Linux
system. For years and even today, Bash has
been the default shell for Linux. But it is getting a
bit long in the tooth.
No need to be offended: we still believe Bash
is the best shell out there when compared to
some other UNIX shells such as Korn Shell
(KSH), C Shell (CSH) or even TCSH.
This tutorial is not about Bash being
incapable, but it is about how to breathe
completely new life into the shell to do old
things conveniently and new things which were
previously not possible, even by a long shot. So,
without further delay, let’s jump in.
While the Python programming language
may require you to write longer commands to
accomplish a task (due to the way Python’s
modules are organised), this is not something
to be particularly concerned about. You can
easily write aliases to the equivalent of the Bash
command that you intend to replace. Most of
the time there will be more than one way to do
a thing, but you will need to decide which way
works best for you.
Python provides support for executing
system commands directly (via the os or
subprocess module), but where possible we will
focus on Python-native implementations, as
this allows us to develop portable code.
Work with Python
SECTION 1: Completing basic shell
tasks in Python
1. File management
The Python module shutil provides support for
file and directory operations. It provides support
for file attributes, directory copying, archiving
etc. Let’s look at some of its important functions.
shutil module
copy (src,dst): Copy the src file to
the destination directory. In this
mode permissions bits are copied but
metadata is not copied.
copy2 (src,dst): Same as copy() but
also copies the metadata.
copytree(src, dst[, symlinks=False[,
ignore=None]]): This is similar to ‘cp
-r’, it allows you to copy an entire
directory.
ignore_patterns (*patterns): ignore_
patterns is an interesting function
that can be used as a callable for
copytree(), it allows you to ignore
files and directories specified by the
glob-style patterns.
rmtree(path[, ignore_errors[,
onerror]]): rmtree() is used to delete
an entire directory.
move(src,dst): Similar to mv command it
allows you to recessively move a file
or directory to a new location.
Example:
>>>from shutil import copytree, ignore_
patterns
>>>copytree(source, destination,
ignore=ignore_patterns(‘*.pyc’,
‘tmp*’))
make_archive(base_name, format[, root_
dir[, base_dir[, verbose[, dry_run[,
owner[, group[, logger]]]]]]] : Think
of this as a replacement for tar, zip,
bzip etc. make_archive() creates an
archive file in the given format
such as zip, bztar, tar , gztar.
Archive support can be extended via
Python modules.
Example
>>> from shutil import make_archive
>>> import os
>>> archive_name = os.path.
expanduser(os.path.join(‘~’,
‘ludarchive’))
>>> root_dir = os.path.expanduser(os.
You can easily write aliases to the
equivalent of the Bash command that
you intend to replace
path.join(‘~’, ‘.ssh’))
>>> make_archive(archive_name, ‘gztar’,
root_dir)
‘/Users/kunal/ludarchive.tar.gz’
2. Interfacing operating system &
subprocesses
Python provides two modules to interface
with the OS and to manage processes, called
os and subprocess. These modules allow you
to interact with the core operating system
shell and let you work with the environment,
processes, users and file descriptors.
The subprocess module was introduced to
support better management of subprocesses
(part of which already exists in the os
module) in Python and is aimed to replace
os.system, os.spawn*, os.popen, popen2.* and
commands.* modules.
os module
environ: environment represents the
environment variables in a string object.
OS
example:
>>> import os
>>> os.environ
{‘VERSIONER_PYTHON_PREFER_32_BIT’:
‘no’, ‘LC_CTYPE’: ‘UTF-8’, ‘TERM_
PROGRAM_VERSION’: ‘297’, ‘LOGNAME’:
‘kunaldeo’, ‘USER’: ‘kunaldeo’, ‘PATH’:
‘/System/Library/Frameworks/Python.
framework/Versions/2.7/bin:/Users/
kunaldeo/narwhal/bin:/opt/local/sbin:/
usr/local/bin:/usr/bin:/bin:/usr/sbin:/
sbin:/usr/local/bin:/usr/X11/bin:/opt/
local/bin:/Applications/MOTODEV_Studio_
For_Android_2.0.0_x86/android_sdk/
tools:/Applications/MOTODEV_Studio_For_
Android_2.0.0_x86/android_sdk/platformtools:/Volumes/CyanogenModWorkspace/
bin’, ‘HOME’: ‘/Users/kunaldeo’,
‘PS1’: ‘\\[\\e[0;32m\\]\\u\\[\\e[m\\]
\\[\\e[1;34m\\]\\w\\[\\e[m\\] \\
[\\e[1;32m\\]\\$\\[\\e[m\\] \\
[\\e[1;37m\\]’, ‘NARWHAL_ENGINE’:
‘jsc’, ‘DISPLAY’: ‘/tmp/launch-s2LUfa/
org.x:0’, ‘TERM_PROGRAM’: ‘Apple_
Terminal’, ‘TERM’: ‘xterm-color’,
‘Apple_PubSub_Socket_Render’: ‘/tmp/
launch-kDul5P/Render’, ‘VERSIONER_
PYTHON_VERSION’: ‘2.7’, ‘SHLVL’:
‘1’, ‘SECURITYSESSIONID’: ‘186a5’,
‘ANDROID_SDK’: ‘/Applications/MOTODEV_
Studio_For_Android_2.0.0_x86/android_
sdk’,’_’: ‘/System/Library/Frameworks/
Python.framework/Versions/2.7/bin/
python’, ‘TERM_SESSION_ID’: ‘ACFE2492BB5C-418E-8D4F-84E9CF63B506’, ‘SSH_
AUTH_SOCK’: ‘/tmp/launch-dj6Mk4/
Listeners’, ‘SHELL’: ‘/bin/bash’,
‘TMPDIR’: ‘/var/folders/6s/pgknm8b118
737mb8psz8x4z80000gn/T/’, ‘LSCOLORS’:
‘ExFxCxDxBxegedabagacad’, ‘CLICOLOR’:
‘1’, ‘__CF_USER_TEXT_ENCODING’:
‘0x1F5:0:0’, ‘PWD’: ‘/Users/kunaldeo’,
‘COMMAND_MODE’: ‘unix2003’}
You can also find out the value for an
environment value:
>>> os.environ[‘HOME’]
‘/Users/kunaldeo’
putenv(varname,value) : Adds or sets
an environment variable with the given
variable name and value.
getuid() : Return the current process’s
user id.
getlogin() : Returns the username of
currently logged in user
getpid(pid) : Returns the process group
id of given pid. When used without
any parameters it simply returns the
current process id.
getcwd() : Return the path of the
current working directory.
chdir(path) : Change the current
working directory to the given path.
The Python Book 95
Work with Python
listdir(path) : Similar to ls, returns
a list with the content of directories
and file available on the given path.
command with arguments. On process
completion it returns the returncode
attribute.
Example:
Example:
>>> os.listdir(“/home/homer”)
[‘.gnome2’, ‘.pulse’, ‘.gconf’,
‘.gconfd’, ‘.beagle’, ‘.gnome2_
private’, ‘.gksu.lock’, ‘Public’,
‘.ICEauthority’, ‘.bash_history’,
‘.compiz’, ‘.gvfs’, ‘.updatenotifier’, ‘.cache’, ‘Desktop’,
‘Videos’, ‘.profile’, ‘.config’,
‘.esd_auth’, ‘.viminfo’, ‘.sudo_
as_admin_successful’, ‘mbox’,
‘.xsession-errors’, ‘.bashrc’, ‘Music’,
‘.dbus’, ‘.local’, ‘.gstreamer-0.10’,
‘Documents’, ‘.gtk-bookmarks’,
‘Downloads’, ‘Pictures’, ‘.pulsecookie’, ‘.nautilus’, ‘examples.
desktop’, ‘Templates’, ‘.bash_logout’]
>>> import subprocess
>>> print subprocess.call([“ls”,”-l”])
total 3684688
drwx------+ 5 kunaldeo staff
170 Aug 19 01:37 Desktop
drwx------+ 10 kunaldeo staff
340 Jul 26 08:30 Documents
drwx------+ 50 kunaldeo staff
1700 Aug 19 12:50 Downloads
drwx------@ 127 kunaldeo staff
4318 Aug 19 01:43 Dropbox
drwx------@ 42 kunaldeo staff
1428 Aug 12 15:17 Library
drwx------@ 3 kunaldeo staff
102 Jul 3 23:23 Movies
drwx------+ 4 kunaldeo staff
136 Jul 6 08:32 Music
drwx------+ 5 kunaldeo staff
170 Aug 12 11:26 Pictures
drwxr-xr-x+ 5 kunaldeo staff
170 Jul 3 23:23 Public
-rwxr-xr-x
1 kunaldeo staff
1886555648 Aug 16 21:02 androidsdk.tar
drwxr-xr-x
5 kunaldeo staff
170 Aug 16 21:05 sdk
drwxr-xr-x 19 kunaldeo staff
646 Aug 19 01:47 src
-rw-r--r-1 root
staff
367 Aug 16 20:36 umbrella0.log
mkdir(path[, mode]) : Creates a
directory with the given path with the
numeric code mode. The default mode is
0777.
makedirs(path[, mode]) : Creates given
path (inclusive of all its directories)
recursively. The default mode is 0777.
:
Example:
>>> import os
>>> path = “/home/kunal/greatdir”
>>> os.makedirs( path, 0755 );
rename (old,new) : The file or
directory “old” is renamed to “new”
If “new” is a directory, an error
will be raised. On Unix and Linux, if
“new” exists and is a file, it will
be replaced silently if the user has
permission to do so.
renames (old,new) : Similar to rename
but also creates any directories
recessively if necessary.
rmdir(path) : Remove directory from the
path mentioned. If the path already
has files you will need to use shutil.
rmdtree()
subprocess:
call(*popenargs, **kwargs) : Runs the
96 The Python Book
STD_INPUT_HANDLE: The standard input
device. Initially, this is the console input buffer.
STD_OUTPUT_HANDLE: The standard output
device. Initially, this is the active console
screen buffer.
STD_ERROR_HANDLE: The standard error
device. Initially, this is the active console
screen buffer.
SECTION 2: IPython: a ready-made
Python system shell replacement
In section 1 we have introduced you to the
Python modules which allow you to do system
shell-related tasks very easily using vanilla
Python. Using the same features, you can build
a fully featured shell and remove a lot of Python
boilerplate code along the way. However, if
you are kind of person who wants everything
ready-made, you are in luck. IPython provides a
powerful and interactive Python shell which you
can use as your primary shell. IPython supports
Python 2.6 to 2.7 and 3.1 to 3.2 . It supports
two type of Python shells: Terminal based and
Qt based.
Just to reiterate, IPython is purely implemented
in Python and provides a 100% Pythoncompliant shell interface, so everything you
have learnt in section 1 can be run inside
IPython without any problems.
IPython is already available in most Linux
distributions. Search your distro’s repositories to
look for it. In case you are not able to find it, you
can also install it using easy_install or PyPI.
IPython provides a lot of interesting features
which makes it a great shell replacement…
Tab completion: Tab completion provides an
excellent way to explore any Python object that
you are working with. It also helps you to avoid
making typos.
Example :
In [3]: import o {hit tab}
objc
opcode
operator
optparse
os
os2emxpath
In [3]: import os
In [4]: os.p
os.pardir
os.popen
os.path
os.popen2
os.pathconf
os.popen3
{hit tab}
os.pathconf_names
os.popen4
os.pathsep
os.putenv
os.pipe
Built In Object Explorer: You can add
‘?’ after any Python object to view
its details such as Type, Base Class,
String Form, Namespace, File and
Docstring.
Example:
In [28]: os.path?
Type:
module
Base Class:
String Form:
Namespace: Interactive
File:
/System/Library/Frameworks/
Work with Python
You can start the Qt console with:
$ ipython qtconsole
IPython also comes with its own
Qt-based console
Python.framework/Versions/2.7/lib/
python2.7/posixpath.py
Docstring:
Common operations on POSIX pathnames.
Instead of importing this module directly, import
os and refer to this module as os.path. The
‘os.path’ name is an alias for this module on
POSIX systems; on other systems (eg Mac,
Windows), os.path provides the same operations
in a manner specific to that platform, and is an
alias to another module (eg macpath, ntpath).
Some of this can actually be useful on nonPOSIX systems too, eg for manipulation of the
pathname component of URLs.
You can also use double question marks (??) to
view the source code for the relevant object.
Magic functions: IPython comes with a set of
predefined ‘magic functions’ that you can call
with a command-line-style syntax. IPython
‘magic’ commands are conventionally prefaced
by %, but if the flag %automagic is set to on,
then you can call magic commands without the
preceding %.
To view a list of available magic functions,
you can use ‘magic function %lsmagic’. Magic
functions include functions that work with code
such as %run, %edit, %macro, %recall etc;
functions that affect shell such as %colors,
%xmode, %autoindent etc; and other functions
such as %reset, %timeit, %paste etc. Most of
the cool features of IPython are powered using
magic functions.
Example:
In [45]: %lsmagic
Available magic functions:
%alias %autocall %autoindent
%automagic %bookmark %cd %colors
%cpaste %debug %dhist %dirs
%doctest_mode %ed %edit %env %gui
%hist %history %install_default_
config %install_profiles %load_ext
%loadpy %logoff %logon %logstart
%logstate %logstop %lsmagic %macro
%magic %page %paste %pastebin %pdb
If you get errors related to missing modules,
make sure that you have installed the dependent
packages, such as PyQt, pygments, pyexpect
and ZeroMQ.
%pdef %pdoc %pfile %pinfo %pinfo2
%popd %pprint %precision %profile
%prun %psearch %psource %pushd %pwd
%pycat %pylab %quickref %recall
%rehashx %reload_ext %rep %rerun
%reset %reset_selective %run %save
%sc %sx %tb %time %timeit %unalias
%unload_ext %who %who_ls %whos
%xdel %xmode
Automagic is OFF, % prefix IS needed
for magic functions
To view help on any Magic Function, call
‘%somemagic?’ to read its docstring.
Python script execution and runtime code
editing: You can use %run to run any Python
script. You can also control-run the Python
script with pdb debugger using -d, or pdn
profiler using -p. You can also edit a Python
script using the %edit command. %edit will
open the given Python script in the editor
defined by the $EDITOR environment variable.
Shell command support: If you are in the mood
to just run a shell command, you can do it very
easily by prefixing the command with ! .
Q IPython Qt console with GUI capabilities
Example :
In [5]: !ps
PID TTY
4508 ttys000
84275 ttys001
17958 ttys002
TIME
0:00.07
0:00.03
0:00.18
CMD
-bash
-bash
-bash
In [8]: !clang prog.c -o prog
prog.c:2:1: warning: type specifier
missing, defaults to ‘int’ [-Wimplicitint]
main()
^~~~
1 warning generated.
Qt console : IPython also comes with its own
Qt-based console. This provides a number of
features that are only available in a GUI, such
as inline figures, multiline editing with syntax
highlighting, and graphical calltips .
As you can see, it’s easy to tailor Python
for all your shell environment needs.
Python modules like os, subprocess
and shutil are available at your
disposal to do just about everything
you need using Python. IPython turns
this whole experience into an even
more complete package. You get to do
everything a standard Python shell
does and with much more convenient
features. IPython’s magic functions
really do provide a magical Python shell
experience. So next time you open a
Bash session, think again: why settle for
gold when platinum is a step away?
The Python Book 97
Work with Python
Python for system
administrators
Learn how Python can help in system administration as it dares to
replace the usual shell scripting…
Resources
Python-devel Python development
libraries, required for compiling
third-party Python module
setuptools setuptools allows you to
download, build, install, upgrade,
and uninstall Python packages
with ease
System administration is an important part of
our computing environment. It does not matter
whether you are managing systems at your work
our home. Linux, being a UNIX-based operating
system, already has everything a system
administrator needs, such as the world-class
shells (not just one but many, including Bash, csh,
zsh etc), handy tools, and many other features
which make the Linux system an administrator’s
dream. So why do we need Python when Linux
already has everything built-in? Being a dynamic
scripting language, Python is very easy to read
and learn. That’s just not us saying that, but
many Linux distributions actually use Python
in core administrative parts. For example, Red
Hat (and Fedora) system setup tool Anaconda
is written in Python (read this line again, got the
snake connection?). Also, tools like GNU Mailman,
CompizConfig Settings Manager (CCSM) and
hundreds of tiny GUI and non-GUI configuration
tools are written using Python. Python does not
limit you on the choice of user interface to follow
– you can build command-line, GUI and web apps
using Python. This way, it has got covered almost
all the possible interfaces.
Here we will look into executing sysadminrelated tasks using Python.
Parsing configuration files
Configuration files provide a way for applications
to store various settings. In order to write a
script that allows you to modify settings of a
particular application, you should be able to
parse the configuration file of the application.
In this section we learn how to parse INI-style
configuration files. Although old, the INI file
format is very popular with much modern open
source software, such as PHP and MySQL.
Excerpt for php.ini configuration file:
[PHP]
engine = On
zend.ze1_compatibility_mode = Off
short_open_tag = On
asp_tags = Off
precision
= 14
y2k_compliance = On
output_buffering = 4096
;output_handler =
zlib.output_compression = Off
[MySQL]
; Allow or prevent persistent links.
mysql.allow_persistent = On
mysql.max_persistent = 20
mysql.max_links = -1
mysql.default_port = 3306
mysql.default_socket =
mysql.default_host = localhost
mysql.connect_timeout = 60
mysql.trace_mode = Off
Python provides a built-in module called
ConfigParser (known as configparser in Python
3.0). You can use this module to parse and create
configuration files.
@code: writeconfig.py
@description: The following demonstrates
adding MySQL section to the php.ini file.
@warning: Do not use this script with the
actual php.ini file, as it’s not designed to
handle all aspects of a complete php.ini file.
import ConfigParser
config = ConfigParser.
98 The Python Book
RawConfigParser()
config.add_section(‘MySQL’)
config.set(‘MySQL’,’mysql.trace_
mode’,’Off’)
config.set(‘MySQL’,’mysql.connect_
timeout’,’60’)
config.set(‘MySQL’,’mysql.default_
host’,’localhost’)
config.set(‘MySQL’,’mysql.default_
port’,’3306’)
config.set(‘MySQL’,’mysql.allow_
persistent’, ‘On’ )
config.set(‘MySQL’,’mysql.max_
persistent’,’20’)
with open(‘php.ini’, ‘ap’) as
configfile:
config.write(configfile)
Output:php.ini
[MySQL]
mysql.max_persistent = 20
mysql.allow_persistent = On
mysql.default_port = 3306
mysql.default_host = localhost
mysql.trace_mode = Off
mysql.connect_timeout = 60
@code: parseconfig.py
@description: Parsing and updating the config
file
import ConfigParser
config = ConfigParser.ConfigParser()
config.read(‘php.ini’)
# Print config values
print config.get(‘MySQL’,’mysql.
Note
This is written for the Python 2.X series,
as it is still the most popular and default
Python distribution across all the
platforms (including all Linux distros,
BSDs and Mac OS X).
Work with Python
default_host’)
print config.get(‘MySQL’,’mysql.
default_port’)
config.remove_option(‘MySQL’,’mysql.
trace_mode’)
with open(‘php.ini’, ‘wb’) as
configfile:
config.write(configfile)
Parsing JSON data
JSON (also known as JavaScript Object
Notation) is a lightweight modern datainterchange format. JSON is an open standard
under ECMA-262. It is a text format and is
completely
language-independent.
JSON
is also used as the configuration file format
for modern applications such as Mozilla
Firefox and Google Chrome. JSON is also
very popular with modern web services such
as Facebook, Twitter, Amazon EC2 etc. In
this section we will use the Python module
‘simplejson’ to access Yahoo Search (using
the Yahoo Web Services API), which outputs
JSON data.
To use this section, you should have the
following:
1. Python module: simplejson.
Note: You can install Python modules using the
command ‘easy_install ’. This
command assumes that you have a working
internet connection.
2. Yahoo App ID: The Yahoo App ID can be
created from https://developer.apps.yahoo.
com/dashboard/createKey.html. The Yahoo
App ID will be generated on the next page. See
the screenshot below for details.
QGenerating the Yahoo App ID
simplejson is very easy to use. In the following
example we will use the capability of mapping
JSON data structures directly to Python data
types. This gives us direct access to the JSON
data without developing any XML parsing code.
JSON PYTHON DATA MAPPING
JSON
Python
object dict
array
list
string
unicode
number (int)
int, long
number (real) float
TRUE
TRUE
FALSE
FALSE
null
None
For this section we will use the simplejson.
load function, which allows us to deserialise a
JSON object into a Python object.
@code: LUDSearch.py
import simplejson, urllib
APP_ID = ‘xxxxxxxx’ # Change this to
your APP ID
SEARCH_BASE = ‘http://search.
yahooapis.com/WebSearchService/V1/
webSearch’
class YahooSearchError(Exception):
pass
def search(query, results=20,
start=1, **kwargs):
kwargs.update({
‘appid’: APP_ID,
‘query’: query,
‘results’: results,
‘start’: start,
‘output’: ‘json’
})
url = SEARCH_BASE + ‘?’ +
urllib.urlencode(kwargs)
result = simplejson.load(urllib.
urlopen(url))
if ‘Error’ in result:
# An error occurred; raise
an exception
raise YahooSearchError,
result[‘Error’]
return result[‘ResultSet’]
Let’s use the above code from the Python shell
to see how it works. Change to the directory
where you have saved the LUDYSearch.py and
open a Python shell.
@code: Python Shell Output. Lines starting
with ‘>>>’ indicate input
>>> execfile(“LUDYSearch.py”)
>>> results = search(‘Linux User and
Developer’)
>>> results[‘totalResultsAvailable’]
123000000
>>> results[‘totalResultsReturned’]
20
>>> items = results[‘Result’]
>>> for Result in items:
...
print
Result[‘Title’],Result[‘Url’]
...
Linux User http://www.linuxuser.
co.uk/
Linux User and Developer Wikipedia, the free encyclopedia
http://en.wikipedia.org/wiki/Linux_
User_and_Developer
Linux User & Developer |
Linux User http://www.linuxuser.
co.uk/tag/linux-user-developer/
Gathering system
information
One of the important jobs of a system
administrator is gathering system information.
In this section we will use the SIGAR (System
Information Gatherer And Reporter) API to
demonstrate how we can gather system
information using Python. SIGAR is a very
complete API and it can provide lot of
information, including the following:
1. System memory, swap, CPU, load average,
uptime, logins.
2. Per-process memory, CPU, credential info,
state, arguments, environment, open files.
3. File system detection and metrics.
4. Network interface detection, configuration
info and metrics.
5. TCP and UDP connection tables.
6. Network route table.
Installing SIGAR
The first step is to build and install SIGAR. SIGAR
is hosted at GitHub, so make sure that you have
Git installed in your system. Then perform
the following steps to install SIGAR and its
Python bindings:
$ git clone git://github.com/
hyperic/sigar.git sigar.git
$ cd sigar.git/bindings/python
$ sudo python setup.py install
Python doesn’t
limit your choice
of interface
The Python Book 99
Work with Python
At the end you should see a output similar to
the following :
Writing /usr/local/lib/python2.6/
dist-packages/pysigar-0.1.egg-info
SIGAR is a very easy-to-use library and can be
used to get information on almost every aspect of
a system. The next example shows you how.
The following code shows the memory and the
file system information
@code: PySysInfo.py
import os
import sigar
sg = sigar.open()
mem = sg.mem()
swap = sg.swap()
fslist = sg.file_system_list()
print “==========Memory
Information==============”
print “\tTotal\tUsed\tFree”
print “Mem:\t”,\
(mem.total() / 1024), \
(mem.used() / 1024), \
(mem.free() / 1024)
print “Swap:\t”, \
(swap.total() / 1024), \
(swap.used() / 1024), \
(swap.free() / 1024)
print “RAM:\t”, mem.ram(), “MB”
print “==========File System
Information===============”
def format_size(size):
return sigar.format_size(size *
1024)
print ‘Filesystem\tSize\tUsed\
tAvail\tUse%\tMounted on\tType\n’
for fs in fslist:
dir_name = fs.dir_name()
usage = sg.file_system_
usage(dir_name)
total = usage.total()
used = total - usage.free()
avail = usage.avail()
pct = usage.use_percent() * 100
if pct == 0.0:
pct = ‘-’
print fs.dev_name(), format_
size(total), format_size(used),
format_size(avail),\
pct, dir_name, fs.sys_type_
name(), ‘/’, fs.type_name()
@Output
==========Memory
Information==============
Total
Used
Free
Mem:
8388608 6061884 2326724
100 The Python Book
Swap:
131072 16048 115024
RAM:
8192 MB
==========File System
Information===============
Filesystem
Size
Used
Avail
Use%
Mounted on
Type
/dev/disk0s2 300G 175G 124G 59.0 / hfs
/ local
devfs 191K 191K
0 - /dev devfs /
none
Accessing Secure Shell
(SSH) services
SSH (Secure Shell) is a modern replacement for an
old remote shell system called Telnet. It allows data
to be exchanged using a secure channel between
two networked devices. System administrators
frequently use SSH to administrate networked
systems. In addition to providing remote shell, SSH
is also used for secure file transfer (using SSH File
Transfer Protocol, or SFTP) and remote X server
forwarding (allows you to use SSH clients as X
server). In this section we will learn how to use the
SSH protocol from Python using a Python module
called paramiko, which implements the SSH2
protocol for Python.
paramiko can be installed using the following
steps:
$ git clone https://github.com/robey/
paramiko.git
$ cd paramiko
$ sudo python setup.py install
To
the
core
of
paramiko
is
the
SSHClient
class.
This
class
wraps L{Transport}, L{Channel}, and L{SFTPClient}
to handle most of the aspects of SSH. You can use
SSHClient as:
client = SSHClient()
client.load_system_host_keys()
client.connect(‘some.host.com’)
stdin, stdout, stderr = client.exec_
command(‘dir’)
The following example demonstrates a full SSH
client written using the paramiko module.
@code: PySSHClient.py
import base64, getpass, os, socket, sys,
socket, traceback
import paramiko
import interactive
# setup logging
paramiko.util.log_to_file(‘demo_simple.
log’)
# get hostname
username = ‘’
if len(sys.argv) > 1:
hostname = sys.argv[1]
if hostname.find(‘@’) >= 0:
username, hostname = hostname.
split(‘@’)
else:
hostname = raw_input(‘Hostname: ‘)
if len(hostname) == 0:
print ‘*** Hostname required.’
sys.exit(1)
port = 22
if hostname.find(‘:’) >= 0:
hostname, portstr = hostname.
split(‘:’)
port = int(portstr)
# get username
if username == ‘’:
default_username = getpass.
getuser()
username = raw_input(‘Username
[%s]: ‘ % default_username)
if len(username) == 0:
username = default_username
password = getpass.getpass(‘Password
for %s@%s: ‘ % (username, hostname))
# now, connect and use paramiko
Client to negotiate SSH2 across the
connection
try:
client = paramiko.SSHClient()
client.load_system_host_keys()
client.set_missing_host_key_
policy(paramiko.WarningPolicy)
print ‘*** Connecting...’
client.connect(hostname, port,
username, password)
chan = client.invoke_shell()
print repr(client.get_transport())
print ‘*** SSH Server Connected!
***’
print
interactive.interactive_
shell(chan)
chan.close()
client.close()
except Exception, e:
print ‘*** Caught exception: %s:
%s’ % (e.__class__, e)
traceback.print_exc()
try:
client.close()
except:
pass
sys.exit(1)
To run this code you will also need a custom
Python class interactive.py which implements
Note
If you are confused with the tab spacing of
the code, look for the code files on FileSilo.
Work with Python
the interactive shell for the SSH session. Look
for this file on FileSilo and copy it into the same
folder where you have created PySSHClient.py .
@code_Output
kunal@ubuntu-vm-kdeo:~/src/paramiko/
demos$ python demo_simple.py
Hostname: 192.168.1.2
Username [kunal]: luduser
Password for luduser@192.168.1.2:
*** Connecting...
*** SSH Server Connected! ***
Last login: Thu Jan 13 02:01:06 2011
from 192.168.1.9
[~ $:]
If the host key for the SSH server is not added
to your $HOME/.ssh/known_hosts file, the
client will throw the following error:
*** Caught exception: : unbound
method missing_host_key() must be
called with WarningPolicy instance
as first argument (got SSHClient
instance instead)
This means that the client cannot verify the
authenticity of the server you are connected
to. To add the host key to known_hosts, you
can use the ssh command. It is important
to remember that this is not the ideal way to
add the host key; instead you should use sshkeygen. But for simplicity’s sake we are using
the ssh client.
kunal@ubuntu-vm-kdeo:~/.ssh$ ssh
luduser@192.168.1.2
The authenticity of host
‘192.168.1.2 (192.168.1.2)’ can’t be
established.
RSA key fingerprint is be:01:76:6a:b
9:bb:69:64:e3:dc:37:00:a4:36:33:d1.
Are you sure you want to continue
connecting (yes/no)? yes
Warning: Permanently added
‘192.168.1.2’ (RSA) to the list of
known hosts.
So now you’ve seen how easy it can be to
carry out the complex sysadmin tasks using
Python’s versatile language.
As is the case with all Python coding, the
code that is presented here can fairly easily
be adopted into your GUI application (using
software such as PyGTK or PyQt) or a web
application (using a framework such as Django
or Grok).
Writing a user interface using Python
Learn how to create a user-friendly interface using Python
Administrators are comfortable with running raw
scripts by hand, but end-users are not. So if you
are writing a script that is supposed to be used by
common users, it is a good idea to create a userfriendly interface on top of the script. This way
end-users can run the scripts just like any other
application. To demonstrate this, we will create
a simple GRUB configuration tool which allows
users to select default boot entry and the timeout.
We will be creating a TUI (text user interface)
application and will use the Python module
‘snack’ to facilitate this (not to be confused with
the Python audio library, tksnack).
This app consists of two files…
grub.py: GRUB Config File (grub.conf) Parser
(available on FileSilo). It implements two main
functions, readBootDB() and writeBootFile(),
which are responsible for reading and writing the
GRUB configuration file.
grub_tui.py: Text user interface file for
manipulating the GRUB configuration file using
the functions available in grub.py.
@code:grub_tui.py
import sys
from snack import *
from grub import (readBootDB,
writeBootFile)
def main(entry_
value=’1’,kernels=[]):
try:
(default_value, entry_
value, kernels)=readBootDB()
except:
print >> sys.stderr,
(“Error reading /boot/grub/grub.
conf.”)
sys.exit(10)
screen=SnackScreen()
while True:
g=GridForm(screen, (“Boot
configuration”),1,5)
if len(kernels)>0 :
li=Listbox(height=len(kernels),
width=20, returnExit=1)
for i, x in
enumerate(kernels):
li.append(x,i)
g.add(li, 0, 0)
li.setCurrent(default_value)
bb = ButtonBar(screen,
(((“Ok”), “ok”), ((“Cancel”),
“cancel”)))
e=Entry(3, str(entry_
value))
l=Label((“Timeout (in
seconds):”))
gg=Grid(2,1)
gg.setField(l,0,0)
gg.setField(e,1,0)
g.add(Label(‘’),0,1)
g.add(gg,0,2)
g.add(Label(‘’),0,3)
g.add(bb,0,4,growx=1)
result = g.runOnce()
if
bb.buttonPressed(result) ==
‘cancel’:
screen.finish()
sys.exit(0)
else:
entry_value =
e.value()
try :
c = int(entry_
value)
break
except ValueError:
continue
writeBootFile(c,
li.current())
screen.finish()
if __name__== ‘__main__’:
main()
Start the tool using the sudo
command (as it reads the grub.
conf file)
$ sudo grub_tui.py
The Python Book 101
Work with Python
01
Full code listing
import os, sys, urllib2, argparse, datetime, atexit
from bs4 import BeautifulSoup
addresses = []
deepestAddresses = []
maxLevel = 1
storeFolder = "Wikistore " + str(datetime.datetime.now().strftime("%Y-%m-%d %H:%M"))
1 Import libraries
These are the libraries we are
going to be using for this program
02
2 Set up variables
These are some variables
we’ll use to keep track of the
script’s progress
undesirables = [ {"element" : "table", "attr" : {'class' : 'infobox'} }, {"element" : "table", "attr" : {'class'
: 'vertical-navbox'}}, {"element" : "span", "attr" : {'class' : 'mw-editsection'}}, {"element" : "div", "attr" :
{'class' : 'thumb'}}, {"element" : "sup", "attr" : {'class' : 'reference'}}, {"element" : "div", "attr" : {'class'
: 'reflist'}}, {"element" : "table", "attr" : {'class' : 'nowraplinks'}}, {"element" : "table", "attr" : {'class'
: 'ambox-Refimprove'}}, {"element" : "img", "attr" : None}, {"element" : "script", "attr" : None}, {"element" :
"table", "attr" : {'class' : 'mbox-small'}} , {"element" : "span", "attr" : {"id" : "coordinates"}}, {"element" :
"table", "attr" : {"class" : "ambox-Orphan"}}, {"element" : "div", "attr" : {"class" : "mainarticle"}}, {"element"
: None, "attr" : {"id" : "References"}} ]
def init():
parser = argparse.ArgumentParser(description='Handle the starting page and number of levels we\'re going to
scrape')
parser.add_argument('-URL', dest='link', action='store', help='The Wikipedia page from which we will start
scraping')
parser.add_argument('-levels', dest="levels", action='store', help='How many levels deep should the scraping
go')
args = parser.parse_args()
3 Initialisation
This is the initialising function
that we will use to handle the
input from the user
03
if(args.levels != None):
global maxLevel8
maxLevel = int(args.levels)
if(args.link == None):
print("You need to pass a link with the -URL flag")
sys.exit(0)
else:
if not os.path.exists(storeFolder):
os.makedirs(storeFolder)
grabPage(args.link, 0, args.link.split("/wiki/")[1].strip().replace("_", " "))
atexit.register(cleanUp)
def isValidLink(link):
Scrape Wikipedia with
Beautiful Soup
Use the Beautiful Soup Python library to parse Wikipedia’s
HTML and store it for offline reading
Resources
Beautiful Soup:
www.crummy.com/software/BeautifulSoup/
HTML5Lib:
https://github.com/html5lib/html5lib-python
Python 2.6+ & WikiParser.zip
Six: https://pypi.python.org/pypi/six/
102 The Python Book
In this tutorial we’ll use the popular Python
library Beautiful Soup to scrape Wikipedia for
links to articles and then save those pages for
offline reading. This is ideal for when travelling
or in a location with a poor internet connection.
The plan is simple: using Beautiful Soup
with the HTML5Lib Parser, we’re going to load
a Wikipedia page, remove all of the GUI and
unrelated content, search the content for links
to other Wikipedia articles and then, after a tiny
bit of modification, write them to a file.
Even though it’s now the de facto knowledge
base of the world, Wikipedia isn’t great when it
comes to DOM consistency – that is, IDs and
classes are sometimes quite loose in their
usage. Because of this, we will also cover how
to handle all of the excess bits and bobs of the
Wikipedia GUI that we don’t need, as well as the
various erroneous links that won’t be of much
use to us. You can find the CSS stylings sheet
and a Python script pertaining to this tutorial at
http://bit.ly/19MibBv.
Work with Python
Full code listing continued
4 Get the page
if "/wiki/" in link and ":" not in link and "http://"
not in link and "wikibooks" not in link and "#" not in link
and "wikiquote" not in link and "wiktionary" not in link
and "wikiversity" not in link and "wikivoyage" not in link
and "wikisource" not in link and "wikinews" not in link and
"wikiversity" not in link and "wikidata" not in link:
return True
else:
return False
Here we grab the
page we want
to store and
remove the bits
of the document
we don’t need
fileName = str(name)
if level == maxLevel:
deepestAddresses.append(fileName.
replace('/', '_') + ".html")
doctype = ""
06
head = "" +
fileName + ""
def grabPage(URL, level, name):
5 Check links
page = req.read()
f = open(storeFolder + "/" + fileName.replace('/',
'_') + ".html", 'w')
f.write(doctype + "" + head +
"