Bitstring Manual

bitstring_manual

bitstring_manual

bitstring_manual

bitstring_manual

bitstring_manual

User Manual: Pdf

Open the PDF directly: View PDF PDF.
Page Count: 133 [warning: Documents this large are best viewed by clicking the View PDF Link!]

A Python module to help you manage your bits
by Scott Griffiths
version 3.0.0
November 20, 2011
python-bitstring.googlecode.com
CONTENTS
I User Manual 1
1 Walkthrough 3
1.1 A Brief Introduction ............................... 3
1.1.1 Prerequisites .............................. 3
1.1.2 Getting started ............................. 3
1.1.3 Modifying bitstrings .......................... 5
1.1.4 Finding and Replacing ......................... 6
1.1.5 Constructing a bitstring ......................... 6
1.1.6 Parsing bitstreams ............................ 8
1.2 Worked examples ................................. 8
1.2.1 Hamming distance ........................... 9
1.2.2 Sieve of Eratosthenes .......................... 9
2 Introduction 11
2.1 Getting Started .................................. 12
3 Creation 15
3.1 The bitstring classes ............................... 15
3.2 Using the constructor ............................... 16
3.2.1 From a hexadecimal string ....................... 16
3.2.2 From a binary string .......................... 17
3.2.3 From an octal string ........................... 17
3.2.4 From an integer ............................. 17
3.2.5 Big and little-endian integers ...................... 18
3.2.6 From a floating point number ...................... 18
3.2.7 Exponential-Golomb codes ....................... 19
3.2.8 From raw byte data ........................... 19
3.2.9 From a file ................................ 20
3.3 The auto initialiser ................................ 20
4 Packing 23
4.1 Compact format strings ............................. 25
5 Interpreting Bitstrings 27
5.1 bin ........................................ 27
5.2 hex ........................................ 27
i
5.3 oct ........................................ 28
5.4 uint / uintbe / uintle / uintne ........................... 28
5.5 int / intbe / intle / intne .............................. 29
5.6 float / floatbe / floatle / floatne .......................... 29
5.7 bytes ....................................... 29
5.8 ue ......................................... 30
5.9 se ......................................... 30
5.10 uie / sie ...................................... 30
6 Slicing, Dicing and Splicing 31
6.1 Slicing ...................................... 31
6.1.1 Stepping in slices ............................ 32
6.2 Joining ...................................... 32
6.3 Truncating, inserting, deleting and overwriting ................. 33
6.3.1 Deleting and truncating ......................... 33
6.3.2 insert ................................. 33
6.3.3 overwrite .............................. 34
6.4 The bitstring as a list ............................... 34
6.5 Splitting ..................................... 34
6.5.1 split ................................. 34
6.5.2 cut ................................... 34
7 Reading, Parsing and Unpacking 37
7.1 Reading and parsing ............................... 37
7.1.1 read / readlist ......................... 37
7.1.2 Reading using format strings ...................... 38
7.1.3 Peeking ................................. 39
7.2 Unpacking .................................... 39
7.3 Seeking ...................................... 40
7.4 Finding and replacing .............................. 40
7.4.1 find / rfind ............................ 40
7.4.2 findall ................................ 41
7.4.3 replace ................................ 41
7.5 Working with byte aligned data ......................... 41
8 Miscellany 43
8.1 Other Functions ................................. 43
8.1.1 bytealign .............................. 43
8.1.2 reverse ................................ 43
8.1.3 tobytes ................................ 43
8.1.4 tofile ................................. 44
8.1.5 startswith / endswith ..................... 44
8.1.6 ror / rol .............................. 44
8.2 Special Methods ................................. 44
8.2.1 __len__ ................................ 44
8.2.2 __str__ / __repr__ ....................... 45
8.2.3 __eq__ / __ne__ ......................... 45
8.2.4 __invert__ ............................. 45
ii
8.2.5 __lshift__ / __rshift__ / __ilshift__ /
__irshift__ ............................. 46
8.2.6 __mul__ / __imul__ / __rmul__ .............. 46
8.2.7 __copy__ ............................... 46
8.2.8 __and__ / __or__ / __xor__ / __iand__ /
__ior__ / __ixor__ ....................... 47
II Reference 49
9 Quick Reference 51
9.1 Bits ........................................ 51
9.1.1 Methods ................................. 51
9.1.2 Special methods ............................. 52
9.1.3 Properties ................................ 52
9.2 BitArray ..................................... 52
9.2.1 Additional methods ........................... 53
9.2.2 Additional special methods ....................... 53
9.2.3 Attributes ................................ 53
9.3 ConstBitStream .................................. 53
9.3.1 Additional methods ........................... 53
9.3.2 Additional attributes .......................... 54
9.4 BitStream ..................................... 54
10 The bitstring module 55
10.1 The auto initialiser ................................ 55
10.2 Compact format strings ............................. 56
10.3 Class properties ................................. 57
11 The Bits class 59
12 The BitArray class 71
13 The ConstBitStream class 79
14 The BitStream class 83
15 Functions 85
16 Exceptions 87
III Appendices 89
17 Examples 93
17.1 Creation ..................................... 93
17.2 Manipulation ................................... 93
17.3 Parsing ...................................... 94
17.4 Sieve of Eratosthenes ............................... 94
iii
18 Exponential-Golomb Codes 95
18.1 Interleaved exponential-Golomb codes ..................... 96
19 Optimisation Techniques 97
19.1 Use combined read and interpretation ...................... 97
19.2 Choose the simplest class you can ........................ 97
19.3 Use dedicated functions for bit setting and checking .............. 98
20 Release Notes 99
20.1 Full Version History ............................... 99
20.1.1 November 21st 2011: version 3.0.0 released .............. 99
20.1.2 Backwardly incompatible changes ................... 99
20.1.3 New features ..............................100
20.1.4 June 18th 2011: version 2.2.0 released .................100
20.1.5 February 23rd 2011: version 2.1.1 released ..............101
20.1.6 January 23rd 2011: version 2.1.0 released ...............102
20.1.7 New class hierarchy introduced with simpler classes .........102
20.1.8 July 26th 2010: version 2.0.3 released .................103
20.1.9 July 25th 2010: version 2.0.2 released .................103
20.1.10 The backwardly incompatible changes are: ..............103
20.1.11 The new features in this release are: ..................107
20.1.12 March 18th 2010: version 1.3.0 for Python 2.6 and 3.x released . . . . 109
20.1.13 New features ..............................109
20.1.14 January 19th 2010: version 1.2.0 for Python 2.6 and 3.x released . . . 111
20.1.15 New ‘Bits’ class .............................111
20.1.16 December 22nd 2009: version 1.1.3 for Python 2.6 and 3.x released . 112
20.1.17 December 18th 2009: version 1.1.2 for Python 2.6 and 3.x released . . 112
20.1.18 November 24th 2009: version 1.1.0 for Python 2.6 and 3.x released . 113
20.1.19 New features ..............................113
20.1.20 October 10th 2009: version 1.0.1 for Python 3.x released .......114
20.1.21 October 9th 2009: version 1.0.0 for Python 2.x released ........114
20.1.22 API Changes ..............................115
20.1.23 New features ..............................117
20.1.24 September 11th 2009: version 0.5.2 for Python 2.x released ......117
20.1.25 August 26th 2009: version 0.5.1 for Python 2.x released .......118
20.1.26 July 19th 2009: version 0.5.0 for Python 2.x released .........119
20.1.27 June 15th 2009: version 0.4.3 for Python 2.x released .........121
20.1.28 May 25th 2009: version 0.4.2 for Python 2.x released .........122
20.1.29 April 23rd 2009: Python 3 only version 0.4.1 released .........123
20.1.30 April 11th 2009: version 0.4.0 released ................124
20.1.31 March 11th 2009: version 0.3.2 released ................125
20.1.32 February 26th 2009: version 0.3.1 released ..............125
20.1.33 February 15th 2009: version 0.3.0 released ..............126
20.1.34 January 13th 2009: version 0.2.0 released ...............127
20.1.35 December 29th 2008: version 0.1.0 released ..............127
iv
Part I
User Manual
1
CHAPTER
ONE
WALKTHROUGH
1.1 A Brief Introduction
The aim of the bitstring module is make dealing with binary data in Python as easy as
possible. In this section I will take you through some of the features of the module to help you
get started using it.
Only a few of the module’s features will be covered in this walkthrough; the User Manual and
Reference provide a more thorough guide. The whole of this section can be safely skipped or
skimmed over if you prefer to start with the manual. If however you’d like a gentler introduction
then you might like to follow along the examples with a Python interpreter.
1.1.1 Prerequisites
Python 2.6, 2.7 or 3.x. (If you’re using Python 2.4 or 2.5 then you can use bitstring
version 1.0, but that isn’t covered here.)
An installed bitstring module.
A rudimentory knowledge of binary concepts.
A little free time.
If you haven’t yet downloaded and installed bitstring then please do so (it might
be as easy as typing “sudo easy_install bitstring” or “sudo pip install
bitstring” depending on your system). I’ll be going through some examples using the
interactive Python interpreter, so feel free to start up a Python session and follow along.
1.1.2 Getting started
>>> from bitstring import BitArray, BitStream
First things first, we’re going to be typing ‘bitstring’ a lot, so importing directly saves us a lot of
bitstring.BitStream nonsense. The classes we have imported are BitArray which
is just a container for our binary data and BitStream which adds a bit position and reading
3
bitstring Documentation, Release 3.0.0
methods to treat the data as a stream. There are also immutable versions of both these classes
that we won’t be using here.
We can now create a couple of bitstrings:
>>> a=BitArray(’0xff01’)
>>> b=BitArray(’0b110’)
The first of these we made from the hexadecimal string 0xff01 - the 0x prefix makes it
hexadecimal just as 0b means binary and 0o means octal. Each hex digit represents four bits,
so we have a bitstring of length 16 bits.
The second was created from a binary string. In this case it is just three bits long. Don’t worry
about it not being a whole number of bytes long, that’s all been taken care of internally.
Note: Be sure to remember the quotes around the hex and binary strings. If you forget
them you would just have an ordinary Python integer, which would instead create a bitstring
of that many ‘0’ bits. For example 0xff01 is the same as the base-10 number 65281, so
BitArray(0xff01) would consist of 65281 zero bits!
There are lots of things we can do with our new bitstrings, the simplest of which is just to print
them:
>>> print(a)
0xff01
>>> print(b)
0b110
Now you would be forgiven for thinking that the strings that we used to create the two bitstrings
had just been stored to be given back when printed, but that’s not the case. Every bitstring
should be considered just as a sequence of bits. As we’ll see there are lots of ways to create and
manipulate them, but they have no memory of how they were created. When they are printed
they just pick the simplest hex or binary representation of themselves. If you prefer you can
pick the representation that you want:
>>> a.bin
’1111111100000001’
>>> b.oct
’6’
>>> b.int
-2
>>> a.bytes
’\xff\x01’
There are a few things to note here:
To get the different interpretations of the binary data we use properties such as bin,hex,
oct,int and bytes. You can probably guess what these all mean, but you don’t need
to know quite yet. The properties are calculated when you ask for them rather than being
stored as part of the object itself.
The bytes property returns a bytes object. This is slightly different in Python 2 to
4 Chapter 1. Walkthrough
bitstring Documentation, Release 3.0.0
Python 3 - in Python 3 you would get b’\xff\x01’ returned instead.
Great - let’s try some more:
>>> b.hex
bitstring.InterpretError: Cannot convert to hex unambiguously - not multiple of 4 bits.
Oh dear. The problem we have here is that bis 3 bits long, whereas each hex digit represents
4 bits. This means that there is no unambiguous way to represent it in hexadecimal. There are
similar restrictions on other interpretations (octal must be a mulitple of 3 bits, bytes a multiple
of 8 bits etc.)
An exception is raised rather than trying to guess the best hex representation as there are a
multitude of ways to convert to hex. I occasionally get asked why it doesn’t just do the ‘obvious’
conversion, which is invariably what that person expects from his own field of work. This could
be truncating bits at the start or end, or padding at the start or end with either zeros or ones.
Rather than try to guess what is meant we just raise an exception - if you want a particular
behaviour then write it explicitly:
>>> (b +[0]).hex
’c’
>>> ([0]+b).hex
’6’
Here we’ve added a zero bit first to the end and then to the start. Don’t worry too much about
how it all works, but just to give you a taster the zero bit [0] could also have been written
as BitArray([0]),BitArray([0]),BitArray(’0b0’),BitArray(bin=’0’),
’0b0’ or just 1(this final method isn’t a typo, it means construct a bitstring of length one,
with all the bits initialised to zero - it does look a bit confusing though which is why I prefer
[0] and [1] to represent single bits). Take a look at The auto initialiser for more details.
1.1.3 Modifying bitstrings
ABitArray can be treated just like a list of bits. You can slice it, delete sections, insert new
bits and more using standard index notation:
>>> print(a[3:9])
0b111110
>>> del a[-6:]
>>> print(a)
0b1111111100
The slicing works just as it does for other containers, so the deletion above removes the final
six bits.
If you ask for a single item, rather than a slice, a boolean is returned. Naturally enough 1bits
are True whereas 0bits are False.
>>> a[0]
True
>>> a[-1]
False
1.1. A Brief Introduction 5
bitstring Documentation, Release 3.0.0
To join together bitstrings you can use a variety of methods, including append,prepend,
insert, and plain +or += operations:
>>> a.prepend(’0b01’)
>>> a.append(’0o7’)
>>> a+= ’0x06’
Here we first put two bits at the start of a, then three bits on the end (a single octal digit) and
finally another byte (two hex digits) on the end.
Note how we are just using ordinary strings to specify the new bitstrings we are adding. These
get converted automatically to the right sequence of bits.
Note: The length in bits of bitstrings specified with strings depends on the number of charac-
ters, including leading zeros. So each hex character is four bits, each octal character three bits
and each binary character one bit.
1.1.4 Finding and Replacing
Afind is provided to search for bit patterns within a bitstring. You can choose whether to
search only on byte boundaries or at any bit position:
>>> a=BitArray(’0xa9f’)
>>> a.find(’0x4f’)
(3,)
Here we have found the 0x4f byte in our bitstring, though it wasn’t obvious from the hexadec-
imal as it was at bit position 3. To see this clearer consider this equality:
>>> a== ’0b101, 0x4f, 0b1’
True
in which we’ve broken the bitstring into three parts to show the found byte. This also illustrates
using commas to join bitstring sections.
1.1.5 Constructing a bitstring
Let’s say you have a specification for a binary file type (or maybe a packet specification etc.)
and you want to create a bitstring quickly and easily in Python. For this example I’m going
to use a header from the MPEG-2 video standard. Here’s how the header is described in the
standard:
6 Chapter 1. Walkthrough
bitstring Documentation, Release 3.0.0
sequence_header() No. of bits Mnemonic
sequence_header_code 32 bslbf
horizontal_size_value 12 uimsbf
vertical_size_value 12 uimsbf
aspect_ratio_information 4 uimsbf
frame_rate_code 4 uimsbf
bit_rate_value 18 uimsbf
marker_bit 1 bslbf
vbv_buffer_size_value 10 uimsbf
constrained_parameters_flag 1 bslbf
load_intra_quantiser_matrix 1 uimsbf
if (load_intra_quantiser_matrix)
{ intra_quantiser_matrix[64] } 8*64 uimsbf
load_non_intra_quantiser_matrix 1 uimsbf
if (load_non_intra_quantiser_matrix)
{ non_intra_quantiser_matrix[64] } 8*64 uimsbf
next_start_code()
The mnemonics mean things like uimsbf = ‘Unsigned integer, most significant bit first’.
So to create a sequence_header for your particular stream with width of 352 and height of 288
you could start like this:
s=BitArray()
s.append(’0x000001b3’)# the sequence_header_code
s.append(’uint:12=352’)# 12 bit unsigned integer
s.append(’uint:12=288’)
...
which is fine, but if you wanted to be a bit more concise you could just write
s=BitArray(’0x000001b3, uint:12=352, uint:12=288’)
This is better, but it might not be a good idea to have the width and height hard-wired in like
that. We can make it more flexible by using a format string and the pack function:
width, height =352,288
s=bitstring.pack(’0x000001b3, 2*uint:12’, width, height)
where we have also used 2*uint:12 as shorthand for uint:12, uint:12.
The pack function can also take a dictionary as a parameter which can replace the tokens in
the format string. For example:
fmt = ’sequence_header_code,
uint:12=horizontal_size_value,
uint:12=vertical_size_value,
uint:4=aspect_ratio_information,
...
d = {’sequence_header_code’: ’0x000001b3’,
’horizontal_size_value’: 352,
1.1. A Brief Introduction 7
bitstring Documentation, Release 3.0.0
’vertical_size_value’: 288,
’aspect_ratio_information’: 1,
...
}
s = bitstring.pack(fmt, **d)
1.1.6 Parsing bitstreams
You might have noticed that pack returned a BitStream rather than a BitArray. This
isn’t a problem as the BitStream class just adds a few stream-like qualities to BitArray
which we’ll take a quick look at here.
First, let’s look at the stream we’ve just created:
>>> s
BitStream(’0x000001b31601201’)
The stream-ness of this object is via its bit position, and various reading and peeking methods.
First let’s try a read or two, and see how this affects the bit position:
>>> s.pos
0
>>> s.read(24)
BitStream(’0x000001’)
>>> s.pos
24
>>> s.read(’hex:8’)
’b3’
>>> s.pos
32
First we read 24 bits, which returned a new BitStream object, then we used a format string
to read 8 bits interpreted as a hexadecimal string. We know that the next two sets of 12 bits
were created from integers, so to read them back we can say
>>> s.readlist(’2*uint:12’)
[352, 288]
If you don’t want to use a bitstream then you can always use unpack. This takes much the
same form as readlist except it just unpacks from the start of the bitstring. For example:
>>> s.unpack(’bytes:4, 2*uint:12, uint:4’)
[’\x00\x00\x01\xb3’, 352, 288, 1]
1.2 Worked examples
Below are a few examples of using the bitstring module, as I always find that a good example
can help more than a lengthy reference manual.
8 Chapter 1. Walkthrough
bitstring Documentation, Release 3.0.0
1.2.1 Hamming distance
The Hamming distance between two bitstrings is the number of bit positions in which the two
bitstrings differ. So for example the distance between 0b00110 and 0b01100 is 2 as the second
and fourth bits are different.
Write a function that calculates the Hamming weight of two bitstrings.
def hamming_weight(a, b):
return (a^b).count(True)
Er, that’s it. The ^is a bit-wise exclusive or, which means that the bits in a^b are only set if
they differ in aand b. The count method just counts the number of 1 (or True) bits.
>>> a=Bits(’0b00110’)
>>> hamming_weight(a, ’0b01100’)
2
1.2.2 Sieve of Eratosthenes
The sieve of Eratosthenes is an ancient (and very inefficient) method of finding prime numbers.
The algorithm starts with the number 2 (which is prime) and marks all of its multiples as not
prime, it then continues with the next unmarked integer (which will also be prime) and marks
all of its multiples as not prime.
So to print all primes under a million you could write:
from bitstring import BitArray
# create a BitArray with a million zero bits.
# The bits will be set to indicate that the bit position isn’t prime.
has_factors =BitArray(1000000)
for iin xrange(2,1000000):
if not has_factors[i]:
print(i)
# Set all multiples of our prime to 1.
has_factors.set(True,xrange(i*2,1000000, i))
I’ll leave optimising the algorithm as an exercise for the reader, but it illustrates both bit check-
ing and setting. One reason you might want to use a bitstring for this purpose (instead of a
plain list for example) is that the million bits only take up a million bits in memory, whereas
for a list of integers it would be much more. Try asking for a billion elements in a list - unless
you’ve got some really nice hardware it will fail, whereas a billion element bitstring only takes
125MB.
1.2. Worked examples 9
bitstring Documentation, Release 3.0.0
10 Chapter 1. Walkthrough
CHAPTER
TWO
INTRODUCTION
While it is not difficult to manipulate binary data in Python, for example using the struct
and array modules, it can be quite fiddly and time consuming even for quite small tasks,
especially if you are not dealing only with whole-byte data.
The bitstring module provides four classes, BitStream,BitArray,ConstBitStream
and Bits, instances of which can be constructed from integers, floats, hex, octal, binary,
strings or files, but they all just represent a string of binary digits. I shall use the general
term ‘bitstring’ when referring generically to any of the classes, and use the class names for
parts that apply to only one or another.
BitArray objects can be sliced, joined, reversed, inserted into, overwritten, packed, un-
packed etc. with simple functions or slice notation. BitStream objects can also be read
from, searched in, and navigated in, similar to a file or stream.
Bitstrings are designed to be as lightweight as possible and can be considered to be just a list
of binary digits. They are however stored efficiently - although there are a variety of ways of
creating and viewing the binary data, the bitstring itself just stores the byte data, and all views
are calculated as needed, and are not stored as part of the object.
The different views or interpretations on the data are accessed through properties such as hex,
bin and int, and an extensive set of functions is supplied for modifying, navigating and
analysing the binary data.
A complete reference for the module is given in the Reference section, while the rest of this
manual acts more like a tutorial or guided tour. Below are just a few examples to whet your
appetite; everything here will be covered in greater detail in the rest of this manual.
from bitstring import BitArray
Just some of the ways to create bitstrings:
# from a binary string
a=BitArray(’0b001’)
# from a hexadecimal string
b=BitArray(’0xff470001’)
# straight from a file
c=BitArray(filename=’somefile.ext’)
# from an integer
d=BitArray(int=540, length=11)
11
bitstring Documentation, Release 3.0.0
# using a format string
d=BitArray(’int:11=540’)
Easily construct new bitstrings:
# 5 copies of ’a’ followed by two new bytes
e=5*a+’0xcdcd’
# put a single bit on the front
e.prepend(’0b1’)
# take a slice of the first 7 bits
f=e[7:]
# replace 3 bits with 9 bits from octal string
f[1:4]=’0o775’
# find and replace 2 bit string with 16 bit string
f.replace(’0b01’,’0xee34’)
Interpret the bitstring however you want:
>>> print(e.hex)
’9249cdcd’
>>> print(e.int)
-1840656947
>>> print(e.uint)
2454310349
2.1 Getting Started
The easiest way to install bitstring is to use easy_install via:
sudo easy_install bitstring
or similar.
If you want an earlier version, or need other files in the full package, you can download and
extract the contents of the .zip provided on the project’s website.
First download the latest release (see the Downloads tab on the project’s homepage). Note that
this manual covers only the Python 2.6 and later version. Version 1.0 is available for Python
2.4 / 2.5, which can be found on the project’s homepage.
If you then extract the contents of the zip file you should find files organised in these directories
bitstring/ : The bitstring module files.
test/ : Unit tests for the module, plus some example files for testing purposes.
doc/ : This manual as a PDF and as HTML.
If you downloaded the source and want to install, run:
python setup.py install
12 Chapter 2. Introduction
bitstring Documentation, Release 3.0.0
You might need to add a ‘sudo’ to the start of that command, depending on your system. This
will copy the source files to your Python installation’s site-packages directory.
The module comes with comprehensive unit tests. To run them yourself use your favourite unit
test running method, mine is:
nosetests -w test
which should run all the tests (over 400) and say OK. If tests fail then either your version of
Python isn’t supported (you need Python 2.6, 2.7 or 3.x, though earlier versions supported 2.4
and 2.5) or something unexpected has happened - in which case please tell me about it.
2.1. Getting Started 13
bitstring Documentation, Release 3.0.0
14 Chapter 2. Introduction
CHAPTER
THREE
CREATION
You can create bitstrings in a variety of ways. Internally they are stored as byte arrays, which
means that no space is wasted, and a bitstring containing 10MB of binary data will only take
up 10MB of memory.
3.1 The bitstring classes
Four classes are provided by the bitstring module: BitStream and BitArray together with
their immutable versions ConstBitStream and Bits:
Bits (object): This is the most basic class. It is immutable and so its contents can’t
be changed after creation.
BitArray (Bits): This adds mutating methods to its base class.
ConstBitStream (Bits): This adds methods and properties to allow the bits to be
treated as a stream of bits, with a bit position and reading/parsing methods.
BitStream (BitArray, ConstBitStream): This is the most versative class,
having both the bitstream methods and the mutating methods.
Before verion 3.0 Bits was known as ConstBitArray. The old name is still available for
backward compatibility.
The term ‘bitstring’ is used in this manual to refer generically to any of these classes.
Most of the exampes in this manual use the BitArray class, with BitStream used when
necessary. For most uses the non-const classes are more versatile and so probably your best
choice when starting to use the module.
To summarise when to use each class:
If you need to change the contents of the bitstring then you must use BitArray or
BitStream. Truncating, replacing, inserting, appending etc. are not available for the
const classes.
If you need to use a bitstring as the key in a dictionary or as a member of a set then you
must use Bits or a ConstBitStream. As BitArray and BitStream objects are
mutable they do not support hashing and so cannot be used in these ways.
15
bitstring Documentation, Release 3.0.0
If you are creating directly from a file then a BitArray or BitStream will read the
file into memory whereas a Bits or ConstBitStream will not, so using the const
classes allows extremely large files to be examined.
If you don’t need the extra functionality of a particular class then the simpler ones might
be faster and more memory efficient. The fastest and most memory efficient class is
Bits.
The Bits class is the base class of the other three class. This means that isinstance(s,
Bits) will be true if sis an instance of any of the four classes.
3.2 Using the constructor
When initialising a bitstring you need to specify at most one initialiser. These will be explained
in full below, but briefly they are:
auto : Either a specially formatted string, a list or tuple, a file object, integer, bytearray,
bytes or another bitstring.
bytes : A bytes object (a str in Python 2.6), for example read from a binary file.
hex,oct,bin: Hexadecimal, octal or binary strings.
int,uint: Signed or unsigned bit-wise big-endian binary integers.
intle,uintle: Signed or unsigned byte-wise little-endian binary integers.
intbe,uintbe: Signed or unsigned byte-wise big-endian binary integers.
intne,uintne: Signed or unsigned byte-wise native-endian binary integers.
float /floatbe,floatle,floatne: Big, little and native endian floating point
numbers.
se,ue : Signed or unsigned exponential-Golomb coded integers.
sie,uie : Signed or unsigned interleaved exponential-Golomb coded integers.
bool : A boolean (i.e. True or False).
filename : Directly from a file, without reading into memory.
3.2.1 From a hexadecimal string
>>> c=BitArray(hex=’0x000001b3’)
>>> c.hex
’000001b3’
The initial 0x or 0X is optional. Whitespace is also allowed and is ignored. Note that the
leading zeros are significant, so the length of cwill be 32.
If you include the initial 0x then you can use the auto initialiser instead. As it is the first
parameter in __init__ this will work equally well:
16 Chapter 3. Creation
bitstring Documentation, Release 3.0.0
c=BitArray(’0x000001b3’)
3.2.2 From a binary string
>>> d=BitArray(bin=’0011 00’)
>>> d.bin
’001100’
An initial 0b or 0B is optional and whitespace will be ignored.
As with hex, the auto initialiser will work if the binary string is prefixed by 0b:
>>> d=BitArray(’0b001100’)
3.2.3 From an octal string
>>> o=BitArray(oct=’34100’)
>>> o.oct
’34100’
An initial 0o or 0O is optional, but 0o (a zero and lower-case ‘o’) is preferred as it is slightly
more readable.
As with hex and bin, the auto initialiser will work if the octal string is prefixed by 0o:
>>> o=BitArray(’0o34100’)
3.2.4 From an integer
>>> e=BitArray(uint=45, length=12)
>>> f=BitArray(int=-1, length=7)
>>> e.bin
’000000101101’
>>> f.bin
’1111111’
For initialisation with signed and unsigned binary integers (int and uint respectively) the
length parameter is mandatory, and must be large enough to contain the integer. So for
example if length is 8 then uint can be in the range 0 to 255, while int can range from
-128 to 127. Two’s complement is used to represent negative numbers.
The auto initialise can be used by giving a colon and the length in bits immediately after the
int or uint token, followed by an equals sign then the value:
>>> e=BitArray(’uint:12=45’)
>>> f=BitArray(’int:7=-1’)
3.2. Using the constructor 17
bitstring Documentation, Release 3.0.0
The plain int and uint initialisers are bit-wise big-endian. That is to say that the most
significant bit comes first and the least significant bit comes last, so the unsigned number one
will have a 1as its final bit with all other bits set to 0. These can be any number of bits long.
For whole-byte bitstring objects there are more options available with different endiannesses.
3.2.5 Big and little-endian integers
>>> big_endian =BitArray(uintbe=1, length=16)
>>> little_endian =BitArray(uintle=1, length=16)
>>> native_endian =BitArray(uintne=1, length=16)
There are unsigned and signed versions of three additional ‘endian’ types. The unsigned ver-
sions are used above to create three bitstrings.
The first of these, big_endian, is equivalent to just using the plain bit-wise big-endian uint
initialiser, except that all intbe or uintbe interpretations must be of whole-byte bitstrings,
otherwise a ValueError is raised.
The second, little_endian, is interpreted as least significant byte first, i.e. it is a byte
reversal of big_endian. So we have:
>>> big_endian.hex
’0001’
>>> little_endian.hex
’0100’
Finally we have native_endian, which will equal either big_endian or
little_endian, depending on whether you are running on a big or little-endian ma-
chine (if you really need to check then use import sys; sys.byteorder).
3.2.6 From a floating point number
>>> f1 =BitArray(float=10.3, length=32)
>>> f2 =BitArray(’float:64=5.4e31’)
Floating point numbers can be used for initialisation provided that the bitstring is 32 or 64 bits
long. Standard Python floating point numbers are 64 bits long, so if you use 32 bits then some
accuracy could be lost.
Note that the exact bits used to represent the floating point number could be platform dependent.
Most PCs will conform to the IEEE 754 standard, and presently other floating point represen-
tations are not supported (although they should work on a single platform - it just might get
confusing if you try to interpret a generated bitstring on another platform).
Similar to the situation with integers there are big and little endian versions. The plain float
is big endian and so floatbe is just an alias.
As with other initialisers you can also auto initialise, as demonstrated with the second example
below:
18 Chapter 3. Creation
bitstring Documentation, Release 3.0.0
>>> little_endian =BitArray(floatle=0.0, length=64)
>>> native_endian =BitArray(’floatne:32=-6.3’)
3.2.7 Exponential-Golomb codes
Initialisation with integers represented by exponential-Golomb codes is also possible. ue is
an unsigned code while se is a signed code. Interleaved exponential-Golomb codes are also
supported via uie and sie:
>>> g=BitArray(ue=12)
>>> h=BitArray(se=-402)
>>> g.bin
’0001101’
>>> h.bin
’0000000001100100101’
For these initialisers the length of the bitstring is fixed by the value it is initialised with, so
the length parameter must not be supplied and it is an error to do so. If you don’t know what
exponential-Golomb codes are then you are in good company, but they are quite interesting, so
I’ve included a section on them (see Exponential-Golomb Codes).
The auto initialiser may also be used by giving an equals sign and the value immediately after
aue or se token:
>>> g=BitArray(’ue=12’)
>>> h=BitArray(’se=-402’)
You may wonder why you would bother with auto in this case as the syntax is slightly longer.
Hopefully all will become clear in the next section.
3.2.8 From raw byte data
Using the length and offset parameters to specify the length in bits and an offset at the start to
be ignored is particularly useful when initialising from raw data or from a file.
a=BitArray(bytes=b\x00\x01\x02\xff, length=28, offset=1)
b=BitArray(bytes=open("somefile",’rb’).read())
The length parameter is optional; it defaults to the length of the data in bits (and so will be a
multiple of 8). You can use it to truncate some bits from the end of the bitstring. The offset
parameter is also optional and is used to truncate bits at the start of the data.
You can also use a bytearray object, either explicitly with a bytes=some_bytearray
keyword or via the auto initialiser:
c=BitArray(a_bytearray_object)
If you are using Python 3.x you can use this trick with bytes objects too. This should be
used with caution as in Python 2 it will instead be interpreted as a string (it’s not possible
3.2. Using the constructor 19
bitstring Documentation, Release 3.0.0
to distinguish between str and bytes in Python 2) and so your code won’t work the same
between Python versions.
d=BitArray(b\x23g$5’)# Use with caution! Only works correctly in Python 3.
3.2.9 From a file
Using the filename initialiser allows a file to be analysed without the need to read it all into
memory. The way to create a file-based bitstring is:
p=Bits(filename="my2GBfile")
This will open the file in binary read-only mode. The file will only be read as and when other
operations require it, and the contents of the file will not be changed by any operations. If only
a portion of the file is needed then the offset and length parameters (specified in bits) can
be used.
Note that we created a Bits here rather than a BitArray, as they have quite different be-
haviour in this case. The immutable Bits will never read the file into memory (except as
needed by other operations), whereas if we had created a BitArray then the whole of the file
would immediately have been read into memory. This is because in creating a BitArray you
are implicitly saying that you want to modify it, and so it needs to be in memory.
It’s also possible to use the auto initialiser for file objects. It’s as simple as:
f=open(’my2GBfile’,’rb’)
p=Bits(f)
3.3 The auto initialiser
The auto parameter is the first parameter in the __init__ function and so the auto= can
be omitted when using it. It accepts either a string, an iterable, another bitstring, an integer, a
bytearray or a file object.
Strings starting with 0x or hex: are interpreted as hexadecimal, 0o or oct: implies octal,
and strings starting with 0b or bin: are interpreted as binary. You can also initialise with the
various integer initialisers as described above. If given another bitstring it will create a copy of
it, (non string) iterables are interpreted as boolean arrays and file objects acts a source of binary
data. Finally you can use an integer to create a zeroed bitstring of that number of bits.
>>> fromhex =BitArray(’0x01ffc9’)
>>> frombin =BitArray(’0b01’)
>>> fromoct =BitArray(’0o7550’)
>>> fromint =BitArray(’int:32=10’)
>>> fromfloat =BitArray(’float:64=0.2’)
>>> acopy =BitArray(fromoct)
>>> fromlist =BitArray([1,0,0])
>>> f=open(’somefile’,’rb’)
>>> fromfile =BitArray(f)
20 Chapter 3. Creation
bitstring Documentation, Release 3.0.0
>>> zeroed =BitArray(1000)
>>> frombytes =BitArray(bytearray(b’xyz’))
It can also be used to convert between the BitArray and Bits classes:
>>> immutable =Bits(’0xabc’)
>>> mutable =BitArray(immutable)
>>> mutable += ’0xdef’
>>> immutable =Bits(mutable)
As always the bitstring doesn’t know how it was created; initialising with octal or hex might
be more convenient or natural for a particular example but it is exactly equivalent to initialising
with the corresponding binary string.
>>> fromoct.oct
’7550’
>>> fromoct.hex
’f68’
>>> fromoct.bin
’111101101000’
>>> fromoct.uint
3994
>>> fromoct.int
-152
>>> BitArray(’0o7777’)== ’0xfff’
True
>>> BitArray(’0xf’)== ’0b1111’
True
>>> frombin[::-1]+’0b0’ == fromlist
True
Note how in the final examples above only one half of the == needs to be a bitstring, the other
half gets auto initialised before the comparison is made. This is in common with many other
functions and operators.
You can also chain together string initialisers with commas, which causes the individual bit-
strings to be concatenated.
>>> s=BitArray(’0x12, 0b1, uint:5=2, ue=5, se=-1, se=4’)
>>> s.find(’uint:5=2, ue=5’)
True
>>> s.insert(’0o332, 0b11, int:23=300’,4)
Again, note how the format used in the auto initialiser can be used in many other places where
a bitstring is needed.
3.3. The auto initialiser 21
bitstring Documentation, Release 3.0.0
22 Chapter 3. Creation
CHAPTER
FOUR
PACKING
Another method of creating BitStream objects is to use the pack function. This takes a
format specifier which is a string with comma separated tokens, and a number of items to pack
according to it. It’s signature is bitstring.pack(format, *values, **kwargs).
For example using just the *values arguments we can say:
s=bitstring.pack(’hex:32, uint:12, uint:12’,’0x000001b3’,352,288)
which is equivalent to initialising as:
s=BitStream(’0x0000001b3, uint:12=352, uint:12=288’)
The advantage of the pack function is if you want to write more general code for creation.
def foo(a, b, c, d):
return bitstring.pack(’uint:8, 0b110, int:6, bin, bits’, a, b, c, d)
s1 =foo(12,5,’0b00000’,’’)
s2 =foo(101,3,’0b11011’, s1)
Note how you can use some tokens without sizes (such as bin and bits in the above ex-
ample), and use values of any length to fill them. If the size had been specified then a
ValueError would be raised if the parameter given was the wrong length. Note also how
bitstring literals can be used (the 0b110 in the bitstring returned by foo) and these don’t
consume any of the items in *values.
You can also include keyword, value pairs (or an equivalent dictionary) as the final parame-
ter(s). The values are then packed according to the positions of the keywords in the format
string. This is most easily explained with some examples. Firstly the format string needs to
contain parameter names:
format =’hex:32=start_code, uint:12=width, uint:12=height’
Then we can make a dictionary with these parameters as keys and pass it to pack:
d={’start_code’:’0x000001b3’,’width’:352,’height’:288}
s=bitstring.pack(format, **d)
Another method is to pass the same information as keywords at the end of pack’s parameter
list:
23
bitstring Documentation, Release 3.0.0
s=bitstring.pack(format, width=352, height=288, start_code=’0x000001b3’)
The tokens in the format string that you must provide values for are:
int:n n bits as a signed integer.
uint:n n bits as an unsigned integer.
intbe:n n bits as a big-endian whole byte signed integer.
uintbe:n n bits as a big-endian whole byte unsigned integer.
intle:n n bits as a little-endian whole byte signed integer.
uintle:n n bits as a little-endian whole byte unsigned integer.
intne:n n bits as a native-endian whole byte signed integer.
uintne:n n bits as a native-endian whole byte unsigned integer.
float:n n bits as a big-endian floating point number (same as floatbe).
floatbe:n n bits as a big-endian floating point number (same as float).
floatle:n n bits as a little-endian floating point number.
floatne:n n bits as a native-endian floating point number.
hex[:n] [nbits as] a hexadecimal string.
oct[:n] [nbits as] an octal string.
bin[:n] [nbits as] a binary string.
bits[:n] [nbits as] a new bitstring.
bool single bit as a boolean (True or False).
ue an unsigned integer as an exponential-Golomb code.
se a signed integer as an exponential-Golomb code.
uie an unsigned integer as an interleaved exponential-Golomb code.
sie a signed integer as an interleaved exponential-Golomb code.
and you can also include constant bitstring tokens constructed from any of the following:
0b... binary literal.
0o... octal literal.
0x... hexadecimal literal.
int:n=m signed integer min nbits.
uint:n=m unsigned integer min nbits.
intbe:n=m big-endian whole byte signed integer min nbits.
uintbe:n=m big-endian whole byte unsigned integer min nbits.
intle:n=m little-endian whole byte signed integer min nbits.
uintle:n=m little-endian whole byte unsigned integer min nbits.
intne:n=m native-endian whole byte signed integer min nbits.
uintne:n=m native-endian whole byte unsigned integer min nbits.
float:n=f big-endian floating point number fin nbits.
floatbe:n=f big-endian floating point number fin nbits.
floatle:n=f little-endian floating point number fin nbits.
floatne:n=f native-endian floating point number fin nbits.
ue=m exponential-Golomb code for unsigned integer m.
se=m exponential-Golomb code for signed integer m.
uie=m interleaved exponential-Golomb code for unsigned integer m.
sie=m interleaved exponential-Golomb code for signed integer m.
bool=b a single bit, either True or False.
24 Chapter 4. Packing
bitstring Documentation, Release 3.0.0
You can also use a keyword for the length specifier in the token, for example:
s=bitstring.pack(’int:n=-1’, n=100)
And finally it is also possible just to use a keyword as a token:
s=bitstring.pack(’hello, world’, world=’0x123’, hello=’0b110’)
As you would expect, there is also an unpack function that takes a bitstring and unpacks it
according to a very similar format string. This is covered later in more detail, but a quick
example is:
>>> s=bitstring.pack(’ue, oct:3, hex:8, uint:14’,3,’0o7’,’0xff’,90)
>>> s.unpack(’ue, oct:3, hex:8, uint:14’)
[3, ’7’, ’ff’, 90]
4.1 Compact format strings
Another option when using pack, as well as other methods such as read and byteswap,
is to use a format specifier similar to those used in the struct and array modules. These
consist of a character to give the endianness, followed by more single characters to give the
format.
The endianness character must start the format string and unlike in the struct module it is not
optional (except when used with byteswap):
>Big-endian
<Little-endian
@Native-endian
For ‘network’ endianness use >as network and big-endian are equivalent. This is followed by
at least one of these format characters:
b8 bit signed integer
B8 bit unsigned integer
h16 bit signed integer
H16 bit unsigned integer
l32 bit signed integer
L32 bit unsigned integer
q64 bit signed integer
Q64 bit unsigned integer
f32 bit floating point number
d64 bit floating point number
The exact type is determined by combining the endianness character with the format character,
but rather than give an exhaustive list a single example should explain:
>h Big-endian 16 bit signed integer intbe:16
<h Little-endian 16 bit signed integer intle:16
@h Native-endian 16 bit signed integer intne:16
4.1. Compact format strings 25
bitstring Documentation, Release 3.0.0
As you can see all three are signed integers in 16 bits, the only difference is the endianness.
The native-endian @h will equal the big-endian >h on big-endian systems, and equal the little-
endian <h on little-endian systems. For the single byte codes band Bthe endianness doesn’t
make any difference, but you still need to specify one so that the format string can be parsed
correctly.
An example:
s=bitstring.pack(’>qqqq’,10,11,12,13)
is equivalent to
s=bitstring.pack(’intbe:64, intbe:64, intbe:64, intbe:64’,10,11,12,13)
Just as in the struct module you can also give a multiplicative factor before the format character,
so the previous example could be written even more concisely as
s=bitstring.pack(’>4q’,10,11,12,13)
You can of course combine these format strings with other initialisers, even mixing endian-
nesses (although I’m not sure why you’d want to):
s=bitstring.pack(’>6h3b, 0b1, <9L’,*range(18))
This rather contrived example takes the numbers 0 to 17 and packs the first 6 as signed big-
endian 2-byte integers, the next 3 as single bytes, then inserts a single 1 bit, before packing the
remaining 9 as little-endian 4-byte unsigned integers.
26 Chapter 4. Packing
CHAPTER
FIVE
INTERPRETING BITSTRINGS
Bitstrings don’t know or care how they were created; they are just collections of bits. This
means that you are quite free to interpret them in any way that makes sense.
Several Python properties are used to create interpretations for the bitstring. These properties
call private functions which will calculate and return the appropriate interpretation. These don’t
change the bitstring in any way and it remains just a collection of bits. If you use the property
again then the calculation will be repeated.
Note that these properties can potentially be very expensive in terms of both computation and
memory requirements. For example if you have initialised a bitstring from a 10 GB file object
and ask for its binary string representation then that string will be around 80 GB in size!
For the properties described below we will use these:
>>> a=BitArray(’0x123’)
>>> b=BitArray(’0b111’)
5.1 bin
The most fundamental interpretation is perhaps as a binary string (a ‘bitstring’). The bin
property returns a string of the binary representation of the bitstring. All bitstrings can use this
property and it is used to test equality between bitstrings.
>>> a.bin
’000100100011’
>>> b.bin
’111’
Note that the initial zeros are significant; for bitstrings the zeros are just as important as the
ones!
5.2 hex
For whole-byte bitstrings the most natural interpretation is often as hexadecimal, with each
byte represented by two hex digits.
27
bitstring Documentation, Release 3.0.0
If the bitstring does not have a length that is a multiple of four bits then an InterpretError
exception will be raised. This is done in preference to truncating or padding the value, which
could hide errors in user code.
>>> a.hex
’123’
>>> b.hex
ValueError: Cannot convert to hex unambiguously - not multiple of 4 bits.
5.3 oct
For an octal interpretation use the oct property.
If the bitstring does not have a length that is a multiple of three then an InterpretError
exception will be raised.
>>> a.oct
’0443’
>>> b.oct
’7’
>>> (b +’0b0’).oct
ValueError: Cannot convert to octal unambiguously - not multiple of 3 bits.
5.4 uint / uintbe / uintle / uintne
To interpret the bitstring as a binary (base-2) bit-wise big-endian unsigned integer (i.e. a non-
negative integer) use the uint property.
>>> a.uint
283
>>> b.uint
7
For byte-wise big-endian, little-endian and native-endian interpretations use uintbe,
uintle and uintne respectively. These will raise a ValueError if the bitstring is not
a whole number of bytes long.
>>> s=BitArray(’0x000001’)
>>> s.uint # bit-wise big-endian
1
>>> s.uintbe # byte-wise big-endian
1
>>> s.uintle # byte-wise little-endian
65536
>>> s.uintne # byte-wise native-endian (will be 1 on a big-endian platform!)
65536
28 Chapter 5. Interpreting Bitstrings
bitstring Documentation, Release 3.0.0
5.5 int / intbe / intle / intne
For a two’s complement interpretation as a base-2 signed integer use the int property. If the
first bit of the bitstring is zero then the int and uint interpretations will be equal, otherwise
the int will represent a negative number.
>>> a.int
283
>>> b.int
-1
For byte-wise big, little and native endian signed integer interpretations use intbe,intle
and intne respectively. These work in the same manner as their unsigned counterparts de-
scribed above.
5.6 float / floatbe / floatle / floatne
For a floating point interpretation use the float property. This uses your machine’s underlying
floating point representation and will only work if the bitstring is 32 or 64 bits long.
Different endiannesses are provided via floatle and floatne. Note that as floating point
interpretations are only valid on whole-byte bitstrings there is no difference between the bit-
wise big-endian float and the byte-wise big-endian floatbe.
Note also that standard floating point numbers in Python are stored in 64 bits, so use this size
if you wish to avoid rounding errors.
5.7 bytes
A common need is to retrieve the raw bytes from a bitstring for further processing or for writ-
ing to a file. For this use the bytes interpretation, which returns a bytes object (which is
equivalent to an ordinary str in Python 2.6/2.7).
If the length of the bitstring isn’t a multiple of eight then a ValueError will be raised. This is
because there isn’t an unequivocal representation as bytes. You may prefer to use the method
tobytes as this will be pad with between one and seven zero bits up to a byte boundary if
neccessary.
>>> open(’somefile’,’wb’).write(a.tobytes())
>>> open(’anotherfile’,’wb’).write((’0x0’+a).bytes)
>>> a1 =BitArray(filename=’somefile’)
>>> a1.hex
’1230’
>>> a2 =BitArray(filename=’anotherfile’)
>>> a2.hex
’0123’
5.5. int / intbe / intle / intne 29
bitstring Documentation, Release 3.0.0
Note that the tobytes method automatically padded with four zero bits at the end, whereas
for the other example we explicitly padded at the start to byte align before using the bytes
property.
5.8 ue
The ue property interprets the bitstring as a single unsigned exponential-Golomb code and
returns an integer. If the bitstring is not exactly one code then an InterpretError is raised
instead. If you instead wish to read the next bits in the stream and interpret them as a code
use the read function with a ue format string. See Exponential-Golomb Codes for a short
explanation of this type of integer representation.
>>> s=BitArray(ue=12)
>>> s.bin
’0001101’
>>> s.append(BitArray(ue=3))
>>> print(s.readlist(’2*ue’))
[12, 3]
5.9 se
The se property does much the same as ue and the provisos there all apply. The obvious
difference is that it interprets the bitstring as a signed exponential-Golomb rather than unsigned
- see Exponential-Golomb Codes for more information.
>>> s=BitArray(’0x164b’)
>>> s.se
InterpretError: BitArray, is not a single exponential-Golomb code.
>>> while s.pos <s.length:
... print(s.read(’se’))
-5
2
0
-1
5.10 uie / sie
A slightly different type, interleaved exponential-Golomb codes are also supported. The prin-
ciples are the same as with ue and se - see Exponential-Golomb Codes for detail of the differ-
ences.
30 Chapter 5. Interpreting Bitstrings
CHAPTER
SIX
SLICING, DICING AND SPLICING
Manipulating binary data can be a bit of a challenge in Python. One of its strengths is that you
don’t have to worry about the low level data, but this can make life difficult when what you
care about is precisely the thing that is safely hidden by high level abstractions.
In this section some more methods are described that treat data as a series of bits, rather than
bytes.
6.1 Slicing
Slicing takes three arguments: the first position you want, one past the last position you want
and a multiplicative factor which defaults to 1.
The third argument (the ‘step’) will be described shortly, but most of the time you’ll probably
just need the bit-wise slice, where for example a[10:12] will return a 2-bit bitstring of the
10th and 11th bits in a, and a[32] will return just the 32nd bit.
>>> a=BitArray(’0b00011110’)
>>> b=a[3:7]
>>> print(a, b)
0x1e 0xf
For single bit indices (as opposed to slices) a boolean is returned; that is True for ‘1’ bits and
False for ‘0’ bits:
>>> a[0]
False
>>> a[4]
True
If you want a single bit as a new bitstring then use a one-bit slice instead:
>>> a[0:1]
BitArray(’0b0’)
Indexing also works for missing and negative arguments, just as it does for other containers.
31
bitstring Documentation, Release 3.0.0
>>> a=BitArray(’0b00011110’)
>>> print(a[:5]) # first 5 bits
0b00011
>>> print(a[3:]) # everything except first 3 bits
0b11110
>>> print(a[-4:]) # final 4 bits
0xe
>>> print(a[:-1]) # everything except last bit
0b0001111
>>> print(a[-6:-4]) # from 6 from the end to 4 from the end
0b01
6.1.1 Stepping in slices
The step parameter (also known as the stride) can be used in slices and has the same meaning
as in the built-in containers:
>>> s=BitArray(16)
>>> s[::2]=[1]*8
>>> s.bin
’1010101010101010’
>>> del s[8::2]
>>> s.bin
’101010100000’
>>> s[::3].bin
’1010’
Negative slices are also allowed, and should do what you’d expect. So for example s[::-1]
returns a bit-reversed copy of s(which is similar to using s.reverse(), which does the
same operation on sin-place).
6.2 Joining
To join together a couple of bitstring objects use the +or += operators, or the append and
prepend methods.
# Six ways of creating the same BitArray:
a1 =BitArray(bin=’000’)+BitArray(hex=’f’)
a2 =BitArray(’0b000’)+BitArray(’0xf’)
a3 =BitArray(’0b000’)+’0xf’
a4 =BitArray(’0b000’)
a4.append(’0xf’)
a5 =BitArray(’0xf’)
a5.prepend(’0b000’)
a6 =BitArray(’0b000’)
a6 += ’0xf’
32 Chapter 6. Slicing, Dicing and Splicing
bitstring Documentation, Release 3.0.0
Note that the final three methods all modify a bitstring, and so will only work with BitArray
objects, not the immutable Bits objects.
If you want to join a large number of bitstrings then the method join can be used to improve
efficiency and readability. It works like the ordinary string join function in that it uses the
bitstring that it is called on as a separator when joining the list of bitstring objects it is given. If
you don’t want a separator then it can be called on an empty bitstring.
bslist =[BitArray(uint=n, length=12)for nin xrange(1000)]
s=BitArray(’0b1111’).join(bslist)
6.3 Truncating, inserting, deleting and overwriting
The functions in this section all modify the bitstring that they operate on and so are not available
for Bits objects.
6.3.1 Deleting and truncating
To delete bits just use del as you would with any other container:
>>> a=BitArray(’0b00011000’)
>>> del a[3:5]# remove 2 bits at pos 3
>>> a.bin
‘000000’
>>> b=BitArray(’0x112233445566’)
>>> del b[24:40]
>>> b.hex
’11223366’
You can of course use this to truncate the start or end bits just as easily:
>>> a=BitArray(’0x001122’)
>>> del a[-8:] # remove last 8 bits
>>> del a[:8]# remove first 8 bits
>>> a== ’0x11’
True
6.3.2 insert
As you might expect, insert takes one BitArray and inserts it into another. A bit position
must be specified for BitArray and Bits, but for BitStreams if not present then the current
pos is used.
>>> a=BitArray(’0x00112233’)
>>> a.insert(’0xffff’,16)
>>> a.hex
’0011ffff2233’
6.3. Truncating, inserting, deleting and overwriting 33
bitstring Documentation, Release 3.0.0
6.3.3 overwrite
overwrite does much the same as insert, but predictably the BitArray object’s data is
overwritten by the new data.
>>> a=BitStream(’0x00112233’)
>>> a.pos =4
>>> a.overwrite(’0b1111’)# Uses current pos as default
>>> a.hex
’0f112233’
6.4 The bitstring as a list
If you treat a bitstring object as a list whose elements are all either ‘1’ or ‘0’ then you won’t
go far wrong. The table below gives some of the equivalent ways of using methods and the
standard slice notation.
Using functions Using slices
s.insert(bs, pos) s[pos:pos] = bs
s.overwrite(bs, pos) s[pos:pos + bs.len] = bs
s.append(bs) s[s.len:s.len] = bs
s.prepend(bs) s[0:0] = bs
6.5 Splitting
6.5.1 split
Sometimes it can be very useful to use a delimiter to split a bitstring into sections. The split
method returns a generator for the sections.
>>> a=BitArray(’0x4700004711472222’)
>>> for sin a.split(’0x47’, bytealigned=True):
... print(s.hex)
470000
4711
472222
Note that the first item returned is always the bitstring before the first occurrence of the delim-
iter, even if it is empty.
6.5.2 cut
If you just want to split into equal parts then use the cut method. This takes a number of bits
as its first argument and returns a generator for chunks of that size.
34 Chapter 6. Slicing, Dicing and Splicing
bitstring Documentation, Release 3.0.0
>>> a=BitArray(’0x47001243’)
>>> for byte in a.cut(8):
... print(byte.hex)
47
00
12
43
6.5. Splitting 35
bitstring Documentation, Release 3.0.0
36 Chapter 6. Slicing, Dicing and Splicing
CHAPTER
SEVEN
READING, PARSING AND UNPACKING
7.1 Reading and parsing
The BitStream and ConstBitStream classes contain number of methods for reading the
bitstring as if it were a file or stream. Depending on how it was constructed the bitstream might
actually be contained in a file rather than stored in memory, but these methods work for either
case.
In order to behave like a file or stream, every bitstream has a property pos which is the current
position from which reads occur. pos can range from zero (its value on construction) to the
length of the bitstream, a position from which all reads will fail as it is past the last bit. Note
that the pos property isn’t considered a part of the bitstream’s identity; this allows it to vary
for immutable ConstBitStream objects and means that it doesn’t affect equality or hash
values.
The property bytepos is also available, and is useful if you are only dealing with byte
data and don’t want to always have to divide the bit position by eight. Note that if you try
to use bytepos and the bitstring isn’t byte aligned (i.e. pos isn’t a multiple of 8) then a
ByteAlignError exception will be raised.
7.1.1 read / readlist
For simple reading of a number of bits you can use read with an integer argument. A new
bitstring object gets returned, which can be interpreted using one of its properties or used for
further reads. The following example does some simple parsing of an MPEG-1 video stream
(the stream is provided in the test directory if you downloaded the source archive).
>>> s=ConstBitStream(filename=’test/test.m1v’)
>>> print(s.pos)
0
>>> start_code =s.read(32).hex
>>> width =s.read(12).uint
>>> height =s.read(12).uint
>>> print(start_code, width, height, s.pos)
000001b3 352 288 56
>>> s.pos += 37
37
bitstring Documentation, Release 3.0.0
>>> flags =s.read(2)
>>> constrained_parameters_flag =flags.read(1)
>>> load_intra_quantiser_matrix =flags.read(1)
>>> print(s.pos, flags.pos)
95 2
If you want to read multiple items in one go you can use readlist. This can take an iterable
of bit lengths and return a list of bitstring objects. So for example instead of writing:
a=s.read(32)
b=s.read(8)
c=s.read(24)
you can equivalently use just:
a,b,c=s.readlist([32,8,24])
7.1.2 Reading using format strings
The read /readlist methods can also take a format string similar to that used in the auto
initialiser. Only one token should be provided to read and a single value is returned. To read
multiple tokens use readlist, which unsurprisingly returns a list.
The format string consists of comma separated tokens that describe how to interpret the next
bits in the bitstring. The tokens are:
int:n n bits as a signed integer.
uint:n n bits as an unsigned integer.
intbe:n n bits as a byte-wise big-endian signed integer.
uintbe:n n bits as a byte-wise big-endian unsigned integer.
intle:n n bits as a byte-wise little-endian signed integer.
uintle:n n bits as a byte-wise little-endian unsigned integer.
intne:n n bits as a byte-wise native-endian signed integer.
uintne:n n bits as a byte-wise native-endian unsigned integer.
float:n n bits as a big-endian floating point number (same as floatbe).
floatbe:n n bits as a big-endian floating point number (same as float).
floatle:n n bits as a little-endian floating point number.
floatne:n n bits as a native-endian floating point number.
hex:n n bits as a hexadecimal string.
oct:n n bits as an octal string.
bin:n n bits as a binary string.
bits:n n bits as a new bitstring.
bytes:n n bytes as a bytes object.
ue next bits as an unsigned exponential-Golomb code.
se next bits as a signed exponential-Golomb code.
uie next bits as an interleaved unsigned exponential-Golomb code.
sie next bits as an interleaved signed exponential-Golomb code.
bool next bits as a boolean (True or False).
38 Chapter 7. Reading, Parsing and Unpacking
bitstring Documentation, Release 3.0.0
So in the earlier example we could have written:
start_code =s.read(’hex:32’)
width =s.read(’uint:12’)
height =s.read(’uint:12’)
and we also could have combined the three reads as:
start_code, width, height =s.readlist(’hex:32, 12, 12’)
where here we are also taking advantage of the default uint interpretation for the second and
third tokens.
You are allowed to use one ‘stretchy’ token in a readlist. This is a token without a length
specified which will stretch to fill encompass as many bits as possible. This is often useful
when you just want to assign something to ‘the rest’ of the bitstring:
a, b, everthing_else =s.readlist(’intle:16, intle:24, bits’)
In this example the bits token will consist of everything left after the first two tokens are read,
and could be empty.
It is an error to use more than one stretchy token, or to use a ue,se,uie or se token after
a stretchy token (the reason you can’t use exponential-Golomb codes after a stretchy token is
that the codes can only be read forwards; that is you can’t ask “if this code ends here, where
did it begin?” as there could be many possible answers).
7.1.3 Peeking
In addition to the read methods there are matching peek methods. These are identical to the
read except that they do not advance the position in the bitstring to after the read elements.
s=ConstBitStream(’0x4732aa34’)
if s.peek(8)== ’0x47’:
t=s.read(16)# t is first 2 bytes ’0x4732’
else:
s.find(’0x47’)
7.2 Unpacking
The unpack method works in a very similar way to readlist. The major difference is that
it interprets the whole bitstring from the start, and takes no account of the current pos. It’s a
natural complement of the pack function.
s=pack(’uint:10, hex, int:13, 0b11’,130,’3d’,-23)
a,b,c,d=s.unpack(’uint:10, hex, int:13, bin:2’)
7.2. Unpacking 39
bitstring Documentation, Release 3.0.0
7.3 Seeking
The properties pos and bytepos are available for getting and setting the position, which is
zero on creation of the bitstring.
Note that you can only use bytepos if the position is byte aligned, i.e. the bit position is a
multiple of 8. Otherwise a ByteAlignError exception is raised.
For example:
>>> s=BitStream(’0x123456’)
>>> s.pos
0
>>> s.bytepos += 2
>>> s.pos # note pos verses bytepos
16
>>> s.pos += 4
>>> print(s.read(’bin:4’)) # the final nibble ’0x6’
0110
7.4 Finding and replacing
7.4.1 find / rfind
To search for a sub-string use the find method. If the find succeeds it will set the position
to the start of the next occurrence of the searched for string and return a tuple containing that
position, otherwise it will return an empty tuple. By default the sub-string will be found at any
bit position - to allow it to only be found on byte boundaries set bytealigned=True.
>>> s=ConstBitStream(’0x00123400001234’)
>>> found =s.find(’0x1234’, bytealigned=True)
>>> print(found, s.bytepos)
(8,) 1
>>> found =s.find(’0xff’, bytealigned=True)
>>> print(found, s.bytepos)
() 1
The reason for returning the bit position in a tuple is so that the return value is True in a
boolean sense if the sub-string is found, and False if it is not (if just the bit position were
returned there would be a problem with finding at position 0). The effect is that you can use if
s.find(...): and have it behave as you’d expect.
rfind does much the same as find, except that it will find the last occurrence, rather than
the first.
>>> t=BitArray(’0x0f231443e8’)
>>> found =t.rfind(’0xf’)# Search all bit positions in reverse
>>> print(found)
(31,) # Found within the 0x3e near the end
40 Chapter 7. Reading, Parsing and Unpacking
bitstring Documentation, Release 3.0.0
For all of these finding functions you can optionally specify a start and / or end to narrow
the search range. Note though that because it’s searching backwards rfind will start at end
and end at start (so you always need start <end).
7.4.2 findall
To find all occurrences of a bitstring inside another (even overlapping ones), use findall.
This returns a generator for the bit positions of the found strings.
>>> r=BitArray(’0b011101011001’)
>>> ones =r.findall([1])
>>> print(list(ones))
[1, 2, 3, 5, 7, 8, 11]
7.4.3 replace
To replace all occurrences of one BitArray with another use replace. The replacements
are done in-place, and the number of replacements made is returned. This methods changes the
contents of the bitstring and so isn’t available for the Bits or ConstBitStream classes.
>>> s=BitArray(’0b110000110110’)
>>> s.replace(’0b110’,’0b1111’)
3 # The number of replacements made
>>> s.bin
’111100011111111’
7.5 Working with byte aligned data
The emphasis with the bitstring module is always towards not worrying if things are a whole
number of bytes long or are aligned on byte boundaries. Internally the module has to worry
about this quite a lot, but the user shouldn’t have to care. To this end methods such as find,
findall,split and replace by default aren’t concerned with looking for things only on
byte boundaries and provide a parameter bytealigned which can be set to True to change
this behaviour.
This works fine, but it’s not uncommon to be working only with whole-byte data and all the
bytealigned=True can get a bit repetitive. To solve this it is possible to change the default
throughout the module by setting bitstring.bytealigned. For example:
>>> s=BitArray(’0xabbb’)
>>> s.find(’0xbb’)# look for the byte 0xbb
(4,) # found, but not on byte boundary
>>> s.find(’0xbb’, bytealigned=True)# try again...
(8,) # not found on any byte boundaries
>>> bitstring.bytealigned =True # change the default behaviour
>>> s.find(’0xbb’)
(8,) # now only finds byte aligned
7.5. Working with byte aligned data 41
bitstring Documentation, Release 3.0.0
42 Chapter 7. Reading, Parsing and Unpacking
CHAPTER
EIGHT
MISCELLANY
8.1 Other Functions
8.1.1 bytealign
bytealign advances between zero and seven bits to make the pos a multiple of eight. It
returns the number of bits advanced.
>>> a=BitStream(’0x11223344’)
>>> a.pos =1
>>> skipped =a.bytealign()
>>> print(skipped, a.pos)
7 8
>>> skipped =a.bytealign()
>>> print(skipped, a.pos)
0 8
8.1.2 reverse
This simply reverses the bits of the BitArray in place. You can optionally specify a range of
bits to reverse.
>>> a=BitArray(’0b000001101’)
>>> a.reverse()
>>> a.bin
’101100000’
>>> a.reverse(0,4)
>>> a.bin
’110100000’
8.1.3 tobytes
Returns the byte data contained in the bitstring as a bytes object (equivalent to a str if you’re
using Python 2.6). This differs from using the plain bytes property in that if the bitstring isn’t
a whole number of bytes long then it will be made so by appending up to seven zero bits.
43
bitstring Documentation, Release 3.0.0
>>> BitArray(’0b1’).tobytes()
’\x80’
8.1.4 tofile
Writes the byte data contained in the bitstring to a file. The file should have been opened in a
binary write mode, for example:
>>> f=open(’newfile’,’wb’)
>>> BitArray(’0xffee3241fed’).tofile(f)
In exactly the same manner as with tobytes, up to seven zero bits will be appended to make
the file a whole number of bytes long.
8.1.5 startswith / endswith
These act like the same named functions on strings, that is they return True if the bitstring
starts or ends with the parameter given. Optionally you can specify a range of bits to use.
>>> s=BitArray(’0xef133’)
>>> s.startswith(’0b111011’)
True
>>> s.endswith(’0x4’)
False
8.1.6 ror / rol
To rotate the bits in a BitArray use ror and rol for right and left rotations respectively.
The changes are done in-place.
>>> s=BitArray(’0x00001’)
>>> s.rol(6)
>>> s.hex
’00040’
8.2 Special Methods
A few of the special methods have already been covered, for example __add__ and
__iadd__ (the +and += operators) and __getitem__ and __setitem__ (reading and
setting slices via []). Here are some more:
8.2.1 __len__
This implements the len function and returns the length of the bitstring in bits.
44 Chapter 8. Miscellany
bitstring Documentation, Release 3.0.0
It’s recommended that you use the len property instead of the function as a limitation of
Python means that the function will raise an OverflowError if the bitstring has more than
sys.maxsize elements (that’s typically 256MB of data with 32-bit Python).
There’s not much more to say really, except to emphasise that it is always in bits and never
bytes.
>>> len(BitArray(’0x00’))
8
8.2.2 __str__ / __repr__
These get called when you try to print a bitstring. As bitstrings have no preferred interpretation
the form printed might not be what you want - if not then use the hex,bin,int etc. properties.
The main use here is in interactive sessions when you just want a quick look at the bitstring.
The __repr__ tries to give a code fragment which if evaluated would give an equal bitstring.
The form used for the bitstring is generally the one which gives it the shortest representation.
If the resulting string is too long then it will be truncated with ... - this prevents very long
bitstrings from tying up your interactive session while they print themselves.
>>> a=BitArray(’0b1111 111’)
>>> print(a)
0b1111111
>>> a
BitArray(’0b1111111’)
>>> a+= ’0b1’
>>> print(a)
0xff
>>> print(a.bin)
11111111
8.2.3 __eq__ / __ne__
The equality of two bitstring objects is determined by their binary representations being equal.
If you have a different criterion you wish to use then code it explicitly, for example a.int ==
b.int could be true even if a == b wasn’t (as they could be different lengths).
>>> BitArray(’0b0010’)== ’0x2’
True
>>> BitArray(’0x2’)!= ’0o2’
True
8.2.4 __invert__
To get a bit-inverted copy of a bitstring use the ~operator:
8.2. Special Methods 45
bitstring Documentation, Release 3.0.0
>>> a=BitArray(’0b0001100111’)
>>> print(a)
0b0001100111
>>> print(~a)
0b1110011000
>>> ~~a== a
True
8.2.5 __lshift__ / __rshift__ / __ilshift__ /
__irshift__
Bitwise shifts can be achieved using <<,>>,<<= and >>=. Bits shifted off the left or right
are replaced with zero bits. If you need special behaviour, such as keeping the sign of two’s
complement integers then do the shift on the property instead, for example use a.int >>=
2.
>>> a=BitArray(’0b10011001’)
>>> b=a<< 2
>>> print(b)
0b01100100
>>> a>>= 2
>>> print(a)
0b00100110
8.2.6 __mul__ / __imul__ / __rmul__
Multiplication of a bitstring by an integer means the same as it does for ordinary strings: con-
catenation of multiple copies of the bitstring.
>>> a=BitArray(’0b10’)*8
>>> print(a.bin)
1010101010101010
8.2.7 __copy__
This allows the bitstring to be copied via the copy module.
>>> import copy
>>> a=Bits(’0x4223fbddec2231’)
>>> b=copy.copy(a)
>>> b== a
True
>>> bis a
False
It’s not terribly exciting, and isn’t the only method of making a copy. Using b =
BitArray(a) is another option, but b = a[:] may be more familiar to some.
46 Chapter 8. Miscellany
bitstring Documentation, Release 3.0.0
8.2.8 __and__ / __or__ / __xor__ / __iand__ / __ior__
/ __ixor__
Bit-wise AND, OR and XOR are provided for bitstring objects of equal length only (otherwise
aValueError is raised).
>>> a=BitArray(’0b00001111’)
>>> b=BitArray(’0b01010101’)
>>> print((a&b).bin)
00000101
>>> print((a|b).bin)
01011111
>>> print((a^b).bin)
01010000
>>> b&= ’0x1f’
>>> print(b.bin)
00010101
8.2. Special Methods 47
bitstring Documentation, Release 3.0.0
48 Chapter 8. Miscellany
Part II
Reference
49
CHAPTER
NINE
QUICK REFERENCE
This section lists the bitstring module’s classes together with all their methods and attributes.
The next section goes into full detail with examples.
9.1 Bits
Bits(object)
ABits is the most basic class. It is immutable, so once created its value cannot change. It is
a base class for all the other classes in the bitstring module.
9.1.1 Methods
all – Check if all specified bits are set to 1 or 0.
any – Check if any of specified bits are set to 1 or 0.
count – Count the number of bits set to 1 or 0.
cut – Create generator of constant sized chunks.
endswith – Return whether the bitstring ends with a sub-bitstring.
find – Find a sub-bitstring in the current bitstring.
findall – Find all occurences of a sub-bitstring in the current bitstring.
join – Join bitstrings together using current bitstring.
rfind – Seek backwards to find a sub-bitstring.
split – Create generator of chunks split by a delimiter.
startswith – Return whether the bitstring starts with a sub-bitstring.
tobytes – Return bitstring as bytes, padding if needed.
tofile – Write bitstring to file, padding if needed.
unpack – Interpret bits using format string.
51
bitstring Documentation, Release 3.0.0
9.1.2 Special methods
Also available are the operators [],==,!=,+,*,~,<<,>>,&,|and ^.
9.1.3 Properties
bin – The bitstring as a binary string.
bool – For single bit bitstrings, interpret as True or False.
bytes – The bitstring as a bytes object.
float – Interpret as a floating point number.
floatbe – Interpret as a big-endian floating point number.
floatle – Interpret as a little-endian floating point number.
floatne – Interpret as a native-endian floating point number.
hex – The bitstring as a hexadecimal string.
int – Interpret as a two’s complement signed integer.
intbe – Interpret as a big-endian signed integer.
intle – Interpret as a little-endian signed integer.
intne – Interpret as a native-endian signed integer.
len – Length of the bitstring in bits.
oct – The bitstring as an octal string.
se – Interpret as a signed exponential-Golomb code.
ue – Interpret as an unsigned exponential-Golomb code.
sie – Interpret as a signed interleaved exponential-Golomb code.
uie – Interpret as an unsigned interleaved exponential-Golomb code.
uint – Interpret as a two’s complement unsigned integer.
uintbe – Interpret as a big-endian unsigned integer.
uintle – Interpret as a little-endian unsigned integer.
uintne – Interpret as a native-endian unsigned integer.
9.2 BitArray
BitArray(Bits)
This class adds mutating methods to Bits.
52 Chapter 9. Quick Reference
bitstring Documentation, Release 3.0.0
9.2.1 Additional methods
append – Append a bitstring.
byteswap – Change byte endianness in-place.
insert – Insert a bitstring.
invert – Flip bit(s) between one and zero.
overwrite – Overwrite a section with a new bitstring.
prepend – Prepend a bitstring.
replace – Replace occurences of one bitstring with another.
reverse – Reverse bits in-place.
rol – Rotate bits to the left.
ror – Rotate bits to the right.
set – Set bit(s) to 1 or 0.
9.2.2 Additional special methods
Mutating operators are available: [],<<=,>>=,*=,&=,|= and ^=.
9.2.3 Attributes
The same as Bits, except that they are all (with the exception of len) writable as
well as readable.
9.3 ConstBitStream
ConstBitStream(Bits)
This class, previously known as just Bits (which is an alias for backward-compatibility), adds
a bit position and methods to read and navigate in the bitstream.
9.3.1 Additional methods
bytealign – Align to next byte boundary.
peek – Peek at and interpret next bits as a single item.
peeklist – Peek at and interpret next bits as a list of items.
read – Read and interpret next bits as a single item.
readlist – Read and interpret next bits as a list of items.
9.3. ConstBitStream 53
bitstring Documentation, Release 3.0.0
readto – Read up to and including next occurrence of a bitstring.
9.3.2 Additional attributes
bytepos – The current byte position in the bitstring.
pos – The current bit position in the bitstring.
9.4 BitStream
BitStream(BitArray, ConstBitStream)
This class, also known as BitString, contains all of the ‘stream’ elements of
ConstBitStream and adds all of the mutating methods of BitArray.
54 Chapter 9. Quick Reference
CHAPTER
TEN
THE BITSTRING MODULE
The bitstring module provides four classes, Bits,BitArray,ConstBitStream and
BitStream.Bits is the simplest, and represents an immutable sequence of bits, while
BitArray adds various methods that modify the contents (these classes are intended to
loosely mirror bytes and bytearray in Python 3). The ‘Stream’ classes have additional
methods to treat the bits as a file or stream.
If you need to change the contents of a bitstring after creation then you must use either the
BitArray or BitStream classes. If you need to use bitstrings as keys in a dictionary or
members of a set then you must use either a Bits or a ConstBitStream. In this section
the generic term ‘bitstring’ is used to refer to an object of any of these classes.
Note that for the bitstream classes the bit position within the bitstream (the position from which
reads occur) can change without affecting the equality operation. This means that the pos and
bytepos properties can change even for a ConstBitStream object.
The public methods, special methods and properties of both classes are detailed in this section.
10.1 The auto initialiser
Note that in places where a bitstring can be used as a parameter, any other valid input to the
auto initialiser can also be used. This means that the parameter can also be a format string
which consists of tokens:
Starting with hex=, or simply starting with 0x implies hexadecimal. e.g. 0x013ff,
hex=013ff
Starting with oct=, or simply starting with 0o implies octal. e.g. 0o755,oct=755
Starting with bin=, or simply starting with 0b implies binary. e.g. 0b0011010,
bin=0011010
Starting with int: or uint: followed by a length in bits and =gives base-2 integers.
e.g. uint:8=255,int:4=-7
To get big, little and native-endian whole-byte integers append be,le or ne respectively
to the uint or int identifier. e.g. uintle:32=1,intne:16=-23
55
bitstring Documentation, Release 3.0.0
For floating point numbers use float: followed by the length in bits and =and the
number. The default is big-endian, but you can also append be,le or ne as with integers.
e.g. float:64=0.2,floatle:32=-0.3e12
Starting with ue=,uie=,se= or sie= implies an exponential-Golomb coded integer.
e.g. ue=12,sie=-4
Multiples tokens can be joined by separating them with commas, so for example se=4, 0b1,
se=-1 represents the concatenation of three elements.
Parentheses and multiplicative factors can also be used, for example 2*(0b10, 0xf) is
equivalent to 0b10, 0xf, 0b10, 0xf. The multiplying factor must come before the thing
it is being used to repeat.
The auto parameter also accepts other types:
A list or tuple, whose elements will be evaluated as booleans (imagine calling bool()
on each item) and the bits set to 1for True items and 0for False items.
A positive integer, used to create a bitstring of that many zero bits.
A file object, presumably opened in read-binary mode, from which the bitstring will be
formed.
A bytearray object.
In Python 3 only, a bytes object. Note this won’t work for Python 2 as bytes is just a
synonym for str.
10.2 Compact format strings
For the read,unpack,peek methods and pack function you can use compact format strings
similar to those used in the struct and array modules. These start with an endian identifier:
>for big-endian, <for little-endian or @for native-endian. This must be followed by at least
one of these codes:
Code Interpretation
b8 bit signed integer
B8 bit unsigned integer
h16 bit signed integer
H16 bit unsigned integer
l32 bit signed integer
L32 bit unsigned integer
q64 bit signed integer
Q64 bit unsigned integer
f32 bit floating point number
d64 bit floating point number
For more detail see Compact format strings.
56 Chapter 10. The bitstring module
bitstring Documentation, Release 3.0.0
10.3 Class properties
Bitstrings use a wide range of properties for getting and setting different interpretations on the
binary data, as well as accessing bit lengths and positions. For the mutable BitStream and
BitArray objects the properties are all read and write (with the exception of the length),
whereas for immutable objects the only write enabled properties are for the position in the
bitstream (pos/bitpos and bytepos).
10.3. Class properties 57
bitstring Documentation, Release 3.0.0
58 Chapter 10. The bitstring module
CHAPTER
ELEVEN
THE BITS CLASS
class bitstring.Bits([auto,length,offset,**kwargs ])
Creates a new bitstring. You must specify either no initialiser, just an auto value,
or one of the keyword arguments bytes,bin,hex,oct,uint,int,uintbe,
intbe,uintle,intle,uintne,intne,se,ue,sie,uie,float,floatbe,
floatle,floatne,bool or filename. If no initialiser is given then a zeroed
bitstring of length bits is created.
The initialiser for the Bits class is precisely the same as for BitArray,BitStream
and ConstBitStream.
offset is available when using the bytes or filename initialisers. It gives a number
of bits to ignore at the start of the bitstring.
Specifying length is mandatory when using the various integer initialisers. It must be
large enough that a bitstring can contain the integer in length bits. It must also be
specified for the float initialisers (the only valid values are 32 and 64). It is optional for
the bytes and filename initialisers and can be used to truncate data from the end of
the input value.
>>> s1 =Bits(hex=’0x934’)
>>> s2 =Bits(oct=’0o4464’)
>>> s3 =Bits(bin=’0b001000110100’)
>>> s4 =Bits(int=-1740, length=12)
>>> s5 =Bits(uint=2356, length=12)
>>> s6 =Bits(bytes=b\x93@’, length=12)
>>> s1 == s2 == s3 == s4 == s5 == s6
True
For information on the use of auto see The auto initialiser.
>>> s=Bits(’uint:12=32, 0b110’)
>>> t=Bits(’0o755, ue:12, int:3=-1’)
all(value[,pos ])
Returns True if all of the specified bits are all set to value, otherwise returns
False.
If value is True then 1bits are checked for, otherwise 0bits are checked for.
59
bitstring Documentation, Release 3.0.0
pos should be an iterable of bit positions. Negative numbers are treated in the same
way as slice indices and it will raise an IndexError if pos < -s.len or pos
> s.len. It defaults to the whole bitstring.
>>> s=Bits(’int:15=-1’)
>>> s.all(True, [3,4,12,13])
True
>>> s.all(1)
True
any(value[,pos ])
Returns True if any of the specified bits are set to value, otherwise returns False.
If value is True then 1bits are checked for, otherwise 0bits are checked for.
pos should be an iterable of bit positions. Negative numbers are treated in the same
way as slice indices and it will raise an IndexError if pos < -s.len or pos
> s.len. It defaults to the whole bitstring.
>>> s=Bits(’0b11011100’)
>>> s.any(False,range(6))
True
>>> s.any(1)
True
count(value)
Returns the number of bits set to value.
value can be True or False or anything that can be cast to a bool, so you could
equally use 1or 0.
>>> s=BitString(1000000)
>>> s.set(1, [4,44,444444])
>>> s.count(1)
3
>>> s.count(False)
999997
cut(bits[,start,end,count ])
Returns a generator for slices of the bitstring of length bits.
At most count items are returned and the range is given by the slice [start:end],
which defaults to the whole bitstring.
>>> s=BitString(’0x1234’)
>>> for nibble in s.cut(4):
... s.prepend(nibble)
>>> print(s)
0x43211234
endswith(bs[,start,end ])
Returns True if the bitstring ends with the sub-string bs, otherwise returns False.
A slice can be given using the start and end bit positions and defaults to the whole
bitstring.
60 Chapter 11. The Bits class
bitstring Documentation, Release 3.0.0
>>> s=Bits(’0x35e22’)
>>> s.endswith(’0b10, 0x22’)
True
>>> s.endswith(’0x22’, start=13)
False
find(bs[,start,end,bytealigned ])
Searches for bs in the current bitstring and sets pos to the start of bs and returns it
in a tuple if found, otherwise it returns an empty tuple.
The reason for returning the bit position in a tuple is so that it evaluates as True even
if the bit position is zero. This allows constructs such as if s.find(’0xb3’):
to work as expected.
If bytealigned is True then it will look for bs only at byte aligned positions (which
is generally much faster than searching for it in every possible bit position). start
and end give the search range and default to the whole bitstring.
>>> s=Bits(’0x0023122’)
>>> s.find(’0b000100’, bytealigned=True)
(16,)
findall(bs[,start,end,count,bytealigned ])
Searches for all occurrences of bs (even overlapping ones) and returns a generator
of their bit positions.
If bytealigned is True then bs will only be looked for at byte aligned positions.
start and end optionally define a search range and default to the whole bitstring.
The count paramater limits the number of items that will be found - the default is to
find all occurences.
>>> s=Bits(’0xab220101’)*5
>>> list(s.findall(’0x22’, bytealigned=True))
[8, 40, 72, 104, 136]
join(sequence)
Returns the concatenation of the bitstrings in the iterable sequence joined with
self as a separator.
>>> s=Bits().join([’0x0001ee’,’uint:24=13’,’0b0111’])
>>> print(s)
0x0001ee00000d7
>>> s=Bits(’0b1’).join([’0b0’]*5)
>>> print(s.bin)
010101010
rfind(bs[,start,end,bytealigned ])
Searches backwards for bs in the current bitstring and sets pos to the start of bs
and returns it in a tuple if found, otherwise it returns an empty tuple.
The reason for returning the bit position in a tuple is so that it evaluates
as True even if the bit position is zero. This allows constructs such as if
61
bitstring Documentation, Release 3.0.0
s.rfind(’0xb3’): to work as expected.
If bytealigned is True then it will look for bs only at byte aligned positions. start
and end give the search range and default to 0and len respectively.
Note that as it’s a reverse search it will start at end and finish at start.
>>> s=Bits(’0o031544’)
>>> s.rfind(’0b100’)
(15,)
>>> s.rfind(’0b100’, end=17)
(12,)
split(delimiter[,start,end,count,bytealigned ])
Splits the bitstring into sections that start with delimiter. Returns a generator for
bitstring objects.
The first item generated is always the bits before the first occurrence of delimiter
(even if empty). A slice can be optionally specified with start and end, while count
specifies the maximum number of items generated.
If bytealigned is True then the delimiter will only be found if it starts at a byte
aligned position.
>>> s=Bits(’0x42423’)
>>> [bs.bin for bs in s.split(’0x4’)]
[’’, ’01000’, ’01001000’, ’0100011’]
startswith(bs[,start,end ])
Returns True if the bitstring starts with the sub-string bs, otherwise returns False.
A slice can be given using the start and end bit positions and defaults to the whole
bitstring.
tobytes()
Returns the bitstring as a bytes object (equivalent to a str in Python 2.6/2.7).
The returned value will be padded at the end with between zero and seven 0bits to
make it byte aligned.
This method can also be used to output your bitstring to a file - just open a file in
binary write mode and write the function’s output.
>>> s=Bits(bytes=b’hello’)
>>> s+= ’0b01’
>>> s.tobytes()
b’hello@’
tofile(f)
Writes the bitstring to the file object f, which should have been opened in binary
write mode.
The data written will be padded at the end with between zero and seven 0bits to
make it byte aligned.
62 Chapter 11. The Bits class
bitstring Documentation, Release 3.0.0
>>> f=open(’newfile’,’wb’)
>>> Bits(’0x1234’).tofile(f)
unpack(fmt,**kwargs)
Interprets the whole bitstring according to the fmt string or iterable and returns a list
of bitstring objects.
A dictionary or keyword arguments can also be provided. These will replace length
identifiers in the format string.
fmt is an iterable or a string with comma separated tokens that describe how to
interpret the next bits in the bitstring. See the entry for read for details.
>>> s=Bits(’int:4=-1, 0b1110’)
>>> i,b=s.unpack(’int:4, bin’)
If a token doesn’t supply a length (as with bin above) then it will try to consume
the rest of the bitstring. Only one such token is allowed.
bin
Property for the representation of the bitstring as a binary string.
bool
Property for representing the bitstring as a boolean (True or False).
If the bitstring is not a single bit then the getter will raise an InterpretError.
bytes
Property representing the underlying byte data that contains the bitstring.
When used as a getter the bitstring must be a whole number of byte long or a
InterpretError will be raised.
An alternative is to use the tobytes method, which will pad with between zero
and seven 0bits to make it byte aligned if needed.
>>> s=Bits(’0x12345678’)
>>> s.bytes
b’\x124Vx’
hex
Property representing the hexadecimal value of the bitstring.
If the bitstring is not a multiple of four bits long then getting its hex value will raise
an InterpretError.
>>> s=Bits(bin=’1111 0000’)
>>> s.hex
’f0’
int
Property for the signed two’s complement integer representation of the bitstring.
intbe
Property for the byte-wise big-endian signed two’s complement integer representa-
tion of the bitstring.
63
bitstring Documentation, Release 3.0.0
Only valid for whole-byte bitstrings, in which case it is equal to s.int, otherwise
an InterpretError is raised.
intle
Property for the byte-wise little-endian signed two’s complement integer represen-
tation of the bitstring.
Only valid for whole-byte bitstring, in which case it is equal to s[::-8].int,
i.e. the integer representation of the byte-reversed bitstring.
intne
Property for the byte-wise native-endian signed two’s complement integer repre-
sentation of the bitstring.
Only valid for whole-byte bitstrings, and will equal either the big-endian or the
little-endian integer representation depending on the platform being used.
float
floatbe
Property for the floating point representation of the bitstring.
The bitstring must be either 32 or 64 bits long to support the floating point interpre-
tations, otherwise an InterpretError will be raised.
If the underlying floating point methods on your machine are not IEEE 754 com-
pliant then using the float interpretations is undefined (this is unlikely unless you’re
on some very unusual hardware).
The float property is bit-wise big-endian, which as all floats must be whole-byte
is exactly equivalent to the byte-wise big-endian floatbe.
floatle
Property for the byte-wise little-endian floating point representation of the bitstring.
floatne
Property for the byte-wise native-endian floating point representation of the bit-
string.
len
length
Read-only property that give the length of the bitstring in bits (len and length
are equivalent).
This is almost equivalent to using the len() built-in function, except that for large
bitstrings len() may fail with an OverflowError, whereas the len property
continues to work.
oct
Property for the octal representation of the bitstring.
If the bitstring is not a multiple of three bits long then getting its octal value will
raise a InterpretError.
64 Chapter 11. The Bits class
bitstring Documentation, Release 3.0.0
>>> s=BitString(’0b111101101’)
>>> s.oct
’755’
>>> s.oct =’01234567’
>>> s.oct
’01234567’
se
Property for the signed exponential-Golomb code representation of the bitstring.
When used as a getter an InterpretError will be raised if the bitstring is not a
single code.
>>> s=BitString(se=-40)
>>> s.bin
0000001010001
>>> s+= ’0b1’
>>> s.se
Error: BitString is not a single exponential-Golomb code.
ue
Property for the unsigned exponential-Golomb code representation of the bitstring.
When used as a getter an InterpretError will be raised if the bitstring is not a
single code.
sie
Property for the signed interleaved exponential-Golomb code representation of the
bitstring.
When used as a getter an InterpretError will be raised if the bitstring is not a
single code.
uie
Property for the unsigned interleaved exponential-Golomb code representation of
the bitstring.
When used as a getter an InterpretError will be raised if the bitstring is not a
single code.
uint
Property for the unsigned base-2 integer representation of the bitstring.
uintbe
Property for the byte-wise big-endian unsigned base-2 integer representation of the
bitstring.
uintle
Property for the byte-wise little-endian unsigned base-2 integer representation of
the bitstring.
uintne
Property for the byte-wise native-endian unsigned base-2 integer representation of
the bitstring.
65
bitstring Documentation, Release 3.0.0
__add__(bs)
__radd__(bs)
s1 + s2
Concatenate two bitstring objects and return the result. Either bitstring can be ‘auto’
initialised.
s=Bits(ue=132)+’0xff’
s2 =’0b101’ +s
__and__(bs)
__rand__(bs)
s1 & s2
Returns the bit-wise AND between two bitstrings, which must have the same length
otherwise a ValueError is raised.
>>> print(Bits(’0x33’)&’0x0f’)
0x03
__bool__()
if s:
Returns True if at least one bit is set to 1, otherwise returns False.
This special method is used in Python 3 only; for Python 2 the equivalent is called
__nonzero__, but the details are exactly the same.
>>> bool(Bits())
False
>>> bool(Bits(’0b0000010000’))
True
>>> bool(Bits(’0b0000000000’))
False
__contains__(bs)
bs in s
Returns True if bs can be found in the bitstring, otherwise returns False.
Similar to using find, except that you are only told if it is found, and not where it
was found.
>>> ’0b11’ in Bits(’0x06’)
True
>>> ’0b111’ in Bits(’0x06’)
False
__copy__()
s2 = copy.copy(s1)
This allows the copy module to correctly copy bitstrings. Other equivalent meth-
ods are to initialise a new bitstring with the old one or to take a complete slice.
66 Chapter 11. The Bits class
bitstring Documentation, Release 3.0.0
>>> import copy
>>> s=Bits(’0o775’)
>>> s_copy1 =copy.copy(s)
>>> s_copy2 =Bits(s)
>>> s_copy3 =s[:]
>>> s== s_copy1 == s_copy2 == s_copy3
True
__eq__(bs)
s1 == s2
Compares two bitstring objects for equality, returning True if they have the same
binary representation, otherwise returning False.
>>> Bits(’0o7777’)== ’0xfff’
True
>>> a=Bits(uint=13, length=8)
>>> b=Bits(uint=13, length=10)
>>> a== b
False
__getitem__(key)
s[start:end:step]
Returns a slice of the bitstring.
The usual slice behaviour applies except that the step parameter gives a multiplica-
tive factor for start and end (i.e. the bits ‘stepped over’ are included in the
slice).
>>> s=Bits(’0x0123456’)
>>> s[0:4]
Bits(’0x1’)
>>> s[0:3:8]
Bits(’0x012345’)
If a single element is asked for then either True or False will be returned.
>>> s[0]
False
>>> s[-1]
True
__hash__()
hash(s)
Returns an integer hash of the Bits.
This method is not available for the BitArray or BitStream classes, as only
immutable objects should be hashed. You typically won’t need to call it directly,
instead it is used for dictionary keys and in sets.
__invert__()
~s
67
bitstring Documentation, Release 3.0.0
Returns the bitstring with every bit inverted, that is all zeros replaced with ones, and
all ones replaced with zeros.
If the bitstring is empty then an Error will be raised.
>>> s = ConstBitStream(‘0b1110010’)
>>> print(~s)
0b0001101
>>> print(~s & s)
0b0000000
__len__()
len(s)
Returns the length of the bitstring in bits if it is less than sys.maxsize, otherwise
raises OverflowError.
It’s recommended that you use the len property rather than the len function be-
cause of the function’s behaviour for large bitstring objects, although calling the
special function directly will always work.
>>> s=Bits(filename=’11GB.mkv’)
>>> s.len
93944160032
>>> len(s)
OverflowError: long int too large to convert to int
>>> s.__len__()
93944160032
__lshift__(n)
s << n
Returns the bitstring with its bits shifted nplaces to the left. The nright-most bits
will become zeros.
>>> s=Bits(’0xff’)
>>> s<< 4
Bits(’0xf0’)
__mul__(n)
__rmul__(n)
s*n/n*s
Return bitstring consisting of nconcatenations of another.
>>> a=Bits(’0x34’)
>>> b=a*5
>>> print(b)
0x3434343434
__ne__(bs)
s1 != s2
Compares two bitstring objects for inequality, returning False if they have the
same binary representation, otherwise returning True.
68 Chapter 11. The Bits class
bitstring Documentation, Release 3.0.0
__nonzero__()
See __bool__.
__or__(bs)
__ror__(bs)
s1 | s2
Returns the bit-wise OR between two bitstring, which must have the same length
otherwise a ValueError is raised.
>>> print(Bits(’0x33’)|’0x0f’)
0x3f
__repr__()
repr(s)
A representation of the bitstring that could be used to create it (which will often not
be the form used to create it).
If the result is too long then it will be truncated with ... and the length of the
whole will be given.
>>> Bits(‘0b11100011’)
Bits(‘0xe3’)
__rshift__(n)
s >> n
Returns the bitstring with its bits shifted nplaces to the right. The nleft-most bits
will become zeros.
>>> s = Bits(‘0xff’)
>>> s >> 4
Bits(‘0x0f’)
__str__()
print(s)
Used to print a representation of of the bitstring, trying to be as brief as possible.
If the bitstring is a multiple of 4 bits long then hex will be used, otherwise either
binary or a mix of hex and binary will be used. Very long strings will be truncated
with ....
>>> s=Bits(’0b1’)*7
>>> print(s)
0b1111111
>>> print(s +’0b1’)
0xff
__xor__(bs)
__rxor__(bs)
s1 ^ s2
69
bitstring Documentation, Release 3.0.0
Returns the bit-wise XOR between two bitstrings, which must have the same length
otherwise a ValueError is raised.
>>> print(Bits(’0x33’)^’0x0f’)
0x3c
70 Chapter 11. The Bits class
CHAPTER
TWELVE
THE BITARRAY CLASS
class bitstring.BitArray([auto,length,offset,**kwargs ])
The Bits class is the base class for BitArray and so (with the exception of
__hash__) all of its methods are also available for BitArray objects. The initialiser
is also the same as for Bits and so won’t be repeated here.
ABitArray is a mutable Bits, and so the one thing all of the methods listed here
have in common is that they can modify the contents of the bitstring.
append(bs)
Join a BitArray to the end of the current BitArray.
>>> s=BitArray(’0xbad’)
>>> s.append(’0xf00d’)
>>> s
BitArray(’0xbadf00d’)
byteswap([fmt,start,end,repeat=True ])
Change the endianness of the BitArray in-place according to fmt. Return the
number of swaps done.
The fmt can be an integer, an iterable of integers or a compact format string similar
to those used in pack (described in Compact format strings). It defaults to 0, which
means reverse as many bytes as possible. The fmt gives a pattern of byte sizes to use
to swap the endianness of the BitArray. Note that if you use a compact format
string then the endianness identifier (<,>or @) is not needed, and if present it will
be ignored.
start and end optionally give a slice to apply the transformation to (it defaults to the
whole BitArray). If repeat is True then the byte swapping pattern given by the
fmt is repeated in its entirety as many times as possible.
>>> s=BitArray(’0x00112233445566’)
>>> s.byteswap(2)
3
>>> s
BitArray(’0x11003322554466’)
>>> s.byteswap(’h’)
3
>>> s
71
bitstring Documentation, Release 3.0.0
BitArray(’0x00112233445566’)
>>> s.byteswap([2,5])
1
>>> s
BitArray(’0x11006655443322’)
It can also be used to swap the endianness of the whole BitArray.
>>> s=BitArray(’uintle:32=1234’)
>>> s.byteswap()
>>> print(s.uintbe)
1234
insert(bs,pos)
Inserts bs at pos.
When used with the BitStream class the pos is optional, and if not present the
current bit position will be used. After insertion the property pos will be immedi-
ately after the inserted bitstring.
>>> s=BitArray(’0xccee’)
>>> s.insert(’0xd’,8)
>>> s
BitArray(’0xccdee’)
>>> s.insert(’0x00’)
>>> s
BitArray(’0xccd00ee’)
invert([pos ])
Inverts one or many bits from 1to 0or vice versa.
pos can be either a single bit position or an iterable of bit positions. Neg-
ative numbers are treated in the same way as slice indices and it will raise
IndexError if pos < -s.len or pos > s.len. The default is to invert
the entire BitArray.
>>> s=BitArray(’0b111001’)
>>> s.invert(0)
>>> s.bin
’0b011001’
>>> s.invert([-2,-1])
>>> s.bin
’0b011010’
>>> s.invert()
>>> s.bin
’0b100101’
overwrite(bs,pos)
Replaces the contents of the current BitArray with bs at pos.
When used with the BitStream class the pos is optional, and if not present the
current bit position will be used. After insertion the property pos will be immedi-
ately after the overwritten bitstring.
72 Chapter 12. The BitArray class
bitstring Documentation, Release 3.0.0
>>> s=BitArray(length=10)
>>> s.overwrite(’0b111’,3)
>>> s
BitArray(’0b0001110000’)
>>> s.pos
6
prepend(bs)
Inserts bs at the beginning of the current BitArray.
>>> s=BitArray(’0b0’)
>>> s.prepend(’0xf’)
>>> s
BitArray(’0b11110’)
replace(old,new[,start,end,count,bytealigned ])
Finds occurrences of old and replaces them with new. Returns the number of re-
placements made.
If bytealigned is True then replacements will only be made on byte boundaries.
start and end give the search range and default to 0and len respectively. If count
is specified then no more than this many replacements will be made.
>>> s=BitArray(’0b0011001’)
>>> s.replace(’0b1’,’0xf’)
3
>>> print(s.bin)
0011111111001111
>>> s.replace(’0b1’,’’, count=6)
6
>>> print(s.bin)
0011001111
reverse([start,end ])
Reverses bits in the BitArray in-place.
start and end give the range and default to 0and len respectively.
>>> a=BitArray(’0b10111’)
>>> a.reverse()
>>> a.bin
’11101’
rol(bits[,start,end ])
Rotates the contents of the BitArray in-place by bits bits to the left.
start and end define the slice to use and default to 0and len respectively.
Raises ValueError if bits < 0.
>>> s=BitArray(’0b01000001’)
>>> s.rol(2)
>>> s.bin
’00000101’
73
bitstring Documentation, Release 3.0.0
ror(bits[,start,end ])
Rotates the contents of the BitArray in-place by bits bits to the right.
start and end define the slice to use and default to 0and len respectively.
Raises ValueError if bits < 0.
set(value[,pos ])
Sets one or many bits to either 1(if value is True) or 0(if value isn’t True). pos
can be either a single bit position or an iterable of bit positions. Negative numbers
are treated in the same way as slice indices and it will raise IndexError if pos
< -s.len or pos > s.len. The default is to set every bit in the BitArray.
Using s.set(True, x) can be more efficent than other equivalent methods
such as s[x] = 1,s[x] = "0b1" or s.overwrite(’0b1’, x), espe-
cially if many bits are being set.
>>> s=BitArray(’0x0000’)
>>> s.set(True,-1)
>>> print(s)
0x0001
>>> s.set(1, (0,4,5,7,9))
>>> s.bin
’1000110101000001’
>>> s.set(0)
>>> s.bin
’0000000000000000’
bin
Writable version of Bits.bin.
bool
Writable version of Bits.bool.
bytes
Writable version of Bits.bytes.
hex
Writable version of Bits.hex.
int
Writable version of Bits.int.
When used as a setter the value must fit into the current length of the BitArray,
else a ValueError will be raised.
>>> s=BitArray(’0xf3’)
>>> s.int
-13
>>> s.int =1232
ValueError: int 1232 is too large for a BitArray of length 8.
intbe
Writable version of Bits.intbe.
74 Chapter 12. The BitArray class
bitstring Documentation, Release 3.0.0
When used as a setter the value must fit into the current length of the BitArray,
else a ValueError will be raised.
intle
Writable version of Bits.intle.
When used as a setter the value must fit into the current length of the BitArray,
else a ValueError will be raised.
intne
Writable version of Bits.intne.
When used as a setter the value must fit into the current length of the BitArray,
else a ValueError will be raised.
float
floatbe
Writable version of Bits.float.
floatle
Writable version of Bits.floatle.
floatne
Writable version of Bits.floatne.
oct
Writable version of Bits.oct.
se
Writable version of Bits.se.
ue
Writable version of Bits.uie.
sie
Writable version of Bits.sie.
uie
Writable version of Bits.ue.
uint
Writable version of Bits.uint.
When used as a setter the value must fit into the current length of the BitArray,
else a ValueError will be raised.
uintbe
Writable version of Bits.uintbe.
When used as a setter the value must fit into the current length of the BitArray,
else a ValueError will be raised.
uintle
Writable version of Bits.uintle.
75
bitstring Documentation, Release 3.0.0
When used as a setter the value must fit into the current length of the BitArray,
else a ValueError will be raised.
uintne
Writable version of Bits.uintle.
When used as a setter the value must fit into the current length of the BitArray,
else a ValueError will be raised.
__delitem__(key)
del s[start:end:step]
Deletes the slice specified.
__iadd__(bs)
s1 += s2
Appends bs to the current bitstring.
Note that for BitArray objects this will be an in-place change, whereas for Bits
objects using += will not call this method - instead a new object will be created (it
is equivalent to a copy and an __add__).
>>> s=BitArray(ue=423)
>>> s+= BitArray(ue=12)
>>> s.read(’ue’)
423
>>> s.read(’ue’)
12
__iand__(bs)
s &= bs
In-place bit-wise AND between two bitstrings. If the two bitstrings are not the same
length then a ValueError is raised.
__ilshift__(n)
s <<= n
Shifts the bits in-place nbits to the left. The nright-most bits will become zeros
and bits shifted off the left will be lost.
__imul__(n)
s*= n
In-place concatenation of ncopies of the current bitstring.
>>> s=BitArray(’0xbad’)
>>> s*=3
>>> s.hex
’badbadbad’
__ior__(bs)
s |= bs
In-place bit-wise OR between two bitstrings. If the two bitstrings are not the same
length then a ValueError is raised.
76 Chapter 12. The BitArray class
bitstring Documentation, Release 3.0.0
__irshift__(n)
s >>= n
Shifts the bits in-place nbits to the right. The nleft-most bits will become zeros
and bits shifted off the right will be lost.
__ixor__(bs)
s ^= bs
In-place bit-wise XOR between two bitstrings. If the two bitstrings are not the same
length then a ValueError is raised.
__setitem__(key,value)
s1[start:end:step] = s2
Replaces the slice specified with a new value.
>>> s=BitArray(’0x00112233’)
>>> s[1:2:8]=’0xfff’
>>> print(s)
0x00fff2233
>>> s[-12:] =’0xc’
>>> print(s)
0x00fff2c
77
bitstring Documentation, Release 3.0.0
78 Chapter 12. The BitArray class
CHAPTER
THIRTEEN
THE CONSTBITSTREAM CLASS
class bitstring.ConstBitStream([auto,length,offset,**kwargs ])
The Bits class is the base class for ConstBitStream and so all of its methods are
also available for ConstBitStream objects. The initialiser is also the same as for
Bits and so won’t be repeated here.
AConstBitStream is a Bits with added methods and properties that allow it to be
parsed as a stream of bits.
bytealign()
Aligns to the start of the next byte (so that pos is a multiple of 8) and returns the
number of bits skipped.
If the current position is already byte aligned then it is unchanged.
>>> s=ConstBitStream(’0xabcdef’)
>>> s.pos += 3
>>> s.bytealign()
5
>>> s.pos
8
peek(fmt)
Reads from the current bit position pos in the bitstring according to the fmt string
or integer and returns the result.
The bit position is unchanged.
For information on the format string see the entry for the read method.
>>> s=ConstBitStream(’0x123456’)
>>> s.peek(16)
ConstBitStream(’0x1234’)
>>> s.peek(’hex:8’)
’12’
peeklist(fmt,**kwargs)
Reads from current bit position pos in the bitstring according to the fmt string or
iterable and returns a list of results.
A dictionary or keyword arguments can also be provided. These will replace length
identifiers in the format string. The position is not advanced to after the read items.
79
bitstring Documentation, Release 3.0.0
See the entries for read and readlist for more information.
read(fmt)
Reads from current bit position pos in the bitstring according the the format string
and returns a single result. If not enough bits are available then a ReadError is
raised.
fmt is either a token string that describes how to interpret the next bits in the bitstring
or an integer. If it’s an integer then that number of bits will be read, and returned as
a new bitstring. Otherwise the tokens are:
int:n n bits as a signed integer.
uint:n n bits as an unsigned integer.
float:n n bits as a floating point number.
intbe:n n bits as a big-endian signed integer.
uintbe:n n bits as a big-endian unsigned integer.
floatbe:n n bits as a big-endian float.
intle:n n bits as a little-endian signed int.
uintle:n n bits as a little-endian unsigned int.
floatle:n n bits as a little-endian float.
intne:n n bits as a native-endian signed int.
uintne:n n bits as a native-endian unsigned int.
floatne:n n bits as a native-endian float.
hex:n n bits as a hexadecimal string.
oct:n n bits as an octal string.
bin:n n bits as a binary string.
ue next bits as an unsigned exp-Golomb.
se next bits as a signed exp-Golomb.
uie next bits as an interleaved unsigned exp-Golomb.
sie next bits as an interleaved signed exp-Golomb.
bits:n n bits as a new bitstring.
bytes:n n bytes as bytes object.
bool next bit as a boolean (True or False).
For example:
>>> s=ConstBitStream(’0x23ef55302’)
>>> s.read(’hex:12’)
’23e’
>>> s.read(’bin:4’)
’1111’
>>> s.read(’uint:5’)
10
>>> s.read(’bits:4’)
ConstBitStream(’0xa’)
The read method is useful for reading exponential-Golomb codes.
>>> s=ConstBitStream(’se=-9, ue=4’)
>>> s.read(’se’)
-9
80 Chapter 13. The ConstBitStream class
bitstring Documentation, Release 3.0.0
>>> s.read(’ue’)
4
readlist(fmt,**kwargs)
Reads from current bit position pos in the bitstring according to the fmt string
or iterable and returns a list of results. If not enough bits are available then a
ReadError is raised.
A dictionary or keyword arguments can also be provided. These will replace length
identifiers in the format string. The position is advanced to after the read items.
See the entry for read for information on the format strings.
For multiple items you can separate using commas or given multiple parameters:
>>> s=ConstBitStream(’0x43fe01ff21’)
>>> s.readlist(’hex:8, uint:6’)
[’43’, 63]
>>> s.readlist([’bin:3’,’intle:16’])
[’100’, -509]
>>> s.pos =0
>>> s.readlist(’hex:b, uint:d’, b=8, d=6)
[’43’, 63]
readto(bs,bytealigned)
Reads up to and including the next occurrence of the bitstring bs and returns the
results. If bytealigned is True it will look for the bitstring starting only at whole-
byte positions.
Raises a ReadError if bs is not found, and ValueError if bs is empty.
>>> s=ConstBitStream(’0x47000102034704050647’)
>>> s.readto(’0x47’, bytealigned=True)
BitStream(’0x47’)
>>> s.readto(’0x47’, bytealigned=True)
BitStream(’0x0001020347’)
>>> s.readto(’0x47’, bytealigned=True)
BitStream(’0x04050647’)
bytepos
Property for setting and getting the current byte position in the bitstring.
When used as a getter will raise a ByteAlignError if the current position in not
byte aligned.
pos
bitpos
Read and write property for setting and getting the current bit position in the bit-
string. Can be set to any value from 0to len.
The pos and bitpos properties are exactly equivalent - you can use whichever
you prefer.
81
bitstring Documentation, Release 3.0.0
if s.pos <100:
s.pos += 10
82 Chapter 13. The ConstBitStream class
CHAPTER
FOURTEEN
THE BITSTREAM CLASS
class bitstring.BitStream([auto,length,offset,**kwargs ])
Both the BitArray and the ConstBitStream classes are base classes for
BitStream and so all of their methods are also available for BitStream objects.
The initialiser is also the same as for Bits and so won’t be repeated here.
ABitStream is a mutable container of bits with methods and properties that allow it
to be parsed as a stream of bits. There are no additional methods of properties in this
class - see its base classes (Bits,BitArray and ConstBitStream) for details.
83
bitstring Documentation, Release 3.0.0
84 Chapter 14. The BitStream class
CHAPTER
FIFTEEN
FUNCTIONS
bitstring.pack(format[,*values,**kwargs ])
Packs the values and keyword arguments according to the format string and returns a new
BitStream.
Parameters
format – string with comma separated tokens
values – extra values used to construct the BitStream
kwargs – a dictionary of token replacements
Return type BitStream
The format string consists of comma separated tokens of the form name:length=value.
See the entry for read for more details.
The tokens can be ‘literals’, like 0xef,0b110,uint:8=55, etc. which just represent a set
sequence of bits.
They can also have the value missing, in which case the values contained in *values will be
used.
>>> a=pack(’bin:3, hex:4’,’001’,’f’)
>>> b=pack(’uint:10’,33)
A dictionary or keyword arguments can also be provided. These will replace items in the format
string.
>>> c=pack(’int:a=b’, a=10, b=20)
>>> d=pack(’int:8=a, bin=b, int:4=a’, a=7, b=’0b110’)
Plain names can also be used as follows:
>>> e=pack(’a, b, b, a’, a=’0b11’, b=’0o2’)
Tokens starting with an endianness identifier (<,>or @) implies a struct-like compact format
string (see Compact format strings). For example this packs three little-endian 16-bit integers:
>>> f=pack(’<3h’,12,3,108)
And of course you can combine the different methods in a single pack.
85
bitstring Documentation, Release 3.0.0
AValueError will be raised if the *values are not all used up by the format string, and if
a value provided doesn’t match the length specified by a token.
86 Chapter 15. Functions
CHAPTER
SIXTEEN
EXCEPTIONS
exception bitstring.Error(Exception)
Base class for all module exceptions.
exception bitstring.InterpretError(Error,ValueError)
Inappropriate interpretation of binary data. For example using the ‘bytes’ property on a
bitstring that isn’t a whole number of bytes long.
exception bitstring.ByteAlignError(Error)
Whole-byte position or length needed.
exception bitstring.CreationError(Error,ValueError)
Inappropriate argument during bitstring creation.
exception bitstring.ReadError(Error,IndexError)
Reading or peeking past the end of a bitstring.
87
bitstring Documentation, Release 3.0.0
88 Chapter 16. Exceptions
Part III
Appendices
89
bitstring Documentation, Release 3.0.0
Gathered together here are a few odds and ends that didn’t fit well into either the user manual or
the reference section. The only unifying theme is that none of them provide any vital knowledge
about bitstring, and so they can all be safely ignored.
91
bitstring Documentation, Release 3.0.0
92
CHAPTER
SEVENTEEN
EXAMPLES
17.1 Creation
There are lots of ways of creating new bitstrings. The most flexible is via the auto parameter,
which is used in this example.
# Multiple parts can be joined with a single expression...
s=BitArray(’0x000001b3, uint:12=352, uint:12=288, 0x1, 0x3’)
# and extended just as easily
s+= ’uint:18=48000, 0b1, uint:10=4000, 0b100’
# To covert to an ordinary string use the bytes property
open(’video.m2v’,’wb’).write(s.bytes)
# The information can be read back with a similar syntax
start_code, width, height =s.readlist(’hex:32, uint:12, uint:12’)
aspect_ratio, frame_rate =s.readlist(’2*bin:4’)
17.2 Manipulation
s=BitArray(’0x0123456789abcdef’)
del s[4:8]# deletes the ’1’
s.insert(’0xcc’,12)# inserts ’cc’ between the ’3’ and ’4’
s.overwrite(’0b01’,30)# changes the ’6’ to a ’5’
# This replaces every ’1’ bit with a 5 byte Ascii string!
s.replace(’0b1’, BitArray(bytes=’hello’))
del s[-1001:] # deletes final 1001 bits
s.reverse() # reverses whole BitString
s.prepend(’uint:12=44’)# prepend a 12 bit integer
93
bitstring Documentation, Release 3.0.0
17.3 Parsing
This example creates a class that parses a structure that is part of the H.264 video standard.
class seq_parameter_set_data(object):
def __init__(self, s):
"""Interpret next bits in BitString s as an SPS."""
# Read and interpret bits in a single expression:
self.profile_idc =s.read(’uint:8’)
# Multiple reads in one go returns a list:
self.constraint_flags =s.readlist(’4*uint:1’)
self.reserved_zero_4bits =s.read(’bin:4’)
self.level_idc =s.read(’uint:8’)
self.seq_parameter_set_id =s.read(’ue’)
if self.profile_idc in [100,110,122,244,44,83,86]:
self.chroma_format_idc =s.read(’ue’)
if self.chroma_format_idc == 3:
self.separate_colour_plane_flag =s.read(’uint:1’)
self.bit_depth_luma_minus8 =s.read(’ue’)
self.bit_depth_chroma_minus8 =s.read(’ue’)
# etc.
>>> s=BitStream(’0x6410281bc0’)
>>> sps =seq_parameter_set_data(s)
>>> print(sps.profile_idc)
100
>>> print(sps.level_idc)
40
>>> print(sps.reserved_zero_4bits)
0b0000
>>> print(sps.constraint_flags)
[0, 0, 0, 1]
17.4 Sieve of Eratosthenes
This classic (though inefficient) method of calculating prime numbers uses a bitstring to store
whether each bit position represents a prime number. This takes much less memory than an
ordinary array.
def prime_sieve(top=1000000):
b=BitArray(top) # bitstring of ’0’ bits
for iin xrange(2, top):
if not b[i]:
yield i
# i is prime, so set all its multiples to ’1’.
b.set(True,xrange(i*i, top, i))
94 Chapter 17. Examples
CHAPTER
EIGHTEEN
EXPONENTIAL-GOLOMB CODES
As this type of representation of integers isn’t as well known as the standard base-2 represen-
tation I thought that a short explanation of them might be welcome. This section can be safely
skipped if you’re not interested.
Exponential-Golomb codes represent integers using bit patterns that get longer for larger num-
bers. For unsigned and signed numbers (the bitstring properties ue and se respectively) the
patterns start like this:
Bit pattern Unsigned Signed
10 0
010 1 1
011 2 -1
00100 3 2
00101 4 -2
00110 5 3
00111 6 -3
0001000 7 4
0001001 8 -4
0001010 9 5
0001011 10 -5
0001100 11 6
... ... ...
They consist of a sequence of n ‘0’ bits, followed by a ‘1’ bit, followed by n more bits. The
bits after the first ‘1’ bit count upwards as ordinary base-2 binary numbers until they run out of
space and an extra ‘0’ bit needs to get included at the start.
The advantage of this method of representing integers over many other methods is that it can be
quite efficient at representing small numbers without imposing a limit on the maximum number
that can be represented.
Exercise: Using the table above decode this sequence of unsigned Exponential Golomb codes:
001001101101101011000100100101
The answer is that it decodes to 3, 0, 0, 2, 2, 1, 0, 0, 8, 4. Note how you don’t need to know
how many bits are used for each code in advance - there’s only one way to decode it. To create
this bitstring you could have written something like:
95
bitstring Documentation, Release 3.0.0
a=BitStream().join([BitArray(ue=i) for iin [3,0,0,2,2,1,0,0,8,4]])
and to read it back:
while a.pos != a.len:
print(a.read(’ue’))
The notation ue and se for the exponential-Golomb code properties comes from the H.264
video standard, which uses these types of code a lot. There are other ways to map the bitstrings
to integers:
18.1 Interleaved exponential-Golomb codes
This type of code is used in the Dirac video standard, and is represented by the attributes uie
and sie. For the interleaved codes the pattern is very similar to before for the unsigned case:
Bit pattern Unsigned
10
001 1
011 2
00001 3
00011 4
01001 5
01011 6
0000001 7
0000011 8
0001001 9
... ...
For the signed code it looks a little different:
Bit pattern Signed
10
0010 1
0011 -1
0110 2
0111 -2
000010 3
000011 -3
000110 4
000111 -4
010010 5
010011 -5
... ...
I’m sure you can work out the pattern yourself from here!
96 Chapter 18. Exponential-Golomb Codes
CHAPTER
NINETEEN
OPTIMISATION TECHNIQUES
The bistring module aims to be as fast as reasonably possible, and although there is more
work to be done optimising some operations it is currently quite well optimised without resort-
ing to C extensions.
There are however some pointers you should follow to make your code efficient, so if you need
things to run faster then this is the section for you.
19.1 Use combined read and interpretation
When parsing a bitstring one way to write code is in the following style:
width =s.read(12).uint
height =s.read(12).uint
flags =s.read(4).bin
This works fine, but is not very quick. The problem is that the call to read constructs and
returns a new bitstring, which then has to be interpreted. The new bitstring isn’t used for
anything else and so creating it is wasted effort. Instead it is better to use a string parameter
that does the read and interpretation together:
width =s.read(’uint:12’)
height =s.read(’uint:12’)
flags =s.read(’bin:4’)
This is much faster, although probably not as fast as the combined call:
width, height, flags =s.readlist(’uint:12, uint:12, bin:4’)
19.2 Choose the simplest class you can
If you don’t need to modify your bitstring after creation then prefer the immutable Bits over
the mutable BitArray. This is typically the case when parsing, or when creating directly
from files.
97
bitstring Documentation, Release 3.0.0
The speed difference between the classes is noticable, and there are also memory usage opti-
misations that are made if objects are known to be immutable.
You should also prefer ConstBitStream to BitStream if you won’t need to modify any
bits.
One anti-pattern to watch out for is using += on a Bits object. For example, don’t do this:
s=Bits()
for iin range(1000):
s+= ’0xab’
Now this is inefficient for a few reasons, but the one I’m highlighting is that as the immutable
bitstring doesn’t have an __iadd__ special method the ordinary __add__ gets used instead.
In other words s += ’0xab’ gets converted to s = s + ’0xab’, which creates a new
Bits from the old on every iteration. This isn’t what you’d want or possibly expect. If shad
been a BitArray then the addition would have been done in-place, and have been much more
efficient.
19.3 Use dedicated functions for bit setting and check-
ing
If you need to set or check individual bits then there are special functions for this. For example
one way to set bits would be:
s=BitArray(1000)
for pin [14,34,501]:
s[p] =’0b1’
This creates a 1000 bit bitstring and sets three of the bits to ‘1’. Unfortunately the crucial
line spends most of its time creating a new bitstring from the ‘0b1’ string. You could make it
slightly quicker by using s[p] = True, but it is much faster (and I mean at least an order of
magnitude) to use the set method:
s=BitArray(1000)
s.set(True, [14,34,501])
As well as set and invert there are also checking methods all and any. So rather than
using
if s[100]and s[200]:
do_something()
it’s better to say
if s.all(True, (100,200)):
do_something()
98 Chapter 19. Optimisation Techniques
CHAPTER
TWENTY
RELEASE NOTES
20.1 Full Version History
20.1.1 November 21st 2011: version 3.0.0 released
This is a major release which breaks backward compatibility in a few places.
20.1.2 Backwardly incompatible changes
Hex, oct and bin properties don’t have leading 0x, 0o and 0b
If you ask for the hex, octal or binary representations of a bitstring then they will no longer
be prefixed with 0x,0o or 0b. This was done as it was noticed that the first thing a lot of
user code does after getting these representations was to cut off the first two characters before
further processing.
>>> a=BitArray(’0x123’)
>>> a.hex, a.oct, a.bin
(’123’, ’0443’, ’000100100011’)
Previously this would have returned (’0x123’, ’0o0443’, ’0b000100100011’)
This change might require some recoding, but it should all be simplifications.
ConstBitArray renamed to Bits
Previously Bits was an alias for ConstBitStream (for backward compatibility). This has now
changed so that Bits and BitArray loosely correspond to the built-in types bytes and bytearray.
If you were using streaming/reading methods on a Bits object then you will have to change it
to a ConstBitStream.
The ConstBitArray name is kept as an alias for Bits.
99
bitstring Documentation, Release 3.0.0
Stepping in slices has conventional meaning
The step parameter in __getitem__,__setitem__ and __delitem__ used to act as a
multiplier for the start and stop parameters. No one seemed to use it though and so it has now
reverted to the convential meaning for containers.
If you are using step then recoding is simple: s[a:b:c] becomes s[a*c:b*c].
Some examples of the new usage:
>>> s=BitArray(’0x0000’)
s[::4] = [1, 1, 1, 1]
>>> s.hex
’8888’
>>> del s[8::2]
>>> s.hex
’880’
20.1.3 New features
New readto method
This method is a mix between a find and a read - it searches for a bitstring and then reads up to
and including it. For example:
>>> s=ConstBitStream(’0x47000102034704050647’)
>>> s.readto(’0x47’, bytealigned=True)
BitStream(’0x47’)
>>> s.readto(’0x47’, bytealigned=True)
BitStream(’0x0001020347’)
>>> s.readto(’0x47’, bytealigned=True)
BitStream(’0x04050647’)
pack function accepts an iterable as its format
Previously only a string was accepted as the format in the pack function. This was an oversight
as it broke the symmetry between pack and unpack. Now you can use formats like this:
fmt =[’hex:8’,’bin:3’]
a=pack(fmt, ’47’,’001’)
a.unpack(fmt)
20.1.4 June 18th 2011: version 2.2.0 released
This is a minor upgrade with a couple of new features.
100 Chapter 20. Release Notes
bitstring Documentation, Release 3.0.0
New interleaved exponential-Golomb interpretations
New bit interpretations for interleaved exponential-Golomb (as used in the Dirac video codec)
are supplied via uie and sie:
>>> s=BitArray(uie=41)
>>> s.uie
41
>>> s.bin
’0b00010001001’
These are pretty similar to the non-interleaved versions - see the manual for more details. Credit
goes to Paul Sargent for the patch.
New package-level bytealigned variable
A number of methods take a bytealigned parameter to indicate that they should only work
on byte boundaries (e.g. find,replace,split). Previously this parameter defaulted to
False. Instead it now defaults to bitstring.bytealigned, which itself defaults to
False, but can be changed to modify the default behaviour of the methods. For example:
>>> a=BitArray(’0x00 ff 0f ff’)
>>> a.find(’0x0f’)
(4,) # found first not on a byte boundary
>>> a.find(’0x0f’, bytealigned=True)
(16,) # forced looking only on byte boundaries
>>> bitstring.bytealigned =True # Change default behaviour
>>> a.find(’0x0f’)
(16,)
>>> a.find(’0x0f’, bytealigned=False)
(4,)
If you’re only working with bytes then this can help avoid some errors and save some typing!
Other changes
Fix for Python 3.2, correcting for a change to the binascii module.
Fix for bool initialisation from 0or 1.
Efficiency improvements, including interning strategy.
20.1.5 February 23rd 2011: version 2.1.1 released
This is a release to fix a couple of bugs that were introduced in 2.1.0.
Bug fix: Reading using the ‘bytes’ token had been broken (Issue 102).
Fixed problem using some methods on ConstBitArray objects.
Better exception handling for tokens missing values.
20.1. Full Version History 101
bitstring Documentation, Release 3.0.0
Some performance improvements.
20.1.6 January 23rd 2011: version 2.1.0 released
20.1.7 New class hierarchy introduced with simpler classes
Previously there were just two classes, the immutable Bits which was the base class for the
mutable BitString class. Both of these classes have the concept of a bit position, from
which reads etc. take place so that the bitstring could be treated as if it were a file or stream.
Two simpler classes have now been added which are purely bit containers and don’t have a bit
position. These are called ConstBitArray and BitArray. As you can guess the former
is an immutable version of the latter.
The other classes have also been renamed to better reflect their capabilities. In-
stead of BitString you should use BitStream, and instead of Bits you can use
ConstBitStream. The old names are kept as aliases for backward compatibility.
The classes hierarchy is:
ConstBitArray
/ \
/ \
BitArray ConstBitStream (formerly Bits)
\ /
\ /
BitStream (formerly BitString)
Other changes
A lot of internal reorganisation has taken place since the previous version, most of which won’t
be noticed by the end user. Some things you might see are:
New package structure. Previous versions have been a single file for the module and
another for the unit tests. The module is now split into many more files so it can’t be
used just by copying bitstring.py any more.
To run the unit tests there is now a script called runtests.py in the test directory.
File based bitstring are now implemented in terms of an mmap. This should be just an
implementation detail, but unfortunately for 32-bit versions of Python this creates a limit
of 4GB on the files that can be used. The work around is either to get a 64-bit Python, or
just stick with version 2.0.
The ConstBitArray and ConstBitStream classes no longer copy byte data when
a slice or a read takes place, they just take a reference. This is mostly a very nice optimi-
sation, but there are occassions where it could have an adverse effect. For example if a
very large bitstring is created, a small slice taken and the original deleted. The byte data
from the large bitstring would still be retained in memory.
102 Chapter 20. Release Notes
bitstring Documentation, Release 3.0.0
Optimisations. Once again this version should be faster than the last. The module is still
pure Python but some of the reorganisation was to make it more feasible to put some of
the code into Cython or similar, so hopefully more speed will be on the way.
20.1.8 July 26th 2010: version 2.0.3 released
1. Bug fix: Using peek and read for a single bit now returns a new bitstring as was intended,
rather than the old behaviour of returning a bool.
2. Removed HTML docs from source archive - better to use the online version.
20.1.9 July 25th 2010: version 2.0.2 released
This is a major release, with a number of backwardly incompatible changes. The main change
is the removal of many methods, all of which have simple alternatives. Other changes are quite
minor but may need some recoding.
There are a few new features, most of which have been made to help the stream-lining of the
API. As always there are performance improvements and some API changes were made purely
with future performance in mind.
20.1.10 The backwardly incompatible changes are:
Methods removed
About half of the class methods have been removed from the API. They all have simple alter-
natives, so what remains is more powerful and easier to remember. The removed methods are
listed here on the left, with their equivalent replacements on the right:
s.advancebit() -> s.pos += 1
s.advancebits(bits) -> s.pos += bits
s.advancebyte() -> s.pos += 8
s.advancebytes(bytes) -> s.pos += 8*bytes
s.allunset([a, b]) -> s.all(False, [a, b])
s.anyunset([a, b]) -> s.any(False, [a, b])
s.delete(bits, pos) -> del s[pos:pos+bits]
s.peekbit() -> s.peek(1)
s.peekbitlist(a, b) -> s.peeklist([a, b])
s.peekbits(bits) -> s.peek(bits)
s.peekbyte() -> s.peek(8)
s.peekbytelist(a, b) -> s.peeklist([8*a, 8*b])
s.peekbytes(bytes) -> s.peek(8*bytes)
s.readbit() -> s.read(1)
s.readbitlist(a, b) -> s.readlist([a, b])
s.readbits(bits) -> s.read(bits)
s.readbyte() -> s.read(8)
s.readbytelist(a, b) -> s.readlist([8*a, 8*b])
20.1. Full Version History 103
bitstring Documentation, Release 3.0.0
s.readbytes(bytes) -> s.read(8*bytes)
s.retreatbit() -> s.pos -= 1
s.retreatbits(bits) -> s.pos -= bits
s.retreatbyte() -> s.pos -= 8
s.retreatbytes(bytes) -> s.pos -= 8*bytes
s.reversebytes(start, end) -> s.byteswap(0, start, end)
s.seek(pos) -> s.pos = pos
s.seekbyte(bytepos) -> s.bytepos = bytepos
s.slice(start, end, step) -> s[start:end:step]
s.tell() -> s.pos
s.tellbyte() -> s.bytepos
s.truncateend(bits) -> del s[-bits:]
s.truncatestart(bits) -> del s[:bits]
s.unset([a, b]) -> s.set(False, [a, b])
Many of these methods have been deprecated for the last few releases, but there are some new
removals too. Any recoding needed should be quite straightforward, so while I apologise for
the hassle, I had to take the opportunity to streamline and rationalise what was becoming a bit
of an overblown API.
set / unset methods combined
The set/unset methods have been combined in a single method, which now takes a boolean
as its first argument:
s.set([a, b]) -> s.set(1, [a, b])
s.unset([a, b]) -> s.set(0, [a, b])
s.allset([a, b]) -> s.all(1, [a, b])
s.allunset([a, b]) -> s.all(0, [a, b])
s.anyset([a, b]) -> s.any(1, [a, b])
s.anyunset([a, b]) -> s.any(0, [a, b])
all / any only accept iterables
The all and any methods (previously called allset,allunset,anyset and
anyunset) no longer accept a single bit position. The recommended way of testing a sin-
gle bit is just to index it, for example instead of:
>>> if s.all(True, i):
just use
>>> if s[i]:
If you really want to you can of course use an iterable with a single element, such as
s.any(False, [i]), but it’s clearer just to write not s[i].
104 Chapter 20. Release Notes
bitstring Documentation, Release 3.0.0
Exception raised on reading off end of bitstring
If a read or peek goes beyond the end of the bitstring then a ReadError will be raised. The
previous behaviour was that the rest of the bitstring would be returned and no exception raised.
BitStringError renamed to Error
The base class for errors in the bitstring module is now just Error, so it will
likely appears in your code as bitstring.Error instead of the rather repetitive
bitstring.BitStringError.
Single bit slices and reads return a bool
A single index slice (such as s[5]) will now return a bool (i.e. True or False) rather than a
single bit bitstring. This is partly to reflect the style of the bytearray type, which returns an
integer for single items, but mostly to avoid common errors like:
>>> if s[0]:
... do_something()
While the intent of this code snippet is quite clear (i.e. do_something if the first bit of s is set)
under the old rules s[0] would be true as long as swasn’t empty. That’s because any one-bit
bitstring was true as it was a non-empty container. Under the new rule s[0] is True if sstarts
with a 1bit and False if sstarts with a 0bit.
The change does not affect reads and peeks, so s.peek(1) will still return a single bit bit-
string, which leads on to the next item...
Empty bitstrings or bitstrings with only zero bits are considered False
Previously a bitstring was False if it had no elements, otherwise it was True. This is standard
behaviour for containers, but wasn’t very useful for a container of just 0s and 1s. The new
behaviour means that the bitstring is False if it has no 1 bits. This means that code like this:
>>> if s.peek(1):
... do_something()
should work as you’d expect. It also means that Bits(1000),Bits(0x00) and
Bits(’uint:12=0’) are all also False. If you need to check for the emptiness of a
bitstring then instead check the len property:
if s -> if s.len
if not s -> if not s.len
Length and offset disallowed for some initialisers
Previously you could create bitstring using expressions like:
20.1. Full Version History 105
bitstring Documentation, Release 3.0.0
>>> s=Bits(hex=’0xabcde’, offset=4, length=13)
This has now been disallowed, and the offset and length parameters may only be used when
initialising with bytes or a file. To replace the old behaviour you could instead use
>>> s=Bits(hex=’0xabcde’)[4:17]
Renamed format parameter fmt
Methods with a format parameter have had it renamed to fmt, to prevent hiding the built-in
format. Affects methods unpack,read,peek,readlist,peeklist and byteswap
and the pack function.
Iterables instead of *format accepted for some methods
This means that for the affected methods (unpack,readlist and peeklist) you will
need to use an iterable to specify multiple items. This is easier to show than to describe, so
instead of
>>> a,b,c,d=s.readlist(’uint:12’,’hex:4’,’bin:7’)
you would instead write
>>> a,b,c,d=s.readlist([’uint:12’,’hex:4’,’bin:7’])
Note that you could still use the single string ’uint:12, hex:4, bin:7’ if you pre-
ferred.
Bool auto-initialisation removed
You can no longer use True and False to initialise single bit bitstrings. The reasoning
behind this is that as bool is a subclass of int, it really is bad practice to have Bits(False)
be different to Bits(0) and to have Bits(True) different to Bits(1).
If you have used bool auto-initialisation then you will have to be careful to replace it as the
bools will now be interpreted as ints, so Bits(False) will be empty (a bitstring of length
0), and Bits(True) will be a single zero bit (a bitstring of length 1). Sorry for the confusion,
but I think this will prevent bigger problems in the future.
There are a few alternatives for creating a single bit bitstring. My favourite is to use a list with
a single item:
Bits(False) -> Bits([0])
Bits(True) -> Bits([1])
106 Chapter 20. Release Notes
bitstring Documentation, Release 3.0.0
New creation from file strategy
Previously if you created a bitstring from a file, either by auto-initialising with a file object or
using the filename parameter, the file would not be read into memory unless you tried to modify
it, at which point the whole file would be read.
The new behaviour depends on whether you create a Bits or a BitString from the file. If you
create a Bits (which is immutable) then the file will never be read into memory. This allows
very large files to be opened for examination even if they could never fit in memory.
If however you create a BitString, the whole of the referenced file will be read to store in
memory. If the file is very big this could take a long time, or fail, but the idea is that in saying
you want the mutable BitString you are implicitly saying that you want to make changes and
so (for now) we need to load it into memory.
The new strategy is a bit more predictable in terms of performance than the old. The main point
to remember is that if you want to open a file and don’t plan to alter the bitstring then use the
Bits class rather than BitString.
Just to be clear, in neither case will the contents of the file ever be changed - if you want to
output the modified BitString then use the tofile method, for example.
find and rfind return a tuple instead of a bool
If a find is unsuccessful then an empty tuple is returned (which is False in a boolean sense)
otherwise a single item tuple with the bit position is returned (which is True in a boolean
sense). You shouldn’t need to recode unless you explicitly compared the result of a find to True
or False, for example this snippet doesn’t need to be altered:
>>> if s.find(’0x23’):
... print(s.bitpos)
but you could now instead use
>>> found =s.find(’0x23’)
>>> if found:
... print(found[0])
The reason for returning the bit position in a tuple is so that finding at position zero can still be
True - it’s the tuple (0,) - whereas not found can be False - the empty tuple ().
20.1.11 The new features in this release are:
New count method
This method just counts the number of 1 or 0 bits in the bitstring.
>>> s=Bits(’0x31fff4’)
>>> s.count(1)
16
20.1. Full Version History 107
bitstring Documentation, Release 3.0.0
read and peek methods accept integers
The read,readlist,peek and peeklist methods now accept integers as parameters to
mean “read this many bits and return a bitstring”. This has allowed a number of methods to be
removed from this release, so for example instead of:
>>> a,b,c=s.readbits(5,6,7)
>>> if s.peekbit():
... do_something()
you should write:
>>> a,b,c=s.readlist([5,6,7])
>>> if s.peek(1):
... do_something()
byteswap used to reverse all bytes
The byteswap method now allows a format specifier of 0 (the default) to signify that all
of the whole bytes should be reversed. This means that calling just byteswap() is almost
equivalent to the now removed bytereverse() method (a small difference is that byteswap
won’t raise an exception if the bitstring isn’t a whole number of bytes long).
Auto initialise with bytearray or (for Python 3 only) bytes
So rather than writing:
>>> a=Bits(bytes=some_bytearray)
you can just write
>>> a=Bits(some_bytearray)
This also works for the bytes type, but only if you’re using Python 3. For Python 2 it’s not
possible to distinguish between a bytes object and a str. For this reason this method should be
used with some caution as it will make you code behave differently with the different major
Python versions.
>>> b=Bits(b’abcd\x23\x00)# Only Python 3!
set,invert,all and any default to whole bitstring
This means that you can for example write:
>>> a=BitString(100)# 100 zero bits
>>> a.set(1)# set all bits to 1
>>> a.all(1)# are all bits set to 1?
True
>>> a.any(0)# are any set to 0?
108 Chapter 20. Release Notes
bitstring Documentation, Release 3.0.0
False
>>> a.invert() # invert every bit
New exception types
As well as renaming BitStringError to just Error there are also new exceptions which
use Error as a base class.
These can be caught in preference to Error if you need finer control. The new exceptions
sometimes also derive from built-in exceptions:
3. ByteAlignError(Error) - whole byte position or length needed.
4. ReadError(Error, IndexError) - reading or peeking off the end of the bit-
string.
5. CreationError(Error, ValueError) - inappropriate argument during bit-
string creation.
6. InterpretError(Error, ValueError) - inappropriate interpretation of binary
data.
20.1.12 March 18th 2010: version 1.3.0 for Python 2.6 and 3.x re-
leased
20.1.13 New features
byteswap method for changing endianness
Changes the endianness in-place according to a format string or integer(s) giving the byte pat-
tern. See the manual for details.
>>> s=BitString(’0x00112233445566’)
>>> s.byteswap(2)
3
>>> s
BitString(’0x11003322554466’)
>>> s.byteswap(’h’)
3
>>> s
BitString(’0x00112233445566’)
>>> s.byteswap([2,5])
1
>>> s
BitString(’0x11006655443322’)
20.1. Full Version History 109
bitstring Documentation, Release 3.0.0
Multiplicative factors in bitstring creation and reading
For example:
>>> s=Bits(’100*0x123’)
Token grouping using parenthesis
For example:
>>> s=Bits(’3*(uint:6=3, 0b1)’)
Negative slice indices allowed
The start and end parameters of many methods may now be negative, with the same meaning
as for negative slice indices. Affects all methods with these parameters.
Sequence ABCs used
The Bits class now derives from collections.Sequence, while the BitString class de-
rives from collections.MutableSequence.
Keywords allowed in readlist,peeklist and unpack
Keywords for token lengths are now permitted when reading. So for example, you can write
>>> s=bitstring.pack(’4*(uint:n)’,2,3,4,5, n=7)
>>> s.unpack(’4*(uint:n)’, n=7)
[2, 3, 4, 5]
start and end parameters added to rol and ror
join function accepts other iterables
Also its parameter has changed from ‘bitstringlist’ to ‘sequence’. This is technically a backward
incompatibility in the unlikely event that you are referring to the parameter by name.
__init__ method accepts keywords
Rather than a long list of initialisers the __init__ methods now use a **kwargs dictionary
for all initialisers except ‘auto’. This should have no effect, except that this is a small backward
incompatibility if you use positional arguments when initialising with anything other than auto
(which would be rather unusual).
110 Chapter 20. Release Notes
bitstring Documentation, Release 3.0.0
More optimisations
A number of methods have been speeded up.
Bug fixed in replace method
(it could fail if start != 0).
20.1.14 January 19th 2010: version 1.2.0 for Python 2.6 and 3.x
released
20.1.15 New ‘Bits’ class
Introducing a brand new class, Bits, representing an immutable sequence of bits.
The Bits class is the base class for the mutable BitString. The differences between Bits and
BitStrings are:
Bits are immutable, so once they have been created their value cannot change. This of
course means that mutating methods (append, replace, del etc.) are not available for Bits.
Bits are hashable, so they can be used in sets and as keys in dictionaries.
Bits are potentially more efficient than BitStrings, both in terms of computation and
memory. The current implementation is only marginally more efficient though - this
should improve in future versions.
You can switch from Bits to a BitString or vice versa by constructing a new object from the
old.
>>> s=Bits(’0xabcd’)
>>> t=BitString(s)
>>> t.append(’0xe’)
>>> u=Bits(t)
The relationship between Bits and BitString is supposed to loosely mirror that between bytes
and bytearray in Python 3.
Deprecation messages turned on
A number of methods have been flagged for removal in version 2. Deprecation warnings will
now be given, which include an alternative way to do the same thing. All of the deprecated
methods have simpler equivalent alternatives.
>>> t=s.slice(0,2)
__main__:1: DeprecationWarning: Call to deprecated function slice.
Instead of ’s.slice(a, b, c)’ use ’s[a:b:c]’.
20.1. Full Version History 111
bitstring Documentation, Release 3.0.0
The deprecated methods are: advancebit,advancebits,advancebyte,
advancebytes,retreatbit,retreatbits,retreatbyte,retreatbytes,
tell,seek,slice,delete,tellbyte,seekbyte,truncatestart and
truncateend.
Initialise from bool
Booleans have been added to the list of types that can ‘auto’ initialise a bitstring.
>>> zerobit =BitString(False)
>>> onebit =BitString(True)
Improved efficiency
More methods have been speeded up, in particular some deletions and insertions.
Bug fixes
A rare problem with truncating the start of bitstrings was fixed.
A possible problem outputting the final byte in tofile() was fixed.
20.1.16 December 22nd 2009: version 1.1.3 for Python 2.6 and 3.x
released
This version hopefully fixes an installation problem for platforms with case-sensitive file sys-
tems. There are no new features or other bug fixes.
20.1.17 December 18th 2009: version 1.1.2 for Python 2.6 and 3.x
released
This is a minor update with (almost) no new features.
Improved efficiency
The speed of many typical operations has been increased, some substantially.
Initialise from integer
A BitString of ‘0’ bits can be created using just an integer to give the length in bits. So instead
of
>>> s=BitString(length=100)
you can write just
112 Chapter 20. Release Notes
bitstring Documentation, Release 3.0.0
>>> s=BitString(100)
This matches the behaviour of bytearrays and (in Python 3) bytes.
A defect related to using the set / unset functions on !BitStrings initialised from a file has
been fixed.
20.1.18 November 24th 2009: version 1.1.0 for Python 2.6 and 3.x
released
Note that this version will not work for Python 2.4 or 2.5. There may be an update for these
Python versions some time next year, but it’s not a priorty quite yet. Also note that only one
version is now provided, which works for Python 2.6 and 3.x (done with the minimum of
hackery!)
20.1.19 New features
Improved efficiency
A fair number of functions have improved efficiency, some quite dramatically.
New bit setting and checking functions
Although these functions don’t do anything that couldn’t be done before, they do make some
common use cases much more efficient. If you need to set or check single bits then these are
the functions you need.
set /unset : Set bit(s) to 1 or 0 respectively.
allset /allunset : Check if all bits are 1 or all 0.
anyset /anyunset : Check if any bits are 1 or any 0.
>>> s=BitString(length=1000)
>>> s.set((10,100,44,12,1))
>>> s.allunset((2,22,222))
True
>>> s.anyset(range(7,77))
True
New rotate functions
ror /rol : Rotate bits to the right or left respectively.
>>> s=BitString(’0b100000000’)
>>> s.ror(2)
>>> s.bin
20.1. Full Version History 113
bitstring Documentation, Release 3.0.0
’0b001000000’
>>> s.rol(5)
>>> s.bin
’0b000000100’
Floating point interpretations
New float initialisations and interpretations are available. These only work for BitStrings of
length 32 or 64 bits.
>>> s=BitString(float=0.2, length=64)
>>> s.float
0.200000000000000001
>>> t=bitstring.pack(’<3f’,-0.4,1e34,17.0)
>>> t.hex
’0xcdccccbedf84f67700008841’
‘bytes’ token reintroduced
This token returns a bytes object (equivalent to a str in Python 2.6).
>>> s=BitString(’0x010203’)
>>> s.unpack(’bytes:2, bytes:1’)
[’\x01\x02’, ’\x03’]
‘uint’ is now the default token type
So for example these are equivalent:
a,b=s.readlist(’uint:12, uint:12’)
a,b=s.readlist(’12, 12’)
20.1.20 October 10th 2009: version 1.0.1 for Python 3.x released
This is a straight port of version 1.0.0 to Python 3.
For changes since the last Python 3 release read all the way down in this document to version
0.4.3.
This version will also work for Python 2.6, but there’s no advantage to using it over the 1.0.0
release. It won’t work for anything before 2.6.
20.1.21 October 9th 2009: version 1.0.0 for Python 2.x released
Version 1 is here!
114 Chapter 20. Release Notes
bitstring Documentation, Release 3.0.0
This is the first release not to carry the ‘beta’ tag. It contains a couple of minor new features
but is principally a release to fix the API. If you’ve been using an older version then you almost
certainly will have to recode a bit. If you’re not ready to do that then you may wish to delay
updating.
So the bad news is that there are lots of small changes to the API. The good news is that all the
changes are pretty trivial, the new API is cleaner and more ‘Pythonic’, and that by making it
version 1.0 I’m promising not to tweak it again for some time.
20.1.22 API Changes
New read / peek functions for returning multiple items
The functions read, readbits, readbytes, peek, peekbits and peekbytes now only ever return a
single item, never a list.
The new functions readlist, readbitlist, readbytelist, peeklist, peekbitlist and peekbytelist can
be used to read multiple items and will always return a list.
So a line like:
>>> a,b=s.read(’uint:12, hex:32’)
becomes
>>> a,b=s.readlist(’uint:12, hex:32’)
Renaming / removing functions
Functions have been renamed as follows:
‘‘seekbit‘‘ -> ‘‘seek‘‘
‘‘tellbit‘‘ -> ‘‘tell‘‘
‘‘reversebits‘‘ -> ‘‘reverse‘‘
‘‘deletebits‘‘ -> ‘‘delete‘‘
‘‘tostring‘‘ -> ‘‘tobytes‘‘
and a couple have been removed altogether:
deletebytes - use delete instead.
empty - use not s rather than s.empty().
20.1. Full Version History 115
bitstring Documentation, Release 3.0.0
Renaming parameters
The parameters ‘startbit’ and ‘endbit’ have been renamed ‘start’ and ‘end’. This affects the
methods slice,find,findall,rfind,reverse,cut and split.
The parameter ‘bitpos’ has been renamed to ‘pos’. The affects the methods seek,tell,
insert,overwrite and delete.
Mutating methods return None rather than self
This means that you can’t chain functions together so
>>> s.append(’0x00’).prepend(’0xff’)
>>> t=s.reverse()
Needs to be rewritten
>>> s.append(’0x00’)
>>> s.prepend(’0xff’)
>>> s.reverse()
>>> t=s
Affects truncatestart,truncateend,insert,overwrite,delete,append,
prepend,reverse and reversebytes.
Properties renamed
The ‘data’ property has been renamed to ‘bytes’. Also if the BitString is not a whole number
of bytes then a ValueError exception will be raised when using ‘bytes’ as a ‘getter’.
Properties ‘len’ and ‘pos’ have been added to replace ‘length’ and ‘bitpos’, although the longer
names have not been removed so you can continue to use them if you prefer.
Other changes
The unpack method now always returns a list, never a single item.
BitStrings are now ‘unhashable’, so calling hash on one or making a set will fail.
The colon separating the token name from its length is now mandatory. So for example
BitString(’uint12=100’) becomes BitString(’uint:12=100’).
Removed support for the ‘bytes’ token in format strings. Instead of
s.read(’bytes:4’) use s.read(’bits:32’).
116 Chapter 20. Release Notes
bitstring Documentation, Release 3.0.0
20.1.23 New features
Added endswith and startswith functions
These do much as you’d expect; they return True or False depending on whether the BitString
starts or ends with the parameter.
>>> BitString(’0xef342’).startswith(’0b11101’)
True
20.1.24 September 11th 2009: version 0.5.2 for Python 2.x released
Finally some tools for dealing with endianness!
New interpretations are now available for whole-byte BitStrings that treat them as big, little, or
native-endian
>>> big =BitString(intbe=1, length=16)# or BitString(’intbe:16=1’) if you prefer.
>>> little =BitString(intle=1, length=16)
>>> print big.hex, little.hex
0x0001 0x0100
>>> print big.intbe, little.intle
1 1
‘Struct’-like compact format codes
To save some typing when using pack,unpack,read and peek, compact format codes
based on those used in the struct and array modules have been added. These must start with
a character indicating the endianness (>,<or @for big, little and native-endian), followed by
characters giving the format:
b 1-byte signed int
B 1-byte unsigned int
h 2-byte signed int
H 2-byte unsigned int
l 4-byte signed int
L 4-byte unsigned int
q 8-byte signed int
Q 8-byte unsigned int
For example:
>>> s=bitstring.pack(’<4h’,0,1,2,3)
creates a BitString with four little-endian 2-byte integers. While
>>> x,y,z=s.read(’>hhl’)
reads them back as two big-endian two-byte integers and one four-byte big endian integer.
20.1. Full Version History 117
bitstring Documentation, Release 3.0.0
Of course you can combine this new format with the old ones however you like:
>>> s.unpack(’<h, intle:24, uint:5, bin’)
[0, 131073, 0, ’0b0000000001100000000’]
20.1.25 August 26th 2009: version 0.5.1 for Python 2.x released
This update introduces pack and unpack functions for creating and dissembling BitStrings.
New pack() and unpack() functions
The pack function provides a flexible new method for creating BitStrings. Tokens for BitString
‘literals’ can be used in the same way as in the constructor.
>>> from bitstring import BitString, pack
>>> a=pack(’0b11, 0xff, 0o77, int:5=-1, se=33’)
You can also leave placeholders in the format, which will be filled in by the values provided.
>>> b=pack(’uint:10, hex:4’,33,’f’)
Finally you can use a dictionary or keywords.
>>> c=pack(’bin=a, hex=b, bin=a’, a=’010’, b=’ef’)
The unpack method is similar to the read method except that it always unpacks from the
start of the BitString.
>>> x,y=b.unpack(’uint:10, hex’)
If a token is given without a length (as above) then it will expand to fill the remaining bits in
the BitString. This also now works with read and peek.
New tostring() and tofile() methods
The tostring method just returns the data as a string, with up to seven zero bits appended
to byte align. The tofile method does the same except writes to a file object.
>>> f=open(’myfile’,’wb’)
>>> BitString(’0x1234ff’).tofile(f)
Other changes
The use of =is now mandatory in ‘auto’ initialisers. Tokens like uint12 100 will no longer
work. Also the use of a :before the length is encouraged, but not yet mandated. So the
previous example should be written as uint:12=100.
The ‘auto’ initialiser will now take a file object.
118 Chapter 20. Release Notes
bitstring Documentation, Release 3.0.0
>>> f=open(’myfile’,’rb’)
>>> s=BitString(f)
20.1.26 July 19th 2009: version 0.5.0 for Python 2.x released
This update breaks backward compatibility in a couple of areas. The only one you probably
need to be concerned about is the change to the default for bytealigned in find,replace,
split, etc.
See the user manual for more details on each of these items.
Expanded abilities of ‘auto’ initialiser
More types can be initialised through the ‘auto’ initialiser. For example instead of
>>> a=BitString(uint=44, length=16)
you can write
>>> a=BitString(’uint16=44’)
Also, different comma-separated tokens will be joined together, e.g.
>>> b=BitString(’0xff’)+’int8=-5’
can be written
>>> b=BitString(’0xff, int8=-5’)
New formatted read and peek methods
These takes a format string similar to that used in the auto initialiser. If only one token is
provided then a single value is returned, otherwise a list of values is returned.
>>> start_code, width, height =s.read(’hex32, uint12, uint12’)
is equivalent to
>>> start_code =s.readbits(32).hex
>>> width =s.readbits(12).uint
>>> height =s.readbits(12).uint
The tokens are:
int n : n bits as an unsigned integer.
uint n : n bits as a signed integer.
hex n : n bits as a hexadecimal string.
oct n : n bits as an octal string.
bin n : n bits as a binary string.
ue : next bits as an unsigned exp-Golomb.
20.1. Full Version History 119
bitstring Documentation, Release 3.0.0
se : next bits as a signed exp-Golomb.
bits n : n bits as a new BitString.
bytes n : n bytes as a new BitString.
See the user manual for more details.
hex and oct methods removed
The special methods for hex and oct have been removed. Please use the hex and oct
properties instead.
>>> hex(s)
becomes
>>> s.hex
join made a method
The join function must now be called on a BitString object, which will be used to join the list
together. You may need to recode slightly:
>>> s=bitstring.join(’0x34’,’0b1001’,’0b1’)
becomes
>>> s=BitString().join(’0x34’,’0b1001’,’0b1’)
More than one value allowed in readbits, readbytes, peekbits and peekbytes
If you specify more than one bit or byte length then a list of BitStrings will be returned.
>>> a,b,c=s.readbits(10,5,5)
is equivalent to
>>> a=readbits(10)
>>> b=readbits(5)
>>> c=readbits(5)
bytealigned defaults to False, and is at the end of the parameter list
Functions that have a bytealigned paramater have changed so that it now defaults to False rather
than True. Also its position in the parameter list has changed to be at the end. You may need to
recode slightly (sorry!)
120 Chapter 20. Release Notes
bitstring Documentation, Release 3.0.0
readue and readse methods have been removed
Instead you should use the new read function with a ‘ue’ or ‘se’ token:
>>> i=s.readue()
becomes
>>> i=s.read(’ue’)
This is more flexible as you can read multiple items in one go, plus you can now also use the
peek method with ue and se.
Minor bugs fixed
See the issue tracker for more details.
20.1.27 June 15th 2009: version 0.4.3 for Python 2.x released
This is a minor update. This release is the first to bundle the bitstring manual. This is a PDF
and you can find it in the docs directory.
New ‘cut’ method
This method returns a generator for constant sized chunks of a BitString.
>>> for byte in s.cut(8):
... do_something_with(byte)
You can also specify a startbit and endbit, as well as a count, which limits the number of items
generated:
>>> first100TSPackets =list(s.cut(188*8, count=100))
‘slice’ method now equivalent to __getitem__
This means that a step can also be given to the slice method so that the following are now the
same thing, and it’s just a personal preference which to use:
>>> s1 =s[a:b:c]
>>> s2 =s.slice(a, b, c)
findall gets a ‘count’ parameter
So now
20.1. Full Version History 121
bitstring Documentation, Release 3.0.0
>>> list(a.findall(s, count=n))
is equivalent to
>>> list(a.findall(s))[:n]
except that it won’t need to generate the whole list and so is much more efficient.
Changes to ‘split’
The split method now has a ‘count’ parameter rather than ‘maxsplit’. This makes the interface
closer to that for cut, replace and findall. The final item generated is now no longer the whole
of the rest of the BitString.
A couple of minor bugs were fixed. See the issue tracker for details.
20.1.28 May 25th 2009: version 0.4.2 for Python 2.x released
This is a minor update, and almost doesn’t break compatibility with version 0.4.0, but with the
slight exception of findall() returning a generator, detailed below.
Stepping in slices
The use of the step parameter (also known as the stride) in slices has been added. Its use is
a little non-standard as it effectively gives a multiplicative factor to apply to the start and stop
parameters, rather than skipping over bits.
For example this makes it much more convenient if you want to give slices in terms of bytes
instead of bits. Instead of writing s[a*8:b*8] you can use s[a:b:8].
When using a step the BitString is effectively truncated to a multiple of the step, so s[::8]
is equal to sif sis an integer number of bytes, otherwise it is truncated by up to 7 bits. So the
final seven complete 16-bit words could be written as s[-7::16].
Negative slices are also allowed, and should do what you’d expect. So for example s[::-1]
returns a bit-reversed copy of s(which is similar to s.reversebits(), which does the
same operation on sin-place). As another example, to get the first 10 bytes in reverse byte
order you could use s_bytereversed = s[0:10:-8].
Removed restrictions on offset
You can now specify an offset of greater than 7 bits when creating a BitString, and the use of
offset is also now permitted when using the filename initialiser. This is useful when you want
to create a BitString from the middle of a file without having to read the file into memory.
>>> f=BitString(filename=’reallybigfile’, offset=8000000, length=32)
122 Chapter 20. Release Notes
bitstring Documentation, Release 3.0.0
Integers can be assigned to slices
You can now assign an integer to a slice of a BitString. If the integer doesn’t fit in the size of
slice given then a ValueError exception is raised. So this is now allowed and works as expected:
>>> s[8:16]=106
and is equivalent to
>>> s[8:16]=BitString(uint=106, length=8)
Less exceptions raised
Some changes have been made to slicing so that less exceptions are raised, bringing the inter-
face closer to that for lists. So for example trying to delete past the end of the BitString will
now just delete to the end, rather than raising a ValueError.
Initialisation from lists and tuples
A new option for the auto initialiser is to pass it a list or tuple. The items in the list or tuple are
evaluated as booleans and the bits in the BitString are set to 1 for True items and 0 for False
items. This can be used anywhere the auto initialiser can currently be used. For example:
>>> a=BitString([True,7,False,0, ()]) # 0b11000
>>> b=a+[’Yes’,’’]# Adds ’0b10’
>>> (True,True,False)in a
True
Miscellany
reversebits now has optional startbit and endbit parameters.
As an optimisation findall will return a generator, rather than a list. If you still want
the whole list then of course you can just call list() on the generator.
Improved efficiency of rfind.
A couple of minor bugs were fixed. See the issue tracker for details.
20.1.29 April 23rd 2009: Python 3 only version 0.4.1 released
This version is just a port of version 0.4.0 to Python 3. All the unit tests pass, but beyond
that only limited ad hoc testing has been done and so it should be considered an experimental
release. That said, the unit test coverage is very good - I’m just not sure if anyone even wants
a Python 3 version!
20.1. Full Version History 123
bitstring Documentation, Release 3.0.0
20.1.30 April 11th 2009: version 0.4.0 released
New methods
Added rfind,findall and replace. These do pretty much what you’d expect - see the
docstrings or the wiki for more information.
More special methods
Some missing methods were added: __repr__,__contains__,__rand__,__ror__,
__rxor__ and __delitem__.
Miscellany
A couple of small bugs were fixed (see the issue tracker).
There are some small backward incompatibilities relative to version 0.3.2:
Combined find and findbytealigned
findbytealigned has been removed, and becomes part of find. The default start position
has changed on both find and split to be the start of the BitString. You may need to recode:
>>> s1.find(bs)
>>> s2.findbytealigned(bs)
>>> s2.split(bs)
becomes
>>> s1.find(bs, bytealigned=False, startbit=s1.bitpos)
>>> s2.find(bs, startbit=s1.bitpos) # bytealigned defaults to True
>>> s2.split(bs, startbit=s2.bitpos)
Reading off end of BitString no longer raises exception
Previously a read or peek function that encountered the end of the BitString would raise a
ValueError. It will now instead return the remainder of the BitString, which could be an empty
BitString. This is closer to the file object interface.
Removed visibility of offset
The offset property was previously read-only, and has now been removed from public view
altogether. As it is used internally for efficiency reasons you shouldn’t really have needed to
use it. If you do then use the _offset parameter instead (with caution).
124 Chapter 20. Release Notes
bitstring Documentation, Release 3.0.0
20.1.31 March 11th 2009: version 0.3.2 released
Better performance
A number of methods (especially find and findbytealigned) have been sped up consid-
erably.
Bit-wise operations
Added support for bit-wise AND (&), OR (|) and XOR (^). For example:
>>> a=BitString(’0b00111’)
>>> print a&’0b10101’
0b00101
Miscellany
Added seekbit and seekbyte methods. These complement the ‘advance’ and ‘retreat’
functions, although you can still just use bitpos and bytepos properties directly.
>>> a.seekbit(100)# Equivalent to a.bitpos = 100
Allowed comparisons between BitString objects and strings. For example this will now work:
>>> a=BitString(’0b00001111’)
>>> a== ’0x0f’
True
20.1.32 February 26th 2009: version 0.3.1 released
This version only adds features and fixes bugs relative to 0.3.0, and doesn’t break backwards
compatibility.
Octal interpretation and initialisation
The oct property now joins bin and hex. Just prefix octal numbers with ‘0o’:
>>> a=BitString(’0o755’)
>>> print a.bin
0b111101101
Simpler copying
Rather than using b = copy.copy(a) to create a copy of a BitString, now you can just use
b = BitString(a).
20.1. Full Version History 125
bitstring Documentation, Release 3.0.0
More special methods
Lots of new special methods added, for example bit-shifting via << and >>, equality testing
via == and !=, bit inversion (~) and concatenation using *.
Also __setitem__ is now supported so BitString objects can be modified using standard
index notation.
Proper installer
Finally got round to writing the distutils script. To install just python setup.py
install.
20.1.33 February 15th 2009: version 0.3.0 released
Simpler initialisation from binary and hexadecimal
The first argument in the BitString constructor is now called ‘auto’ and will attempt to interpret
the type of a string. Prefix binary numbers with ‘0b’ and hexadecimals with ‘0x’:
>>> a=BitString(’0b0’)# single zero bit
>>> b=BitString(’0xffff’)# two bytes
Previously the first argument was ‘data’, so if you relied on this then you will need to recode:
>>> a=BitString(\x00\x00\x01\xb3)# Don’t do this any more!
becomes
>>> a=BitString(data=\x00\x00\x01\xb3)
or just
>>> a=BitString(’0x000001b3’)
This new notation can also be used in functions that take a BitString as an argument. For
example:
>>> a=BitString(’0x0011’)+’0xff’
>>> a.insert(’0b001’,6)
>>> a.find(’0b1111’)
BitString made more mutable
The methods append,deletebits,insert,overwrite,truncatestart and
truncateend now modify the BitString that they act upon. This allows for cleaner and more
efficient code, but you may need to rewrite slightly if you depended upon the old behaviour:
126 Chapter 20. Release Notes
bitstring Documentation, Release 3.0.0
>>> a=BitString(hex=’0xffff’)
>>> a=a.append(BitString(hex=’0x00’))
>>> b=a.deletebits(10,10)
becomes
>>> a=BitString(’0xffff’)
>>> a.append(’0x00’)
>>> b=copy.copy(a)
>>> b.deletebits(10,10)
Thanks to Frank Aune for suggestions in this and other areas.
Changes to printing
The binary interpretation of a BitString is now prepended with ‘0b’. This is in keeping with the
Python 2.6 (and 3.0) bin function. The prefix is optional when initialising using bin=.
Also, if you just print a BitString with no interpretation it will pick something appropriate - hex
if it is an integer number of bytes, otherwise binary. If the BitString representation is very long
it will be truncated by ‘... so it is only an approximate interpretation.
>>> a=BitString(’0b0011111’)
>>> print a
0b0011111
>>> a+= ’0b0’
>>> print a
0x3e
More convenience functions
Some missing methods such as advancebit and deletebytes have been added. Also a
number of ‘peek’ methods make an appearance as have prepend and reversebits. See
the Tutorial for more details.
20.1.34 January 13th 2009: version 0.2.0 released
Some fairly minor updates, not really deserving of a whole version point update.
20.1.35 December 29th 2008: version 0.1.0 released
First release!
20.1. Full Version History 127

Navigation menu