The Algorithm Design Manual

AlgorithmDesignManual

AlgorithmDesignManual

The%20Algorithm%20Design%20Manual

The%20Algorithm%20Design%20Manual

The%20Algorithm%20Design%20Manual%20Recommended%20by%20Google

User Manual: Pdf

Open the PDF directly: View PDF PDF.
Page Count: 739 [warning: Documents this large are best viewed by clicking the View PDF Link!]

The Algorithm Design Manual
Second Edition
Steven S. Skiena
The Algorithm Design Manual
Second Edition
123
Steven S. Skiena
Department of Computer Science
State University of New York
at Stony Brook
New York, USA
skiena@cs.sunysb.edu
ISBN: 978-1-84800-069-8 e-ISBN: 978-1-84800-070-4
DOI: 10.1007/978-1-84800-070-4
British Library Cataloguing in Publication Data
A catalogue record for this book is available from the British Library
Library of Congress Control Number: 2008931136
c
Springer-Verlag London Limited 2008
Apart from any fair dealing for the purposes of research or private study, or criticism or review, as permitted
under the Copyright, Designs and Patents Act 1988, this publication may only be reproduced, stored or trans-
mitted, in any form or by any means, with the prior permission in writing of the publishers, or in the case of
reprographic reproduction in accordance with the terms of licenses issued by the Copyright Licensing Agency.
Enquiries concerning reproduction outside those terms should be sent to the publishers.
The use of registered names, trademarks, etc., in this publication does not imply, even in the absence of a
specific statement, that such names are exempt from the relevant laws and regulations and therefore free for
general use.
The publisher makes no representation, express or implied, with regard to the accuracy of the information
contained in this book and cannot accept any legal responsibility or liability for any errors or omissions that
may be made.
Printed on acid-free paper
Springer Science+Business Media
springer.com
Preface
Most professional programmers that I’ve encountered are not well prepared to
tackle algorithm design problems. This is a pity, because the techniques of algorithm
design form one of the core practical technologies of computer science. Designing
correct, efficient, and implementable algorithms for real-world problems requires
access to two distinct bodies of knowledge:
Techniques – Good algorithm designers understand several fundamental al-
gorithm design techniques, including data structures, dynamic programming,
depth-first search, backtracking, and heuristics. Perhaps the single most im-
portant design technique is modeling, the art of abstracting a messy real-world
application into a clean problem suitable for algorithmic attack.
Resources – Good algorithm designers stand on the shoulders of giants.
Rather than laboring from scratch to produce a new algorithm for every task,
they can figure out what is known about a particular problem. Rather than
re-implementing popular algorithms from scratch, they seek existing imple-
mentations to serve as a starting point. They are familiar with many classic
algorithmic problems, which provide sufficient source material to model most
any application.
This book is intended as a manual on algorithm design, providing access to
combinatorial algorithm technology for both students and computer professionals.
It is divided into two parts: Techniques and Resources. The former is a general
guide to techniques for the design and analysis of computer algorithms. The Re-
sources section is intended for browsing and reference, and comprises the catalog
of algorithmic resources, implementations, and an extensive bibliography.
vi PREFACE
To the Reader
I have been gratified by the warm reception the first edition of The Algorithm De-
sign Manual has received since its initial publication in 1997. It has been recognized
as a unique guide to using algorithmic techniques to solve problems that often arise
in practice. But much has changed in the world since the The Algorithm Design
Manual was first published over ten years ago. Indeed, if we date the origins of
modern algorithm design and analysis to about 1970, then roughly 30% of modern
algorithmic history has happened since the first coming of The Algorithm Design
Manual.
Three aspects of The Algorithm Design Manual have been particularly beloved:
(1) the catalog of algorithmic problems, (2) the war stories, and (3) the electronic
component of the book. These features have been preserved and strengthened in
this edition:
The Catalog of Algorithmic Problems – Since finding out what is known about
an algorithmic problem can be a difficult task, I provide a catalog of the
75 most important problems arising in practice. By browsing through this
catalog, the student or practitioner can quickly identify what their problem is
called, what is known about it, and how they should proceed to solve it. To aid
in problem identification, we include a pair of “before” and “after” pictures for
each problem, illustrating the required input and output specifications. One
perceptive reviewer called my book “The Hitchhiker’s Guide to Algorithms”
on the strength of this catalog.
The catalog is the most important part of this book. To update the catalog
for this edition, I have solicited feedback from the world’s leading experts on
each associated problem. Particular attention has been paid to updating the
discussion of available software implementations for each problem.
War Stories – In practice, algorithm problems do not arise at the beginning of
a large project. Rather, they typically arise as subproblems when it becomes
clear that the programmer does not know how to proceed or that the current
solution is inadequate.
To provide a better perspective on how algorithm problems arise in the real
world, we include a collection of “war stories,” or tales from our experience
with real problems. The moral of these stories is that algorithm design and
analysis is not just theory, but an important tool to be pulled out and used
as needed.
This edition retains all the original war stories (with updates as appropriate)
plus additional new war stories covering external sorting, graph algorithms,
simulated annealing, and other topics.
Electronic Component – Since the practical person is usually looking for a
program more than an algorithm, we provide pointers to solid implementa-
tions whenever they are available. We have collected these implementations
PREFACE vii
at one central website site (http://www.cs.sunysb.edu/algorith) for easy re-
trieval. We have been the number one “Algorithm” site on Google pretty
much since the initial publication of the book.
Further, we provide recommendations to make it easier to identify the correct
code for the job. With these implementations available, the critical issue
in algorithm design becomes properly modeling your application, more so
than becoming intimate with the details of the actual algorithm. This focus
permeates the entire book.
Equally important is what we do not do in this book. We do not stress the
mathematical analysis of algorithms, leaving most of the analysis as informal ar-
guments. You will not find a single theorem anywhere in this book. When more
details are needed, the reader should study the cited programs or references. The
goal of this manual is to get you going in the right direction as quickly as possible.
To the Instructor
This book covers enough material for a standard Introduction to Algorithms course.
We assume the reader has completed the equivalent of a second programming
course, typically titled Data Structures or Computer Science II.
A full set of lecture slides for teaching this course is available online at
http://www.algorist.com . Further, I make available online audio and video lectures
using these slides to teach a full-semester algorithm course. Let me help teach your
course, by the magic of the Internet!
This book stresses design over analysis. It is suitable for both traditional lecture
courses and the new “active learning” method, where the professor does not lecture
but instead guides student groups to solve real problems. The “war stories” provide
an appropriate introduction to the active learning method.
I have made several pedagogical improvements throughout the book. Textbook-
oriented features include:
More Leisurely Discussion – The tutorial material in the first part of the book
has been doubled over the previous edition. The pages have been devoted to
more thorough and careful exposition of fundamental material, instead of
adding more specialized topics.
False Starts – Algorithms textbooks generally present important algorithms
as a fait accompli, obscuring the ideas involved in designing them and the
subtle reasons why other approaches fail. The war stories illustrate such de-
velopment on certain applied problems, but I have expanded such coverage
into classical algorithm design material as well.
Stop and Think – Here I illustrate my thought process as I solve a topic-
specific homework problem—false starts and all. I have interspersed such
viii PREFACE
problem blocks throughout the text to increase the problem-solving activity
of my readers. Answers appear immediately following each problem.
More and Improved Homework Problems – This edition of The Algorithm
Design Manual has twice as many homework exercises as the previous one.
Exercises that proved confusing or ambiguous have been improved or re-
placed. Degree of difficulty ratings (from 1 to 10) have been assigned to all
problems.
Self-Motivating Exam Design – In my algorithms courses, I promise the stu-
dents that all midterm and final exam questions will be taken directly from
homework problems in this book. This provides a “student-motivated exam,”
so students know exactly how to study to do well on the exam. I have carefully
picked the quantity, variety, and difficulty of homework exercises to make this
work; ensuring there are neither too few or too many candidate problems.
Take-Home Lessons – Highlighted “take-home” lesson boxes scattered
throughout the text emphasize the big-picture concepts to be gained from
the chapter.
Links to Programming Challenge Problems – Each chapter’s exercises will
contain links to 3-5 relevant “Programming Challenge” problems from
http://www.programming-challenges.com. These can be used to add a pro-
gramming component to paper-and-pencil algorithms courses.
More Code, Less Pseudo-code – More algorithms in this book appear as code
(written in C) instead of pseudo-code. I believe the concreteness and relia-
bility of actual tested implementations provides a big win over less formal
presentations for simple algorithms. Full implementations are available for
study at http://www.algorist.com .
Chapter Notes – Each tutorial chapter concludes with a brief notes section,
pointing readers to primary sources and additional references.
Acknowledgments
Updating a book dedication after ten years focuses attention on the effects of time.
Since the first edition, Renee has become my wife and then the mother of our
two children, Bonnie and Abby. My father has left this world, but Mom and my
brothers Len and Rob remain a vital presence in my life. I dedicate this book to
my family, new and old, here and departed.
I would like to thank several people for their concrete contributions to this
new edition. Andrew Gaun and Betson Thomas helped in many ways, particularly
in building the infrastructure for the new http://www.cs.sunysb.edu/algorithand
dealing with a variety of manuscript preparation issues. David Gries offered valu-
able feedback well beyond the call of duty. Himanshu Gupta and Bin Tang bravely
PREFACE ix
taught courses using a manuscript version of this edition. Thanks also to my
Springer-Verlag editors, Wayne Wheeler and Allan Wylde.
A select group of algorithmic sages reviewed sections of the Hitchhiker’s guide,
sharing their wisdom and alerting me to new developments. Thanks to:
Ami Amir, Herve Bronnimann, Bernard Chazelle, Chris Chu, Scott
Cotton, Yefim Dinitz, Komei Fukuda, Michael Goodrich, Lenny Heath,
Cihat Imamoglu, Tao Jiang, David Karger, Giuseppe Liotta, Albert
Mao, Silvano Martello, Catherine McGeoch, Kurt Mehlhorn, Scott
A. Mitchell, Naceur Meskini, Gene Myers, Gonzalo Navarro, Stephen
North, Joe O’Rourke, Mike Paterson, Theo Pavlidis, Seth Pettie, Michel
Pocchiola, Bart Preneel, Tomasz Radzik, Edward Reingold, Frank
Ruskey, Peter Sanders, Joao Setubal, Jonathan Shewchuk, Robert
Skeel, Jens Stoye, Torsten Suel, Bruce Watson, and Uri Zwick.
Several exercises were originated by colleagues or inspired by other texts. Re-
constructing the original sources years later can be challenging, but credits for each
problem (to the best of my recollection) appear on the website.
It would be rude not to thank important contributors to the original edition.
Ricky Bradley and Dario Vlah built up the substantial infrastructure required for
the WWW site in a logical and extensible manner. Zhong Li drew most of the
catalog figures using xfig. Richard Crandall, Ron Danielson, Takis Metaxas, Dave
Miller, Giri Narasimhan, and Joe Zachary all reviewed preliminary versions of the
first edition; their thoughtful feedback helped to shape what you see here.
Much of what I know about algorithms I learned along with my graduate
students. Several of them (Yaw-Ling Lin, Sundaram Gopalakrishnan, Ting Chen,
Francine Evans, Harald Rau, Ricky Bradley, and Dimitris Margaritis) are the real
heroes of the war stories related within. My Stony Brook friends and algorithm
colleagues Estie Arkin, Michael Bender, Jie Gao, and Joe Mitchell have always
been a pleasure to work and be with. Finally, thanks to Michael Brochstein and
the rest of the city contingent for revealing a proper life well beyond Stony Brook.
Caveat
It is traditional for the author to magnanimously accept the blame for whatever
deficiencies remain. I don’t. Any errors, deficiencies, or problems in this book are
somebody else’s fault, but I would appreciate knowing about them so as to deter-
mine who is to blame.
Steven S. Skiena
Department of Computer Science
Stony Brook University
Stony Brook, NY 11794-4400
http://www.cs.sunysb.edu/skiena
April 2008
Contents
I Practical Algorithm Design 1
1 Introduction to Algorithm Design 3
1.1 Robot Tour Optimization . ..................... 5
1.2 SelectingtheRightJobs ....................... 9
1.3 ReasoningaboutCorrectness .................... 11
1.4 ModelingtheProblem ........................ 19
1.5 AbouttheWarStories ........................ 22
1.6 WarStory:PsychicModeling .................... 23
1.7 Exercises................................ 27
2 Algorithm Analysis 31
2.1 The RAM Model of Computation . . ................ 31
2.2 The Big Oh Notation ......................... 34
2.3 Growth Rates and Dominance Relations . . ............ 37
2.4 Working with the Big Oh . ..................... 40
2.5 Reasoning About Efficiency ..................... 41
2.6 Logarithms and Their Applications . ................ 46
2.7 Properties of Logarithms . . ..................... 50
2.8 War Story: Mystery of the Pyramids ................ 51
2.9 Advanced Analysis (*) . . . ..................... 54
2.10 Exercises . . .............................. 57
3 Data Structures 65
3.1 Contiguous vs. Linked Data Structures . . . ............ 66
xii CONTENTS
3.2 Stacks and Queues . . ........................ 71
3.3 Dictionaries . . ............................ 72
3.4 BinarySearchTrees.......................... 77
3.5 Priority Queues ............................ 83
3.6 War Story: Stripping Triangulations . ............... 85
3.7 HashingandStrings ......................... 89
3.8 Specialized Data Structures . . ................... 93
3.9 WarStory:StringemUp ...................... 94
3.10 Exercises ................................ 98
4 Sorting and Searching 103
4.1 Applications of Sorting ........................ 104
4.2 Pragmatics of Sorting . ........................ 107
4.3 Heapsort:FastSortingviaDataStructures ............ 108
4.4 WarStory:GivemeaTicketonanAirplane ........... 118
4.5 Mergesort: Sorting by Divide-and-Conquer . . .......... 120
4.6 Quicksort: Sorting by Randomization ............... 123
4.7 Distribution Sort: Sorting via Bucketing .............. 129
4.8 WarStory:SkienafortheDefense ................. 131
4.9 Binary Search and Related Algorithms ............... 132
4.10 Divide-and-Conquer . . ........................ 135
4.11 Exercises ................................ 139
5 Graph Traversal 145
5.1 FlavorsofGraphs........................... 146
5.2 DataStructuresforGraphs ..................... 151
5.3 WarStory:IwasaVictimofMooresLaw ............ 155
5.4 War Story: Getting the Graph . ................... 158
5.5 TraversingaGraph.......................... 161
5.6 Breadth-FirstSearch ......................... 162
5.7 Applications of Breadth-First Search . ............... 166
5.8 Depth-FirstSearch .......................... 169
5.9 Applications of Depth-First Search . . ............... 172
5.10 Depth-First Search on Directed Graphs .............. 178
5.11 Exercises ................................ 184
6 Weighted Graph Algorithms 191
6.1 MinimumSpanningTrees ...................... 192
6.2 WarStory:NothingbutNets .................... 202
6.3 ShortestPaths............................. 205
6.4 War Story: Dialing for Documents . . ............... 212
6.5 Network Flows and Bipartite Matching .............. 217
6.6 Design Graphs, Not Algorithms ................... 222
6.7 Exercises................................ 225
CONTENTS xiii
7 Combinatorial Search and Heuristic Methods 230
7.1 Backtracking.............................. 231
7.2 SearchPruning ............................ 238
7.3 Sudoku . . . .............................. 239
7.4 WarStory:CoveringChessboards.................. 244
7.5 HeuristicSearchMethods ...................... 247
7.6 WarStory:OnlyitisNotaRadio ................. 260
7.7 War Story: Annealing Arrays .................... 263
7.8 OtherHeuristicSearchMethods .................. 266
7.9 Parallel Algorithms . ......................... 267
7.10 War Story: Going Nowhere Fast . . . ................ 268
7.11 Exercises . . .............................. 270
8 Dynamic Programming 273
8.1 Caching vs. Computation . ..................... 274
8.2 Approximate String Matching .................... 280
8.3 Longest Increasing Sequence ..................... 289
8.4 War Story: Evolution of the Lobster ................ 291
8.5 The Partition Problem . . . ..................... 294
8.6 Parsing Context-Free Grammars . . ................ 298
8.7 Limitations of Dynamic Programming: TSP ............ 301
8.8 War Story: What’s Past is Prolog . . ................ 304
8.9 War Story: Text Compression for Bar Codes ........... 307
8.10 Exercises . . .............................. 310
9 Intractable Problems and Approximation Algorithms 316
9.1 Problems and Reductions . ..................... 317
9.2 Reductions for Algorithms . ..................... 319
9.3 Elementary Hardness Reductions . . ................ 323
9.4 Satisfiability .............................. 328
9.5 Creative Reductions ......................... 330
9.6 The Art of Proving Hardness .................... 334
9.7 War Story: Hard Against the Clock . ................ 337
9.8 War Story: And Then I Failed .................... 339
9.9 Pvs.NP ................................ 341
9.10 Dealing with NP-complete Problems ................ 344
9.11 Exercises . . .............................. 350
10 How to Design Algorithms 356
II The Hitchhiker’s Guide to Algorithms 361
11 A Catalog of Algorithmic Problems 363
xiv CONTENTS
12 Data Structures 366
12.1 Dictionaries . . ............................ 367
12.2 Priority Queues ............................ 373
12.3 Suffix Trees and Arrays ........................ 377
12.4 Graph Data Structures ........................ 381
12.5 Set Data Structures . . ........................ 385
12.6 Kd-Trees ................................ 389
13 Numerical Problems 393
13.1 Solving Linear Equations ....................... 395
13.2 Bandwidth Reduction ........................ 398
13.3 Matrix Multiplication . ........................ 401
13.4 Determinants and Permanents . ................... 404
13.5 Constrained and Unconstrained Optimization . .......... 407
13.6 Linear Programming . ........................ 411
13.7 Random Number Generation . ................... 415
13.8 Factoring and Primality Testing ................... 420
13.9 Arbitrary-Precision Arithmetic ................... 423
13.10 Knapsack Problem . . ........................ 427
13.11 Discrete Fourier Transform . . ................... 431
14 Combinatorial Problems 434
14.1 Sorting ................................. 436
14.2 Searching ................................ 441
14.3 Median and Selection . ........................ 445
14.4 Generating Permutations ....................... 448
14.5 Generating Subsets . . ........................ 452
14.6 Generating Partitions . ........................ 456
14.7 Generating Graphs . . ........................ 460
14.8 Calendrical Calculations ....................... 465
14.9 Job Scheduling . ............................ 468
14.10 Satisfiability . . ............................ 472
15 Graph Problems: Polynomial-Time 475
15.1 Connected Components ....................... 477
15.2 Topological Sorting . . ........................ 481
15.3 Minimum Spanning Tree ....................... 484
15.4 Shortest Path . ............................ 489
15.5 Transitive Closure and Reduction . . . ............... 495
15.6 Matching ................................ 498
15.7 Eulerian Cycle/Chinese Postman . . . ............... 502
15.8 Edge and Vertex Connectivity . ................... 505
15.9 Network Flow . ............................ 509
15.10 Drawing Graphs Nicely ........................ 513
CONTENTS xv
15.11 Drawing Trees ............................. 517
15.12 Planarity Detection and Embedding ................ 520
16 Graph Problems: Hard Problems 523
16.1 Clique .................................. 525
16.2 Independent Set . . . ......................... 528
16.3 Vertex Cover .............................. 530
16.4 Traveling Salesman Problem ..................... 533
16.5 Hamiltonian Cycle . ......................... 538
16.6 Graph Partition . . . ......................... 541
16.7 Vertex Coloring . . . ......................... 544
16.8 Edge Coloring ............................. 548
16.9 Graph Isomorphism . ......................... 550
16.10 Steiner Tree .............................. 555
16.11 Feedback Edge/Vertex Set . ..................... 559
17 Computational Geometry 562
17.1 Robust Geometric Primitives .................... 564
17.2 Convex Hull .............................. 568
17.3 Triangulation ............................. 572
17.4 Voronoi Diagrams . . ......................... 576
17.5 Nearest Neighbor Search . . ..................... 580
17.6 Range Search ............................. 584
17.7 Point Location ............................. 587
17.8 Intersection Detection . . . ..................... 591
17.9 Bin Packing .............................. 595
17.10 Medial-Axis Transform . . . ..................... 598
17.11 Polygon Partitioning ......................... 601
17.12 Simplifying Polygons ......................... 604
17.13 Shape Similarity . . . ......................... 607
17.14 Motion Planning . . ......................... 610
17.15 Maintaining Line Arrangements . . . ................ 614
17.16 Minkowski Sum . . . ......................... 617
18 Set and String Problems 620
18.1 Set Cover . . .............................. 621
18.2 Set Packing .............................. 625
18.3 String Matching . . . ......................... 628
18.4 Approximate String Matching .................... 631
18.5 Text Compression . . ......................... 637
18.6 Cryptography ............................. 641
18.7 Finite State Machine Minimization . ................ 646
18.8 Longest Common Substring/Subsequence . ............ 650
18.9 Shortest Common Superstring .................... 654
xvi CONTENTS
19 Algorithmic Resources 657
19.1 Software Systems . . . ........................ 657
19.2 Data Sources . . ............................ 663
19.3 Online Bibliographic Resources ................... 663
19.4 Professional Consulting Services ................... 664
Bibliography 665
Index 709
1
Introduction to Algorithm Design
What is an algorithm? An algorithm is a procedure to accomplish a specific task.
An algorithm is the idea behind any reasonable computer program.
To be interesting, an algorithm must solve a general, well-specified problem.An
algorithmic problem is specified by describing the complete set of instances it must
work on and of its output after running on one of these instances. This distinction,
between a problem and an instance of a problem, is fundamental. For example, the
algorithmic problem known as sorting is defined as follows:
Problem: Sorting
Input: A sequence of nkeys a1,...,a
n.
Output: The permutation (reordering) of the input sequence such that a
1a
2
···a
n1a
n.
An instance of sorting might be an array of names, like {Mike, Bob, Sally, Jill,
Jan}, or a list of numbers like {154, 245, 568, 324, 654, 324}. Determining that
you are dealing with a general problem is your first step towards solving it.
An algorithm is a procedure that takes any of the possible input instances
and transforms it to the desired output. There are many different algorithms for
solving the problem of sorting. For example, insertion sort is a method for sorting
that starts with a single element (thus forming a trivially sorted list) and then
incrementally inserts the remaining elements so that the list stays sorted. This
algorithm, implemented in C, is described below:
S.S. Skiena, The Algorithm Design Manual, 2nd ed., DOI: 10.1007/978-1-84800-070-4 1,
c
Springer-Verlag London Limited 2008
41. INTRODUCTION TO ALGORITHM DESIGN
I N S E R T I O N S O R TI N S E R T I O N S O R T
I N S E R T I O N S O R T
I N S E R T I O N S O R T
E I N S R T I O N S O R TE I N S R T I O N S O R T
E I N R S T I O N S O R T
E I N R S T I O N S O R TE I N R S T I O N S O R T
E I I N R S T O N S O R T
E I I N O R S T N S O R TE I I N O R S T N S O R T
E I I N N O R S T S O R TE I I N N O R S T S O R T
E I I N N O R S S T O R TE I I N N O R S S T O R T
E I I
N
N
O
O
R R
S
S
T T
E I I N N O O R S S T R T
E I I N N O O R R S S T T
Figure 1.1: Animation of insertion sort in action (time flows down)
insertion_sort(item s[], int n)
{
int i,j; /* counters */
for (i=1; i<n; i++) {
j=i;
while ((j>0) && (s[j] < s[j-1])) {
swap(&s[j],&s[j-1]);
j = j-1;
}
}
}
An animation of the logical flow of this algorithm on a particular instance (the
letters in the word “INSERTIONSORT”) is given in Figure 1.1
Note the generality of this algorithm. It works just as well on names as it does
on numbers, given the appropriate comparison operation (<) to test which of the
two keys should appear first in sorted order. It can be readily verified that this
algorithm correctly orders every possible input instance according to our definition
of the sorting problem.
There are three desirable properties for a good algorithm. We seek algorithms
that are correct and efficient, while being easy to implement. These goals may not
be simultaneously achievable. In industrial settings, any program that seems to
give good enough answers without slowing the application down is often acceptable,
regardless of whether a better algorithm exists. The issue of finding the best possible
answer or achieving maximum efficiency usually arises in industry only after serious
performance or legal troubles.
In this chapter, we will focus on the issues of algorithm correctness, and defer a
discussion of efficiency concerns to Chapter 2. It is seldom obvious whether a given
1.1 ROBOT TOUR OPTIMIZATION 5
0 0
1
2
3
4
5
6
7
8
Figure 1.2: A good instance for the nearest-neighbor heuristic
algorithm correctly solves a given problem. Correct algorithms usually come with
a proof of correctness, which is an explanation of why we know that the algorithm
must take every instance of the problem to the desired result. However, before we go
further we demonstrate why “it’s obvious” never suffices as a proof of correctness,
and is usually flat-out wrong.
1.1 Robot Tour Optimization
Let’s consider a problem that arises often in manufacturing, transportation, and
testing applications. Suppose we are given a robot arm equipped with a tool, say a
soldering iron. In manufacturing circuit boards, all the chips and other components
must be fastened onto the substrate. More specifically, each chip has a set of contact
points (or wires) that must be soldered to the board. To program the robot arm
for this job, we must first construct an ordering of the contact points so the robot
visits (and solders) the first contact point, then the second point, third, and so
forth until the job is done. The robot arm then proceeds back to the first contact
point to prepare for the next board, thus turning the tool-path into a closed tour,
or cycle.
Robots are expensive devices, so we want the tour that minimizes the time it
takes to assemble the circuit board. A reasonable assumption is that the robot arm
moves with fixed speed, so the time to travel between two points is proportional
to their distance. In short, we must solve the following algorithm problem:
Problem: Robot Tour Optimization
Input: AsetSof npoints in the plane.
Output: What is the shortest cycle tour that visits each point in the set S?
You are given the job of programming the robot arm. Stop right now and think
up an algorithm to solve this problem. I’ll be happy to wait until you find one. . .
61. INTRODUCTION TO ALGORITHM DESIGN
Several algorithms might come to mind to solve this problem. Perhaps the most
popular idea is the nearest-neighbor heuristic. Starting from some point p0, we walk
first to its nearest neighbor p1.Fromp1, we walk to its nearest unvisited neighbor,
thus excluding only p0as a candidate. We now repeat this process until we run
out of unvisited points, after which we return to p0to close off the tour. Written
in pseudo-code, the nearest-neighbor heuristic looks like this:
NearestNeighbor(P)
Pick and visit an initial point p0from P
p=p0
i=0
While there are still unvisited points
i=i+1
Select pito be the closest unvisited point to pi1
Visit pi
Return to p0from pn1
This algorithm has a lot to recommend it. It is simple to understand and imple-
ment. It makes sense to visit nearby points before we visit faraway points to reduce
the total travel time. The algorithm works perfectly on the example in Figure 1.2.
The nearest-neighbor rule is reasonably efficient, for it looks at each pair of points
(pi,p
j) at most twice: once when adding pito the tour, the other when adding pj.
Against all these positives there is only one problem. This algorithm is completely
wrong.
Wrong? How can it be wrong? The algorithm always finds a tour, but it doesn’t
necessarily find the shortest possible tour. It doesn’t necessarily even come close.
Consider the set of points in Figure 1.3, all of which lie spaced along a line. The
numbers describe the distance that each point lies to the left or right of the point
labeled ‘0’. When we start from the point ‘0’ and repeatedly walk to the nearest
unvisited neighbor, we might keep jumping left-right-left-right over ‘0’ as the algo-
rithm offers no advice on how to break ties. A much better (indeed optimal) tour
for these points starts from the leftmost point and visits each point as we walk
right before returning at the rightmost point.
Try now to imagine your boss’s delight as she watches a demo of your robot
arm hopscotching left-right-left-right during the assembly of such a simple board.
“But wait,” you might be saying. “The problem was in starting at point ‘0’.
Instead, why don’t we start the nearest-neighbor rule using the leftmost point
as the initial point p0? By doing this, we will find the optimal solution on this
instance.”
That is 100% true, at least until we rotate our example 90 degrees. Now all
points are equally leftmost. If the point ‘0’ were moved just slightly to the left, it
would be picked as the starting point. Now the robot arm will hopscotch up-down-
up-down instead of left-right-left-right, but the travel time will be just as bad as
before. No matter what you do to pick the first point, the nearest-neighbor rule is
doomed to work incorrectly on certain point sets.
1.1 ROBOT TOUR OPTIMIZATION 7
-1 0 1 3 11
-21 -5
-1 0 1 3 11
-21 -5
Figure 1.3: A bad instance for the nearest-neighbor heuristic, with the optimal solution
Maybe what we need is a different approach. Always walking to the closest
point is too restrictive, since it seems to trap us into making moves we didn’t
want. A different idea might be to repeatedly connect the closest pair of endpoints
whose connection will not create a problem, such as premature termination of the
cycle. Each vertex begins as its own single vertex chain. After merging everything
together, we will end up with a single chain containing all the points in it. Con-
necting the final two endpoints gives us a cycle. At any step during the execution
of this closest-pair heuristic, we will have a set of single vertices and vertex-disjoint
chains available to merge. In pseudocode:
ClosestPair(P)
Let nbe the number of points in set P.
For i=1ton1do
d=
For each pair of endpoints (s, t) from distinct vertex chains
if dist(s, t)dthen sm=s,tm=t,andd=dist(s, t)
Connect (sm,t
m) by an edge
Connect the two endpoints by an edge
This closest-pair rule does the right thing in the example in Figure 1.3. It starts
by connecting ‘0’ to its immediate neighbors, the points 1 and 1. Subsequently,
the next closest pair will alternate left-right, growing the central path by one link at
a time. The closest-pair heuristic is somewhat more complicated and less efficient
than the previous one, but at least it gives the right answer in this example.
But this is not true in all examples. Consider what this algorithm does on the
point set in Figure 1.4(l). It consists of two rows of equally spaced points, with
the rows slightly closer together (distance 1 e) than the neighboring points are
spaced within each row (distance 1 + e). Thus the closest pairs of points stretch
across the gap, not around the boundary. After we pair off these points, the closest
81. INTRODUCTION TO ALGORITHM DESIGN
1−e
1+e
1+e
1−e
(l) 1+e
1−e
1+e
1−e
(r)
Figure 1.4: A bad instance for the closest-pair heuristic, with the optimal solution
remaining pairs will connect these pairs alternately around the boundary. The total
path length of the closest-pair tour is 3(1 e) + 2(1 + e)+(1 e)2+(2+2e)2.
Compared to the tour shown in Figure 1.4(r), we travel over 20% farther than
necessary when e0. Examples exist where the penalty is considerably worse
than this.
Thus this second algorithm is also wrong. Which one of these algorithms per-
forms better? You can’t tell just by looking at them. Clearly, both heuristics can
end up with very bad tours on very innocent-looking input.
At this point, you might wonder what a correct algorithm for our problem looks
like. Well, we could try enumerating all possible orderings of the set of points, and
then select the ordering that minimizes the total length:
OptimalTSP(P)
d=
For each of the n! permutations Piof point set P
If (cost(Pi)d) then d=cost(Pi)andPmin =Pi
Return Pmin
Since all possible orderings are considered, we are guaranteed to end up with
the shortest possible tour. This algorithm is correct, since we pick the best of all
the possibilities. But it is also extremely slow. The fastest computer in the world
couldn’t hope to enumerate all the 20! =2,432,902,008,176,640,000 orderings of 20
points within a day. For real circuit boards, where n1,000, forget about it.
All of the world’s computers working full time wouldn’t come close to finishing
the problem before the end of the universe, at which point it presumably becomes
moot.
The quest for an efficient algorithm to solve this problem, called the traveling
salesman problem (TSP), will take us through much of this book. If you need to
know how the story ends, check out the catalog entry for the traveling salesman
problem in Section 16.4 (page 533).
1.2 SELECTING THE RIGHT JOBS 9
The President’s Algorist
Halting State
"Discrete" Mathematics Calculated Bets
Programming Challenges
Steiner’s Tree Process Terminated
Tarjan of the Jungle The Four Volume Problem
Figure 1.5: An instance of the non-overlapping movie scheduling problem
Take-Home Lesson: There is a fundamental difference between algorithms,
which always produce a correct result, and heuristics, which may usually do a
good job but without providing any guarantee.
1.2 Selecting the Right Jobs
Now consider the following scheduling problem. Imagine you are a highly-in-
demand actor, who has been presented with offers to star in ndifferent movie
projects under development. Each offer comes specified with the first and last day
of filming. To take the job, you must commit to being available throughout this
entire period. Thus you cannot simultaneously accept two jobs whose intervals
overlap.
For an artist such as yourself, the criteria for job acceptance is clear: you want
to make as much money as possible. Because each of these films pays the same fee
per film, this implies you seek the largest possible set of jobs (intervals) such that
no two of them conflict with each other.
For example, consider the available projects in Figure 1.5. We can star in at most
four films, namely “Discrete” Mathematics,Programming Challenges,Calculated
Bets, and one of either Halting State or Steiner’s Tree.
You (or your agent) must solve the following algorithmic scheduling problem:
Problem: Movie Scheduling Problem
Input: AsetIof nintervals on the line.
Output: What is the largest subset of mutually non-overlapping intervals which can
be selected from I?
You are given the job of developing a scheduling algorithm for this task. Stop
right now and try to find one. Again, I’ll be happy to wait.
There are several ideas that may come to mind. One is based on the notion
that it is best to work whenever work is available. This implies that you should
start with the job with the earliest start date – after all, there is no other job you
can work on, then at least during the begining of this period.
10 1. INTRODUCTION TO ALGORITHM DESIGN
(l) (r)
War and Peace
Figure 1.6: Bad instances for the (l) earliest job first and (r) shortest job first heuristics.
EarliestJobFirst(I)
Accept the earlest starting job jfrom Iwhich does not overlap any
previously accepted job, and repeat until no more such jobs remain.
This idea makes sense, at least until we realize that accepting the earliest job
might block us from taking many other jobs if that first job is long. Check out
Figure 1.6(l), where the epic “War and Peace” is both the first job available and
long enough to kill off all other prospects.
This bad example naturally suggests another idea. The problem with “War and
Peace” is that it is too long. Perhaps we should start by taking the shortest job,
and keep seeking the shortest available job at every turn. Maximizing the number
of jobs we do in a given period is clearly connected to banging them out as quickly
as possible. This yields the heuristic:
ShortestJobFirst(I)
While (I=)do
Accept the shortest possible job jfrom I.
Delete j, and any interval which intersects jfrom I.
Again this idea makes sense, at least until we realize that accepting the shortest
job might block us from taking two other jobs, as shown in Figure 1.6(r). While the
potential loss here seems smaller than with the previous heuristic, it can readily
limit us to half the optimal payoff.
At this point, an algorithm where we try all possibilities may start to look good,
because we can be certain it is correct. If we ignore the details of testing whether
a set of intervals are in fact disjoint, it looks something like this:
ExhaustiveScheduling(I)
j=0
Smax =
For each of the 2nsubsets Siof intervals I
If (Siis mutally non-overlapping) and (size(Si)>j)
then j=size(Si)andSmax =Si.
Return Smax
But how slow is it? The key limitation is enumerating the 2nsubsets of n
things. The good news is that this is much better than enumerating all n! orders
1.3 REASONING ABOUT CORRECTNESS 11
of nthings, as proposed for the robot tour optimization problem. There are only
about one million subsets when n= 20, which could be exhaustively counted
within seconds on a decent computer. However, when fed n= 100 movies, 2100 is
much much greater than the 20! which made our robot cry “uncle” in the previous
problem.
The difference between our scheduling and robotics problems are that there is an
algorithm which solves movie scheduling both correctly and efficiently. Think about
the first job to terminate—i.e. the interval xwhich contains the rightmost point
which is leftmost among all intervals. This role is played by “Discrete” Mathematics
in Figure 1.5. Other jobs may well have started before x, but all of these must at
least partially overlap each other, so we can select at most one from the group. The
first of these jobs to terminate is x, so any of the overlapping jobs potentially block
out other opportunities to the right of it. Clearly we can never lose by picking x.
This suggests the following correct, efficient algorithm:
OptimalScheduling(I)
While (I=)do
Accept the job jfrom Iwith the earliest completion date.
Delete j, and any interval which intersects jfrom I.
Ensuring the optimal answer over all possible inputs is a difficult but often
achievable goal. Seeking counterexamples that break pretender algorithms is an
important part of the algorithm design process. Efficient algorithms are often lurk-
ing out there; this book seeks to develop your skills to help you find them.
Take-Home Lesson: Reasonable-looking algorithms can easily be incorrect. Al-
gorithm correctness is a property that must be carefully demonstrated.
1.3 Reasoning about Correctness
Hopefully, the previous examples have opened your eyes to the subtleties of algo-
rithm correctness. We need tools to distinguish correct algorithms from incorrect
ones, the primary one of which is called a proof.
A proper mathematical proof consists of several parts. First, there is a clear,
precise statement of what you are trying to prove. Second, there is a set of assump-
tions of things which are taken to be true and hence used as part of the proof.
Third, there is a chain of reasoning which takes you from these assumptions to the
statement you are trying to prove. Finally, there is a little square ( )orQED at the
bottom to denote that you have finished, representing the Latin phrase for “thus
it is demonstrated.”
This book is not going to emphasize formal proofs of correctness, because they
are very difficult to do right and quite misleading when you do them wrong. A
proof is indeed a demonstration. Proofs are useful only when they are honest; crisp
arguments explaining why an algorithm satisfies a nontrivial correctness property.
12 1. INTRODUCTION TO ALGORITHM DESIGN
Correct algorithms require careful exposition, and efforts to show both cor-
rectness and not incorrectness. We develop tools for doing so in the subsections
below.
1.3.1 Expressing Algorithms
Reasoning about an algorithm is impossible without a careful description of the
sequence of steps to be performed. The three most common forms of algorithmic
notation are (1) English, (2) pseudocode, or (3) a real programming language.
We will use all three in this book. Pseudocode is perhaps the most mysterious of
the bunch, but it is best defined as a programming language that never complains
about syntax errors. All three methods are useful because there is a natural tradeoff
between greater ease of expression and precision. English is the most natural but
least precise programming language, while Java and C/C++ are precise but diffi-
cult to write and understand. Pseudocode is generally useful because it represents
a happy medium.
The choice of which notation is best depends upon which method you are most
comfortable with. I usually prefer to describe the ideas of an algorithm in English,
moving to a more formal, programming-language-like pseudocode or even real code
to clarify sufficiently tricky details.
A common mistake my students make is to use pseudocode to dress up an ill-
defined idea so that it looks more formal. Clarity should be the goal. For example,
the ExhaustiveScheduling algorithm on page 10 could have been better written
in English as:
ExhaustiveScheduling(I)
Test all 2nsubsets of intervals from I, and return the largest subset
consisting of mutually non-overlapping intervals.
Take-Home Lesson: The heart of any algorithm is an idea. If your idea is
not clearly revealed when you express an algorithm, then you are using too
low-level a notation to describe it.
1.3.2 Problems and Properties
We need more than just an algorithm description in order to demonstrate cor-
rectness. We also need a careful description of the problem that it is intended to
solve.
Problem specifications have two parts: (1) the set of allowed input instances,
and (2) the required properties of the algorithm’s output. It is impossible to prove
the correctness of an algorithm for a fuzzily-stated problem. Put another way, ask
the wrong problem and you will get the wrong answer.
Some problem specifications allow too broad a class of input instances. Suppose
we had allowed film projects in our movie scheduling problem to have gaps in
1.3 REASONING ABOUT CORRECTNESS 13
production (i.e. , filming in September and November but a hiatus in October).
Then the schedule associated with any particular film would consist of a given set
of intervals. Our star would be free to take on two interleaving but not overlapping
projects (such as the film above nested with one filming in August and October).
The earliest completion algorithm would not work for such a generalized scheduling
problem. Indeed, no efficient algorithm exists for this generalized problem.
Take-Home Lesson: An important and honorable technique in algorithm de-
sign is to narrow the set of allowable instances until there is a correct and
efficient algorithm. For example, we can restrict a graph problem from general
graphs down to trees, or a geometric problem from two dimensions down to
one.
There are two common traps in specifying the output requirements of a problem.
One is asking an ill-defined question. Asking for the best route between two places
on a map is a silly question unless you define what best means. Do you mean the
shortest route in total distance, or the fastest route, or the one minimizing the
number of turns?
The second trap is creating compound goals. The three path-planning criteria
mentioned above are all well-defined goals that lead to correct, efficient optimiza-
tion algorithms. However, you must pick a single criteria. A goal like Find the
shortest path from ato bthat doesn’t use more than twice as many turns as neces-
sary is perfectly well defined, but complicated to reason and solve.
I encourage you to check out the problem statements for each of the 75 catalog
problems in the second part of this book. Finding the right formulation for your
problem is an important part of solving it. And studying the definition of all these
classic algorithm problems will help you recognize when someone else has thought
about similar problems before you.
1.3.3 Demonstrating Incorrectness
The best way to prove that an algorithm is incorrect is to produce an instance in
which it yields an incorrect answer. Such instances are called counter-examples.
No rational person will ever leap to the defense of an algorithm after a counter-
example has been identified. Very simple instances can instantly kill reasonable-
looking heuristics with a quick touch´e. Good counter-examples have two important
properties:
Verifiability – To demonstrate that a particular instance is a counter-example
to a particular algorithm, you must be able to (1) calculate what answer your
algorithm will give in this instance, and (2) display a better answer so as to
prove the algorithm didn’t find it.
Since you must hold the given instance in your head to reason about it, an
important part of verifiability is. . .
14 1. INTRODUCTION TO ALGORITHM DESIGN
Simplicity – Good counter-examples have all unnecessary details boiled away.
They make clear exactly why the proposed algorithm fails. Once a counter-
example has been found, it is worth simplifying it down to its essence. For
example, the counter-example of Figure 1.6(l) could be made simpler and
better by reducing the number of overlapped segments from four to two.
Hunting for counter-examples is a skill worth developing. It bears some simi-
larity to the task of developing test sets for computer programs, but relies more on
inspiration than exhaustion. Here are some techniques to aid your quest:
Think small – Note that the robot tour counter-examples I presented boiled
down to six points or less, and the scheduling counter-examples to only three
intervals. This is indicative of the fact that when algorithms fail, there is
usually a very simple example on which they fail. Amateur algorists tend
to draw a big messy instance and then stare at it helplessly. The pros look
carefully at several small examples, because they are easier to verify and
reason about.
Think exhaustively – There are only a small number of possibilities for the
smallest nontrivial value of n. For example, there are only three interesting
ways two intervals on the line can occur: (1) as disjoint intervals, (2) as
overlapping intervals, and (3) as properly nesting intervals, one within the
other. All cases of three intervals (including counter-examples to both movie
heuristics) can be systematically constructed by adding a third segment in
each possible way to these three instances.
Hunt for the weakness – If a proposed algorithm is of the form “always take
the biggest” (better known as the greedy algorithm), think about why that
might prove to be the wrong thing to do. In particular, . . .
Go for a tie – A devious way to break a greedy heuristic is to provide instances
where everything is the same size. Suddenly the heuristic has nothing to base
its decision on, and perhaps has the freedom to return something suboptimal
as the answer.
Seek extremes Many counter-examples are mixtures of huge and tiny, left
and right, few and many, near and far. It is usually easier to verify or rea-
son about extreme examples than more muddled ones. Consider two tightly
bunched clouds of points separated by a much larger distance d. The optimal
TSP tour will be essentially 2dregardless of the number of points, because
what happens within each cloud doesn’t really matter.
Take-Home Lesson: Searching for counterexamples is the best way to disprove
the correctness of a heuristic.
1.3 REASONING ABOUT CORRECTNESS 15
1.3.4 Induction and Recursion
Failure to find a counterexample to a given algorithm does not mean “it is obvious”
that the algorithm is correct. A proof or demonstration of correctness is needed.
Often mathematical induction is the method of choice.
When I first learned about mathematical induction it seemed like complete
magic. You proved a formula like n
i=1 i=n(n+1)/2 for some basis case like 1
or 2, then assumed it was true all the way to n1 before proving it was true for
general nusing the assumption. That was a proof? Ridiculous!
When I first learned the programming technique of recursion it also seemed like
complete magic. The program tested whether the input argument was some basis
case like 1 or 2. If not, you solved the bigger case by breaking it into pieces and
calling the subprogram itself to solve these pieces. That was a program? Ridiculous!
The reason both seemed like magic is because recursion is mathematical induc-
tion. In both, we have general and boundary conditions, with the general condition
breaking the problem into smaller and smaller pieces. The initial or boundary con-
dition terminates the recursion. Once you understand either recursion or induction,
you should be able to see why the other one also works.
I’ve heard it said that a computer scientist is a mathematician who only knows
how to prove things by induction. This is partially true because computer scientists
are lousy at proving things, but primarily because so many of the algorithms we
study are either recursive or incremental.
Consider the correctness of insertion sort, which we introduced at the beginning
of this chapter. The reason it is correct can be shown inductively:
The basis case consists of a single element, and by definition a one-element
array is completely sorted.
In general, we can assume that the first n1 elements of array Aare com-
pletely sorted after n1 iterations of insertion sort.
To insert one last element xto A, we find where it goes, namely the unique
spot between the biggest element less than or equal to xand the smallest
element greater than x. This is done by moving all the greater elements back
by one position, creating room for xin the desired location.
One must be suspicious of inductive proofs, however, because very subtle rea-
soning errors can creep in. The first are boundary errors. For example, our insertion
sort correctness proof above boldly stated that there was a unique place to insert
xbetween two elements, when our basis case was a single-element array. Greater
care is needed to properly deal with the special cases of inserting the minimum or
maximum elements.
The second and more common class of inductive proof errors concerns cavallier
extension claims. Adding one extra item to a given problem instance might cause
the entire optimal solution to change. This was the case in our scheduling problem
(see Figure 1.7). The optimal schedule after inserting a new segment may contain
16 1. INTRODUCTION TO ALGORITHM DESIGN
Figure 1.7: Large-scale changes in the optimal solution (boxes) after inserting a single interval
(dashed) into the instance
none of the segments of any particular optimal solution prior to insertion. Boldly
ignoring such difficulties can lead to very convincing inductive proofs of incorrect
algorithms.
Take-Home Lesson: Mathematical induction is usually the right way to verify
the correctness of a recursive or incremental insertion algorithm.
Stop and Think: Incremental Correctness
Problem: Prove the correctness of the following recursive algorithm for increment-
ing natural numbers, i.e. yy+1:
Increment(y)
if y=0then return(1) else
if (ymod 2) = 1 then
return(2 ·Increment(y/2))
else return(y+1)
Solution: The correctness of this algorithm is certainly not obvious to me. But as
it is recursive and I am a computer scientist, my natural instinct is to try to prove
it by induction.
The basis case of y= 0 is obviously correctly handled. Clearly the value 1 is
returned, and 0 + 1 = 1.
Now assume the function works correctly for the general case of y=n1. Given
this, we must demonstrate the truth for the case of y=n. Half of the cases are
easy, namely the even numbers (For which (ymod 2) = 0), since y+ 1 is explicitly
returned.
For the odd numbers, the answer depends upon what is returned by
Increment(y/2). Here we want to use our inductive assumption, but it isn’t
quite right. We have assumed that increment worked correctly for y=n1, but
not for a value which is about half of it. We can fix this problem by strengthening
our assumption to declare that the general case holds for all yn1. This costs us
nothing in principle, but is necessary to establish the correctness of the algorithm.
1.3 REASONING ABOUT CORRECTNESS 17
Now, the case of odd y(i.e. y=2m+ 1 for some integer m) can be dealt with
as:
2·Increment((2m+1)/2)=2·Increment(m+1/2)
=2·Increment(m)
=2(m+1)
=2m+2=y+1
and the general case is resolved.
1.3.5 Summations
Mathematical summation formulae arise often in algorithm analysis, which we will
study in Chapter 2. Further, proving the correctness of summation formulae is a
classic application of induction. Several exercises on inductive proofs of summations
appear as exercises at the end this chapter. To make these more accessible, I review
the basics of summations here.
Summation formula are concise expressions describing the addition of an arbi-
trarily large set of numbers, in particular the formula
n
i=1
f(i)=f(1) + f(2) + ...+f(n)
There are simple closed forms for summations of many algebraic functions. For
example, since nones is n,
n
i=1
1=n
The sum of the first nintegers can be seen by pairing up the ith and (ni+ 1)th
integers:
n
i=1
i=
n/2
i=1
(i+(ni+ 1)) = n(n+1)/2
Recognizing two basic classes of summation formulae will get you a long way
in algorithm analysis:
Arithmetic progressions – We already encountered arithmetic progressions
when we saw S(n)=n
ii=n(n+1)/2 in the analysis of selection sort.
From the big picture perspective, the important thing is that the sum is
quadratic, not that the constant is 1/2. In general,
S(n, p)=
n
i
ip(np+1)
18 1. INTRODUCTION TO ALGORITHM DESIGN
for p1. Thus the sum of squares is cubic, and the sum of cubes is quartic
(if you use such a word). The “big Theta” notation (Θ(x)) will be properly
explained in Section 2.2.
For p<1, this sum always converges to a constant, even as n→∞.The
interesting case is between results in ...
Geometric series – In geometric progressions, the index of the loop effects
the exponent, i.e.
G(n, a)=
n
i=0
ai=a(an+1 1)/(a1)
How we interpret this sum depends upon the base of the progression, i.e. a.
When a<1, this converges to a constant even as n→∞.
This series convergence proves to be the great “free lunch” of algorithm anal-
ysis. It means that the sum of a linear number of things can be constant, not
linear. For example, 1 + 1/2+1/4+1/8+...2 no matter how many terms
we add up.
When a>1, the sum grows rapidly with each new term, as in 1 + 2 + 4 +
8 + 16 + 32 = 63. Indeed, G(n, a)=Θ(an+1)fora>1.
Stop and Think: Factorial Formulae
Problem: Prove that n
i=1 i×i!=(n+ 1)! 1 by induction.
Solution: The inductive paradigm is straightforward. First verify the basis case
(here we do n= 1, although n= 0 would be even more general):
1
i=1
i×i! = 1 = (1 + 1)! 1=21=1
Now assume the statement is true up to n. To prove the general case of n+1,
observe that rolling out the largest term
n+1
i=1
i×i!=(n+1)×(n+ 1)! +
n
i=1
i×i!
reveals the left side of our inductive assumption. Substituting the right side gives
us
n+1
i=1
i×i!=(n+1)×(n+ 1)! + (n+ 1)! 1
1.4 MODELING THE PROBLEM 19
=(n+ 1)! ×((n+1)+1)1
=(n+ 2)! 1
This general trick of separating out the largest term from the summation to
reveal an instance of the inductive assumption lies at the heart of all such proofs.
1.4 Modeling the Problem
Modeling is the art of formulating your application in terms of precisely described,
well-understood problems. Proper modeling is the key to applying algorithmic de-
sign techniques to real-world problems. Indeed, proper modeling can eliminate the
need to design or even implement algorithms, by relating your application to what
has been done before. Proper modeling is the key to effectively using the “Hitch-
hiker’s Guide” in Part II of this book.
Real-world applications involve real-world objects. You might be working on a
system to route traffic in a network, to find the best way to schedule classrooms
in a university, or to search for patterns in a corporate database. Most algorithms,
however, are designed to work on rigorously defined abstract structures such as
permutations, graphs, and sets. To exploit the algorithms literature, you must
learn to describe your problem abstractly, in terms of procedures on fundamental
structures.
1.4.1 Combinatorial Objects
Odds are very good that others have stumbled upon your algorithmic problem
before you, perhaps in substantially different contexts. But to find out what is
known about your particular “widget optimization problem,” you can’t hope to
look in a book under widget. You must formulate widget optimization in terms of
computing properties of common structures such as:
Permutations – which are arrangements, or orderings, of items. For example,
{1,4,3,2}and {4,3,2,1}are two distinct permutations of the same set of four
integers. We have already seen permutations in the robot optimization prob-
lem, and in sorting. Permutations are likely the object in question whenever
your problem seeks an “arrangement,” “tour,” “ordering,” or “sequence.”
Subsets – which represent selections from a set of items. For example, {1,3,4}
and {2}are two distinct subsets of the first four integers. Order does not
matter in subsets the way it does with permutations, so the subsets {1,3,4}
and {4,3,1}would be considered identical. We saw subsets arise in the movie
scheduling problem. Subsets are likely the object in question whenever your
problem seeks a “cluster,” “collection,” “committee,” “group,” “packaging,”
or “selection.”
20 1. INTRODUCTION TO ALGORITHM DESIGN
Sol
Steve Len Jim Lisa JeffRichardRob Laurie
Morris Eve Sid
Stony Brook
Orient Point
Montauk
Shelter Island
Sag Harbor
Riverhead
Islip
Greenport
Figure 1.8: Modeling real-world structures with trees and graphs
Trees – which represent hierarchical relationships between items. Figure
1.8(a) shows part of the family tree of the Skiena clan. Trees are likely the
object in question whenever your problem seeks a “hierarchy,” “dominance
relationship,” “ancestor/descendant relationship,” or “taxonomy.”
Graphs – which represent relationships between arbitrary pairs of objects.
Figure 1.8(b) models a network of roads as a graph, where the vertices are
cities and the edges are roads connecting pairs of cities. Graphs are likely
the object in question whenever you seek a “network,” “circuit,” “web,” or
“relationship.”
Points – which represent locations in some geometric space. For example,
the locations of McDonald’s restaurants can be described by points on a
map/plane. Points are likely the object in question whenever your problems
work on “sites,” “positions,” “data records,” or “locations.”
Polygons – which represent regions in some geometric spaces. For example,
the borders of a country can be described by a polygon on a map/plane.
Polygons and polyhedra are likely the object in question whenever you are
working on “shapes,” “regions,” “configurations,” or “boundaries.”
Strings – which represent sequences of characters or patterns. For example,
the names of students in a class can be represented by strings. Strings are
likely the object in question whenever you are dealing with “text,” “charac-
ters,” “patterns,” or “labels.”
These fundamental structures all have associated algorithm problems, which are
presented in the catalog of Part II. Familiarity with these problems is important,
because they provide the language we use to model applications. To become fluent
in this vocabulary, browse through the catalog and study the input and output pic-
tures for each problem. Understanding these problems, even at a cartoon/definition
level, will enable you to know where to look later when the problem arises in your
application.
1.4 MODELING THE PROBLEM 21
A L G O R I T H M
4+{1,4,2,3}
9+{1,2,7}{1,2,7,9}
{4,1,5,2,3}
L G O R I T H MA
Figure 1.9: Recursive decompositions of combinatorial objects. (left column) Permutations,
subsets, trees, and graphs. (right column) Point sets, polygons, and strings
Examples of successful application modeling will be presented in the war stories
spaced throughout this book. However, some words of caution are in order. The act
of modeling reduces your application to one of a small number of existing problems
and structures. Such a process is inherently constraining, and certain details might
not fit easily into the given target problem. Also, certain problems can be modeled
in several different ways, some much better than others.
Modeling is only the first step in designing an algorithm for a problem. Be alert
for how the details of your applications differ from a candidate model, but don’t
be too quick to say that your problem is unique and special. Temporarily ignoring
details that don’t fit can free the mind to ask whether they really were fundamental
in the first place.
Take-Home Lesson: Modeling your application in terms of well-defined struc-
tures and algorithms is the most important single step towards a solution.
1.4.2 Recursive Objects
Learning to think recursively is learning to look for big things that are made from
smaller things of exactly the same type as the big thing. If you think of houses as
sets of rooms, then adding or deleting a room still leaves a house behind.
Recursive structures occur everywhere in the algorithmic world. Indeed, each
of the abstract structures described above can be thought about recursively. You
just have to see how you can break them down, as shown in Figure 1.9:
Permutations – Delete the first element of a permutation of {1,...,n}things
and you get a permutation of the remaining n1 things. Permutations are
recursive objects.
22 1. INTRODUCTION TO ALGORITHM DESIGN
Subsets – Every subset of the elements {1,...,n}contains a subset of
{1,...,n 1}made visible by deleting element nif it is present. Subsets
are recursive objects.
Trees – Delete the root of a tree and what do you get? A collection of smaller
trees. Delete any leaf of a tree and what do you get? A slightly smaller tree.
Trees are recursive objects.
Graphs – Delete any vertex from a graph, and you get a smaller graph. Now
divide the vertices of a graph into two groups, left and right. Cut through
all edges which span from left to right, and what do you get? Two smaller
graphs, and a bunch of broken edges. Graphs are recursive objects.
Points – Take a cloud of points, and separate them into two groups by drawing
a line. Now you have two smaller clouds of points. Point sets are recursive
objects.
Polygons – Inserting any internal chord between two nonadjacent vertices of
a simple polygon on nvertices cuts it into two smaller polygons. Polygons
are recursive objects.
Strings – Delete the first character from a string, and what do you get? A
shorter string. Strings are recursive objects.
Recursive descriptions of objects require both decomposition rules and basis
cases, namely the specification of the smallest and simplest objects where the de-
composition stops. These basis cases are usually easily defined. Permutations and
subsets of zero things presumably look like {}. The smallest interesting tree or
graph consists of a single vertex, while the smallest interesting point cloud consists
of a single point. Polygons are a little trickier; the smallest genuine simple polygon
is a triangle. Finally, the empty string has zero characters in it. The decision of
whether the basis case contains zero or one element is more a question of taste and
convenience than any fundamental principle.
Such recursive decompositions will come to define many of the algorithms we
will see in this book. Keep your eyes open for them.
1.5 About the War Stories
The best way to learn how careful algorithm design can have a huge impact on per-
formance is to look at real-world case studies. By carefully studying other people’s
experiences, we learn how they might apply to our work.
Scattered throughout this text are several of my own algorithmic war stories,
presenting our successful (and occasionally unsuccessful) algorithm design efforts
on real applications. I hope that you will be able to internalize these experiences
so that they will serve as models for your own attacks on problems.
1.6 WAR STORY: PSYCHIC MODELING 23
Every one of the war stories is true. Of course, the stories improve somewhat in
the retelling, and the dialogue has been punched up to make them more interesting
to read. However, I have tried to honestly trace the process of going from a raw
problem to a solution, so you can watch how this process unfolded.
The Oxford English Dictionary defines an algorist as “one skillful in reckonings
or figuring.” In these stories, I have tried to capture some of the mindset of the
algorist in action as they attack a problem.
The various war stories usually involve at least one, and often several, problems
from the problem catalog in Part II. I reference the appropriate section of the
catalog when such a problem occurs. This emphasizes the benefits of modeling
your application in terms of standard algorithm problems. By using the catalog,
you will be able to pull out what is known about any given problem whenever it is
needed.
1.6 War Story: Psychic Modeling
The call came for me out of the blue as I sat in my office.
“Professor Skiena, I hope you can help me. I’m the President of Lotto Systems
Group Inc., and we need an algorithm for a problem arising in our latest product.”
“Sure,” I replied. After all, the dean of my engineering school is always encour-
aging our faculty to interact more with industry.
“At Lotto Systems Group, we market a program designed to improve our cus-
tomers’ psychic ability to predict winning lottery numbers.1In a standard lottery,
each ticket consists of six numbers selected from, say, 1 to 44. Thus, any given
ticket has only a very small chance of winning. However, after proper training, our
clients can visualize, say, 15 numbers out of the 44 and be certain that at least four
of them will be on the winning ticket. Are you with me so far?”
“Probably not,” I replied. But then I recalled how my dean encourages us to
interact with industry.
“Our problem is this. After the psychic has narrowed the choices down to 15
numbers and is certain that at least 4 of them will be on the winning ticket, we
must find the most efficient way to exploit this information. Suppose a cash prize
is awarded whenever you pick at least three of the correct numbers on your ticket.
We need an algorithm to construct the smallest set of tickets that we must buy in
order to guarantee that we win at least one prize.”
“Assuming the psychic is correct?”
“Yes, assuming the psychic is correct. We need a program that prints out a list
of all the tickets that the psychic should buy in order to minimize their investment.
Can you help us?”
Maybe they did have psychic ability, for they had come to the right place. Iden-
tifying the best subset of tickets to buy was very much a combinatorial algorithm
1Yes, this is a true story.
24 1. INTRODUCTION TO ALGORITHM DESIGN
35
34
25
24
23
15
14
13
12
45
Figure 1.10: Covering all pairs of {1,2,3,4,5}with tickets {1,2,3},{1,4,5},{2,4,5},{3,4,5}
problem. It was going to be some type of covering problem, where each ticket we
buy was going to “cover” some of the possible 4-element subsets of the psychic’s
set. Finding the absolute smallest set of tickets to cover everything was a special
instance of the NP-complete problem set cover (discussed in Section 18.1 (page
621)), and presumably computationally intractable.
It was indeed a special instance of set cover, completely specified by only four
numbers: the size nof the candidate set S(typically n15), the number of slots
kfor numbers on each ticket (typically k6), the number of psychically-promised
correct numbers jfrom S(say j= 4), and finally, the number of matching numbers
lnecessary to win a prize (say l= 3). Figure 1.10 illustrates a covering of a smaller
instance, where n=5,j=k=3,andl=2.
“Although it will be hard to find the exact minimum set of tickets to buy, with
heuristics I should be able to get you pretty close to the cheapest covering ticket
set,” I told him. “Will that be good enough?”
“So long as it generates better ticket sets than my competitor’s program, that
will be fine. His system doesn’t always guarantee a win. I really appreciate your
help on this, Professor Skiena.”
“One last thing. If your program can train people to pick lottery winners, why
don’t you use it to win the lottery yourself?”
“I look forward to talking to you again real soon, Professor Skiena. Thanks for
the help.”
I hung up the phone and got back to thinking. It seemed like the perfect project
to give to a bright undergraduate. After modeling it in terms of sets and subsets,
the basic components of a solution seemed fairly straightforward:
1.6 WAR STORY: PSYCHIC MODELING 25
We needed the ability to generate all subsets of knumbers from the candidate
set S. Algorithms for generating and ranking/unranking subsets of sets are
presented in Section 14.5 (page 452).
We needed the right formulation of what it meant to have a covering set of
purchased tickets. The obvious criteria would be to pick a small set of tickets
such that we have purchased at least one ticket containing each of the n
l
l-subsets of Sthat might pay off with the prize.
We needed to keep track of which prize combinations we have thus far cov-
ered. We seek tickets to cover as many thus-far-uncovered prize combinations
as possible. The currently covered combinations are a subset of all possible
combinations. Data structures for subsets are discussed in Section 12.5 (page
385). The best candidate seemed to be a bit vector, which would answer in
constant time “is this combination already covered?”
We needed a search mechanism to decide which ticket to buy next. For small
enough set sizes, we could do an exhaustive search over all possible sub-
sets of tickets and pick the smallest one. For larger problems, a randomized
search process like simulated annealing (see Section 7.5.3 (page 254)) would
select tickets-to-buy to cover as many uncovered combinations as possible.
By repeating this randomized procedure several times and picking the best
solution, we would be likely to come up with a good set of tickets.
Excluding the details of the search mechanism, the pseudocode for the book-
keeping looked something like this:
LottoTicketSet(n, k, l)
Initialize the n
l-element bit-vector Vto all false
While there exists a false entry in V
Select a k-subset Tof {1,...,n}as the next ticket to buy
For each of the l-subsets Tiof T,V[rank(Ti)] = true
Report the set of tickets bought
The bright undergraduate, Fayyaz Younas, rose to the challenge. Based on
this framework, he implemented a brute-force search algorithm and found optimal
solutions for problems with n5 in a reasonable time. He implemented a random
search procedure to solve larger problems, tweaking it for a while before settling on
the best variant. Finally, the day arrived when we could call Lotto Systems Group
and announce that we had solved the problem.
“Our program found an optimal solution for n= 15, k=6,j=6,l= 3 meant
buying 28 tickets.”
“Twenty-eight tickets!” complained the president. “You must have a bug. Look,
these five tickets will suffice to cover everything twice over: {2,4,8,10,13,14},
{4,5,7,8,12,15},{1,2,3,6,11,13},{3,5,6,9,10,15},{1,7,9,11,12,14}.”
26 1. INTRODUCTION TO ALGORITHM DESIGN
35
34
25
24
23
15
14
13
12
45
Figure 1.11: Guaranteeing a winning pair from {1,2,3,4,5}using only tickets {1,2,3}and
{1,4,5}
We fiddled with this example for a while before admitting that he was right. We
hadn’t modeled the problem correctly! In fact, we didn’t need to explicitly cover all
possible winning combinations. Figure 1.11 illustrates the principle by giving a two-
ticket solution to our previous four-ticket example. Such unpromising outcomes as
{2,3,4}and {3,4,5}each agree in one matching pair with tickets from Figure 1.11.
We were trying to cover too many combinations, and the penny-pinching psychics
were unwilling to pay for such extravagance.
Fortunately, this story has a happy ending. The general outline of our search-
based solution still holds for the real problem. All we must fix is which subsets
we get credit for covering with a given set of tickets. After this modification, we
obtained the kind of results they were hoping for. Lotto Systems Group gratefully
accepted our program to incorporate into their product, and hopefully hit the
jackpot with it.
The moral of this story is to make sure that you model the problem correctly
before trying to solve it. In our case, we came up with a reasonable model, but
didn’t work hard enough to validate it before we started to program. Our misin-
terpretation would have become obvious had we worked out a small example by
hand and bounced it off our sponsor before beginning work. Our success in recov-
ering from this error is a tribute to the basic correctness of our initial formulation,
and our use of well-defined abstractions for such tasks as (1) ranking/unranking
k-subsets, (2) the set data structure, and (3) combinatorial search.
1.7 EXERCISES 27
Chapter Notes
Every decent algorithm book reflects the design philosophy of its author. For stu-
dents seeking alternative presentations and viewpoints, we particularly recommend
the books of Corman, et. al [CLRS01], Kleinberg/Tardos [KT06], and Manber
[Man89].
Formal proofs of algorithm correctness are important, and deserve a fuller dis-
cussion than we are able to provide in this chapter. See Gries [Gri89] for a thorough
introduction to the techniques of program verification.
The movie scheduling problem represents a very special case of the general inde-
pendent set problem, which is discussed in Section 16.2 (page 528). The restriction
limits the allowable input instances to interval graphs, where the vertices of the
graph Gcan be represented by intervals on the line and (i, j) is an edge of Giff
the intervals overlap. Golumbic [Gol04] provides a full treatment of this interesting
and important class of graphs.
Jon Bentley’s Programming Pearls columns are probably the best known col-
lection of algorithmic “war stories.” Originally published in the Communications
of the ACM, they have been collected in two books [Ben90, Ben99]. Brooks’s The
Mythical Man Month [Bro95] is another wonderful collection of war stories, focused
more on software engineering than algorithm design, but they remain a source of
considerable wisdom. Every programmer should read all these books, for pleasure
as well as insight.
Our solution to the lotto ticket set covering problem is presented in more detail
in [YS96].
1.7 Exercises
Finding Counterexamples
1-1. [3] Show that a+bcanbelessthanmin(a, b).
1-2. [3] Show that a×bcanbelessthanmin(a, b).
1-3. [5] Design/draw a road network with two points aand bsuch that the fastest route
between aand bis not the shortest route.
1-4. [5] Design/draw a road network with two points aand bsuch that the shortest
route between aand bis not the route with the fewest turns.
1-5. [4] The knapsack problem is as follows: given a set of integers S={s1,s
2,...,s
n},
and a target number T, find a subset of Swhich adds up exactly to T. For example,
there exists a subset within S={1,2,5,9,10}that adds up to T= 22 but not
T= 23.
Find counterexamples to each of the following algorithms for the knapsack problem.
That is, giving an Sand Tsuch that the subset is selected using the algorithm does
not leave the knapsack completely full, even though such a solution exists.
28 1. INTRODUCTION TO ALGORITHM DESIGN
(a) Put the elements of Sin the knapsack in left to right order if they fit, i.e. the
first-fit algorithm.
(b) Put the elements of Sin the knapsack from smallest to largest, i.e. the best-fit
algorithm.
(c) Put the elements of Sin the knapsack from largest to smallest.
1-6. [5] The set cover problem is as follows: given a set of subsets S1, ..., Smof the
universal set U={1, ..., n}, find the smallest subset of subsets TSsuch that
tiTti=U. For example, there are the following subsets, S1={1,3,5},S2=
{2,4},S3={1,4},andS4={2,5}The set cover would then be S1and S2.
Find a counterexample for the following algorithm: Select the largest subset for the
cover, and then delete all its elements from the universal set. Repeat by adding the
subset containing the largest number of uncovered elements until all are covered.
Proofs of Correctness
1-7. [3] Prove the correctness of the following recursive algorithm to multiply two
natural numbers, for all integer constants c2.
function multiply(y, z)
comment Return the product yz.
1. if z=0then return(0) else
2. return(multiply(cy, z/c)+y·(zmod c))
1-8. [3] Prove the correctness of the following algorithm for evaluating a polynomial.
P(x) = anxn+an1xn1+...+a1x+a0
function horner(A, x)
p=An
for ifrom n1to0
p=px+Ai
return p
1-9. [3] Prove the correctness of the following sorting algorithm.
function bubblesort (A: list[1 ...n])
var int i, j
for ifrom nto 1
for jfrom 1 to i1
if (A[j]>A[j+1])
swap the values of A[j]andA[j+1]
Induction
1-10. [3] Prove that n
i=1 i=n(n+1)/2forn0, by induction.
1-11. [3] Prove that n
i=1 i2=n(n+ 1)(2n+1)/6forn0, by induction.
1-12. [3] Prove that n
i=1 i3=n2(n+1)
2/4forn0, by induction.
1-13. [3] Prove that
n
i=1
i(i+1)(i+2)=n(n+1)(n+2)(n+3)/4
1.7 EXERCISES 29
1-14. [5] Prove by induction on n1 that for every a=1,
n
i=0
ai=an+1 1
a1
1-15. [3] Prove by induction that for n1,
n
i=1
1
i(i+1) =n
n+1
1-16. [3] Prove by induction that n3+2nis divisible by 3 for all n0.
1-17. [3] Prove by induction that a tree with nvertices has exactly n1edges.
1-18. [3] Prove by mathematical induction that the sum of the cubes of the first n
positive integers is equal to the square of the sum of these integers, i.e.
n
i=1
i3=(
n
i=1
i)2
Estimation
1-19. [3] Do all the books you own total at least one million pages? How many total
pages are stored in your school library?
1-20. [3] How many words are there in this textbook?
1-21. [3] How many hours are one million seconds? How many days? Answer these
questions by doing all arithmetic in your head.
1-22. [3] Estimate how many cities and towns there are in the United States.
1-23. [3] Estimate how many cubic miles of water flow out of the mouth of the Mississippi
River each day. Do not look up any supplemental facts. Describe all assumptions
you made in arriving at your answer.
1-24. [3] Is disk drive access time normally measured in milliseconds (thousandths of a
second) or microseconds (millionths of a second)? Does your RAM memory access
a word in more or less than a microsecond? How many instructions can your CPU
execute in one year if the machine is left running all the time?
1-25. [4] A sorting algorithm takes 1 second to sort 1,000 items on your local machine.
How long will it take to sort 10,000 items. . .
(a) if you believe that the algorithm takes time proportional to n2,and
(b) if you believe that the algorithm takes time roughly proportional to nlog n?
Implementation Projects
1-26. [5] Implement the two TSP heuristics of Section 1.1 (page 5). Which of them gives
better-quality solutions in practice? Can you devise a heuristic that works better
than both of them?
1-27. [5] Describe how to test whether a given set of tickets establishes sufficient coverage
in the Lotto problem of Section 1.6 (page 23). Write a program to find good ticket
sets.
30 1. INTRODUCTION TO ALGORITHM DESIGN
Interview Problems
1-28. [5] Write a function to perform integer division without using either the /or *
operators. Find a fast way to do it.
1-29. [5] There are 25 horses. At most, 5 horses can race together at a time. You must
determine the fastest, second fastest, and third fastest horses. Find the minimum
number of races in which this can be done.
1-30. [3] How many piano tuners are there in the entire world?
1-31. [3] How many gas stations are there in the United States?
1-32. [3] How much does the ice in a hockey rink weigh?
1-33. [3] How many miles of road are there in the United States?
1-34. [3] On average, how many times would you have to flip open the Manhattan phone
book at random in order to find a specific name?
Programming Challenges
These programming challenge problems with robot judging are available at
http://www.programming-challenges.com or http://online-judge.uva.es.
1-1. “The 3n+ 1 Problem” – Programming Challenges 110101, UVA Judge 100.
1-2. “The Trip” – Programming Challenges 110103, UVA Judge 10137.
1-3. “Australian Voting” – Programming Challenges 110108, UVA Judge 10142.
2
Algorithm Analysis
Algorithms are the most important and durable part of computer science because
they can be studied in a language- and machine-independent way. This means that
we need techniques that enable us to compare the efficiency of algorithms without
implementing them. Our two most important tools are (1) the RAM model of
computation and (2) the asymptotic analysis of worst-case complexity.
Assessing algorithmic performance makes use of the “big Oh” notation that,
proves essential to compare algorithms and design more efficient ones. While the
hopelessly practical person may blanch at the notion of theoretical analysis, we
present the material because it really is useful in thinking about algorithms.
This method of keeping score will be the most mathematically demanding part
of this book. But once you understand the intuition behind these ideas, the for-
malism becomes a lot easier to deal with.
2.1 The RAM Model of Computation
Machine-independent algorithm design depends upon a hypothetical computer
called the Random Access Machine or RAM. Under this model of computation,
we are confronted with a computer where:
Each simple operation (+, *, –, =, if, call) takes exactly one time step.
Loops and subroutines are not considered simple operations. Instead, they
are the composition of many single-step operations. It makes no sense for
sort to be a single-step operation, since sorting 1,000,000 items will certainly
take much longer than sorting 10 items. The time it takes to run through a
loop or execute a subprogram depends upon the number of loop iterations or
the specific nature of the subprogram.
S.S. Skiena, The Algorithm Design Manual, 2nd ed., DOI: 10.1007/978-1-84800-070-4 2,
c
Springer-Verlag London Limited 2008
32 2. ALGORITHM ANALYSIS
Each memory access takes exactly one time step. Further, we have as much
memory as we need. The RAM model takes no notice of whether an item is
in cache or on the disk.
Under the RAM model, we measure run time by counting up the number of
steps an algorithm takes on a given problem instance. If we assume that our RAM
executes a given number of steps per second, this operation count converts naturally
to the actual running time.
The RAM is a simple model of how computers perform. Perhaps it sounds too
simple. After all, multiplying two numbers takes more time than adding two num-
bers on most processors, which violates the first assumption of the model. Fancy
compiler loop unrolling and hyperthreading may well violate the second assump-
tion. And certainly memory access times differ greatly depending on whether data
sits in cache or on the disk. This makes us zero for three on the truth of our basic
assumptions.
And yet, despite these complaints, the RAM proves an excellent model for
understanding how an algorithm will perform on a real computer. It strikes a fine
balance by capturing the essential behavior of computers while being simple to
work with. We use the RAM model because it is useful in practice.
Every model has a size range over which it is useful. Take, for example, the
model that the Earth is flat. You might argue that this is a bad model, since it
has been fairly well established that the Earth is in fact round. But, when laying
the foundation of a house, the flat Earth model is sufficiently accurate that it can
be reliably used. It is so much easier to manipulate a flat-Earth model that it is
inconceivable that you would try to think spherically when you don’t have to.1
The same situation is true with the RAM model of computation. We make
an abstraction that is generally very useful. It is quite difficult to design an algo-
rithm such that the RAM model gives you substantially misleading results. The
robustness of the RAM enables us to analyze algorithms in a machine-independent
way.
Take-Home Lesson: Algorithms can be understood and studied in a language-
and machine-independent manner.
2.1.1 Best, Worst, and Average-Case Complexity
Using the RAM model of computation, we can count how many steps our algorithm
takes on any given input instance by executing it. However, to understand how good
or bad an algorithm is in general, we must know how it works over all instances.
To understand the notions of the best, worst, and average-case complexity,
think about running an algorithm over all possible instances of data that can be
1The Earth is not completely spherical either, but a spherical Earth provides a useful model for such things
as longitude and latitude.
2.1 THE RAM MODEL OF COMPUTATION 33
1 2 3 4 . . . . . . N
.
.
Problem Size
Best Case
Average Case
Worst Caseof Steps
Number
Figure 2.1: Best, worst, and average-case complexity
fed to it. For the problem of sorting, the set of possible input instances consists of
all possible arrangements of nkeys, over all possible values of n. We can represent
each input instance as a point on a graph (shown in Figure 2.1) where the x-axis
represents the size of the input problem (for sorting, the number of items to sort),
and the y-axis denotes the number of steps taken by the algorithm in this instance.
These points naturally align themselves into columns, because only integers
represent possible input size (e.g., it makes no sense to sort 10.57 items). We can
define three interesting functions over the plot of these points:
The worst-case complexity of the algorithm is the function defined by the
maximum number of steps taken in any instance of size n. This represents
the curve passing through the highest point in each column.
The best-case complexity of the algorithm is the function defined by the min-
imum number of steps taken in any instance of size n. This represents the
curve passing through the lowest point of each column.
The average-case complexity of the algorithm, which is the function defined
by the average number of steps over all instances of size n.
The worst-case complexity proves to be most useful of these three measures in
practice. Many people find this counterintuitive. To illustrate why, try to project
what will happen if you bring ndollars into a casino to gamble. The best case,
that you walk out owning the place, is possible but so unlikely that you should not
even think about it. The worst case, that you lose all ndollars, is easy to calculate
and distressingly likely to happen. The average case, that the typical bettor loses
87.32% of the money that he brings to the casino, is difficult to establish and its
meaning subject to debate. What exactly does average mean? Stupid people lose
34 2. ALGORITHM ANALYSIS
more than smart people, so are you smarter or stupider than the average person,
and by how much? Card counters at blackjack do better on average than customers
who accept three or more free drinks. We avoid all these complexities and obtain
a very useful result by just considering the worst case.
The important thing to realize is that each of these time complexities define a
numerical function, representing time versus problem size. These functions are as
well defined as any other numerical function, be it y=x22x+ 1 or the price
of Google stock as a function of time. But time complexities are such complicated
functions that we must simplify them to work with them. For this, we need the
“Big Oh” notation.
2.2 The Big Oh Notation
The best, worst, and average-case time complexities for any given algorithm are
numerical functions over the size of possible problem instances. However, it is very
difficult to work precisely with these functions, because they tend to:
Have too many bumps – An algorithm such as binary search typically runs
a bit faster for arrays of size exactly n=2
k1 (where kis an integer),
because the array partitions work out nicely. This detail is not particularly
significant, but it warns us that the exact time complexity function for any
algorithm is liable to be very complicated, with little up and down bumps as
shown in Figure 2.2.
Require too much detail to specify precisely – Counting the exact number
of RAM instructions executed in the worst case requires the algorithm be
specified to the detail of a complete computer program. Further, the precise
answer depends upon uninteresting coding details (e.g., did he use a case
statement or nested ifs?). Performing a precise worst-case analysis like
T(n) = 12754n2+ 4353n+ 834 lg2n+ 13546
would clearly be very difficult work, but provides us little extra information
than the observation that “the time grows quadratically with n.”
It proves to be much easier to talk in terms of simple upper and lower bounds
of time-complexity functions using the Big Oh notation. The Big Oh simplifies
our analysis by ignoring levels of detail that do not impact our comparison of
algorithms.
The Big Oh notation ignores the difference between multiplicative constants.
The functions f(n)=2nand g(n)=nare identical in Big Oh analysis. This
makes sense given our application. Suppose a given algorithm in (say) C language
ran twice as fast as one with the same algorithm written in Java. This multiplicative
2.2 THE BIG OH NOTATION 35
0
n
f(n)
n
.......
4321
lower bound
upper bound
Figure 2.2: Upper and lower bounds valid for n>n
0smooth out the behavior of complex
functions
factor of two tells us nothing about the algorithm itself, since both programs imple-
ment exactly the same algorithm. We ignore such constant factors when comparing
two algorithms.
The formal definitions associated with the Big Oh notation are as follows:
f(n)=O(g(n)) means c·g(n)isanupper bound on f(n). Thus there exists
some constant csuch that f(n) is always c·g(n), for large enough n(i.e. ,
nn0for some constant n0).
f(n)=Ω(g(n)) means c·g(n)isalower bound on f(n). Thus there exists
some constant csuch that f(n) is always c·g(n), for all nn0.
f(n)=Θ(g(n)) means c1·g(n) is an upper bound on f(n)andc2·g(n)is
a lower bound on f(n), for all nn0. Thus there exist constants c1and c2
such that f(n)c1·g(n)andf(n)c2·g(n). This means that g(n) provides
a nice, tight bound on f(n).
Got it? These definitions are illustrated in Figure 2.3. Each of these definitions
assumes a constant n0beyond which they are always satisfied. We are not concerned
about small values of n(i.e. , anything to the left of n0). After all, we don’t really
care whether one sorting algorithm sorts six items faster than another, but seek
which algorithm proves faster when sorting 10,000 or 1,000,000 items. The Big Oh
notation enables us to ignore details and focus on the big picture.
Take-Home Lesson: The Big Oh notation and worst-case analysis are tools
that greatly simplify our ability to compare the efficiency of algorithms.
Make sure you understand this notation by working through the following ex-
amples. We choose certain constants (cand n0) in the explanations below because
36 2. ALGORITHM ANALYSIS
(c)
f(n)
c2*g(n)
n
n0
c1*g(n)
c*g(n)
f(n)
n
n0
f(n)
c*g(n)
n
n0
(b)
(a)
Figure 2.3: Illustrating the big (a) O, (b) Ω, and (c) Θ notations
they work and make a point, but other pairs of constants will do exactly the same
job. You are free to choose any constants that maintain the same inequality—ideally
constants that make it obvious that the inequality holds:
3n2100n+6=O(n2), because I choose c=3and3n2>3n2100n+6;
3n2100n+6=O(n3), because I choose c=1andn3>3n2100n+6whenn>3;
3n2100n+6=O(n), because for any cIchoosec×n<3n2when n>c;
3n2100n+6=Ω(n2), because I choose c=2and2n2<3n2100n+6when n>100;
3n2100n+6(n3), because I choose c=3and3n2100n+6<n
3when n>3;
3n2100n+6=Ω(n), because for any cIchoosecn < 3n2100n+6 whenn>100c;
3n2100n+6=Θ(n2), because both Oand Ω apply;
3n2100n+6(n3), because only Oapplies;
3n2100n+6(n), because only Ω applies.
The Big Oh notation provides for a rough notion of equality when comparing
functions. It is somewhat jarring to see an expression like n2=O(n3), but its
meaning can always be resolved by going back to the definitions in terms of upper
and lower bounds. It is perhaps most instructive to read the “=” here as meaning
one of the functions that are. Clearly, n2is one of functions that are O(n3).
2.3 GROWTH RATES AND DOMINANCE RELATIONS 37
Stop and Think: Back to the Definition
Problem: Is 2n+1 (2
n)?
Solution: Designing novel algorithms requires cleverness and inspiration. However,
applying the Big Oh notation is best done by swallowing any creative instincts
you may have. All Big Oh problems can be correctly solved by going back to the
definition and working with that.
Is 2n+1 =O(2n)?Well, f(n)=O(g(n)) iff (if and only if) there exists a
constant csuch that for all sufficiently large nf(n)c·g(n). Is there? The
key observation is that 2n+1 =2·2n,so2·2nc·2nfor any c2.
Is 2n+1 = Ω(2n)?Go back to the definition. f(n)=Ω(g(n)) iff there exists
a constant c>0 such that for all sufficiently large nf(n)c·g(n). This
would be satisfied for any 0 <c2. Together the Big Oh and Ω bounds
imply 2n+1 (2
n)
Stop and Think: Hip to the Squares?
Problem: Is (x+y)2=O(x2+y2).
Solution: Working with the Big Oh means going back to the definition at the
slightest sign of confusion. By definition, this expression is valid iff we can find
some csuch that (x+y)2c(x2+y2).
My first move would be to expand the left side of the equation, i.e. (x+y)2=
x2+2xy+y2. If the middle 2xy term wasn’t there, the inequality would clearly hold
for any c>1. But it is there, so we need to relate the 2xy to x2+y2. What if xy?
Then 2xy 2y22(x2+y2). What if xy? Then 2xy 2x22(x2+y2). Either
way, we now can bound this middle term by two times the right-side function. This
means that (x+y)23(x2+y2), and so the result holds.
2.3 Growth Rates and Dominance Relations
With the Big Oh notation, we cavalierly discard the multiplicative constants. Thus,
the functions f(n)=0.001n2and g(n) = 1000n2are treated identically, even
though g(n) is a million times larger than f(n) for all values of n.
38 2. ALGORITHM ANALYSIS
nf(n)lg n n n lg n n22nn!
10 0.003 μs0.01 μs0.033 μs0.1 μs 1 μs3.63 ms
20 0.004 μs0.02 μs0.086 μs0.4 μs1ms 77.1 years
30 0.005 μs0.03 μs0.147 μs0.9 μs1sec 8.4×1015 yrs
40 0.005 μs0.04 μs0.213 μs1.6 μs18.3 min
50 0.006 μs0.05 μs0.282 μs2.5 μs13 days
100 0.007 μs0.1 μs0.644 μs10 μs 4 ×1013 yrs
1,000 0.010 μs1.00 μs9.966 μs1ms
10,000 0.013 μs10 μs130 μs100 ms
100,000 0.017 μs0.10 ms 1.67 ms 10 sec
1,000,000 0.020 μs1ms 19.93 ms 16.7 min
10,000,000 0.023 μs0.01 sec 0.23 sec 1.16 days
100,000,000 0.027 μs0.10 sec 2.66 sec 115.7 days
1,000,000,000 0.030 μs1sec 29.90 sec 31.7 years
Figure 2.4: Growth rates of common functions measured in nanoseconds
The reason why we are content with coarse Big Oh analysis is provided by
Figure 2.4, which shows the growth rate of several common time analysis functions.
In particular, it shows how long algorithms that use f(n) operations take to run
on a fast computer, where each operation takes one nanosecond (109seconds).
The following conclusions can be drawn from this table:
All such algorithms take roughly the same time for n= 10.
Any algorithm with n! running time becomes useless for n20.
Algorithms whose running time is 2nhave a greater operating range, but
become impractical for n>40.
Quadratic-time algorithms whose running time is n2remain usable up to
about n=10,000, but quickly deteriorate with larger inputs. They are likely
to be hopeless for n>1,000,000.
Linear-time and nlg nalgorithms remain practical on inputs of one billion
items.
An O(lg n) algorithm hardly breaks a sweat for any imaginable value of n.
The bottom line is that even ignoring constant factors, we get an excellent idea
of whether a given algorithm is appropriate for a problem of a given size. An algo-
rithm whose running time is f(n)=n3seconds will beat one whose running time is
g(n) = 1,000,000 ·n2seconds only when n<1,000,000. Such enormous differences
in constant factors between algorithms occur far less frequently in practice than
large problems do.
2.3 GROWTH RATES AND DOMINANCE RELATIONS 39
2.3.1 Dominance Relations
The Big Oh notation groups functions into a set of classes, such that all the func-
tions in a particular class are equivalent with respect to the Big Oh. Functions
f(n)=0.34nand g(n) = 234,234nbelong in the same class, namely those that are
order Θ(n). Further, when two functions fand gbelong to different classes, they are
different with respect to our notation. Either f(n)=O(g(n)) or g(n)=O(f(n)),
but not both.
We say that a faster-growing function dominates a slower-growing one, just as
a faster-growing country eventually comes to dominate the laggard. When fand
gbelong to different classes (i.e., f(n)(g(n))), we say gdominates fwhen
f(n)=O(g(n)), sometimes written gf.
The good news is that only a few function classes tend to occur in the course
of basic algorithm analysis. These suffice to cover almost all the algorithms we will
discuss in this text, and are listed in order of increasing dominance:
Constant functions, f(n) = 1 – Such functions might measure the cost of
adding two numbers, printing out “The Star Spangled Banner,” or the growth
realized by functions such as f(n) = min(n, 100). In the big picture, there is
no dependence on the parameter n.
Logarithmic functions, f(n) = log n– Logarithmic time-complexity shows up
in algorithms such as binary search. Such functions grow quite slowly as n
gets big, but faster than the constant function (which is standing still, after
all). Logarithms will be discussed in more detail in Section 2.6 (page 46)
Linear functions, f(n)=n– Such functions measure the cost of looking at
each item once (or twice, or ten times) in an n-element array, say to identify
the biggest item, the smallest item, or compute the average value.
Superlinear functions, f(n)=nlg n– This important class of functions arises
in such algorithms as Quicksort and Mergesort. They grow just a little faster
than linear (see Figure 2.4), just enough to be a different dominance class.
Quadratic functions, f(n)=n2– Such functions measure the cost of looking
at most or all pairs of items in an n-element universe. This arises in algorithms
such as insertion sort and selection sort.
Cubic functions, f(n)=n3– Such functions enumerate through all triples of
items in an n-element universe. These also arise in certain dynamic program-
ming algorithms developed in Chapter 8.
Exponential functions, f(n)=cnforagivenconstantc>1 – Functions like
2narise when enumerating all subsets of nitems. As we have seen, exponential
algorithms become useless fast, but not as fast as. . .
Factorial functions, f(n)=n! – Functions like n! arise when generating all
permutations or orderings of nitems.
40 2. ALGORITHM ANALYSIS
The intricacies of dominance relations will be futher discussed in Section 2.9.2
(page 56). However, all you really need to understand is that:
n!2nn3n2nlog nnlog n1
Take-Home Lesson: Although esoteric functions arise in advanced algorithm
analysis, a small variety of time complexities suffice and account for most
algorithms that are widely used in practice.
2.4 Working with the Big Oh
You learned how to do simplifications of algebraic expressions back in high school.
Working with the Big Oh requires dusting off these tools. Most of what you learned
there still holds in working with the Big Oh, but not everything.
2.4.1 Adding Functions
The sum of two functions is governed by the dominant one, namely:
O(f(n)) + O(g(n)) O(max(f(n),g(n)))
Ω(f(n)) + Ω(g(n)) Ω(max(f(n),g(n)))
Θ(f(n)) + Θ(g(n)) Θ(max(f(n),g(n)))
This is very useful in simplifying expressions, since it implies that n3+n2+
n+1=O(n3). Everything is small potatoes besides the dominant term.
The intuition is as follows. At least half the bulk of f(n)+g(n) must come
from the larger value. The dominant function will, by definition, provide the larger
value as n→∞. Thus, dropping the smaller function from consideration reduces
the value by at most a factor of 1/2, which is just a multiplicative constant. Suppose
f(n)=O(n2)andg(n)=O(n2). This implies that f(n)+g(n)=O(n2) as well.
2.4.2 Multiplying Functions
Multiplication is like repeated addition. Consider multiplication by any constant
c>0, be it 1.02 or 1,000,000. Multiplying a function by a constant can not affect
its asymptotic behavior, because we can multiply the bounding constants in the
Big Oh analysis of c·f(n)by1/c to give appropriate constants for the Big Oh
analysis of f(n). Thus:
O(c·f(n)) O(f(n))
Ω(c·f(n)) Ω(f(n))
2.5 REASONING ABOUT EFFICIENCY 41
Θ(c·f(n)) Θ(f(n))
Of course, cmust be strictly positive (i.e., c>0) to avoid any funny business,
since we can wipe out even the fastest growing function by multiplying it by zero.
On the other hand, when two functions in a product are increasing, both are
important. The function O(n! log n) dominates n!justasmuchaslogndominates
1. In general,
O(f(n)) O(g(n)) O(f(n)g(n))
Ω(f(n)) Ω(g(n)) Ω(f(n)g(n))
Θ(f(n)) Θ(g(n)) Θ(f(n)g(n))
Stop and Think: Transitive Experience
Problem: Show that Big Oh relationships are transitive. That is, if f(n)=O(g(n))
and g(n)=O(h(n)), then f(n)=O(h(n)).
Solution: We always go back to the definition when working with the Big Oh. What
we need to show here is that f(n)c3h(n)forn>n
3given that f(n)c1g(n)and
g(n)c2h(n), for n>n
1and n>n
2, respectively. Cascading these inequalities,
we get that
f(n)c1g(n)c1c2h(n)
for n>n
3= max(n1,n
2).
2.5 Reasoning About Efficiency
Gross reasoning about an algorithm’s running time of is usually easy given a precise
written description of the algorithm. In this section, I will work through several
examples, perhaps in greater detail than necessary.
2.5.1 Selection Sort
Here we analyze the selection sort algorithm, which repeatedly identifies the small-
est remaining unsorted element and puts it at the end of the sorted portion of the
array. An animation of selection sort in action appears in Figure 2.5, and the code
is shown below:
42 2. ALGORITHM ANALYSIS
S E L E C T I O N S O R T
C E L E S T I O N S O R TC E L E S T I O N S O R T
C E L E S T I O N S O R T
C E E L S T I O N S O R TC E E L S T I O N S O R T
C E E I S T L O N S O R T
C E E I L T S O N S O R T
C E E I L N S O T S O R TC E E I L N S O T S O R T
C E E I L N O S T S O R TC E E I L N O S T S O R T
C E E I L N O O T S S R T
C E E I L N O O R S S T TC E E I L N O O R S S T T
C E E I L N O O R S S T T
C E E I L N O O R S S T TC E E I L N O O R S S T T
C E E I L N O O R S S T TC E E I L N O O R S S T T
C
E E I L
N
O
O
R
S
S
T T
Figure 2.5: Animation of selection sort in action
selection_sort(int s[], int n)
{
int i,j; /* counters */
int min; /* index of minimum */
for (i=0; i<n; i++) {
min=i;
for (j=i+1; j<n; j++)
if (s[j] < s[min]) min=j;
swap(&s[i],&s[min]);
}
}
The outer loop goes around ntimes. The nested inner loop goes around ni1
times, where iis the index of the outer loop. The exact number of times the if
statement is executed is given by:
S(n)=
n1
i=0
n1
j=i+1
1=
n1
i=0
ni1
What this sum is doing is adding up the integers in decreasing order starting from
n1, i.e.
S(n)=(n1) + (n2) + (n3) + ...+2+1
How can we reason about such a formula? We must solve the summation formula
using the techniques of Section 1.3.5 (page 17) to get an exact value. But, with
the Big Oh we are only interested in the order of the expression. One way to think
about it is that we are adding up n1 terms, whose average value is about n/2.
This yields S(n)n(n1)/2.
2.5 REASONING ABOUT EFFICIENCY 43
Another way to think about it is in terms of upper and lower bounds. We have
nterms at most, each of which is at most n1. Thus, S(n)n(n1) = O(n2).
We have n/2 terms each that are bigger than n/2. Thus S(n)=(n/2) ×(n/2) =
Ω(n2). Together, this tells us that the running time is Θ(n2), meaning that selection
sort is quadratic.
2.5.2 Insertion Sort
A basic rule of thumb in Big Oh analysis is that worst-case running time follows
from multiplying the largest number of times each nested loop can iterate. Consider
the insertion sort algorithm presented on page 4, whose inner loops are repeated
here:
for (i=1; i<n; i++) {
j=i;
while ((j>0) && (s[j] < s[j-1])) {
swap(&s[j],&s[j-1]);
j = j-1;
}
}
How often does the inner while loop iterate? This is tricky because there are
two different stopping conditions: one to prevent us from running off the bounds
of the array (j>0) and the other to mark when the element finds its proper place
in sorted order (s[j]<s[j1]). Since worst-case analysis seeks an upper bound
on the running time, we ignore the early termination and assume that this loop
always goes around itimes. In fact, we can assume it always goes around ntimes
since i<n. Since the outer loop goes around ntimes, insertion sort must be a
quadratic-time algorithm, i.e. O(n2).
This crude “round it up” analysis always does the job, in that the Big Oh
running time bound you get will always be correct. Occasionally, it might be too
generous, meaning the actual worst case time might be of a lower order than implied
by such analysis. Still, I strongly encourage this kind of reasoning as a basis for
simple algorithm analysis.
2.5.3 String Pattern Matching
Pattern matching is the most fundamental algorithmic operation on text strings.
This algorithm implements the find command available in any web browser or text
editor:
Problem: Substring Pattern Matching
Input: A text string tand a pattern string p.
Output: Does tcontain the pattern pas a substring, and if so where?
44 2. ALGORITHM ANALYSIS
a
a
b
a
b
b
a
a
a b b a
a b b
a b
Figure 2.6: Searching for the substring abba in the text aababba.
Perhaps you are interested finding where “Skiena” appears in a given news
article (well, I would be interested in such a thing). This is an instance of string
pattern matching with tas the news article and p=“Skiena.”
There is a fairly straightforward algorithm for string pattern matching that
considers the possibility that pmay start at each possible position in tand then
testsifthisisso.
int findmatch(char *p, char *t)
{
int i,j; /* counters */
int m, n; /* string lengths */
m = strlen(p);
n = strlen(t);
for (i=0; i<=(n-m); i=i+1) {
j=0;
while ((j<m) && (t[i+j]==p[j]))
j = j+1;
if (j == m) return(i);
}
return(-1);
}
What is the worst-case running time of these two nested loops? The inner while
loop goes around at most mtimes, and potentially far less when the pattern match
fails. This, plus two other statements, lies within the outer for loop. The outer loop
goes around at most nmtimes, since no complete alignment is possible once we
get too far to the right of the text. The time complexity of nested loops multiplies,
so this gives a worst-case running time of O((nm)(m+ 2)).
We did not count the time it takes to compute the length of the strings using
the function strlen. Since the implementation of strlen is not given, we must guess
how long it should take. If we explicitly count the number of characters until we
2.5 REASONING ABOUT EFFICIENCY 45
hit the end of the string; this would take time linear in the length of the string.
This suggests that the running time should be O(n+m+(nm)(m+ 2)).
Let’s use our knowledge of the Big Oh to simplify things. Since m+2=Θ(m),
the “+2” isn’t interesting, so we are left with O(n+m+(nm)m). Multiplying
this out yields O(n+m+nm m2), which still seems kind of ugly.
However, in any interesting problem we know that nm, since it is impossible
to have pas a substring of tfor any pattern longer than the text itself. One
consequence of this is that n+m2n(n). Thus our worst-case running time
simplifies further to O(n+nm m2).
Two more observations and we are done. First, note that nnm, since m1
in any interesting pattern. Thus n+nm (nm), and we can drop the additive
n, simplifying our analysis to O(nm m2).
Finally, observe that the m2term is negative, and thus only serves to lower
the value within. Since the Big Oh gives an upper bound, we can drop any negative
term without invalidating the upper bound. That nmimplies that mn m2,
so the negative term is not big enough to cancel any other term which is left. Thus
we can simply express the worst-case running time of this algorithm as O(nm).
After you get enough experience, you will be able to do such an algorithm
analysis in your head without even writing the algorithm down. After all, algorithm
design for a given task involves mentally rifling through different possibilities and
selecting the best approach. This kind of fluency comes with practice, but if you
are confused about why a given algorithm runs in O(f(n)) time, start by writing
it out carefully and then employ the reasoning we used in this section.
2.5.4 Matrix Multiplication
Nested summations often arise in the analysis of algorithms with nested loops.
Consider the problem of matrix multiplication:
Problem: Matrix Multiplication
Input: Two matrices, A(of dimension x×y)andB(dimension y×z).
Output: An x×zmatrix Cwhere C[i][j] is the dot product of the ith row of A
and the jth column of B.
Matrix multiplication is a fundamental operation in linear algebra, presented
with an example in catalog in Section 13.3 (page 401). That said, the elementary
algorithm for matrix multiplication is implemented as a tight product of three
nested loops:
for (i=1; i<=x; i++)
for (j=1; j<=y; j++) {
C[i][j] = 0;
for (k=1; k<=z; k++)
C[i][j] += A[i][k] * B[k][j];
}
46 2. ALGORITHM ANALYSIS
How can we analyze the time complexity of this algorithm? The number of
multiplications M(x, y, z) is given by the following summation:
M(x, y, z)=
x
i=1
y
j=1
z
k=1
1
Sums get evaluated from the right inward. The sum of zones is z,so
M(x, y, z)=
x
i=1
y
j=1
z
The sum of yzs is just as simple, yz,so
M(x, y, z)=
x
i=1
yz
Finally, the sum of xyzsisxyz.
Thus the running of this matrix multiplication algorithm is O(xyz). If we con-
sider the common case where all three dimensions are the same, this becomes
O(n3)—i.e. , a cubic algorithm.
2.6 Logarithms and Their Applications
Logarithm is an anagram of algorithm, but that’s not why we need to know what
logarithms are. You’ve seen the button on your calculator but may have forgotten
why it is there. A logarithm is simply an inverse exponential function. Saying bx=y
is equivalent to saying that x= logby. Further, this definition is the same as saying
blogby=y.
Exponential functions grow at a distressingly fast rate, as anyone who has
ever tried to pay off a credit card balance understands. Thus, inverse exponen-
tial functions—i.e. logarithms—grow refreshingly slowly. Logarithms arise in any
process where things are repeatedly halved. We now look at several examples.
2.6.1 Logarithms and Binary Search
Binary search is a good example of an O(log n) algorithm. To locate a particular
person pin a telephone book containing nnames, you start by comparing pagainst
the middle, or (n/2)nd name, say Monroe, Marilyn. Regardless of whether pbelongs
before this middle name (Dean, James) or after it (Presley, Elvis), after only one
comparison you can discard one half of all the names in the book. The number of
steps the algorithm takes equals the number of times we can halve nuntil only one
name is left. By definition, this is exactly log2n. Thus, twenty comparisons suffice
to find any name in the million-name Manhattan phone book!
Binary search is one of the most powerful ideas in algorithm design. This power
becomes apparent if we imagine being forced to live in a world with only unsorted
2.6 LOGARITHMS AND THEIR APPLICATIONS 47
Figure 2.7: A height htree with dchildren per node as dhleaves. Here h=2andd=3
telephone books. Figure 2.4 shows that O(log n) algorithms are fast enough to be
used on problem instances of essentially unlimited size.
2.6.2 Logarithms and Trees
A binary tree of height 1 can have up to 2 leaf nodes, while a tree of height two
can have up to four leaves. What is the height hof a rooted binary tree with nleaf
nodes? Note that the number of leaves doubles every time we increase the height
by one. To account for nleaves, n=2
hwhich implies that h= log2n.
What if we generalize to trees that have dchildren, where d= 2 for the case
of binary trees? A tree of height 1 can have up to dleaf nodes, while one of height
two can have up to d2leaves. The number of possible leaves multiplies by devery
time we increase the height by one, so to account for nleaves, n=dhwhich implies
that h= logdn, as shown in Figure 2.7.
The punch line is that very short trees can have very many leaves, which is
the main reason why binary trees prove fundamental to the design of fast data
structures.
2.6.3 Logarithms and Bits
There are two bit patterns of length 1 (0 and 1) and four of length 2 (00, 01, 10, and
11). How many bits wdo we need to represent any one of ndifferent possibilities,
be it one of nitems or the integers from 1 to n?
The key observation is that there must be at least ndifferent bit patterns of
length w. Since the number of different bit patterns doubles as you add each bit,
we need at least wbits where 2w=n—i.e., we need w= log2nbits.
2.6.4 Logarithms and Multiplication
Logarithms were particularly important in the days before pocket calculators. They
provided the easiest way to multiply big numbers by hand, either implicitly using
a slide rule or explicitly by using a book of logarithms.
48 2. ALGORITHM ANALYSIS
Logarithms are still useful for multiplication, particularly for exponentiation.
Recall that loga(xy) = loga(x) + loga(y); i.e. , the log of a product is the sum of
the logs. A direct consequence of this is
loganb=b·logan
So how can we compute abfor any aand busing the exp(x) and ln(x) functions
on your calculator, where exp(x)= exand ln(x) = loge(x)? We know
ab= exp(ln(ab)) = exp(bln a)
so the problem is reduced to one multiplication plus one call to each of these
functions.
2.6.5 Fast Exponentiation
Suppose that we need to exactly compute the value of anfor some reasonably
large n. Such problems occur in primality testing for cryptography, as discussed in
Section 13.8 (page 420). Issues of numerical precision prevent us from applying the
formula above.
The simplest algorithm performs n1 multiplications, by computing a×a×
...×a. However, we can do better by observing that n=n/2+n/2.Ifnis
even, then an=(an/2)2.Ifnis odd, then an=a(an/2)2. In either case, we have
halved the size of our exponent at the cost of, at most, two multiplications, so
O(lg n) multiplications suffice to compute the final value.
function power(a, n)
if (n= 0) return(1)
x= power(a, n/2)
if (nis even) then return(x2)
else return(a×x2)
This simple algorithm illustrates an important principle of divide and conquer.
It always pays to divide a job as evenly as possible. This principle applies to real
life as well. When nis not a power of two, the problem cannot always be divided
perfectly evenly, but a difference of one element between the two sides cannot cause
any serious imbalance.
2.6.6 Logarithms and Summations
The Harmonic numbers arise as a special case of arithmetic progression, namely
H(n)=S(n, 1). They reflect the sum of the progression of simple reciprocals,
namely,
H(n)=
n
i=1
1/i ln n
2.6 LOGARITHMS AND THEIR APPLICATIONS 49
Loss (apply the greatest) Increase in level
(A) $2,000 or less no increase
(B) More than $2,000 add 1
(C) More than $5,000 add 2
(D) More than $10,000 add 3
(E) More than $20,000 add 4
(F) More than $40,000 add 5
(G) More than $70,000 add 6
(H) More than $120,000 add 7
(I) More than $200,000 add 8
(J) More than $350,000 add 9
(K) More than $500,000 add 10
(L) More than $800,000 add 11
(M) More than $1,500,000 add 12
(N) More than $2,500,000 add 13
(O) More than $5,000,000 add 14
(P) More than $10,000,000 add 15
(Q) More than $20,000,000 add 16
(R) More than $40,000,000 add 17
(Q) More than $80,000,000 add 18
Figure 2.8: The Federal Sentencing Guidelines for fraud
The Harmonic numbers prove important because they usually explain “where
the log comes from” when one magically pops out from algebraic manipulation. For
example, the key to analyzing the average case complexity of Quicksort is the sum-
mation S(n)=nn
i=1 1/i. Employing the Harmonic number identity immediately
reduces this to Θ(nlog n).
2.6.7 Logarithms and Criminal Justice
Figure 2.8 will be our final example of logarithms in action. This table appears in
the Federal Sentencing Guidelines, used by courts throughout the United States.
These guidelines are an attempt to standardize criminal sentences, so that a felon
convicted of a crime before one judge receives the same sentence that they would
before a different judge. To accomplish this, the judges have prepared an intricate
point function to score the depravity of each crime and map it to time-to-serve.
Figure 2.8 gives the actual point function for fraud—a table mapping dollars
stolen to points. Notice that the punishment increases by one level each time the
amount of money stolen roughly doubles. That means that the level of punishment
(which maps roughly linearly to the amount of time served) grows logarithmically
with the amount of money stolen.
50 2. ALGORITHM ANALYSIS
Think for a moment about the consequences of this. Many a corrupt CEO
certainly has. It means that your total sentence grows extremely slowly with the
amount of money you steal. Knocking off five liquor stores for $10,000 each will
get you more time than embezzling $1,000,000 once. The corresponding benefit of
stealing really large amounts of money is even greater. The moral of logarithmic
growth is clear: “If you are gonna do the crime, make it worth the time!”
Take-Home Lesson: Logarithms arise whenever things are repeatedly halved
or doubled.
2.7 Properties of Logarithms
As we have seen, stating bx=yis equivalent to saying that x= logby.Thebterm
is known as the base of the logarithm. Three bases are of particular importance for
mathematical and historical reasons:
Base b=2Thebinary logarithm, usually denoted lg x, is a base 2 logarithm.
We have seen how this base arises whenever repeated halving (i.e., binary
search) or doubling (i.e., nodes in trees) occurs. Most algorithmic applications
of logarithms imply binary logarithms.
Base b=e–Thenatural log, usually denoted ln x,isabasee=2.71828 ...
logarithm. The inverse of ln xis the exponential function exp(x)=exon your
calculator. Thus, composing these functions gives us
exp(ln x)=x
Base b= 10 – Less common today is the base-10 or common logarithm,
usually denoted as log x. This base was employed in slide rules and logarithm
books in the days before pocket calculators.
We have already seen one important property of logarithms, namely that
loga(xy) = loga(x) + loga(y)
The other important fact to remember is that it is easy to convert a logarithm
from one base to another. This is a consequence of the formula:
logab=logcb
logca
Thus, changing the base of log bfrom base-ato base-csimply involves dividing by
logca. It is easy to convert a common log function to a natural log function, and
vice versa.
Two implications of these properties of logarithms are important to appreciate
from an algorithmic perspective:
2.8 WAR STORY: MYSTERY OF THE PYRAMIDS 51
The base of the logarithm has no real impact on the growth rate- Compare
the following three values: log2(1,000,000) = 19.9316, log3(1,000,000) =
12.5754, and log100(1,000,000) = 3. A big change in the base of the logarithm
produces little difference in the value of the log. Changing the base of the
log from ato cinvolves dividing by logca. This conversion factor is lost to
the Big Oh notation whenever aand care constants. Thus we are usually
justified in ignoring the base of the logarithm when analyzing algorithms.
Logarithms cut any function down to size- The growth rate of the logarithm
of any polynomial function is O(lg n). This follows because
loganb=b·logan
The power of binary search on a wide range of problems is a consequence
of this observation. Note that doing a binary search on a sorted array of
n2things requires only twice as many comparisons as a binary search on n
things.
Logarithms efficiently cut any function down to size. It is hard to do arith-
metic on factorials except for logarithms, since
n!=Π
n
i=1ilog n!=
n
i=1
log i(nlog n)
provides another way for logarithms to pop up in algorithm analysis.
StopandThink:ImportanceofanEvenSplit
Problem: How many queries does binary search take on the million-name Manhat-
tan phone book if each split was 1/3 to 2/3 instead of 1/2 to 1/2?
Solution: When performing binary searches in a telephone book, how important
is it that each query split the book exactly in half? Not much. For the Manhattan
telephone book, we now use log3/2(1,000,000) 35 queries in the worst case, not
a significant change from log2(1,000,000) 20. The power of binary search comes
from its logarithmic complexity, not the base of the log.
2.8 War Story: Mystery of the Pyramids
That look in his eyes should have warned me even before he started talking.
“We want to use a parallel supercomputer for a numerical calculation up to
1,000,000,000, but we need a faster algorithm to do it.”
52 2. ALGORITHM ANALYSIS
I’d seen that distant look before. Eyes dulled from too much exposure to the raw
horsepower of supercomputers—machines so fast that brute force seemed to elim-
inate the need for clever algorithms; at least until the problems got hard enough.
“I am working with a Nobel prize winner to use a computer on a famous problem
in number theory. Are you familiar with Waring’s problem?”
I knew some number theory. “Sure. Waring’s problem asks whether every integer
can be expressed at least one way as the sum of at most four integer squares. For
example, 78 = 82+3
2+2
2+1
2=7
2+5
2+2
2. I remember proving that four
squares suffice to represent any integer in my undergraduate number theory class.
Yes, it’s a famous problem but one that got solved about 200 years ago.”
“No, we are interested in a different version of Waring’s problem. A pyramidal
number is a number of the form (m3m)/6, for m2. Thus the first several
pyramidal numbers are 1, 4, 10, 20, 35, 56, 84, 120, and 165. The conjecture since
1928 is that every integer can be represented by the sum of at most five such
pyramidal numbers. We want to use a supercomputer to prove this conjecture on
all numbers from 1 to 1,000,000,000.”
“Doing a billion of anything will take a substantial amount of time,” I warned.
“The time you spend to compute the minimum representation of each number will
be critical, because you are going to do it one billion times. Have you thought
about what kind of an algorithm you are going to use?”
“We have already written our program and run it on a parallel supercomputer.
It works very fast on smaller numbers. Still, it takes much too much time as soon
as we get to 100,000 or so.”
Terrific, I thought. Our supercomputer junkie had discovered asymptotic
growth. No doubt his algorithm ran in something like quadratic time, and he got
burned as soon as ngot large.
“We need a faster program in order to get to one billion. Can you help us? Of
course, we can run it on our parallel supercomputer when you are ready.”
I am a sucker for this kind of challenge, finding fast algorithms to speed up
programs. I agreed to think about it and got down to work.
I started by looking at the program that the other guy had written. He had
built an array of all the Θ(n1/3) pyramidal numbers from 1 to ninclusive.2To
test each number kin this range, he did a brute force test to establish whether it
was the sum of two pyramidal numbers. If not, the program tested whether it was
the sum of three of them, then four, and finally five, until it first got an answer.
About 45% of the integers are expressible as the sum of three pyramidal numbers.
Most of the remaining 55% require the sum of four, and usually each of these can
be represented in many different ways. Only 241 integers are known to require the
sum of five pyramidal numbers, the largest being 343,867. For about half of the n
numbers, this algorithm presumably went through all of the three-tests and at least
2Why n1/3? Recall that pyramidal numbers are of the form (m3m)/6. The largest msuch that the
resulting number is at most nis roughly 3
6n,sotherear(n1/3) such numbers.
2.8 WAR STORY: MYSTERY OF THE PYRAMIDS 53
some of the four-tests before terminating. Thus, the total time for this algorithm
would be at least O(n×(n1/3)3)=O(n2) time, where n= 1,000,000,000. No
wonder his program cried “Uncle.”
Anything that was going to do significantly better on a problem this large had to
avoid explicitly testing all triples. For each value of k, we were seeking the smallest
set of pyramidal numbers that add up to exactly to k. This problem is called the
knapsack problem, and is discussed in Section 13.10 (page 427). In our case, the
weights are the set of pyramidal numbers no greater than n, with an additional
constraint that the knapsack holds exactly kitems.
A standard approach to solving knapsack precomputes the sum of smaller sub-
sets of the items for use in computing larger subsets. If we have a table of all sums
of two numbers and want to know whether kis expressible as the sum of three
numbers, we can ask whether kis expressible as the sum of a single number plus
a number in this two-table.
Therefore I needed a table of all integers less than nthat can be expressed as
the sum of two of the 1,818 pyramidal numbers less than 1,000,000,000. There can
be at most 1,8182= 3,305,124 of them. Actually, there are only about half this
many after we eliminate duplicates and any sum bigger than our target. Building
a sorted array storing these numbers would be no big deal. Let’s call this sorted
data structure of all pair-sums the two-table.
To find the minimum decomposition for a given k, I would first check whether
it was one of the 1,818 pyramidal numbers. If not, I would then check whether
kwas in the sorted table of the sums of two pyramidal numbers. To see whether
kwas expressible as the sum of three such numbers, all I had to do was check
whether kp[i] was in the two-table for 1 i1,818. This could be done
quickly using binary search. To see whether kwas expressible as the sum of four
pyramidal numbers, I had to check whether ktwo[i] was in the two-table for any
1i≤|two|. However, since almost every kwas expressible in many ways as the
sum of four pyramidal numbers, this test would terminate quickly, and the total
time taken would be dominated by the cost of the threes. Testing whether kwas
the sum of three pyramidal numbers would take O(n1/3lg n). Running this on each
of the nintegers gives an O(n4/3lg n) algorithm for the complete job. Comparing
this to his O(n2) algorithm for n= 1,000,000,000 suggested that my algorithm was
acool30,000 times faster than his original!
My first attempt to code this solved up to n=1,000,000 on my ancient Sparc
ELC in about 20 minutes. From here, I experimented with different data structures
to represent the sets of numbers and different algorithms to search these tables. I
tried using hash tables and bit vectors instead of sorted arrays, and experimented
with variants of binary search such as interpolation search (see Section 14.2 (page
441)). My reward for this work was solving up to n=1,000,000 in under three
minutes, a factor of six improvement over my original program.
With the real thinking done, I worked to tweak a little more performance out of
the program. I avoided doing a sum-of-four computation on any kwhen k1was
54 2. ALGORITHM ANALYSIS
the sum-of-three, since 1 is a pyramidal number, saving about 10% of the total run
time using this trick alone. Finally, I got out my profiler and tried some low-level
tricks to squeeze a little more performance out of the code. For example, I saved
another 10% by replacing a single procedure call with in line code.
At this point, I turned the code over to the supercomputer guy. What he did
with it is a depressing tale, which is reported in Section 7.10 (page 268).
In writing up this war story, I went back to rerun my program more than
ten years later. On my desktop SunBlade 150, getting to 1,000,000 now took 27.0
seconds using the gcc compiler without turning on any compiler optimization. With
Level 4 optimization, the job ran in just 14.0 seconds—quite a tribute to the quality
of the optimizer. The run time on my desktop machine improved by a factor of
about three over the four-year period prior to my first edition of this book, with
an additional 5.3 times over the last 11 years. These speedups are probably typical
for most desktops.
The primary lesson of this war story is to show the enormous potential for
algorithmic speedups, as opposed to the fairly limited speedup obtainable via more
expensive hardware. I sped his program up by about 30,000 times. His million-
dollar computer had 16 processors, each reportedly five times faster on integer
computations than the $3,000 machine on my desk. That gave a maximum potential
speedup of less than 100 times. Clearly, the algorithmic improvement was the big
winner here, as it is certain to be in any sufficiently large computation.
2.9 Advanced Analysis (*)
Ideally, each of us would be fluent in working with the mathematical techniques
of asymptotic analysis. And ideally, each of us would be rich and good looking as
well.
In this section I will survey the major techniques and functions employed in
advanced algorithm analysis. I consider this optional material—it will not be used
elsewhere in the textbook section of this book. That said, it will make some of the
complexity functions reported in the Hitchhiker’s Guide far less mysterious.
2.9.1 Esoteric Functions
The bread-and-butter classes of complexity functions were presented in Section
2.3.1 (page 39). More esoteric functions also make appearances in advanced algo-
rithm analysis. Although we will not see them much in this book, it is still good
business to know what they mean and where they come from:
Inverse Ackerman’s function f(n)=α(n) – This function arises in the de-
tailed analysis of several algorithms, most notably the Union-Find data struc-
ture discussed in Section 6.1.3 (page 198).
The exact definition of this function and why it arises will not be discussed
further. It is sufficient to think of it as geek talk for the slowest-growing
2.9 ADVANCED ANALYSIS (*) 55
complexity function. Unlike the constant function f(n) = 1, it eventually
gets to infinity as n→∞, but it certainly takes its time about it. The value
of α(n)<5 for any value of nthat can be written in this physical universe.
f(n) = log log n– The “log log” function is just that—the logarithm of the
logarithm of n. One natural example of how it might arise is doing a binary
search on a sorted array of only lg nitems.
f(n) = log n/ log log n– This function grows a little slower than log nbecause
it is divided by an even slower growing function.
To see where this arises, consider an n-leaf rooted tree of degree d. For binary
trees, i.e. when d= 2, the height his given
n=2
hh=lgn
by taking the logarithm of both sides of the equation. Now consider the height
of such a tree when the degree d= log n. Then
n= (log n)hh= log n/ log log n
f(n) = log2n– This is the product of log functions—i.e., (log n)×(log n). It
might arise if we wanted to count the bits looked at in doing a binary search
on nitems, each of which was an integer from 1 to (say) n2. Each such integer
requires a lg(n2)=2lgnbit representation, and we look at lg nof them, for
a total of 2 lg2nbits.
The “log squared” function typically arises in the design of intricate nested
data structures, where each node in (say) a binary tree represents another
data structure, perhaps ordered on a different key.
f(n)=n– The square root is not so esoteric, but represents the class of
“sublinear polynomials” since n=n1/2. Such functions arise in building
d-dimensional grids that contain npoints. A n×nsquare has area n,
and an n1/3×n1/3×n1/3cube has volume n. In general, a d-dimensional
hypercube of length n1/d on each side has volume d.
f(n)=n(1+)– Epsilon () is the mathematical symbol to denote a constant
that can be made arbitrarily small but never quite goes away.
It arises in the following way. Suppose I design an algorithm that runs in
2cn(1+1/c)time, and I get to pick whichever cI want. For c=2,thisis4n3/2
or O(n3/2). For c=3,thisis8n4/3or O(n4/3), which is better. Indeed, the
exponent keeps getting better the larger I make c.
The problem is that I cannot make carbitrarily large before the 2cterm
begins to dominate. Instead, we report this algorithm as running in O(n1+),
and leave the best value of to the beholder.
56 2. ALGORITHM ANALYSIS
2.9.2 Limits and Dominance Relations
The dominance relation between functions is a consequence of the theory of lim-
its, which you may recall from Calculus. We say that f(n)dominates g(n)if
limn→∞ g(n)/f(n)=0.
Let’s see this definition in action. Suppose f(n)=2n2and g(n)=n2. Clearly
f(n)>g(n) for all n, but it does not dominate since
lim
n→∞ g(n)/f(n) = lim
n→∞ n2/2n2= lim
n→∞ 1/2=0
This is to be expected because both functions are in the class Θ(n2). What about
f(n)=n3and g(n)=n2? Since
lim
n→∞ g(n)/f(n) = lim
n→∞ n2/n3= lim
n→∞ 1/n =0
the higher-degree polynomial dominates. This is true for any two polynomials,
namely that nadominates nbif a>bsince
lim
n→∞ nb/na= lim
n→∞ nba0
Thus n1.2dominates n1.1999999.
Now consider two exponential functions, say f(n)=3
nand g(n)=2
n. Since
lim
n→∞ g(n)/f(n)=2
n/3n= lim
n→∞(2/3)n=0
the exponential with the higher base dominates.
Our ability to prove dominance relations from scratch depends upon our ability
to prove limits. Let’s look at one important pair of functions. Any polynomial (say
f(n)=n) dominates logarithmic functions (say g(n)=lgn). Since n=2
lg n,
f(n)=(2
lg n)=2
lg n
Now consider
lim
n→∞ g(n)/f(n)=lgn/2lg n
In fact, this does go to 0 as n→∞.
Take-Home Lesson: By interleaving the functions here with those of Section
2.3.1 (page 39), we see where everything fits into the dominance pecking order:
n!cnn3n2n1+nlog nnn
log2nlog nlog n/ log log nlog log nα(n)1
Chapter Notes
Most other algorithm texts devote considerably more efforts to the formal analysis
of algorithms than we have here, and so we refer the theoretically-inclined reader
2.10 EXERCISES 57
elsewhere for more depth. Algorithm texts more heavily stressing analysis include
[CLRS01, KT06].
The book Concrete Mathematics by Knuth, Graham, and Patashnik [GKP89]
offers an interesting and thorough presentation of mathematics for the analysis of
algorithms. Niven and Zuckerman [NZ80] is an nice introduction to number theory,
including Waring’s problem, discussed in the war story.
The notion of dominance also gives rise to the “Little Oh” notation. We say that
f(n)=o(g(n)) iff g(n) dominates f(n). Among other things, the Little Oh proves
useful for asking questions. Asking for an o(n2) algorithm means you want one
that is better than quadratic in the worst case—and means you would be willing
to settle for O(n1.999 log2n).
2.10 Exercises
Program Analysis
2-1. [3] What value is returned by the following function? Express your answer as a
function of n. Give the worst-case running time using the Big Oh notation.
function mystery(n)
r:= 0
for i:= 1 to n1do
for j:= i+1to ndo
for k:= 1 to jdo
r:= r+1
return(r)
2-2. [3] What value is returned by the following function? Express your answer as a
function of n. Give the worst-case running time using Big Oh notation.
function pesky(n)
r:= 0
for i:= 1 to ndo
for j:= 1 to ido
for k:= jto i+jdo
r:= r+1
return(r)
2-3. [5] What value is returned by the following function? Express your answer as a
function of n. Give the worst-case running time using Big Oh notation.
function prestiferous(n)
r:= 0
for i:= 1 to ndo
for j:= 1 to ido
for k:= jto i+jdo
for l:= 1 to i+jkdo
58 2. ALGORITHM ANALYSIS
r:= r+1
return(r)
2-4. [8] What value is returned by the following function? Express your answer as a
function of n. Give the worst-case running time using Big Oh notation.
function conundrum(n)
r:= 0
for i:= 1 to ndo
for j:= i+1to ndo
for k:= i+j1to ndo
r:= r+1
return(r)
2-5. [5] Suppose the following algorithm is used to evaluate the polynomial
p(x)=anxn+an1xn1+...+a1x+a0
p:= a0;
xpower := 1;
for i:= 1 to ndo
xpower := xxpower;
p:= p+aixpower
end
(a) How many multiplications are done in the worst-case? How many additions?
(b) How many multiplications are done on the average?
(c) Can you improve this algorithm?
2-6. [3] Prove that the following algorithm for computing the maximum value in an
array A[1..n] is correct.
function max(A)
m:= A[1]
for i:= 2 to ndo
if A[i]>mthen m:= A[i]
return (m)
Big Oh
2-7. [3] True or False?
(a) Is 2n+1 =O(2n)?
(b) Is 22n=O(2n)?
2-8. [3] For each of the following pairs of functions, either f(n)isinO(g(n)), f(n)is
in Ω(g(n)), or f(n)=Θ(g(n)). Determine which relationship is correct and briefly
explain why.
(a) f(n)=logn2;g(n)=logn+5
2.10 EXERCISES 59
(b) f(n)=n;g(n)=logn2
(c) f(n)=log
2n;g(n)=logn
(d) f(n)=n;g(n)=log
2n
(e) f(n)=nlog n+n;g(n)=logn
(f) f(n) = 10; g(n)=log10
(g) f(n)=2
n;g(n)=10n2
(h) f(n)=2
n;g(n)=3
n
2-9. [3] For each of the following pairs of functions f(n)andg(n), determine whether
f(n)=O(g(n)), g(n)=O(f(n)), or both.
(a) f(n)=(n2n)/2, g(n)=6n
(b) f(n)=n+2
n,g(n)=n2
(c) f(n)=nlog n,g(n)=nn/2
(d) f(n)=n+logn,g(n)=n
(e) f(n) = 2(log n)2,g(n)=logn+1
(f) f(n)=4nlog n+n,g(n)=(n2n)/2
2-10. [3] Prove that n33n2n+1=Θ(n3).
2-11. [3] Prove that n2=O(2n).
2-12. [3] For each of the following pairs of functions f(n)andg(n), give an appropriate
positive constant csuch that f(n)c·g(n) for all n>1.
(a) f(n)=n2+n+1, g(n)=2n3
(b) f(n)=nn+n2,g(n)=n2
(c) f(n)=n2n+1, g(n)=n2/2
2-13. [3] Prove that if f1(n)=O(g1(n)) and f2(n)=O(g2(n)), then f1(n)+f2(n)=
O(g1(n)+g2(n)).
2-14. [3] Prove that if f1(N)=Ω(g1(n)) and f2(n)=Ω(g2(n)), then f1(n)+f2(n)=
Ω(g1(n)+g2(n)).
2-15. [3] Prove that if f1(n)=O(g1(n)) and f2(n)=O(g2(n)), then f1(n)·f2(n)=
O(g1(n)·g2(n))
2-16. [5] Prove for all k1 and all sets of constants {ak,a
k1,...,a
1,a
0}∈R,
aknk+ak1nk1+.... +a1n+a0=O(nk)
2-17. [5] Show that for any real constants aand b,b>0
(n+a)b(nb)
2-18. [5] List the functions below from the lowest to the highest order. If any two or
more are of the same order, indicate which.
60 2. ALGORITHM ANALYSIS
n2nnlg nln n
nn3+7n5lg nne
n
n2+lgnn
22n1lg lg n
n3(lg n)2n!n1+εwhere 0 <ε<1
2-19. [5] List the functions below from the lowest to the highest order. If any two or
more are of the same order, indicate which.
nn 2n
nlog nnn3+7n5n2+logn
n2n3log n
n1
3+logn(log n)2n!
ln nn
log nlog log n
(1/3)n(3/2)n6
2-20. [5] Find two functions f(n)andg(n) that satisfy the following relationship. If no
such fand gexist, write “None.”
(a) f(n)=o(g(n)) and f(n)(g(n))
(b) f(n)=Θ(g(n)) and f(n)=o(g(n))
(c) f(n)=Θ(g(n)) and f(n)=O(g(n))
(d) f(n)=Ω(g(n)) and f(n)=O(g(n))
2-21. [5] True or False?
(a) 2n2+1=O(n2)
(b) n=O(log n)
(c) log n=O(n)
(d) n2(1 + n)=O(n2log n)
(e) 3n2+n=O(n2)
(f) nlog n=O(n)
(g) log n=O(n1/2)
2-22. [5] For each of the following pairs of functions f(n)andg(n), state whether f(n)=
O(g(n)), f(n)=Ω(g(n)), f(n)=Θ(g(n)), or none of the above.
(a) f(n)=n2+3n+4,g(n)=6n+7
(b) f(n)=nn,g(n)=n2n
(c) f(n)=2
nn2,g(n)=n4+n2
2-23. [3] For each of these questions, briefly explain your answer.
(a) If I prove that an algorithm takes O(n2) worst-case time, is it possible that it
takes O(n) on some inputs?
(b) If I prove that an algorithm takes O(n2) worst-case time, is it possible that it
takes O(n) on all inputs?
(c) If I prove that an algorithm takes Θ(n2) worst-case time, is it possible that it
takes O(n) on some inputs?
2.10 EXERCISES 61
(d) If I prove that an algorithm takes Θ(n2) worst-case time, is it possible that it
takes O(n) on all inputs?
(e) Is the function f(n)=Θ(n2), where f(n) = 100n2for even nand f(n)=
20n2nlog2nfor odd n?
2-24. [3] For each of the following, answer yes,no,orcan’t tell. Explain your reasoning.
(a) Is 3n=O(2n)?
(b) Is log 3n=O(log 2n)?
(c) Is 3n(2
n)?
(d) Is log 3n= Ω(log 2n)?
2-25. [5] For each of the following expressions f(n) find a simple g(n) such that f(n)=
Θ(g(n)).
(a) f(n)=n
i=1
1
i.
(b) f(n)=n
i=11
i.
(c) f(n)=n
i=1 log i.
(d) f(n)=log(n!).
2-26. [5] Place the following functions into increasing asymptotic order.
f1(n)=n2log2n,f2(n)=n(log2n)2,f3(n)=n
i=0 2i,f4(n)=log
2(n
i=0 2i).
2-27. [5] Place the following functions into increasing asymptotic order. If two or more
of the functions are of the same asymptotic order then indicate this.
f1(n)=n
i=1 i,f2(n)=(
n)logn,f3(n)=nlog n,f4(n)=12n3
2+4n,
2-28. [5] For each of the following expressions f(n) find a simple g(n) such that
f(n)=Θ(g(n)). (You should be able to prove your result by exhibiting the rel-
evant parameters, but this is not required for the homework.)
(a) f(n)=n
i=1 3i4+2i319i+ 20.
(b) f(n)=n
i=1 3(4i) + 2(3i)i19 + 20.
(c) f(n)=n
i=1 5i+3
2i.
2-29. [5] Which of the following are true?
(a) n
i=1 3i(3
n1).
(b) n
i=1 3i(3
n).
(c) n
i=1 3i(3
n+1).
2-30. [5] For each of the following functions ffind a simple function gsuch that f(n)=
Θ(g(n)).
(a) f1(n) = (1000)2n+4
n.
(b) f2(n)=n+nlog n+n.
(c) f3(n)=log(n20)+(logn)10.
(d) f4(n)=(0.99)n+n100.
62 2. ALGORITHM ANALYSIS
2-31. [5] For each pair of expressions (A, B) below, indicate whether Ais O,o,ω,or
ΘofB. Note that zero, one or more of these relations may hold for a given pair;
list all correct ones.
AB
(a)n100 2n
(b)(lgn)12 n
(c)nn
cos(πn/8)
(d)10
n100n
(e)nlg n(lg n)n
(f)lg(n!) nlg n
Summations
2-32. [5] Prove that:
1222+3
242+...+(1)k1k2=(1)k1k(k+1)/2
2-33. [5] Find an expression for the sum of the ith row of the following triangle, and
prove its correctness. Each entry is the sum of the three entries directly above it.
All non existing entries are considered 0.
1
111
12321
1367631
1 4 10 16 19 16 10 4 1
2-34. [3] Assume that Christmas has ndays. Exactly how many presents did my “true
love” send me? (Do some research if you do not understand this question.)
2-35. [5] Consider the following code fragment.
for i=1 to n do
for j=i to 2*i do
output ‘‘foobar’’
Let T(n) denote the number of times ‘foobar’ is printed as a function of n.
a. Express T(n) as a summation (actually two nested summations).
b. Simplify the summation. Show your work.
2-36. [5] Consider the following code fragment.
for i=1 to n/2 do
for j=i to n-i do
for k=1 to j do
output ‘‘foobar’’
Assume nis even. Let T(n) denote the number of times ‘foobar’ is printed as a
function of n.
(a) Express T(n) as three nested summations.
(b) Simplify the summation. Show your work.
2.10 EXERCISES 63
2-37. [6] When you first learned to multiply numbers, you were told that x×ymeans
adding xa total of ytimes, so 5×4 = 5+5+5+5 = 20. What is the time complexity
of multiplying two n-digit numbers in base b(people work in base 10, of course,
while computers work in base 2) using the repeated addition method, as a function
of nand b. Assume that single-digit by single-digit addition or multiplication takes
O(1) time. (Hint: how big can ybe as a function of nand b?)
2-38. [6] In grade school, you learned to multiply long numbers on a digit-by-digit basis,
so that 127 ×211 = 127 ×1 + 127 ×10 + 127 ×200 = 26,397. Analyze the time
complexity of multiplying two n-digit numbers with this method as a function of
n(assume constant base size). Assume that single-digit by single-digit addition or
multiplication takes O(1) time.
Logarithms
2-39. [5] Prove the following identities on logarithms:
(a) Prove that loga(xy)=log
ax+log
ay
(b) Prove that logaxy=ylogax
(c) Prove that logax=logbx
logba
(d) Prove that xlogby=ylogbx
2-40. [3] Show that lg(n+1)=lg n+1
2-41. [3] Prove that that the binary representation of n1haslg2n+1bits.
2-42. [5] In one of my research papers I give a comparison-based sorting algorithm that
runs in O(nlog(n)). Given the existence of an Ω(nlog n) lower bound for sorting,
how can this be possible?
Interview Problems
2-43. [5] You are given a set Sof nnumbers. You must pick a subset Sof knumbers from
Ssuch that the probability of each element of Soccurring in Sis equal (i.e., each
is selected with probability k/n). You may make only one pass over the numbers.
What if nis unknown?
2-44. [5] We have 1,000 data items to store on 1,000 nodes. Each node can store copies
of exactly three different items. Propose a replication scheme to minimize data loss
as nodes fail. What is the expected number of data entries that get lost when three
random nodes fail?
2-45. [5] Consider the following algorithm to find the minimum element in an array
of numbers A[0,...,n]. One extra variable tmp is allocated to hold the current
minimum value. Start from A[0]; ”tmp” is compared against A[1], A[2], ..., A[N]
in order. When A[i]<tmp,tmp =A[i]. What is the expected number of times that
the assignment operation tmp =A[i] is performed?
2-46. [5] You have a 100-story building and a couple of marbles. You must identify the
lowest floor for which a marble will break if you drop it from this floor. How fast
can you find this floor if you are given an infinite supply of marbles? What if you
have only two marbles?
64 2. ALGORITHM ANALYSIS
2-47. [5] You are given 10 bags of gold coins. Nine bags contain coins that each weigh 10
grams. One bag contains all false coins that weigh one gram less. You must identify
this bag in just one weighing. You have a digital balance that reports the weight of
what is placed on it.
2-48. [5] You have eight balls all of the same size. Seven of them weigh the same, and one
of them weighs slightly more. How can you find the ball that is heavier by using a
balance and only two weighings?
2-49. [5] Suppose we start with ncompanies that eventually merge into one big company.
How many different ways are there for them to merge?
2-50. [5] ARamanujam number can be written two different ways as the sum of two
cubes—i.e. , there exist distinct a,b,c,anddsuch that a3+b3=c3+d3. Generate
all Ramanujam numbers where a, b, c, d < n.
2-51. [7] Six pirates must divide $300 dollars among themselves. The division is to proceed
as follows. The senior pirate proposes a way to divide the money. Then the pirates
vote. If the senior pirate gets at least half the votes he wins, and that division
remains. If he doesn’t, he is killed and then the next senior-most pirate gets a
chance to do the division. Now you have to tell what will happen and why (i.e.
, how many pirates survive and how the division is done)? All the pirates are
intelligent and the first priority is to stay alive and the next priority is to get as
much money as possible.
2-52. [7] Reconsider the pirate problem above, where only one indivisible dollar is to be
divided. Who gets the dollar and how many are killed?
Programming Challenges
These programming challenge problems with robot judging are available at
http://www.programming-challenges.com or http://online-judge.uva.es.
2-1. “Primary Arithmetic” – Programming Challenges 110501, UVA Judge 10035.
2-2. “A Multiplication Game” – Programming Challenges 110505, UVA Judge 847.
2-3. “Light, More Light” – Programming Challenges 110701, UVA Judge 10110.
3
Data Structures
Changing a data structure in a slow program can work the same way an organ
transplant does in a sick patient. Important classes of abstract data types such as
containers, dictionaries, and priority queues, have many different but functionally
equivalent data structures that implement them. Changing the data structure does
not change the correctness of the program, since we presumably replace a correct
implementation with a different correct implementation. However, the new imple-
mentation of the data type realizes different tradeoffs in the time to execute various
operations, so the total performance can improve dramatically. Like a patient in
need of a transplant, only one part might need to be replaced in order to fix the
problem.
But it is better to be born with a good heart than have to wait for a replace-
ment. The maximum benefit from good data structures results from designing your
program around them in the first place. We assume that the reader has had some
previous exposure to elementary data structures and pointer manipulation. Still,
data structure (CS II) courses these days focus more on data abstraction and ob-
ject orientation than the nitty-gritty of how structures should be represented in
memory. We will review this material to make sure you have it down.
In data structures, as with most subjects, it is more important to really un-
derstand the basic material than have exposure to more advanced concepts. We
will focus on each of the three fundamental abstract data types (containers, dic-
tionaries, and priority queues) and see how they can be implemented with arrays
and lists. Detailed discussion of the tradeoffs between more sophisticated imple-
mentations is deferred to the relevant catalog entry for each of these data types.
S.S. Skiena, The Algorithm Design Manual, 2nd ed., DOI: 10.1007/978-1-84800-070-4 3,
c
Springer-Verlag London Limited 2008
66 3. DATA STRUCTURES
3.1 Contiguous vs. Linked Data Structures
Data structures can be neatly classified as either contiguous or linked, depending
upon whether they are based on arrays or pointers:
Contiguously-allocated structures are composed of single slabs of memory, and
include arrays, matrices, heaps, and hash tables.
Linked data structures are composed of distinct chunks of memory bound
together by pointers, and include lists, trees, and graph adjacency lists.
In this section, we review the relative advantages of contiguous and linked data
structures. These tradeoffs are more subtle than they appear at first glance, so I
encourage readers to stick with me here even if you may be familiar with both
types of structures.
3.1.1 Arrays
The array is the fundamental contiguously-allocated data structure. Arrays are
structures of fixed-size data records such that each element can be efficiently located
by its index or (equivalently) address.
A good analogy likens an array to a street full of houses, where each array
element is equivalent to a house, and the index is equivalent to the house number.
Assuming all the houses are equal size and numbered sequentially from 1 to n,we
can compute the exact position of each house immediately from its address.1
Advantages of contiguously-allocated arrays include:
Constant-time access given the index – Because the index of each element
maps directly to a particular memory address, we can access arbitrary data
items instantly provided we know the index.
Space efficiency Arrays consist purely of data, so no space is wasted with
links or other formatting information. Further, end-of-record information is
not needed because arrays are built from fixed-size records.
Memory locality – A common programming idiom involves iterating through
all the elements of a data structure. Arrays are good for this because they
exhibit excellent memory locality. Physical continuity between successive data
accesses helps exploit the high-speed cache memory on modern computer
architectures.
The downside of arrays is that we cannot adjust their size in the middle of
a program’s execution. Our program will fail soon as we try to add the (n+
1Houses in Japanese cities are traditionally numbered in the order they were built, not by their physical
location. This makes it extremely difficult to locate a Japanese address without a detailed map.
3.1 CONTIGUOUS VS. LINKED DATA STRUCTURES 67
1)st customer, if we only allocate room for nrecords. We can compensate by
allocating extremely large arrays, but this can waste space, again restricting what
our programs can do.
Actually, we can efficiently enlarge arrays as we need them, through the miracle
of dynamic arrays. Suppose we start with an array of size 1, and double its size from
mto 2meach time we run out of space. This doubling process involves allocating a
new contiguous array of size 2m, copying the contents of the old array to the lower
half of the new one, and returning the space used by the old array to the storage
allocation system.
The apparent waste in this procedure involves the recopying of the old contents
on each expansion. How many times might an element have to be recopied after a
total of ninsertions? Well, the first inserted element will have been recopied when
the array expands after the first, second, fourth, eighth, ...insertions. It will take
log2ndoublings until the array gets to have npositions. However, most elements
do not suffer much upheaval. Indeed, the (n/2 + 1)st through nth elements will
move at most once and might never have to move at all.
If half the elements move once, a quarter of the elements twice, and so on, the
total number of movements Mis given by
M=
lg n
i=1
i·n/2i=n
lg n
i=1
i/2in
i=1
i/2i=2n
Thus, each of the nelements move only two times on average, and the total work
of managing the dynamic array is the same O(n) as it would have been if a single
array of sufficient size had been allocated in advance!
The primary thing lost using dynamic arrays is the guarantee that each array
access takes constant time in the worst case. Now all the queries will be fast, except
for those relatively few queries triggering array doubling. What we get instead is a
promise that the nth array access will be completed quickly enough that the total
effort expended so far will still be O(n). Such amortized guarantees arise frequently
in the analysis of data structures.
3.1.2 Pointers and Linked Structures
Pointers are the connections that hold the pieces of linked structures together.
Pointers represent the address of a location in memory. A variable storing a pointer
to a given data item can provide more freedom than storing a copy of the item
itself. A cell-phone number can be thought of as a pointer to its owner as they
move about the planet.
Pointer syntax and power differ significantly across programming languages, so
we begin with a quick review of pointers in C language. A pointer pis assumed to
68 3. DATA STRUCTURES
ClintonJeffersonLincoln NIL
Figure 3.1: Linked list example showing data and pointer fields
give the address in memory where a particular chunk of data is located.2Pointers
in C have types declared at compiler time, denoting the data type of the items
they can point to. We use *p to denote the item that is pointed to by pointer p,
and &x to denote the address (i.e. , pointer) of a particular variable x. A special
NULL pointer value is used to denote structure-terminating or unassigned pointers.
All linked data structures share certain properties, as revealed by the following
linked list type declaration:
typedef struct list {
item_type item; /* data item */
struct list *next; /* point to successor */
} list;
In particular:
Each node in our data structure (here list) contains one or more data fields
(here item) that retain the data that we need to store.
Each node contains a pointer field to at least one other node (here next).
This means that much of the space used in linked data structures has to be
devoted to pointers, not data.
Finally, we need a pointer to the head of the structure, so we know where to
access it.
The list is the simplest linked structure. The three basic operations supported
by lists are searching, insertion, and deletion. In doubly-linked lists, each node points
both to its predecessor and its successor element. This simplifies certain operations
at a cost of an extra pointer field per node.
Searching a List
Searching for item xin a linked list can be done iteratively or recursively. We opt
for recursively in the implementation below. If xis in the list, it is either the first
element or located in the smaller rest of the list. Eventually, we reduce the problem
to searching in an empty list, which clearly cannot contain x.
2C permits direct manipulation of memory addresses in ways which may horrify Java programmers, but we
will avoid doing any such tricks.
3.1 CONTIGUOUS VS. LINKED DATA STRUCTURES 69
list *search_list(list *l, item_type x)
{
if (l == NULL) return(NULL);
if (l->item == x)
return(l);
else
return( search_list(l->next, x) );
}
Insertion into a List
Insertion into a singly-linked list is a nice exercise in pointer manipulation, as shown
below. Since we have no need to maintain the list in any particular order, we might
as well insert each new item in the simplest place. Insertion at the beginning of the
list avoids any need to traverse the list, but does require us to update the pointer
(denoted l) to the head of the data structure.
void insert_list(list **l, item_type x)
{
list *p; /* temporary pointer */
p = malloc( sizeof(list) );
p->item = x;
p->next = *l;
*l = p;
}
Two C-isms to note. First, the malloc function allocates a chunk of memory
of sufficient size for a new node to contain x. Second, the funny double star (**l)
denotes that lis a pointer to a pointer to a list node. Thus the last line, *l=p;
copies pto the place pointed to l, which is the external variable maintaining access
to the head of the list.
Deletion From a List
Deletion from a linked list is somewhat more complicated. First, we must find a
pointer to the predecessor of the item to be deleted. We do this recursively:
70 3. DATA STRUCTURES
list *predecessor_list(list *l, item_type x)
{
if ((l == NULL) || (l->next == NULL)) {
printf("Error: predecessor sought on null list.\n");
return(NULL);
}
if ((l->next)->item == x)
return(l);
else
return( predecessor_list(l->next, x) );
}
The predecessor is needed because it points to the doomed node, so its next
pointer must be changed. The actual deletion operation is simple, once ruling out
the case that the to-be-deleted element does not exist. Special care must be taken
to reset the pointer to the head of the list (l) when the first element is deleted:
delete_list(list **l, item_type x)
{
list *p; /* item pointer */
list *pred; /* predecessor pointer */
list *search_list(), *predecessor_list();
p = search_list(*l,x);
if (p != NULL) {
pred = predecessor_list(*l,x);
if (pred == NULL) /* splice out out list */
*l = p->next;
else
pred->next = p->next;
free(p); /* free memory used by node */
}
}
C language requires explicit deallocation of memory, so we must free the
deleted node after we are finished with it to return the memory to the system.
3.1.3 Comparison
The relative advantages of linked lists over static arrays include:
Overflow on linked structures can never occur unless the memory is actually
full.
3.2 STACKS AND QUEUES 71
Insertions and deletions are simpler than for contiguous (array) lists.
With large records, moving pointers is easier and faster than moving the
items themselves.
while the relative advantages of arrays include:
Linked structures require extra space for storing pointer fields.
Linked lists do not allow efficient random access to items.
Arrays allow better memory locality and cache performance than random
pointer jumping.
Take-Home Lesson: Dynamic memory allocation provides us with flexibility
on how and where we use our limited storage resources.
One final thought about these fundamental structures is that they can be
thought of as recursive objects:
Lists – Chopping the first element off a linked list leaves a smaller linked list.
This same argument works for strings, since removing characters from string
leaves a string. Lists are recursive objects.
Arrays – Splitting the first kelements off of an nelement array gives two
smaller arrays, of size kand nk, respectively. Arrays are recursive objects.
This insight leads to simpler list processing, and efficient divide-and-conquer
algorithms such as quicksort and binary search.
3.2 Stacks and Queues
We use the term container to denote a data structure that permits storage and
retrieval of data items independent of content. By contrast, dictionaries are abstract
data types that retrieve based on key values or content, and will be discussed in
Section 3.3 (page 72).
Containers are distinguished by the particular retrieval order they support. In
the two most important types of containers, this retrieval order depends on the
insertion order:
Stacks – Support retrieval by last-in, first-out (LIFO) order. Stacks are simple
to implement and very efficient. For this reason, stacks are probably the
right container to use when retrieval order doesn’t matter at all, such as
when processing batch jobs. The put and get operations for stacks are usually
called push and pop:
72 3. DATA STRUCTURES
Push(x,s): Insert item xat the top of stack s.
Pop(s): Return (and remove) the top item of stack s.
LIFO order arises in many real-world contexts. People crammed into a subway
car exit in LIFO order. Food inserted into my refrigerator usually exits the
same way, despite the incentive of expiration dates. Algorithmically, LIFO
tends to happen in the course of executing recursive algorithms.
Queues – Support retrieval in first in, first out (FIFO) order. This is surely
the fairest way to control waiting times for services. You want the container
holding jobs to be processed in FIFO order to minimize the maximum time
spent waiting. Note that the average waiting time will be the same regardless
of whether FIFO or LIFO is used. Many computing applications involve data
items with infinite patience, which renders the question of maximum waiting
time moot.
Queues are somewhat trickier to implement than stacks and thus are most
appropriate for applications (like certain simulations) where the order is im-
portant. The put and get operations for queues are usually called enqueue
and dequeue.
Enqueue(x,q): Insert item xat the back of queue q.
Dequeue(q): Return (and remove) the front item from queue q.
We will see queues later as the fundamental data structure controlling
breadth-first searches in graphs.
Stacks and queues can be effectively implemented using either arrays or linked
lists. The key issue is whether an upper bound on the size of the container is known
in advance, thus permitting the use of a statically-allocated array.
3.3 Dictionaries
The dictionary data type permits access to data items by content. You stick an
item into a dictionary so you can find it when you need it.
The primary operations of dictionary support are:
Search(D,k) – Given a search key k, return a pointer to the element in dic-
tionary Dwhose key value is k, if one exists.
Insert(D,x) – Given a data item x, add it to the set in the dictionary D.
Delete(D,x) – Given a pointer to a given data item xin the dictionary D,
remove it from D.
3.3 DICTIONARIES 73
Certain dictionary data structures also efficiently support other useful opera-
tions:
Max(D) or Min(D) – Retrieve the item with the largest (or smallest) key from
D. This enables the dictionary to serve as a priority queue, to be discussed
in Section 3.5 (page 83).
Predecessor(D,k) or Successor(D,k) – Retrieve the item from Dwhose key is
immediately before (or after) kin sorted order. These enable us to iterate
through the elements of the data structure.
Many common data processing tasks can be handled using these dictionary
operations. For example, suppose we want to remove all duplicate names from a
mailing list, and print the results in sorted order. Initialize an empty dictionary
D, whose search key will be the record name. Now read through the mailing list,
and for each record search to see if the name is already in D. If not, insert it
into D. Once finished, we must extract the remaining names out of the dictionary.
By starting from the first item Min(D) and repeatedly calling Successor until we
obtain Max(D), we traverse all elements in sorted order.
By defining such problems in terms of abstract dictionary operations, we avoid
the details of the data structure’s representation and focus on the task at hand.
In the rest of this section, we will carefully investigate simple dictionary imple-
mentations based on arrays and linked lists. More powerful dictionary implemen-
tations such as binary search trees (see Section 3.4 (page 77)) and hash tables (see
Section 3.7 (page 89)) are also attractive options in practice. A complete discussion
of different dictionary data structures is presented in the catalog in Section 12.1
(page 367). We encourage the reader to browse through the data structures section
of the catalog to better learn what your options are.
Stop and Think: Comparing Dictionary Implementations (I)
Problem: What are the asymptotic worst-case running times for each of the seven
fundamental dictionary operations (search, insert, delete, successor, predecessor,
minimum, and maximum) when the data structure is implemented as:
An unsorted array.
A sorted array.
Solution: This problem (and the one following it) reveal some of the inherent trade-
offs of data structure design. A given data representation may permit efficient im-
plementation of certain operations at the cost that other operations are expensive.
74 3. DATA STRUCTURES
In addition to the array in question, we will assume access to a few extra
variables such as n—the number of elements currently in the array. Note that we
must maintain the value of these variables in the operations where they change
(e.g., insert and delete), and charge these operations the cost of this maintenance.
The basic dictionary operations can be implemented with the following costs
on unsorted and sorted arrays, respectively:
Unsorted Sorted
Dictionary operation array array
Search(L,k)O(n)O(log n)
Insert(L,x)O(1) O(n)
Delete(L,x)O(1)O(n)
Successor(L,x)O(n)O(1)
Predecessor(L,x)O(n)O(1)
Minimum(L)O(n)O(1)
Maximum(L)O(n)O(1)
We must understand the implementation of each operation to see why. First,
we discuss the operations when maintaining an unsorted array A.
Search is implemented by testing the search key kagainst (potentially) each
element of an unsorted array. Thus, search takes linear time in the worst case,
which is when key kis not found in A.
Insertion is implemented by incrementing nand then copying item xto
the nth cell in the array, A[n]. The bulk of the array is untouched, so this
operation takes constant time.
Deletion is somewhat trickier, hence the superscript() in the table. The
definition states that we are given a pointer xto the element to delete, so
we need not spend any time searching for the element. But removing the xth
element from the array Aleaves a hole that must be filled. We could fill the
hole by moving each of the elements A[x+1] to A[n] up one position, but
this requires Θ(n) time when the first element is deleted. The following idea
is better: just write over A[x] with A[n], and decrement n. This only takes
constant time.
The definition of the traversal operations, Predecessor and Successor, refer
to the item appearing before/after xin sorted order. Thus, the answer is
not simply A[x1] (or A[x+ 1]), because in an unsorted array an element’s
physical predecessor (successor) is not necessarily its logical predecessor (suc-
cessor). Instead, the predecessor of A[x] is the biggest element smaller than
A[x]. Similarly, the successor of A[x] is the smallest element larger than A[x].
Both require a sweep through all nelements of Ato determine the winner.
Minimum and Maximum are similarly defined with respect to sorted order,
and so require linear sweeps to identify in an unsorted array.
3.3 DICTIONARIES 75
Implementing a dictionary using a sorted array completely reverses our notions
of what is easy and what is hard. Searches can now be done in O(log n) time, using
binary search, because we know the median element sits in A[n/2]. Since the upper
and lower portions of the array are also sorted, the search can continue recursively
on the appropriate portion. The number of halvings of nuntil we get to a single
element is lg n.
The sorted order also benefits us with respect to the other dictionary retrieval
operations. The minimum and maximum elements sit in A[1] and A[n], while the
predecessor and successor to A[x]areA[x1] and A[x+ 1], respectively.
Insertion and deletion become more expensive, however, because making room
for a new item or filling a hole may require moving many items arbitrarily. Thus
both become linear-time operations.
Take-Home Lesson: Data structure design must balance all the different op-
erations it supports. The fastest data structure to support both operations A
and Bmay well not be the fastest structure to support either operation Aor
B.
Stop and Think: Comparing Dictionary Implementations (II)
Problem: What is the asymptotic worst-case running times for each of the seven
fundamental dictionary operations when the data structure is implemented as
A singly-linked unsorted list.
A doubly-linked unsorted list.
A singly-linked sorted list.
A doubly-linked sorted list.
Solution: Two different issues must be considered in evaluating these implementa-
tions: singly- vs. doubly-linked lists and sorted vs. unsorted order. Subtle operations
are denoted with a superscript:
Singly Double Singly Doubly
Dictionary operation unsorted unsorted sorted sorted
Search(L,k)O(n)O(n)O(n)O(n)
Insert(L,x)O(1) O(1) O(n)O(n)
Delete(L,x)O(n)O(1) O(n)O(1)
Successor(L,x)O(n)O(n)O(1) O(1)
Predecessor(L,x)O(n)O(n)O(n)O(1)
Minimum(L)O(n)O(n)O(1) O(1)
Maximum(L)O(n)O(n)O(1)O(1)
76 3. DATA STRUCTURES
As with unsorted arrays, search operations are destined to be slow while main-
tenance operations are fast.
Insertion/Deletion – The complication here is deletion from a singly-linked
list. The definition of the Delete operation states we are given a pointer xto
the item to be deleted. But what we really need is a pointer to the element
pointing to xin the list, because that is the node that needs to be changed.
We can do nothing without this list predecessor, and so must spend linear
time searching for it on a singly-linked list. Doubly-linked lists avoid this
problem, since we can immediately retrieve the list predecessor of x.
Deletion is faster for sorted doubly-linked lists than sorted arrays, because
splicing out the deleted element from the list is more efficient than filling
the hole by moving array elements. The predecessor pointer problem again
complicates deletion from singly-linked sorted lists.
Search – Sorting provides less benefit for linked lists than it did for arrays. Bi-
nary search is no longer possible, because we can’t access the median element
without traversing all the elements before it. What sorted lists do provide is
quick termination of unsuccessful searches, for if we have not found Abbott by
the time we hit Costello we can deduce that he doesn’t exist. Still, searching
takes linear time in the worst case.
Traversal operations – The predecessor pointer problem again complicates
implementing Predecessor. The logical successor is equivalent to the node
successor for both types of sorted lists, and hence can be implemented in
constant time.
Maximum – The maximum element sits at the tail of the list, which would
normally require Θ(n) time to reach in either singly- or doubly-linked lists.
However, we can maintain a separate pointer to the list tail, provided we
pay the maintenance costs for this pointer on every insertion and deletion.
The tail pointer can be updated in constant time on doubly-linked lists: on
insertion check whether last->next still equals NULL, and on deletion set
last to point to the list predecessor of last if the last element is deleted.
We have no efficient way to find this predecessor for singly-linked lists. So
why can we implement maximum in Θ(1) on singly-linked lists? The trick is
to charge the cost to each deletion, which already took linear time. Adding
an extra linear sweep to update the pointer does not harm the asymptotic
complexity of Delete, while gaining us Maximum in constant time as a reward
for clear thinking.
3.4 BINARY SEARCH TREES 77
1
2
13
13
1
2
3
2
2
3
1
2
3
Figure 3.2: The five distinct binary search trees on three nodes
3.4 Binary Search Trees
We have seen data structures that allow fast search or flexible update, but not fast
search and flexible update. Unsorted, doubly-linked lists supported insertion and
deletion in O(1) time but search took linear time in the worse case. Sorted arrays
support binary search and logarithmic query times, but at the cost of linear-time
update.
Binary search requires that we have fast access to two elements—specifically
the median elements above and below the given node. To combine these ideas, we
need a “linked list” with two pointers per node. This is the basic idea behind binary
search trees.
Arooted binary tree is recursively defined as either being (1) empty, or (2)
consisting of a node called the root, together with two rooted binary trees called
the left and right subtrees, respectively. The order among “brother” nodes matters
in rooted trees, so left is different from right. Figure 3.2 gives the shapes of the five
distinct binary trees that can be formed on three nodes.
A binary search tree labels each node in a binary tree with a single key such
that for any node labeled x, all nodes in the left subtree of xhave keys <xwhile
all nodes in the right subtree of xhave keys >x. This search tree labeling scheme
is very special. For any binary tree on nnodes, and any set of nkeys, there is
exactly one labeling that makes it a binary search tree. The allowable labelings for
three-node trees are given in Figure 3.2.
3.4.1 Implementing Binary Search Trees
Binary tree nodes have left and right pointer fields, an (optional) parent pointer,
and a data field. These relationships are shown in Figure 3.3; a type declaration
for the tree structure is given below:
78 3. DATA STRUCTURES
left right
parent
Figure 3.3: Relationships in a binary search tree (left). Finding the minimum (center) and
maximum (right) elements in a binary search tree
typedef struct tree {
item_type item; /* data item */
struct tree *parent; /* pointer to parent */
struct tree *left; /* pointer to left child */
struct tree *right; /* pointer to right child */
} tree;
The basic operations supported by binary trees are searching, traversal, inser-
tion, and deletion.
Searching in a Tree
The binary search tree labeling uniquely identities where each key is located. Start
at the root. Unless it contains the query key x, proceed either left or right depending
upon whether xoccurs before or after the root key. This algorithm works because
both the left and right subtrees of a binary search tree are themselves binary search
trees. This recursive structure yields the recursive search algorithm below:
tree *search_tree(tree *l, item_type x)
{
if (l == NULL) return(NULL);
if (l->item == x) return(l);
if (x < l->item)
return( search_tree(l->left, x) );
else
return( search_tree(l->right, x) );
}
3.4 BINARY SEARCH TREES 79
This search algorithm runs in O(h) time, where hdenotes the height of the
tree.
Finding Minimum and Maximum Elements in a Tree
Implementing the find-minimum operation requires knowing where the minimum
element is in the tree. By definition, the smallest key must reside in the left subtree
of the root, since all keys in the left subtree have values less than that of the root.
Therefore, as shown in Figure 3.3, the minimum element must be the leftmost
descendent of the root. Similarly, the maximum element must be the rightmost
descendent of the root.
tree *find_minimum(tree *t)
{
tree *min; /* pointer to minimum */
if (t == NULL) return(NULL);
min = t;
while (min->left != NULL)
min = min->left;
return(min);
}
Traversal in a Tree
Visiting all the nodes in a rooted binary tree proves to be an important component
of many algorithms. It is a special case of traversing all the nodes and edges in a
graph, which will be the foundation of Chapter 5.
A prime application of tree traversal is listing the labels of the tree nodes.
Binary search trees make it easy to report the labels in sorted order. By definition,
all the keys smaller than the root must lie in the left subtree of the root, and all
keys bigger than the root in the right subtree. Thus, visiting the nodes recursively
in accord with such a policy produces an in-order traversal of the search tree:
void traverse_tree(tree *l)
{
if (l != NULL) {
traverse_tree(l->left);
process_item(l->item);
traverse_tree(l->right);
}
}
80 3. DATA STRUCTURES
Each item is processed once during the course of traversal, which runs in O(n)
time, where ndenotes the number of nodes in the tree.
Alternate traversal orders come from changing the position of process item
relative to the traversals of the left and right subtrees. Processing the item first
yields a pre-order traversal, while processing it last gives a post-order traversal.
These make relatively little sense with search trees, but prove useful when the
rooted tree represents arithmetic or logical expressions.
Insertion in a Tree
There is only one place to insert an item xinto a binary search tree Twhere we
know we can find it again. We must replace the NULL pointer found in Tafter an
unsuccessful query for the key k.
This implementation uses recursion to combine the search and node insertion
stages of key insertion. The three arguments to insert tree are (1) a pointer lto
the pointer linking the search subtree to the rest of the tree, (2) the key xto be
inserted, and (3) a parent pointer to the parent node containing l. The node is
allocated and linked in on hitting the NULL pointer. Note that we pass the pointer to
the appropriate left/right pointer in the node during the search, so the assignment
*l = p; links the new node into the tree:
insert_tree(tree **l, item_type x, tree *parent)
{
tree *p; /* temporary pointer */
if (*l == NULL) {
p = malloc(sizeof(tree)); /* allocate new node */
p->item = x;
p->left = p->right = NULL;
p->parent = parent;
*l = p; /* link into parent’s record */
return;
}
if (x < (*l)->item)
insert_tree(&((*l)->left), x, *l);
else
insert_tree(&((*l)->right), x, *l);
}
Allocating the node and linking it in to the tree is a constant-time operation
after the search has been performed in O(h) time.
3.4 BINARY SEARCH TREES 81
initial tree delete node with zero children (3)
5
5
2
6
8
7
3
1
2
8
7
4
3
1
2
5
6
8
7
4
1
delete node with 2 children (4)delete node with 1 child (6)
6
8
7
4
3
1
5
2
Figure 3.4: Deleting tree nodes with 0, 1, and 2 children
Deletion from a Tree
Deletion is somewhat trickier than insertion, because removing a node means ap-
propriately linking its two descendant subtrees back into the tree somewhere else.
There are three cases, illustrated in Figure 3.4. Leaf nodes have no children, and
so may be deleted by simply clearing the pointer to the given node.
The case of the doomed node having one child is also straightforward. There
is one parent and one grandchild, and we can link the grandchild directly to the
parent without violating the in-order labeling property of the tree.
But what of a to-be-deleted node with two children? Our solution is to relabel
this node with the key of its immediate successor in sorted order. This successor
must be the smallest value in the right subtree, specifically the leftmost descendant
in the right subtree (p). Moving this to the point of deletion results in a properly-
labeled binary search tree, and reduces our deletion problem to physically removing
a node with at most one child—a case that has been resolved above.
The full implementation has been omitted here because it looks a little ghastly,
but the code follows logically from the description above.
The worst-case complexity analysis is as follows. Every deletion requires the
cost of at most two search operations, each taking O(h) time where his the height
of the tree, plus a constant amount of pointer manipulation.
3.4.2 How Good Are Binary Search Trees?
When implemented using binary search trees, all three dictionary operations take
O(h) time, where his the height of the tree. The smallest height we can hope for
occurs when the tree is perfectly balanced, where h=log n. This is very good,
but the tree must be perfectly balanced.
82 3. DATA STRUCTURES
Our insertion algorithm puts each new item at a leaf node where it should have
been found. This makes the shape (and more importantly height) of the tree a
function of the order in which we insert the keys.
Unfortunately, bad things can happen when building trees through insertion.
The data structure has no control over the order of insertion. Consider what hap-
pens if the user inserts the keys in sorted order. The operations insert(a), followed
by insert(b),insert(c),insert(d), . . . will produce a skinny linear height tree
where only right pointers are used.
Thus binary trees can have heights ranging from lg nto n. But how tall are
they on average? The average case analysis of algorithms can be tricky because we
must carefully specify what we mean by average. The question is well defined if we
consider each of the n! possible insertion orderings equally likely and average over
those. If so, we are in luck, because with high probability the resulting tree will
have O(log n) height. This will be shown in Section 4.6 (page 123).
This argument is an important example of the power of randomization.Wecan
often develop simple algorithms that offer good performance with high probabil-
ity. We will see that a similar idea underlies the fastest known sorting algorithm,
quicksort.
3.4.3 Balanced Search Trees
Random search trees are usually good. But if we get unlucky with our order of
insertion, we can end up with a linear-height tree in the worst case. This worst
case is outside of our direct control, since we must build the tree in response to the
requests given by our potentially nasty user.
What would be better is an insertion/deletion procedure which adjusts the tree a
little after each insertion, keeping it close enough to be balanced so the maximum
height is logarithmic. Sophisticated balanced binary search tree data structures
have been developed that guarantee the height of the tree always to be O(log n).
Therefore, all dictionary operations (insert, delete, query) take O(log n) time each.
Implementations of balanced tree data structures such as red-black trees and splay
trees are discussed in Section 12.1 (page 367).
From an algorithm design viewpoint, it is important to know that these trees
exist and that they can be used as black boxes to provide an efficient dictionary
implementation. When figuring the costs of dictionary operations for algorithm
analysis, we can assume the worst-case complexities of balanced binary trees to be
a fair measure.
Take-Home Lesson: Picking the wrong data structure for the job can be
disastrous in terms of performance. Identifying the very best data structure
is usually not as critical, because there can be several choices that perform
similarly.
3.5 PRIORITY QUEUES 83
Stop and Think: Exploiting Balanced Search Trees
Problem: You are given the task of reading nnumbers and then printing them
out in sorted order. Suppose you have access to a balanced dictionary data struc-
ture, which supports the operations search, insert, delete, minimum, maximum,
successor, and predecessor each in O(log n) time.
1. How c