# The Algorithm Design Manual

### AlgorithmDesignManual

### AlgorithmDesignManual

### The%20Algorithm%20Design%20Manual

### The%20Algorithm%20Design%20Manual

### The%20Algorithm%20Design%20Manual%20Recommended%20by%20Google

User Manual: Pdf

Open the PDF directly: View PDF .

Page Count: 739 [warning: Documents this large are best viewed by clicking the View PDF Link!]

- Preface
- Contents
- 1.Introduction to Algorithm Design
- 2.Algorithm Analysis
- 3.Data Structures
- 4.Sorting and Searching
- Applications of Sorting
- Pragmatics of Sorting
- Heapsort: Fast Sorting via Data Structures
- War Story: Give me a Ticket on an Airplane
- Mergesort: Sorting by Divide-and-Conquer
- Quicksort: Sorting by Randomization
- Distribution Sort: Sorting via Bucketing
- War Story: Skiena for the Defense
- Binary Search and Related Algorithms
- Divide-and-Conquer
- Exercises

- 5.Graph Traversal
- 6.Weighted Graph Algorithms
- 7.Combinatorial Search and Heuristic Methods
- 8.Dynamic Programming
- 9.Intractable Problems and Approximation Algorithms
- 10.How to Design Algorithms
- 11.A Catalog of Algorithmic Problems
- 12.Data Structures
- 13.Numerical Problems
- 14.Combinatorial Problems
- 15.Graph Problems: Polynomial-Time
- 16.Graph Problems: Hard Problems
- 17.Computational Geometry
- 18.Set and String Problems
- 19.Algorithmic Resources
- Bibliography
- Index

The Algorithm Design Manual

Second Edition

Steven S. Skiena

The Algorithm Design Manual

Second Edition

123

Steven S. Skiena

Department of Computer Science

State University of New York

at Stony Brook

New York, USA

skiena@cs.sunysb.edu

ISBN: 978-1-84800-069-8 e-ISBN: 978-1-84800-070-4

DOI: 10.1007/978-1-84800-070-4

British Library Cataloguing in Publication Data

A catalogue record for this book is available from the British Library

Library of Congress Control Number: 2008931136

c

Springer-Verlag London Limited 2008

Apart from any fair dealing for the purposes of research or private study, or criticism or review, as permitted

under the Copyright, Designs and Patents Act 1988, this publication may only be reproduced, stored or trans-

mitted, in any form or by any means, with the prior permission in writing of the publishers, or in the case of

reprographic reproduction in accordance with the terms of licenses issued by the Copyright Licensing Agency.

Enquiries concerning reproduction outside those terms should be sent to the publishers.

The use of registered names, trademarks, etc., in this publication does not imply, even in the absence of a

speciﬁc statement, that such names are exempt from the relevant laws and regulations and therefore free for

general use.

The publisher makes no representation, express or implied, with regard to the accuracy of the information

contained in this book and cannot accept any legal responsibility or liability for any errors or omissions that

may be made.

Printed on acid-free paper

Springer Science+Business Media

springer.com

Preface

Most professional programmers that I’ve encountered are not well prepared to

tackle algorithm design problems. This is a pity, because the techniques of algorithm

design form one of the core practical technologies of computer science. Designing

correct, eﬃcient, and implementable algorithms for real-world problems requires

access to two distinct bodies of knowledge:

•Techniques – Good algorithm designers understand several fundamental al-

gorithm design techniques, including data structures, dynamic programming,

depth-ﬁrst search, backtracking, and heuristics. Perhaps the single most im-

portant design technique is modeling, the art of abstracting a messy real-world

application into a clean problem suitable for algorithmic attack.

•Resources – Good algorithm designers stand on the shoulders of giants.

Rather than laboring from scratch to produce a new algorithm for every task,

they can ﬁgure out what is known about a particular problem. Rather than

re-implementing popular algorithms from scratch, they seek existing imple-

mentations to serve as a starting point. They are familiar with many classic

algorithmic problems, which provide suﬃcient source material to model most

any application.

This book is intended as a manual on algorithm design, providing access to

combinatorial algorithm technology for both students and computer professionals.

It is divided into two parts: Techniques and Resources. The former is a general

guide to techniques for the design and analysis of computer algorithms. The Re-

sources section is intended for browsing and reference, and comprises the catalog

of algorithmic resources, implementations, and an extensive bibliography.

vi PREFACE

To the Reader

I have been gratiﬁed by the warm reception the ﬁrst edition of The Algorithm De-

sign Manual has received since its initial publication in 1997. It has been recognized

as a unique guide to using algorithmic techniques to solve problems that often arise

in practice. But much has changed in the world since the The Algorithm Design

Manual was ﬁrst published over ten years ago. Indeed, if we date the origins of

modern algorithm design and analysis to about 1970, then roughly 30% of modern

algorithmic history has happened since the ﬁrst coming of The Algorithm Design

Manual.

Three aspects of The Algorithm Design Manual have been particularly beloved:

(1) the catalog of algorithmic problems, (2) the war stories, and (3) the electronic

component of the book. These features have been preserved and strengthened in

this edition:

•The Catalog of Algorithmic Problems – Since ﬁnding out what is known about

an algorithmic problem can be a diﬃcult task, I provide a catalog of the

75 most important problems arising in practice. By browsing through this

catalog, the student or practitioner can quickly identify what their problem is

called, what is known about it, and how they should proceed to solve it. To aid

in problem identiﬁcation, we include a pair of “before” and “after” pictures for

each problem, illustrating the required input and output speciﬁcations. One

perceptive reviewer called my book “The Hitchhiker’s Guide to Algorithms”

on the strength of this catalog.

The catalog is the most important part of this book. To update the catalog

for this edition, I have solicited feedback from the world’s leading experts on

each associated problem. Particular attention has been paid to updating the

discussion of available software implementations for each problem.

•War Stories – In practice, algorithm problems do not arise at the beginning of

a large project. Rather, they typically arise as subproblems when it becomes

clear that the programmer does not know how to proceed or that the current

solution is inadequate.

To provide a better perspective on how algorithm problems arise in the real

world, we include a collection of “war stories,” or tales from our experience

with real problems. The moral of these stories is that algorithm design and

analysis is not just theory, but an important tool to be pulled out and used

as needed.

This edition retains all the original war stories (with updates as appropriate)

plus additional new war stories covering external sorting, graph algorithms,

simulated annealing, and other topics.

•Electronic Component – Since the practical person is usually looking for a

program more than an algorithm, we provide pointers to solid implementa-

tions whenever they are available. We have collected these implementations

PREFACE vii

at one central website site (http://www.cs.sunysb.edu/∼algorith) for easy re-

trieval. We have been the number one “Algorithm” site on Google pretty

much since the initial publication of the book.

Further, we provide recommendations to make it easier to identify the correct

code for the job. With these implementations available, the critical issue

in algorithm design becomes properly modeling your application, more so

than becoming intimate with the details of the actual algorithm. This focus

permeates the entire book.

Equally important is what we do not do in this book. We do not stress the

mathematical analysis of algorithms, leaving most of the analysis as informal ar-

guments. You will not ﬁnd a single theorem anywhere in this book. When more

details are needed, the reader should study the cited programs or references. The

goal of this manual is to get you going in the right direction as quickly as possible.

To the Instructor

This book covers enough material for a standard Introduction to Algorithms course.

We assume the reader has completed the equivalent of a second programming

course, typically titled Data Structures or Computer Science II.

A full set of lecture slides for teaching this course is available online at

http://www.algorist.com . Further, I make available online audio and video lectures

using these slides to teach a full-semester algorithm course. Let me help teach your

course, by the magic of the Internet!

This book stresses design over analysis. It is suitable for both traditional lecture

courses and the new “active learning” method, where the professor does not lecture

but instead guides student groups to solve real problems. The “war stories” provide

an appropriate introduction to the active learning method.

I have made several pedagogical improvements throughout the book. Textbook-

oriented features include:

•More Leisurely Discussion – The tutorial material in the ﬁrst part of the book

has been doubled over the previous edition. The pages have been devoted to

more thorough and careful exposition of fundamental material, instead of

adding more specialized topics.

•False Starts – Algorithms textbooks generally present important algorithms

as a fait accompli, obscuring the ideas involved in designing them and the

subtle reasons why other approaches fail. The war stories illustrate such de-

velopment on certain applied problems, but I have expanded such coverage

into classical algorithm design material as well.

•Stop and Think – Here I illustrate my thought process as I solve a topic-

speciﬁc homework problem—false starts and all. I have interspersed such

viii PREFACE

problem blocks throughout the text to increase the problem-solving activity

of my readers. Answers appear immediately following each problem.

•More and Improved Homework Problems – This edition of The Algorithm

Design Manual has twice as many homework exercises as the previous one.

Exercises that proved confusing or ambiguous have been improved or re-

placed. Degree of diﬃculty ratings (from 1 to 10) have been assigned to all

problems.

•Self-Motivating Exam Design – In my algorithms courses, I promise the stu-

dents that all midterm and ﬁnal exam questions will be taken directly from

homework problems in this book. This provides a “student-motivated exam,”

so students know exactly how to study to do well on the exam. I have carefully

picked the quantity, variety, and diﬃculty of homework exercises to make this

work; ensuring there are neither too few or too many candidate problems.

•Take-Home Lessons – Highlighted “take-home” lesson boxes scattered

throughout the text emphasize the big-picture concepts to be gained from

the chapter.

•Links to Programming Challenge Problems – Each chapter’s exercises will

contain links to 3-5 relevant “Programming Challenge” problems from

http://www.programming-challenges.com. These can be used to add a pro-

gramming component to paper-and-pencil algorithms courses.

•More Code, Less Pseudo-code – More algorithms in this book appear as code

(written in C) instead of pseudo-code. I believe the concreteness and relia-

bility of actual tested implementations provides a big win over less formal

presentations for simple algorithms. Full implementations are available for

study at http://www.algorist.com .

•Chapter Notes – Each tutorial chapter concludes with a brief notes section,

pointing readers to primary sources and additional references.

Acknowledgments

Updating a book dedication after ten years focuses attention on the eﬀects of time.

Since the ﬁrst edition, Renee has become my wife and then the mother of our

two children, Bonnie and Abby. My father has left this world, but Mom and my

brothers Len and Rob remain a vital presence in my life. I dedicate this book to

my family, new and old, here and departed.

I would like to thank several people for their concrete contributions to this

new edition. Andrew Gaun and Betson Thomas helped in many ways, particularly

in building the infrastructure for the new http://www.cs.sunysb.edu/∼algorithand

dealing with a variety of manuscript preparation issues. David Gries oﬀered valu-

able feedback well beyond the call of duty. Himanshu Gupta and Bin Tang bravely

PREFACE ix

taught courses using a manuscript version of this edition. Thanks also to my

Springer-Verlag editors, Wayne Wheeler and Allan Wylde.

A select group of algorithmic sages reviewed sections of the Hitchhiker’s guide,

sharing their wisdom and alerting me to new developments. Thanks to:

Ami Amir, Herve Bronnimann, Bernard Chazelle, Chris Chu, Scott

Cotton, Yeﬁm Dinitz, Komei Fukuda, Michael Goodrich, Lenny Heath,

Cihat Imamoglu, Tao Jiang, David Karger, Giuseppe Liotta, Albert

Mao, Silvano Martello, Catherine McGeoch, Kurt Mehlhorn, Scott

A. Mitchell, Naceur Meskini, Gene Myers, Gonzalo Navarro, Stephen

North, Joe O’Rourke, Mike Paterson, Theo Pavlidis, Seth Pettie, Michel

Pocchiola, Bart Preneel, Tomasz Radzik, Edward Reingold, Frank

Ruskey, Peter Sanders, Joao Setubal, Jonathan Shewchuk, Robert

Skeel, Jens Stoye, Torsten Suel, Bruce Watson, and Uri Zwick.

Several exercises were originated by colleagues or inspired by other texts. Re-

constructing the original sources years later can be challenging, but credits for each

problem (to the best of my recollection) appear on the website.

It would be rude not to thank important contributors to the original edition.

Ricky Bradley and Dario Vlah built up the substantial infrastructure required for

the WWW site in a logical and extensible manner. Zhong Li drew most of the

catalog ﬁgures using xﬁg. Richard Crandall, Ron Danielson, Takis Metaxas, Dave

Miller, Giri Narasimhan, and Joe Zachary all reviewed preliminary versions of the

ﬁrst edition; their thoughtful feedback helped to shape what you see here.

Much of what I know about algorithms I learned along with my graduate

students. Several of them (Yaw-Ling Lin, Sundaram Gopalakrishnan, Ting Chen,

Francine Evans, Harald Rau, Ricky Bradley, and Dimitris Margaritis) are the real

heroes of the war stories related within. My Stony Brook friends and algorithm

colleagues Estie Arkin, Michael Bender, Jie Gao, and Joe Mitchell have always

been a pleasure to work and be with. Finally, thanks to Michael Brochstein and

the rest of the city contingent for revealing a proper life well beyond Stony Brook.

Caveat

It is traditional for the author to magnanimously accept the blame for whatever

deﬁciencies remain. I don’t. Any errors, deﬁciencies, or problems in this book are

somebody else’s fault, but I would appreciate knowing about them so as to deter-

mine who is to blame.

Steven S. Skiena

Department of Computer Science

Stony Brook University

Stony Brook, NY 11794-4400

http://www.cs.sunysb.edu/∼skiena

April 2008

Contents

I Practical Algorithm Design 1

1 Introduction to Algorithm Design 3

1.1 Robot Tour Optimization . ..................... 5

1.2 SelectingtheRightJobs ....................... 9

1.3 ReasoningaboutCorrectness .................... 11

1.4 ModelingtheProblem ........................ 19

1.5 AbouttheWarStories ........................ 22

1.6 WarStory:PsychicModeling .................... 23

1.7 Exercises................................ 27

2 Algorithm Analysis 31

2.1 The RAM Model of Computation . . ................ 31

2.2 The Big Oh Notation ......................... 34

2.3 Growth Rates and Dominance Relations . . ............ 37

2.4 Working with the Big Oh . ..................... 40

2.5 Reasoning About Eﬃciency ..................... 41

2.6 Logarithms and Their Applications . ................ 46

2.7 Properties of Logarithms . . ..................... 50

2.8 War Story: Mystery of the Pyramids ................ 51

2.9 Advanced Analysis (*) . . . ..................... 54

2.10 Exercises . . .............................. 57

3 Data Structures 65

3.1 Contiguous vs. Linked Data Structures . . . ............ 66

xii CONTENTS

3.2 Stacks and Queues . . ........................ 71

3.3 Dictionaries . . ............................ 72

3.4 BinarySearchTrees.......................... 77

3.5 Priority Queues ............................ 83

3.6 War Story: Stripping Triangulations . ............... 85

3.7 HashingandStrings ......................... 89

3.8 Specialized Data Structures . . ................... 93

3.9 WarStory:String’emUp ...................... 94

3.10 Exercises ................................ 98

4 Sorting and Searching 103

4.1 Applications of Sorting ........................ 104

4.2 Pragmatics of Sorting . ........................ 107

4.3 Heapsort:FastSortingviaDataStructures ............ 108

4.4 WarStory:GivemeaTicketonanAirplane ........... 118

4.5 Mergesort: Sorting by Divide-and-Conquer . . .......... 120

4.6 Quicksort: Sorting by Randomization ............... 123

4.7 Distribution Sort: Sorting via Bucketing .............. 129

4.8 WarStory:SkienafortheDefense ................. 131

4.9 Binary Search and Related Algorithms ............... 132

4.10 Divide-and-Conquer . . ........................ 135

4.11 Exercises ................................ 139

5 Graph Traversal 145

5.1 FlavorsofGraphs........................... 146

5.2 DataStructuresforGraphs ..................... 151

5.3 WarStory:IwasaVictimofMoore’sLaw ............ 155

5.4 War Story: Getting the Graph . ................... 158

5.5 TraversingaGraph.......................... 161

5.6 Breadth-FirstSearch ......................... 162

5.7 Applications of Breadth-First Search . ............... 166

5.8 Depth-FirstSearch .......................... 169

5.9 Applications of Depth-First Search . . ............... 172

5.10 Depth-First Search on Directed Graphs .............. 178

5.11 Exercises ................................ 184

6 Weighted Graph Algorithms 191

6.1 MinimumSpanningTrees ...................... 192

6.2 WarStory:NothingbutNets .................... 202

6.3 ShortestPaths............................. 205

6.4 War Story: Dialing for Documents . . ............... 212

6.5 Network Flows and Bipartite Matching .............. 217

6.6 Design Graphs, Not Algorithms ................... 222

6.7 Exercises................................ 225

CONTENTS xiii

7 Combinatorial Search and Heuristic Methods 230

7.1 Backtracking.............................. 231

7.2 SearchPruning ............................ 238

7.3 Sudoku . . . .............................. 239

7.4 WarStory:CoveringChessboards.................. 244

7.5 HeuristicSearchMethods ...................... 247

7.6 WarStory:OnlyitisNotaRadio ................. 260

7.7 War Story: Annealing Arrays .................... 263

7.8 OtherHeuristicSearchMethods .................. 266

7.9 Parallel Algorithms . ......................... 267

7.10 War Story: Going Nowhere Fast . . . ................ 268

7.11 Exercises . . .............................. 270

8 Dynamic Programming 273

8.1 Caching vs. Computation . ..................... 274

8.2 Approximate String Matching .................... 280

8.3 Longest Increasing Sequence ..................... 289

8.4 War Story: Evolution of the Lobster ................ 291

8.5 The Partition Problem . . . ..................... 294

8.6 Parsing Context-Free Grammars . . ................ 298

8.7 Limitations of Dynamic Programming: TSP ............ 301

8.8 War Story: What’s Past is Prolog . . ................ 304

8.9 War Story: Text Compression for Bar Codes ........... 307

8.10 Exercises . . .............................. 310

9 Intractable Problems and Approximation Algorithms 316

9.1 Problems and Reductions . ..................... 317

9.2 Reductions for Algorithms . ..................... 319

9.3 Elementary Hardness Reductions . . ................ 323

9.4 Satisﬁability .............................. 328

9.5 Creative Reductions ......................... 330

9.6 The Art of Proving Hardness .................... 334

9.7 War Story: Hard Against the Clock . ................ 337

9.8 War Story: And Then I Failed .................... 339

9.9 Pvs.NP ................................ 341

9.10 Dealing with NP-complete Problems ................ 344

9.11 Exercises . . .............................. 350

10 How to Design Algorithms 356

II The Hitchhiker’s Guide to Algorithms 361

11 A Catalog of Algorithmic Problems 363

xiv CONTENTS

12 Data Structures 366

12.1 Dictionaries . . ............................ 367

12.2 Priority Queues ............................ 373

12.3 Suﬃx Trees and Arrays ........................ 377

12.4 Graph Data Structures ........................ 381

12.5 Set Data Structures . . ........................ 385

12.6 Kd-Trees ................................ 389

13 Numerical Problems 393

13.1 Solving Linear Equations ....................... 395

13.2 Bandwidth Reduction ........................ 398

13.3 Matrix Multiplication . ........................ 401

13.4 Determinants and Permanents . ................... 404

13.5 Constrained and Unconstrained Optimization . .......... 407

13.6 Linear Programming . ........................ 411

13.7 Random Number Generation . ................... 415

13.8 Factoring and Primality Testing ................... 420

13.9 Arbitrary-Precision Arithmetic ................... 423

13.10 Knapsack Problem . . ........................ 427

13.11 Discrete Fourier Transform . . ................... 431

14 Combinatorial Problems 434

14.1 Sorting ................................. 436

14.2 Searching ................................ 441

14.3 Median and Selection . ........................ 445

14.4 Generating Permutations ....................... 448

14.5 Generating Subsets . . ........................ 452

14.6 Generating Partitions . ........................ 456

14.7 Generating Graphs . . ........................ 460

14.8 Calendrical Calculations ....................... 465

14.9 Job Scheduling . ............................ 468

14.10 Satisﬁability . . ............................ 472

15 Graph Problems: Polynomial-Time 475

15.1 Connected Components ....................... 477

15.2 Topological Sorting . . ........................ 481

15.3 Minimum Spanning Tree ....................... 484

15.4 Shortest Path . ............................ 489

15.5 Transitive Closure and Reduction . . . ............... 495

15.6 Matching ................................ 498

15.7 Eulerian Cycle/Chinese Postman . . . ............... 502

15.8 Edge and Vertex Connectivity . ................... 505

15.9 Network Flow . ............................ 509

15.10 Drawing Graphs Nicely ........................ 513

CONTENTS xv

15.11 Drawing Trees ............................. 517

15.12 Planarity Detection and Embedding ................ 520

16 Graph Problems: Hard Problems 523

16.1 Clique .................................. 525

16.2 Independent Set . . . ......................... 528

16.3 Vertex Cover .............................. 530

16.4 Traveling Salesman Problem ..................... 533

16.5 Hamiltonian Cycle . ......................... 538

16.6 Graph Partition . . . ......................... 541

16.7 Vertex Coloring . . . ......................... 544

16.8 Edge Coloring ............................. 548

16.9 Graph Isomorphism . ......................... 550

16.10 Steiner Tree .............................. 555

16.11 Feedback Edge/Vertex Set . ..................... 559

17 Computational Geometry 562

17.1 Robust Geometric Primitives .................... 564

17.2 Convex Hull .............................. 568

17.3 Triangulation ............................. 572

17.4 Voronoi Diagrams . . ......................... 576

17.5 Nearest Neighbor Search . . ..................... 580

17.6 Range Search ............................. 584

17.7 Point Location ............................. 587

17.8 Intersection Detection . . . ..................... 591

17.9 Bin Packing .............................. 595

17.10 Medial-Axis Transform . . . ..................... 598

17.11 Polygon Partitioning ......................... 601

17.12 Simplifying Polygons ......................... 604

17.13 Shape Similarity . . . ......................... 607

17.14 Motion Planning . . ......................... 610

17.15 Maintaining Line Arrangements . . . ................ 614

17.16 Minkowski Sum . . . ......................... 617

18 Set and String Problems 620

18.1 Set Cover . . .............................. 621

18.2 Set Packing .............................. 625

18.3 String Matching . . . ......................... 628

18.4 Approximate String Matching .................... 631

18.5 Text Compression . . ......................... 637

18.6 Cryptography ............................. 641

18.7 Finite State Machine Minimization . ................ 646

18.8 Longest Common Substring/Subsequence . ............ 650

18.9 Shortest Common Superstring .................... 654

xvi CONTENTS

19 Algorithmic Resources 657

19.1 Software Systems . . . ........................ 657

19.2 Data Sources . . ............................ 663

19.3 Online Bibliographic Resources ................... 663

19.4 Professional Consulting Services ................... 664

Bibliography 665

Index 709

1

Introduction to Algorithm Design

What is an algorithm? An algorithm is a procedure to accomplish a speciﬁc task.

An algorithm is the idea behind any reasonable computer program.

To be interesting, an algorithm must solve a general, well-speciﬁed problem.An

algorithmic problem is speciﬁed by describing the complete set of instances it must

work on and of its output after running on one of these instances. This distinction,

between a problem and an instance of a problem, is fundamental. For example, the

algorithmic problem known as sorting is deﬁned as follows:

Problem: Sorting

Input: A sequence of nkeys a1,...,a

n.

Output: The permutation (reordering) of the input sequence such that a

1≤a

2≤

···≤a

n−1≤a

n.

An instance of sorting might be an array of names, like {Mike, Bob, Sally, Jill,

Jan}, or a list of numbers like {154, 245, 568, 324, 654, 324}. Determining that

you are dealing with a general problem is your ﬁrst step towards solving it.

An algorithm is a procedure that takes any of the possible input instances

and transforms it to the desired output. There are many diﬀerent algorithms for

solving the problem of sorting. For example, insertion sort is a method for sorting

that starts with a single element (thus forming a trivially sorted list) and then

incrementally inserts the remaining elements so that the list stays sorted. This

algorithm, implemented in C, is described below:

S.S. Skiena, The Algorithm Design Manual, 2nd ed., DOI: 10.1007/978-1-84800-070-4 1,

c

Springer-Verlag London Limited 2008

41. INTRODUCTION TO ALGORITHM DESIGN

I N S E R T I O N S O R TI N S E R T I O N S O R T

I N S E R T I O N S O R T

I N S E R T I O N S O R T

E I N S R T I O N S O R TE I N S R T I O N S O R T

E I N R S T I O N S O R T

E I N R S T I O N S O R TE I N R S T I O N S O R T

E I I N R S T O N S O R T

E I I N O R S T N S O R TE I I N O R S T N S O R T

E I I N N O R S T S O R TE I I N N O R S T S O R T

E I I N N O R S S T O R TE I I N N O R S S T O R T

E I I

N

N

O

O

R R

S

S

T T

E I I N N O O R S S T R T

E I I N N O O R R S S T T

Figure 1.1: Animation of insertion sort in action (time ﬂows down)

insertion_sort(item s[], int n)

{

int i,j; /* counters */

for (i=1; i<n; i++) {

j=i;

while ((j>0) && (s[j] < s[j-1])) {

swap(&s[j],&s[j-1]);

j = j-1;

}

}

}

An animation of the logical ﬂow of this algorithm on a particular instance (the

letters in the word “INSERTIONSORT”) is given in Figure 1.1

Note the generality of this algorithm. It works just as well on names as it does

on numbers, given the appropriate comparison operation (<) to test which of the

two keys should appear ﬁrst in sorted order. It can be readily veriﬁed that this

algorithm correctly orders every possible input instance according to our deﬁnition

of the sorting problem.

There are three desirable properties for a good algorithm. We seek algorithms

that are correct and eﬃcient, while being easy to implement. These goals may not

be simultaneously achievable. In industrial settings, any program that seems to

give good enough answers without slowing the application down is often acceptable,

regardless of whether a better algorithm exists. The issue of ﬁnding the best possible

answer or achieving maximum eﬃciency usually arises in industry only after serious

performance or legal troubles.

In this chapter, we will focus on the issues of algorithm correctness, and defer a

discussion of eﬃciency concerns to Chapter 2. It is seldom obvious whether a given

1.1 ROBOT TOUR OPTIMIZATION 5

0 0

1

2

3

4

5

6

7

8

Figure 1.2: A good instance for the nearest-neighbor heuristic

algorithm correctly solves a given problem. Correct algorithms usually come with

a proof of correctness, which is an explanation of why we know that the algorithm

must take every instance of the problem to the desired result. However, before we go

further we demonstrate why “it’s obvious” never suﬃces as a proof of correctness,

and is usually ﬂat-out wrong.

1.1 Robot Tour Optimization

Let’s consider a problem that arises often in manufacturing, transportation, and

testing applications. Suppose we are given a robot arm equipped with a tool, say a

soldering iron. In manufacturing circuit boards, all the chips and other components

must be fastened onto the substrate. More speciﬁcally, each chip has a set of contact

points (or wires) that must be soldered to the board. To program the robot arm

for this job, we must ﬁrst construct an ordering of the contact points so the robot

visits (and solders) the ﬁrst contact point, then the second point, third, and so

forth until the job is done. The robot arm then proceeds back to the ﬁrst contact

point to prepare for the next board, thus turning the tool-path into a closed tour,

or cycle.

Robots are expensive devices, so we want the tour that minimizes the time it

takes to assemble the circuit board. A reasonable assumption is that the robot arm

moves with ﬁxed speed, so the time to travel between two points is proportional

to their distance. In short, we must solve the following algorithm problem:

Problem: Robot Tour Optimization

Input: AsetSof npoints in the plane.

Output: What is the shortest cycle tour that visits each point in the set S?

You are given the job of programming the robot arm. Stop right now and think

up an algorithm to solve this problem. I’ll be happy to wait until you ﬁnd one. . .

61. INTRODUCTION TO ALGORITHM DESIGN

Several algorithms might come to mind to solve this problem. Perhaps the most

popular idea is the nearest-neighbor heuristic. Starting from some point p0, we walk

ﬁrst to its nearest neighbor p1.Fromp1, we walk to its nearest unvisited neighbor,

thus excluding only p0as a candidate. We now repeat this process until we run

out of unvisited points, after which we return to p0to close oﬀ the tour. Written

in pseudo-code, the nearest-neighbor heuristic looks like this:

NearestNeighbor(P)

Pick and visit an initial point p0from P

p=p0

i=0

While there are still unvisited points

i=i+1

Select pito be the closest unvisited point to pi−1

Visit pi

Return to p0from pn−1

This algorithm has a lot to recommend it. It is simple to understand and imple-

ment. It makes sense to visit nearby points before we visit faraway points to reduce

the total travel time. The algorithm works perfectly on the example in Figure 1.2.

The nearest-neighbor rule is reasonably eﬃcient, for it looks at each pair of points

(pi,p

j) at most twice: once when adding pito the tour, the other when adding pj.

Against all these positives there is only one problem. This algorithm is completely

wrong.

Wrong? How can it be wrong? The algorithm always ﬁnds a tour, but it doesn’t

necessarily ﬁnd the shortest possible tour. It doesn’t necessarily even come close.

Consider the set of points in Figure 1.3, all of which lie spaced along a line. The

numbers describe the distance that each point lies to the left or right of the point

labeled ‘0’. When we start from the point ‘0’ and repeatedly walk to the nearest

unvisited neighbor, we might keep jumping left-right-left-right over ‘0’ as the algo-

rithm oﬀers no advice on how to break ties. A much better (indeed optimal) tour

for these points starts from the leftmost point and visits each point as we walk

right before returning at the rightmost point.

Try now to imagine your boss’s delight as she watches a demo of your robot

arm hopscotching left-right-left-right during the assembly of such a simple board.

“But wait,” you might be saying. “The problem was in starting at point ‘0’.

Instead, why don’t we start the nearest-neighbor rule using the leftmost point

as the initial point p0? By doing this, we will ﬁnd the optimal solution on this

instance.”

That is 100% true, at least until we rotate our example 90 degrees. Now all

points are equally leftmost. If the point ‘0’ were moved just slightly to the left, it

would be picked as the starting point. Now the robot arm will hopscotch up-down-

up-down instead of left-right-left-right, but the travel time will be just as bad as

before. No matter what you do to pick the ﬁrst point, the nearest-neighbor rule is

doomed to work incorrectly on certain point sets.

1.1 ROBOT TOUR OPTIMIZATION 7

-1 0 1 3 11

-21 -5

-1 0 1 3 11

-21 -5

Figure 1.3: A bad instance for the nearest-neighbor heuristic, with the optimal solution

Maybe what we need is a diﬀerent approach. Always walking to the closest

point is too restrictive, since it seems to trap us into making moves we didn’t

want. A diﬀerent idea might be to repeatedly connect the closest pair of endpoints

whose connection will not create a problem, such as premature termination of the

cycle. Each vertex begins as its own single vertex chain. After merging everything

together, we will end up with a single chain containing all the points in it. Con-

necting the ﬁnal two endpoints gives us a cycle. At any step during the execution

of this closest-pair heuristic, we will have a set of single vertices and vertex-disjoint

chains available to merge. In pseudocode:

ClosestPair(P)

Let nbe the number of points in set P.

For i=1ton−1do

d=∞

For each pair of endpoints (s, t) from distinct vertex chains

if dist(s, t)≤dthen sm=s,tm=t,andd=dist(s, t)

Connect (sm,t

m) by an edge

Connect the two endpoints by an edge

This closest-pair rule does the right thing in the example in Figure 1.3. It starts

by connecting ‘0’ to its immediate neighbors, the points 1 and −1. Subsequently,

the next closest pair will alternate left-right, growing the central path by one link at

a time. The closest-pair heuristic is somewhat more complicated and less eﬃcient

than the previous one, but at least it gives the right answer in this example.

But this is not true in all examples. Consider what this algorithm does on the

point set in Figure 1.4(l). It consists of two rows of equally spaced points, with

the rows slightly closer together (distance 1 −e) than the neighboring points are

spaced within each row (distance 1 + e). Thus the closest pairs of points stretch

across the gap, not around the boundary. After we pair oﬀ these points, the closest

81. INTRODUCTION TO ALGORITHM DESIGN

1−e

1+e

1+e

1−e

(l) 1+e

1−e

1+e

1−e

(r)

Figure 1.4: A bad instance for the closest-pair heuristic, with the optimal solution

remaining pairs will connect these pairs alternately around the boundary. The total

path length of the closest-pair tour is 3(1 −e) + 2(1 + e)+(1 −e)2+(2+2e)2.

Compared to the tour shown in Figure 1.4(r), we travel over 20% farther than

necessary when e≈0. Examples exist where the penalty is considerably worse

than this.

Thus this second algorithm is also wrong. Which one of these algorithms per-

forms better? You can’t tell just by looking at them. Clearly, both heuristics can

end up with very bad tours on very innocent-looking input.

At this point, you might wonder what a correct algorithm for our problem looks

like. Well, we could try enumerating all possible orderings of the set of points, and

then select the ordering that minimizes the total length:

OptimalTSP(P)

d=∞

For each of the n! permutations Piof point set P

If (cost(Pi)≤d) then d=cost(Pi)andPmin =Pi

Return Pmin

Since all possible orderings are considered, we are guaranteed to end up with

the shortest possible tour. This algorithm is correct, since we pick the best of all

the possibilities. But it is also extremely slow. The fastest computer in the world

couldn’t hope to enumerate all the 20! =2,432,902,008,176,640,000 orderings of 20

points within a day. For real circuit boards, where n≈1,000, forget about it.

All of the world’s computers working full time wouldn’t come close to ﬁnishing

the problem before the end of the universe, at which point it presumably becomes

moot.

The quest for an eﬃcient algorithm to solve this problem, called the traveling

salesman problem (TSP), will take us through much of this book. If you need to

know how the story ends, check out the catalog entry for the traveling salesman

problem in Section 16.4 (page 533).

1.2 SELECTING THE RIGHT JOBS 9

The President’s Algorist

Halting State

"Discrete" Mathematics Calculated Bets

Programming Challenges

Steiner’s Tree Process Terminated

Tarjan of the Jungle The Four Volume Problem

Figure 1.5: An instance of the non-overlapping movie scheduling problem

Take-Home Lesson: There is a fundamental diﬀerence between algorithms,

which always produce a correct result, and heuristics, which may usually do a

good job but without providing any guarantee.

1.2 Selecting the Right Jobs

Now consider the following scheduling problem. Imagine you are a highly-in-

demand actor, who has been presented with oﬀers to star in ndiﬀerent movie

projects under development. Each oﬀer comes speciﬁed with the ﬁrst and last day

of ﬁlming. To take the job, you must commit to being available throughout this

entire period. Thus you cannot simultaneously accept two jobs whose intervals

overlap.

For an artist such as yourself, the criteria for job acceptance is clear: you want

to make as much money as possible. Because each of these ﬁlms pays the same fee

per ﬁlm, this implies you seek the largest possible set of jobs (intervals) such that

no two of them conﬂict with each other.

For example, consider the available projects in Figure 1.5. We can star in at most

four ﬁlms, namely “Discrete” Mathematics,Programming Challenges,Calculated

Bets, and one of either Halting State or Steiner’s Tree.

You (or your agent) must solve the following algorithmic scheduling problem:

Problem: Movie Scheduling Problem

Input: AsetIof nintervals on the line.

Output: What is the largest subset of mutually non-overlapping intervals which can

be selected from I?

You are given the job of developing a scheduling algorithm for this task. Stop

right now and try to ﬁnd one. Again, I’ll be happy to wait.

There are several ideas that may come to mind. One is based on the notion

that it is best to work whenever work is available. This implies that you should

start with the job with the earliest start date – after all, there is no other job you

can work on, then at least during the begining of this period.

10 1. INTRODUCTION TO ALGORITHM DESIGN

(l) (r)

War and Peace

Figure 1.6: Bad instances for the (l) earliest job ﬁrst and (r) shortest job ﬁrst heuristics.

EarliestJobFirst(I)

Accept the earlest starting job jfrom Iwhich does not overlap any

previously accepted job, and repeat until no more such jobs remain.

This idea makes sense, at least until we realize that accepting the earliest job

might block us from taking many other jobs if that ﬁrst job is long. Check out

Figure 1.6(l), where the epic “War and Peace” is both the ﬁrst job available and

long enough to kill oﬀ all other prospects.

This bad example naturally suggests another idea. The problem with “War and

Peace” is that it is too long. Perhaps we should start by taking the shortest job,

and keep seeking the shortest available job at every turn. Maximizing the number

of jobs we do in a given period is clearly connected to banging them out as quickly

as possible. This yields the heuristic:

ShortestJobFirst(I)

While (I=∅)do

Accept the shortest possible job jfrom I.

Delete j, and any interval which intersects jfrom I.

Again this idea makes sense, at least until we realize that accepting the shortest

job might block us from taking two other jobs, as shown in Figure 1.6(r). While the

potential loss here seems smaller than with the previous heuristic, it can readily

limit us to half the optimal payoﬀ.

At this point, an algorithm where we try all possibilities may start to look good,

because we can be certain it is correct. If we ignore the details of testing whether

a set of intervals are in fact disjoint, it looks something like this:

ExhaustiveScheduling(I)

j=0

Smax =∅

For each of the 2nsubsets Siof intervals I

If (Siis mutally non-overlapping) and (size(Si)>j)

then j=size(Si)andSmax =Si.

Return Smax

But how slow is it? The key limitation is enumerating the 2nsubsets of n

things. The good news is that this is much better than enumerating all n! orders

1.3 REASONING ABOUT CORRECTNESS 11

of nthings, as proposed for the robot tour optimization problem. There are only

about one million subsets when n= 20, which could be exhaustively counted

within seconds on a decent computer. However, when fed n= 100 movies, 2100 is

much much greater than the 20! which made our robot cry “uncle” in the previous

problem.

The diﬀerence between our scheduling and robotics problems are that there is an

algorithm which solves movie scheduling both correctly and eﬃciently. Think about

the ﬁrst job to terminate—i.e. the interval xwhich contains the rightmost point

which is leftmost among all intervals. This role is played by “Discrete” Mathematics

in Figure 1.5. Other jobs may well have started before x, but all of these must at

least partially overlap each other, so we can select at most one from the group. The

ﬁrst of these jobs to terminate is x, so any of the overlapping jobs potentially block

out other opportunities to the right of it. Clearly we can never lose by picking x.

This suggests the following correct, eﬃcient algorithm:

OptimalScheduling(I)

While (I=∅)do

Accept the job jfrom Iwith the earliest completion date.

Delete j, and any interval which intersects jfrom I.

Ensuring the optimal answer over all possible inputs is a diﬃcult but often

achievable goal. Seeking counterexamples that break pretender algorithms is an

important part of the algorithm design process. Eﬃcient algorithms are often lurk-

ing out there; this book seeks to develop your skills to help you ﬁnd them.

Take-Home Lesson: Reasonable-looking algorithms can easily be incorrect. Al-

gorithm correctness is a property that must be carefully demonstrated.

1.3 Reasoning about Correctness

Hopefully, the previous examples have opened your eyes to the subtleties of algo-

rithm correctness. We need tools to distinguish correct algorithms from incorrect

ones, the primary one of which is called a proof.

A proper mathematical proof consists of several parts. First, there is a clear,

precise statement of what you are trying to prove. Second, there is a set of assump-

tions of things which are taken to be true and hence used as part of the proof.

Third, there is a chain of reasoning which takes you from these assumptions to the

statement you are trying to prove. Finally, there is a little square ( )orQED at the

bottom to denote that you have ﬁnished, representing the Latin phrase for “thus

it is demonstrated.”

This book is not going to emphasize formal proofs of correctness, because they

are very diﬃcult to do right and quite misleading when you do them wrong. A

proof is indeed a demonstration. Proofs are useful only when they are honest; crisp

arguments explaining why an algorithm satisﬁes a nontrivial correctness property.

12 1. INTRODUCTION TO ALGORITHM DESIGN

Correct algorithms require careful exposition, and eﬀorts to show both cor-

rectness and not incorrectness. We develop tools for doing so in the subsections

below.

1.3.1 Expressing Algorithms

Reasoning about an algorithm is impossible without a careful description of the

sequence of steps to be performed. The three most common forms of algorithmic

notation are (1) English, (2) pseudocode, or (3) a real programming language.

We will use all three in this book. Pseudocode is perhaps the most mysterious of

the bunch, but it is best deﬁned as a programming language that never complains

about syntax errors. All three methods are useful because there is a natural tradeoﬀ

between greater ease of expression and precision. English is the most natural but

least precise programming language, while Java and C/C++ are precise but diﬃ-

cult to write and understand. Pseudocode is generally useful because it represents

a happy medium.

The choice of which notation is best depends upon which method you are most

comfortable with. I usually prefer to describe the ideas of an algorithm in English,

moving to a more formal, programming-language-like pseudocode or even real code

to clarify suﬃciently tricky details.

A common mistake my students make is to use pseudocode to dress up an ill-

deﬁned idea so that it looks more formal. Clarity should be the goal. For example,

the ExhaustiveScheduling algorithm on page 10 could have been better written

in English as:

ExhaustiveScheduling(I)

Test all 2nsubsets of intervals from I, and return the largest subset

consisting of mutually non-overlapping intervals.

Take-Home Lesson: The heart of any algorithm is an idea. If your idea is

not clearly revealed when you express an algorithm, then you are using too

low-level a notation to describe it.

1.3.2 Problems and Properties

We need more than just an algorithm description in order to demonstrate cor-

rectness. We also need a careful description of the problem that it is intended to

solve.

Problem speciﬁcations have two parts: (1) the set of allowed input instances,

and (2) the required properties of the algorithm’s output. It is impossible to prove

the correctness of an algorithm for a fuzzily-stated problem. Put another way, ask

the wrong problem and you will get the wrong answer.

Some problem speciﬁcations allow too broad a class of input instances. Suppose

we had allowed ﬁlm projects in our movie scheduling problem to have gaps in

1.3 REASONING ABOUT CORRECTNESS 13

production (i.e. , ﬁlming in September and November but a hiatus in October).

Then the schedule associated with any particular ﬁlm would consist of a given set

of intervals. Our star would be free to take on two interleaving but not overlapping

projects (such as the ﬁlm above nested with one ﬁlming in August and October).

The earliest completion algorithm would not work for such a generalized scheduling

problem. Indeed, no eﬃcient algorithm exists for this generalized problem.

Take-Home Lesson: An important and honorable technique in algorithm de-

sign is to narrow the set of allowable instances until there is a correct and

eﬃcient algorithm. For example, we can restrict a graph problem from general

graphs down to trees, or a geometric problem from two dimensions down to

one.

There are two common traps in specifying the output requirements of a problem.

One is asking an ill-deﬁned question. Asking for the best route between two places

on a map is a silly question unless you deﬁne what best means. Do you mean the

shortest route in total distance, or the fastest route, or the one minimizing the

number of turns?

The second trap is creating compound goals. The three path-planning criteria

mentioned above are all well-deﬁned goals that lead to correct, eﬃcient optimiza-

tion algorithms. However, you must pick a single criteria. A goal like Find the

shortest path from ato bthat doesn’t use more than twice as many turns as neces-

sary is perfectly well deﬁned, but complicated to reason and solve.

I encourage you to check out the problem statements for each of the 75 catalog

problems in the second part of this book. Finding the right formulation for your

problem is an important part of solving it. And studying the deﬁnition of all these

classic algorithm problems will help you recognize when someone else has thought

about similar problems before you.

1.3.3 Demonstrating Incorrectness

The best way to prove that an algorithm is incorrect is to produce an instance in

which it yields an incorrect answer. Such instances are called counter-examples.

No rational person will ever leap to the defense of an algorithm after a counter-

example has been identiﬁed. Very simple instances can instantly kill reasonable-

looking heuristics with a quick touch´e. Good counter-examples have two important

properties:

•Veriﬁability – To demonstrate that a particular instance is a counter-example

to a particular algorithm, you must be able to (1) calculate what answer your

algorithm will give in this instance, and (2) display a better answer so as to

prove the algorithm didn’t ﬁnd it.

Since you must hold the given instance in your head to reason about it, an

important part of veriﬁability is. . .

14 1. INTRODUCTION TO ALGORITHM DESIGN

•Simplicity – Good counter-examples have all unnecessary details boiled away.

They make clear exactly why the proposed algorithm fails. Once a counter-

example has been found, it is worth simplifying it down to its essence. For

example, the counter-example of Figure 1.6(l) could be made simpler and

better by reducing the number of overlapped segments from four to two.

Hunting for counter-examples is a skill worth developing. It bears some simi-

larity to the task of developing test sets for computer programs, but relies more on

inspiration than exhaustion. Here are some techniques to aid your quest:

•Think small – Note that the robot tour counter-examples I presented boiled

down to six points or less, and the scheduling counter-examples to only three

intervals. This is indicative of the fact that when algorithms fail, there is

usually a very simple example on which they fail. Amateur algorists tend

to draw a big messy instance and then stare at it helplessly. The pros look

carefully at several small examples, because they are easier to verify and

reason about.

•Think exhaustively – There are only a small number of possibilities for the

smallest nontrivial value of n. For example, there are only three interesting

ways two intervals on the line can occur: (1) as disjoint intervals, (2) as

overlapping intervals, and (3) as properly nesting intervals, one within the

other. All cases of three intervals (including counter-examples to both movie

heuristics) can be systematically constructed by adding a third segment in

each possible way to these three instances.

•Hunt for the weakness – If a proposed algorithm is of the form “always take

the biggest” (better known as the greedy algorithm), think about why that

might prove to be the wrong thing to do. In particular, . . .

•Go for a tie – A devious way to break a greedy heuristic is to provide instances

where everything is the same size. Suddenly the heuristic has nothing to base

its decision on, and perhaps has the freedom to return something suboptimal

as the answer.

•Seek extremes – Many counter-examples are mixtures of huge and tiny, left

and right, few and many, near and far. It is usually easier to verify or rea-

son about extreme examples than more muddled ones. Consider two tightly

bunched clouds of points separated by a much larger distance d. The optimal

TSP tour will be essentially 2dregardless of the number of points, because

what happens within each cloud doesn’t really matter.

Take-Home Lesson: Searching for counterexamples is the best way to disprove

the correctness of a heuristic.

1.3 REASONING ABOUT CORRECTNESS 15

1.3.4 Induction and Recursion

Failure to ﬁnd a counterexample to a given algorithm does not mean “it is obvious”

that the algorithm is correct. A proof or demonstration of correctness is needed.

Often mathematical induction is the method of choice.

When I ﬁrst learned about mathematical induction it seemed like complete

magic. You proved a formula like n

i=1 i=n(n+1)/2 for some basis case like 1

or 2, then assumed it was true all the way to n−1 before proving it was true for

general nusing the assumption. That was a proof? Ridiculous!

When I ﬁrst learned the programming technique of recursion it also seemed like

complete magic. The program tested whether the input argument was some basis

case like 1 or 2. If not, you solved the bigger case by breaking it into pieces and

calling the subprogram itself to solve these pieces. That was a program? Ridiculous!

The reason both seemed like magic is because recursion is mathematical induc-

tion. In both, we have general and boundary conditions, with the general condition

breaking the problem into smaller and smaller pieces. The initial or boundary con-

dition terminates the recursion. Once you understand either recursion or induction,

you should be able to see why the other one also works.

I’ve heard it said that a computer scientist is a mathematician who only knows

how to prove things by induction. This is partially true because computer scientists

are lousy at proving things, but primarily because so many of the algorithms we

study are either recursive or incremental.

Consider the correctness of insertion sort, which we introduced at the beginning

of this chapter. The reason it is correct can be shown inductively:

•The basis case consists of a single element, and by deﬁnition a one-element

array is completely sorted.

•In general, we can assume that the ﬁrst n−1 elements of array Aare com-

pletely sorted after n−1 iterations of insertion sort.

•To insert one last element xto A, we ﬁnd where it goes, namely the unique

spot between the biggest element less than or equal to xand the smallest

element greater than x. This is done by moving all the greater elements back

by one position, creating room for xin the desired location.

One must be suspicious of inductive proofs, however, because very subtle rea-

soning errors can creep in. The ﬁrst are boundary errors. For example, our insertion

sort correctness proof above boldly stated that there was a unique place to insert

xbetween two elements, when our basis case was a single-element array. Greater

care is needed to properly deal with the special cases of inserting the minimum or

maximum elements.

The second and more common class of inductive proof errors concerns cavallier

extension claims. Adding one extra item to a given problem instance might cause

the entire optimal solution to change. This was the case in our scheduling problem

(see Figure 1.7). The optimal schedule after inserting a new segment may contain

16 1. INTRODUCTION TO ALGORITHM DESIGN

Figure 1.7: Large-scale changes in the optimal solution (boxes) after inserting a single interval

(dashed) into the instance

none of the segments of any particular optimal solution prior to insertion. Boldly

ignoring such diﬃculties can lead to very convincing inductive proofs of incorrect

algorithms.

Take-Home Lesson: Mathematical induction is usually the right way to verify

the correctness of a recursive or incremental insertion algorithm.

Stop and Think: Incremental Correctness

Problem: Prove the correctness of the following recursive algorithm for increment-

ing natural numbers, i.e. y→y+1:

Increment(y)

if y=0then return(1) else

if (ymod 2) = 1 then

return(2 ·Increment(y/2))

else return(y+1)

Solution: The correctness of this algorithm is certainly not obvious to me. But as

it is recursive and I am a computer scientist, my natural instinct is to try to prove

it by induction.

The basis case of y= 0 is obviously correctly handled. Clearly the value 1 is

returned, and 0 + 1 = 1.

Now assume the function works correctly for the general case of y=n−1. Given

this, we must demonstrate the truth for the case of y=n. Half of the cases are

easy, namely the even numbers (For which (ymod 2) = 0), since y+ 1 is explicitly

returned.

For the odd numbers, the answer depends upon what is returned by

Increment(y/2). Here we want to use our inductive assumption, but it isn’t

quite right. We have assumed that increment worked correctly for y=n−1, but

not for a value which is about half of it. We can ﬁx this problem by strengthening

our assumption to declare that the general case holds for all y≤n−1. This costs us

nothing in principle, but is necessary to establish the correctness of the algorithm.

1.3 REASONING ABOUT CORRECTNESS 17

Now, the case of odd y(i.e. y=2m+ 1 for some integer m) can be dealt with

as:

2·Increment((2m+1)/2)=2·Increment(m+1/2)

=2·Increment(m)

=2(m+1)

=2m+2=y+1

and the general case is resolved.

1.3.5 Summations

Mathematical summation formulae arise often in algorithm analysis, which we will

study in Chapter 2. Further, proving the correctness of summation formulae is a

classic application of induction. Several exercises on inductive proofs of summations

appear as exercises at the end this chapter. To make these more accessible, I review

the basics of summations here.

Summation formula are concise expressions describing the addition of an arbi-

trarily large set of numbers, in particular the formula

n

i=1

f(i)=f(1) + f(2) + ...+f(n)

There are simple closed forms for summations of many algebraic functions. For

example, since nones is n,

n

i=1

1=n

The sum of the ﬁrst nintegers can be seen by pairing up the ith and (n−i+ 1)th

integers:

n

i=1

i=

n/2

i=1

(i+(n−i+ 1)) = n(n+1)/2

Recognizing two basic classes of summation formulae will get you a long way

in algorithm analysis:

•Arithmetic progressions – We already encountered arithmetic progressions

when we saw S(n)=n

ii=n(n+1)/2 in the analysis of selection sort.

From the big picture perspective, the important thing is that the sum is

quadratic, not that the constant is 1/2. In general,

S(n, p)=

n

i

ip=Θ(np+1)

18 1. INTRODUCTION TO ALGORITHM DESIGN

for p≥1. Thus the sum of squares is cubic, and the sum of cubes is quartic

(if you use such a word). The “big Theta” notation (Θ(x)) will be properly

explained in Section 2.2.

For p<−1, this sum always converges to a constant, even as n→∞.The

interesting case is between results in ...

•Geometric series – In geometric progressions, the index of the loop eﬀects

the exponent, i.e.

G(n, a)=

n

i=0

ai=a(an+1 −1)/(a−1)

How we interpret this sum depends upon the base of the progression, i.e. a.

When a<1, this converges to a constant even as n→∞.

This series convergence proves to be the great “free lunch” of algorithm anal-

ysis. It means that the sum of a linear number of things can be constant, not

linear. For example, 1 + 1/2+1/4+1/8+...≤2 no matter how many terms

we add up.

When a>1, the sum grows rapidly with each new term, as in 1 + 2 + 4 +

8 + 16 + 32 = 63. Indeed, G(n, a)=Θ(an+1)fora>1.

Stop and Think: Factorial Formulae

Problem: Prove that n

i=1 i×i!=(n+ 1)! −1 by induction.

Solution: The inductive paradigm is straightforward. First verify the basis case

(here we do n= 1, although n= 0 would be even more general):

1

i=1

i×i! = 1 = (1 + 1)! −1=2−1=1

Now assume the statement is true up to n. To prove the general case of n+1,

observe that rolling out the largest term

n+1

i=1

i×i!=(n+1)×(n+ 1)! +

n

i=1

i×i!

reveals the left side of our inductive assumption. Substituting the right side gives

us

n+1

i=1

i×i!=(n+1)×(n+ 1)! + (n+ 1)! −1

1.4 MODELING THE PROBLEM 19

=(n+ 1)! ×((n+1)+1)−1

=(n+ 2)! −1

This general trick of separating out the largest term from the summation to

reveal an instance of the inductive assumption lies at the heart of all such proofs.

1.4 Modeling the Problem

Modeling is the art of formulating your application in terms of precisely described,

well-understood problems. Proper modeling is the key to applying algorithmic de-

sign techniques to real-world problems. Indeed, proper modeling can eliminate the

need to design or even implement algorithms, by relating your application to what

has been done before. Proper modeling is the key to eﬀectively using the “Hitch-

hiker’s Guide” in Part II of this book.

Real-world applications involve real-world objects. You might be working on a

system to route traﬃc in a network, to ﬁnd the best way to schedule classrooms

in a university, or to search for patterns in a corporate database. Most algorithms,

however, are designed to work on rigorously deﬁned abstract structures such as

permutations, graphs, and sets. To exploit the algorithms literature, you must

learn to describe your problem abstractly, in terms of procedures on fundamental

structures.

1.4.1 Combinatorial Objects

Odds are very good that others have stumbled upon your algorithmic problem

before you, perhaps in substantially diﬀerent contexts. But to ﬁnd out what is

known about your particular “widget optimization problem,” you can’t hope to

look in a book under widget. You must formulate widget optimization in terms of

computing properties of common structures such as:

•Permutations – which are arrangements, or orderings, of items. For example,

{1,4,3,2}and {4,3,2,1}are two distinct permutations of the same set of four

integers. We have already seen permutations in the robot optimization prob-

lem, and in sorting. Permutations are likely the object in question whenever

your problem seeks an “arrangement,” “tour,” “ordering,” or “sequence.”

•Subsets – which represent selections from a set of items. For example, {1,3,4}

and {2}are two distinct subsets of the ﬁrst four integers. Order does not

matter in subsets the way it does with permutations, so the subsets {1,3,4}

and {4,3,1}would be considered identical. We saw subsets arise in the movie

scheduling problem. Subsets are likely the object in question whenever your

problem seeks a “cluster,” “collection,” “committee,” “group,” “packaging,”

or “selection.”

20 1. INTRODUCTION TO ALGORITHM DESIGN

Sol

Steve Len Jim Lisa JeffRichardRob Laurie

Morris Eve Sid

Stony Brook

Orient Point

Montauk

Shelter Island

Sag Harbor

Riverhead

Islip

Greenport

Figure 1.8: Modeling real-world structures with trees and graphs

•Trees – which represent hierarchical relationships between items. Figure

1.8(a) shows part of the family tree of the Skiena clan. Trees are likely the

object in question whenever your problem seeks a “hierarchy,” “dominance

relationship,” “ancestor/descendant relationship,” or “taxonomy.”

•Graphs – which represent relationships between arbitrary pairs of objects.

Figure 1.8(b) models a network of roads as a graph, where the vertices are

cities and the edges are roads connecting pairs of cities. Graphs are likely

the object in question whenever you seek a “network,” “circuit,” “web,” or

“relationship.”

•Points – which represent locations in some geometric space. For example,

the locations of McDonald’s restaurants can be described by points on a

map/plane. Points are likely the object in question whenever your problems

work on “sites,” “positions,” “data records,” or “locations.”

•Polygons – which represent regions in some geometric spaces. For example,

the borders of a country can be described by a polygon on a map/plane.

Polygons and polyhedra are likely the object in question whenever you are

working on “shapes,” “regions,” “conﬁgurations,” or “boundaries.”

•Strings – which represent sequences of characters or patterns. For example,

the names of students in a class can be represented by strings. Strings are

likely the object in question whenever you are dealing with “text,” “charac-

ters,” “patterns,” or “labels.”

These fundamental structures all have associated algorithm problems, which are

presented in the catalog of Part II. Familiarity with these problems is important,

because they provide the language we use to model applications. To become ﬂuent

in this vocabulary, browse through the catalog and study the input and output pic-

tures for each problem. Understanding these problems, even at a cartoon/deﬁnition

level, will enable you to know where to look later when the problem arises in your

application.

1.4 MODELING THE PROBLEM 21

A L G O R I T H M

4+{1,4,2,3}

9+{1,2,7}{1,2,7,9}

{4,1,5,2,3}

L G O R I T H MA

Figure 1.9: Recursive decompositions of combinatorial objects. (left column) Permutations,

subsets, trees, and graphs. (right column) Point sets, polygons, and strings

Examples of successful application modeling will be presented in the war stories

spaced throughout this book. However, some words of caution are in order. The act

of modeling reduces your application to one of a small number of existing problems

and structures. Such a process is inherently constraining, and certain details might

not ﬁt easily into the given target problem. Also, certain problems can be modeled

in several diﬀerent ways, some much better than others.

Modeling is only the ﬁrst step in designing an algorithm for a problem. Be alert

for how the details of your applications diﬀer from a candidate model, but don’t

be too quick to say that your problem is unique and special. Temporarily ignoring

details that don’t ﬁt can free the mind to ask whether they really were fundamental

in the ﬁrst place.

Take-Home Lesson: Modeling your application in terms of well-deﬁned struc-

tures and algorithms is the most important single step towards a solution.

1.4.2 Recursive Objects

Learning to think recursively is learning to look for big things that are made from

smaller things of exactly the same type as the big thing. If you think of houses as

sets of rooms, then adding or deleting a room still leaves a house behind.

Recursive structures occur everywhere in the algorithmic world. Indeed, each

of the abstract structures described above can be thought about recursively. You

just have to see how you can break them down, as shown in Figure 1.9:

•Permutations – Delete the ﬁrst element of a permutation of {1,...,n}things

and you get a permutation of the remaining n−1 things. Permutations are

recursive objects.

22 1. INTRODUCTION TO ALGORITHM DESIGN

•Subsets – Every subset of the elements {1,...,n}contains a subset of

{1,...,n −1}made visible by deleting element nif it is present. Subsets

are recursive objects.

•Trees – Delete the root of a tree and what do you get? A collection of smaller

trees. Delete any leaf of a tree and what do you get? A slightly smaller tree.

Trees are recursive objects.

•Graphs – Delete any vertex from a graph, and you get a smaller graph. Now

divide the vertices of a graph into two groups, left and right. Cut through

all edges which span from left to right, and what do you get? Two smaller

graphs, and a bunch of broken edges. Graphs are recursive objects.

•Points – Take a cloud of points, and separate them into two groups by drawing

a line. Now you have two smaller clouds of points. Point sets are recursive

objects.

•Polygons – Inserting any internal chord between two nonadjacent vertices of

a simple polygon on nvertices cuts it into two smaller polygons. Polygons

are recursive objects.

•Strings – Delete the ﬁrst character from a string, and what do you get? A

shorter string. Strings are recursive objects.

Recursive descriptions of objects require both decomposition rules and basis

cases, namely the speciﬁcation of the smallest and simplest objects where the de-

composition stops. These basis cases are usually easily deﬁned. Permutations and

subsets of zero things presumably look like {}. The smallest interesting tree or

graph consists of a single vertex, while the smallest interesting point cloud consists

of a single point. Polygons are a little trickier; the smallest genuine simple polygon

is a triangle. Finally, the empty string has zero characters in it. The decision of

whether the basis case contains zero or one element is more a question of taste and

convenience than any fundamental principle.

Such recursive decompositions will come to deﬁne many of the algorithms we

will see in this book. Keep your eyes open for them.

1.5 About the War Stories

The best way to learn how careful algorithm design can have a huge impact on per-

formance is to look at real-world case studies. By carefully studying other people’s

experiences, we learn how they might apply to our work.

Scattered throughout this text are several of my own algorithmic war stories,

presenting our successful (and occasionally unsuccessful) algorithm design eﬀorts

on real applications. I hope that you will be able to internalize these experiences

so that they will serve as models for your own attacks on problems.

1.6 WAR STORY: PSYCHIC MODELING 23

Every one of the war stories is true. Of course, the stories improve somewhat in

the retelling, and the dialogue has been punched up to make them more interesting

to read. However, I have tried to honestly trace the process of going from a raw

problem to a solution, so you can watch how this process unfolded.

The Oxford English Dictionary deﬁnes an algorist as “one skillful in reckonings

or ﬁguring.” In these stories, I have tried to capture some of the mindset of the

algorist in action as they attack a problem.

The various war stories usually involve at least one, and often several, problems

from the problem catalog in Part II. I reference the appropriate section of the

catalog when such a problem occurs. This emphasizes the beneﬁts of modeling

your application in terms of standard algorithm problems. By using the catalog,

you will be able to pull out what is known about any given problem whenever it is

needed.

1.6 War Story: Psychic Modeling

The call came for me out of the blue as I sat in my oﬃce.

“Professor Skiena, I hope you can help me. I’m the President of Lotto Systems

Group Inc., and we need an algorithm for a problem arising in our latest product.”

“Sure,” I replied. After all, the dean of my engineering school is always encour-

aging our faculty to interact more with industry.

“At Lotto Systems Group, we market a program designed to improve our cus-

tomers’ psychic ability to predict winning lottery numbers.1In a standard lottery,

each ticket consists of six numbers selected from, say, 1 to 44. Thus, any given

ticket has only a very small chance of winning. However, after proper training, our

clients can visualize, say, 15 numbers out of the 44 and be certain that at least four

of them will be on the winning ticket. Are you with me so far?”

“Probably not,” I replied. But then I recalled how my dean encourages us to

interact with industry.

“Our problem is this. After the psychic has narrowed the choices down to 15

numbers and is certain that at least 4 of them will be on the winning ticket, we

must ﬁnd the most eﬃcient way to exploit this information. Suppose a cash prize

is awarded whenever you pick at least three of the correct numbers on your ticket.

We need an algorithm to construct the smallest set of tickets that we must buy in

order to guarantee that we win at least one prize.”

“Assuming the psychic is correct?”

“Yes, assuming the psychic is correct. We need a program that prints out a list

of all the tickets that the psychic should buy in order to minimize their investment.

Can you help us?”

Maybe they did have psychic ability, for they had come to the right place. Iden-

tifying the best subset of tickets to buy was very much a combinatorial algorithm

1Yes, this is a true story.

24 1. INTRODUCTION TO ALGORITHM DESIGN

35

34

25

24

23

15

14

13

12

45

Figure 1.10: Covering all pairs of {1,2,3,4,5}with tickets {1,2,3},{1,4,5},{2,4,5},{3,4,5}

problem. It was going to be some type of covering problem, where each ticket we

buy was going to “cover” some of the possible 4-element subsets of the psychic’s

set. Finding the absolute smallest set of tickets to cover everything was a special

instance of the NP-complete problem set cover (discussed in Section 18.1 (page

621)), and presumably computationally intractable.

It was indeed a special instance of set cover, completely speciﬁed by only four

numbers: the size nof the candidate set S(typically n≈15), the number of slots

kfor numbers on each ticket (typically k≈6), the number of psychically-promised

correct numbers jfrom S(say j= 4), and ﬁnally, the number of matching numbers

lnecessary to win a prize (say l= 3). Figure 1.10 illustrates a covering of a smaller

instance, where n=5,j=k=3,andl=2.

“Although it will be hard to ﬁnd the exact minimum set of tickets to buy, with

heuristics I should be able to get you pretty close to the cheapest covering ticket

set,” I told him. “Will that be good enough?”

“So long as it generates better ticket sets than my competitor’s program, that

will be ﬁne. His system doesn’t always guarantee a win. I really appreciate your

help on this, Professor Skiena.”

“One last thing. If your program can train people to pick lottery winners, why

don’t you use it to win the lottery yourself?”

“I look forward to talking to you again real soon, Professor Skiena. Thanks for

the help.”

I hung up the phone and got back to thinking. It seemed like the perfect project

to give to a bright undergraduate. After modeling it in terms of sets and subsets,

the basic components of a solution seemed fairly straightforward:

1.6 WAR STORY: PSYCHIC MODELING 25

•We needed the ability to generate all subsets of knumbers from the candidate

set S. Algorithms for generating and ranking/unranking subsets of sets are

presented in Section 14.5 (page 452).

•We needed the right formulation of what it meant to have a covering set of

purchased tickets. The obvious criteria would be to pick a small set of tickets

such that we have purchased at least one ticket containing each of the n

l

l-subsets of Sthat might pay oﬀ with the prize.

•We needed to keep track of which prize combinations we have thus far cov-

ered. We seek tickets to cover as many thus-far-uncovered prize combinations

as possible. The currently covered combinations are a subset of all possible

combinations. Data structures for subsets are discussed in Section 12.5 (page

385). The best candidate seemed to be a bit vector, which would answer in

constant time “is this combination already covered?”

•We needed a search mechanism to decide which ticket to buy next. For small

enough set sizes, we could do an exhaustive search over all possible sub-

sets of tickets and pick the smallest one. For larger problems, a randomized

search process like simulated annealing (see Section 7.5.3 (page 254)) would

select tickets-to-buy to cover as many uncovered combinations as possible.

By repeating this randomized procedure several times and picking the best

solution, we would be likely to come up with a good set of tickets.

Excluding the details of the search mechanism, the pseudocode for the book-

keeping looked something like this:

LottoTicketSet(n, k, l)

Initialize the n

l-element bit-vector Vto all false

While there exists a false entry in V

Select a k-subset Tof {1,...,n}as the next ticket to buy

For each of the l-subsets Tiof T,V[rank(Ti)] = true

Report the set of tickets bought

The bright undergraduate, Fayyaz Younas, rose to the challenge. Based on

this framework, he implemented a brute-force search algorithm and found optimal

solutions for problems with n≤5 in a reasonable time. He implemented a random

search procedure to solve larger problems, tweaking it for a while before settling on

the best variant. Finally, the day arrived when we could call Lotto Systems Group

and announce that we had solved the problem.

“Our program found an optimal solution for n= 15, k=6,j=6,l= 3 meant

buying 28 tickets.”

“Twenty-eight tickets!” complained the president. “You must have a bug. Look,

these ﬁve tickets will suﬃce to cover everything twice over: {2,4,8,10,13,14},

{4,5,7,8,12,15},{1,2,3,6,11,13},{3,5,6,9,10,15},{1,7,9,11,12,14}.”

26 1. INTRODUCTION TO ALGORITHM DESIGN

35

34

25

24

23

15

14

13

12

45

Figure 1.11: Guaranteeing a winning pair from {1,2,3,4,5}using only tickets {1,2,3}and

{1,4,5}

We ﬁddled with this example for a while before admitting that he was right. We

hadn’t modeled the problem correctly! In fact, we didn’t need to explicitly cover all

possible winning combinations. Figure 1.11 illustrates the principle by giving a two-

ticket solution to our previous four-ticket example. Such unpromising outcomes as

{2,3,4}and {3,4,5}each agree in one matching pair with tickets from Figure 1.11.

We were trying to cover too many combinations, and the penny-pinching psychics

were unwilling to pay for such extravagance.

Fortunately, this story has a happy ending. The general outline of our search-

based solution still holds for the real problem. All we must ﬁx is which subsets

we get credit for covering with a given set of tickets. After this modiﬁcation, we

obtained the kind of results they were hoping for. Lotto Systems Group gratefully

accepted our program to incorporate into their product, and hopefully hit the

jackpot with it.

The moral of this story is to make sure that you model the problem correctly

before trying to solve it. In our case, we came up with a reasonable model, but

didn’t work hard enough to validate it before we started to program. Our misin-

terpretation would have become obvious had we worked out a small example by

hand and bounced it oﬀ our sponsor before beginning work. Our success in recov-

ering from this error is a tribute to the basic correctness of our initial formulation,

and our use of well-deﬁned abstractions for such tasks as (1) ranking/unranking

k-subsets, (2) the set data structure, and (3) combinatorial search.

1.7 EXERCISES 27

Chapter Notes

Every decent algorithm book reﬂects the design philosophy of its author. For stu-

dents seeking alternative presentations and viewpoints, we particularly recommend

the books of Corman, et. al [CLRS01], Kleinberg/Tardos [KT06], and Manber

[Man89].

Formal proofs of algorithm correctness are important, and deserve a fuller dis-

cussion than we are able to provide in this chapter. See Gries [Gri89] for a thorough

introduction to the techniques of program veriﬁcation.

The movie scheduling problem represents a very special case of the general inde-

pendent set problem, which is discussed in Section 16.2 (page 528). The restriction

limits the allowable input instances to interval graphs, where the vertices of the

graph Gcan be represented by intervals on the line and (i, j) is an edge of Giﬀ

the intervals overlap. Golumbic [Gol04] provides a full treatment of this interesting

and important class of graphs.

Jon Bentley’s Programming Pearls columns are probably the best known col-

lection of algorithmic “war stories.” Originally published in the Communications

of the ACM, they have been collected in two books [Ben90, Ben99]. Brooks’s The

Mythical Man Month [Bro95] is another wonderful collection of war stories, focused

more on software engineering than algorithm design, but they remain a source of

considerable wisdom. Every programmer should read all these books, for pleasure

as well as insight.

Our solution to the lotto ticket set covering problem is presented in more detail

in [YS96].

1.7 Exercises

Finding Counterexamples

1-1. [3] Show that a+bcanbelessthanmin(a, b).

1-2. [3] Show that a×bcanbelessthanmin(a, b).

1-3. [5] Design/draw a road network with two points aand bsuch that the fastest route

between aand bis not the shortest route.

1-4. [5] Design/draw a road network with two points aand bsuch that the shortest

route between aand bis not the route with the fewest turns.

1-5. [4] The knapsack problem is as follows: given a set of integers S={s1,s

2,...,s

n},

and a target number T, ﬁnd a subset of Swhich adds up exactly to T. For example,

there exists a subset within S={1,2,5,9,10}that adds up to T= 22 but not

T= 23.

Find counterexamples to each of the following algorithms for the knapsack problem.

That is, giving an Sand Tsuch that the subset is selected using the algorithm does

not leave the knapsack completely full, even though such a solution exists.

28 1. INTRODUCTION TO ALGORITHM DESIGN

(a) Put the elements of Sin the knapsack in left to right order if they ﬁt, i.e. the

ﬁrst-ﬁt algorithm.

(b) Put the elements of Sin the knapsack from smallest to largest, i.e. the best-ﬁt

algorithm.

(c) Put the elements of Sin the knapsack from largest to smallest.

1-6. [5] The set cover problem is as follows: given a set of subsets S1, ..., Smof the

universal set U={1, ..., n}, ﬁnd the smallest subset of subsets T⊂Ssuch that

∪ti∈Tti=U. For example, there are the following subsets, S1={1,3,5},S2=

{2,4},S3={1,4},andS4={2,5}The set cover would then be S1and S2.

Find a counterexample for the following algorithm: Select the largest subset for the

cover, and then delete all its elements from the universal set. Repeat by adding the

subset containing the largest number of uncovered elements until all are covered.

Proofs of Correctness

1-7. [3] Prove the correctness of the following recursive algorithm to multiply two

natural numbers, for all integer constants c≥2.

function multiply(y, z)

comment Return the product yz.

1. if z=0then return(0) else

2. return(multiply(cy, z/c)+y·(zmod c))

1-8. [3] Prove the correctness of the following algorithm for evaluating a polynomial.

P(x) = anxn+an−1xn−1+...+a1x+a0

function horner(A, x)

p=An

for ifrom n−1to0

p=p∗x+Ai

return p

1-9. [3] Prove the correctness of the following sorting algorithm.

function bubblesort (A: list[1 ...n])

var int i, j

for ifrom nto 1

for jfrom 1 to i−1

if (A[j]>A[j+1])

swap the values of A[j]andA[j+1]

Induction

1-10. [3] Prove that n

i=1 i=n(n+1)/2forn≥0, by induction.

1-11. [3] Prove that n

i=1 i2=n(n+ 1)(2n+1)/6forn≥0, by induction.

1-12. [3] Prove that n

i=1 i3=n2(n+1)

2/4forn≥0, by induction.

1-13. [3] Prove that

n

i=1

i(i+1)(i+2)=n(n+1)(n+2)(n+3)/4

1.7 EXERCISES 29

1-14. [5] Prove by induction on n≥1 that for every a=1,

n

i=0

ai=an+1 −1

a−1

1-15. [3] Prove by induction that for n≥1,

n

i=1

1

i(i+1) =n

n+1

1-16. [3] Prove by induction that n3+2nis divisible by 3 for all n≥0.

1-17. [3] Prove by induction that a tree with nvertices has exactly n−1edges.

1-18. [3] Prove by mathematical induction that the sum of the cubes of the ﬁrst n

positive integers is equal to the square of the sum of these integers, i.e.

n

i=1

i3=(

n

i=1

i)2

Estimation

1-19. [3] Do all the books you own total at least one million pages? How many total

pages are stored in your school library?

1-20. [3] How many words are there in this textbook?

1-21. [3] How many hours are one million seconds? How many days? Answer these

questions by doing all arithmetic in your head.

1-22. [3] Estimate how many cities and towns there are in the United States.

1-23. [3] Estimate how many cubic miles of water ﬂow out of the mouth of the Mississippi

River each day. Do not look up any supplemental facts. Describe all assumptions

you made in arriving at your answer.

1-24. [3] Is disk drive access time normally measured in milliseconds (thousandths of a

second) or microseconds (millionths of a second)? Does your RAM memory access

a word in more or less than a microsecond? How many instructions can your CPU

execute in one year if the machine is left running all the time?

1-25. [4] A sorting algorithm takes 1 second to sort 1,000 items on your local machine.

How long will it take to sort 10,000 items. . .

(a) if you believe that the algorithm takes time proportional to n2,and

(b) if you believe that the algorithm takes time roughly proportional to nlog n?

Implementation Projects

1-26. [5] Implement the two TSP heuristics of Section 1.1 (page 5). Which of them gives

better-quality solutions in practice? Can you devise a heuristic that works better

than both of them?

1-27. [5] Describe how to test whether a given set of tickets establishes suﬃcient coverage

in the Lotto problem of Section 1.6 (page 23). Write a program to ﬁnd good ticket

sets.

30 1. INTRODUCTION TO ALGORITHM DESIGN

Interview Problems

1-28. [5] Write a function to perform integer division without using either the /or *

operators. Find a fast way to do it.

1-29. [5] There are 25 horses. At most, 5 horses can race together at a time. You must

determine the fastest, second fastest, and third fastest horses. Find the minimum

number of races in which this can be done.

1-30. [3] How many piano tuners are there in the entire world?

1-31. [3] How many gas stations are there in the United States?

1-32. [3] How much does the ice in a hockey rink weigh?

1-33. [3] How many miles of road are there in the United States?

1-34. [3] On average, how many times would you have to ﬂip open the Manhattan phone

book at random in order to ﬁnd a speciﬁc name?

Programming Challenges

These programming challenge problems with robot judging are available at

http://www.programming-challenges.com or http://online-judge.uva.es.

1-1. “The 3n+ 1 Problem” – Programming Challenges 110101, UVA Judge 100.

1-2. “The Trip” – Programming Challenges 110103, UVA Judge 10137.

1-3. “Australian Voting” – Programming Challenges 110108, UVA Judge 10142.

2

Algorithm Analysis

Algorithms are the most important and durable part of computer science because

they can be studied in a language- and machine-independent way. This means that

we need techniques that enable us to compare the eﬃciency of algorithms without

implementing them. Our two most important tools are (1) the RAM model of

computation and (2) the asymptotic analysis of worst-case complexity.

Assessing algorithmic performance makes use of the “big Oh” notation that,

proves essential to compare algorithms and design more eﬃcient ones. While the

hopelessly practical person may blanch at the notion of theoretical analysis, we

present the material because it really is useful in thinking about algorithms.

This method of keeping score will be the most mathematically demanding part

of this book. But once you understand the intuition behind these ideas, the for-

malism becomes a lot easier to deal with.

2.1 The RAM Model of Computation

Machine-independent algorithm design depends upon a hypothetical computer

called the Random Access Machine or RAM. Under this model of computation,

we are confronted with a computer where:

•Each simple operation (+, *, –, =, if, call) takes exactly one time step.

•Loops and subroutines are not considered simple operations. Instead, they

are the composition of many single-step operations. It makes no sense for

sort to be a single-step operation, since sorting 1,000,000 items will certainly

take much longer than sorting 10 items. The time it takes to run through a

loop or execute a subprogram depends upon the number of loop iterations or

the speciﬁc nature of the subprogram.

S.S. Skiena, The Algorithm Design Manual, 2nd ed., DOI: 10.1007/978-1-84800-070-4 2,

c

Springer-Verlag London Limited 2008

32 2. ALGORITHM ANALYSIS

•Each memory access takes exactly one time step. Further, we have as much

memory as we need. The RAM model takes no notice of whether an item is

in cache or on the disk.

Under the RAM model, we measure run time by counting up the number of

steps an algorithm takes on a given problem instance. If we assume that our RAM

executes a given number of steps per second, this operation count converts naturally

to the actual running time.

The RAM is a simple model of how computers perform. Perhaps it sounds too

simple. After all, multiplying two numbers takes more time than adding two num-

bers on most processors, which violates the ﬁrst assumption of the model. Fancy

compiler loop unrolling and hyperthreading may well violate the second assump-

tion. And certainly memory access times diﬀer greatly depending on whether data

sits in cache or on the disk. This makes us zero for three on the truth of our basic

assumptions.

And yet, despite these complaints, the RAM proves an excellent model for

understanding how an algorithm will perform on a real computer. It strikes a ﬁne

balance by capturing the essential behavior of computers while being simple to

work with. We use the RAM model because it is useful in practice.

Every model has a size range over which it is useful. Take, for example, the

model that the Earth is ﬂat. You might argue that this is a bad model, since it

has been fairly well established that the Earth is in fact round. But, when laying

the foundation of a house, the ﬂat Earth model is suﬃciently accurate that it can

be reliably used. It is so much easier to manipulate a ﬂat-Earth model that it is

inconceivable that you would try to think spherically when you don’t have to.1

The same situation is true with the RAM model of computation. We make

an abstraction that is generally very useful. It is quite diﬃcult to design an algo-

rithm such that the RAM model gives you substantially misleading results. The

robustness of the RAM enables us to analyze algorithms in a machine-independent

way.

Take-Home Lesson: Algorithms can be understood and studied in a language-

and machine-independent manner.

2.1.1 Best, Worst, and Average-Case Complexity

Using the RAM model of computation, we can count how many steps our algorithm

takes on any given input instance by executing it. However, to understand how good

or bad an algorithm is in general, we must know how it works over all instances.

To understand the notions of the best, worst, and average-case complexity,

think about running an algorithm over all possible instances of data that can be

1The Earth is not completely spherical either, but a spherical Earth provides a useful model for such things

as longitude and latitude.

2.1 THE RAM MODEL OF COMPUTATION 33

1 2 3 4 . . . . . . N

.

.

Problem Size

Best Case

Average Case

Worst Caseof Steps

Number

Figure 2.1: Best, worst, and average-case complexity

fed to it. For the problem of sorting, the set of possible input instances consists of

all possible arrangements of nkeys, over all possible values of n. We can represent

each input instance as a point on a graph (shown in Figure 2.1) where the x-axis

represents the size of the input problem (for sorting, the number of items to sort),

and the y-axis denotes the number of steps taken by the algorithm in this instance.

These points naturally align themselves into columns, because only integers

represent possible input size (e.g., it makes no sense to sort 10.57 items). We can

deﬁne three interesting functions over the plot of these points:

•The worst-case complexity of the algorithm is the function deﬁned by the

maximum number of steps taken in any instance of size n. This represents

the curve passing through the highest point in each column.

•The best-case complexity of the algorithm is the function deﬁned by the min-

imum number of steps taken in any instance of size n. This represents the

curve passing through the lowest point of each column.

•The average-case complexity of the algorithm, which is the function deﬁned

by the average number of steps over all instances of size n.

The worst-case complexity proves to be most useful of these three measures in

practice. Many people ﬁnd this counterintuitive. To illustrate why, try to project

what will happen if you bring ndollars into a casino to gamble. The best case,

that you walk out owning the place, is possible but so unlikely that you should not

even think about it. The worst case, that you lose all ndollars, is easy to calculate

and distressingly likely to happen. The average case, that the typical bettor loses

87.32% of the money that he brings to the casino, is diﬃcult to establish and its

meaning subject to debate. What exactly does average mean? Stupid people lose

34 2. ALGORITHM ANALYSIS

more than smart people, so are you smarter or stupider than the average person,

and by how much? Card counters at blackjack do better on average than customers

who accept three or more free drinks. We avoid all these complexities and obtain

a very useful result by just considering the worst case.

The important thing to realize is that each of these time complexities deﬁne a

numerical function, representing time versus problem size. These functions are as

well deﬁned as any other numerical function, be it y=x2−2x+ 1 or the price

of Google stock as a function of time. But time complexities are such complicated

functions that we must simplify them to work with them. For this, we need the

“Big Oh” notation.

2.2 The Big Oh Notation

The best, worst, and average-case time complexities for any given algorithm are

numerical functions over the size of possible problem instances. However, it is very

diﬃcult to work precisely with these functions, because they tend to:

•Have too many bumps – An algorithm such as binary search typically runs

a bit faster for arrays of size exactly n=2

k−1 (where kis an integer),

because the array partitions work out nicely. This detail is not particularly

signiﬁcant, but it warns us that the exact time complexity function for any

algorithm is liable to be very complicated, with little up and down bumps as

shown in Figure 2.2.

•Require too much detail to specify precisely – Counting the exact number

of RAM instructions executed in the worst case requires the algorithm be

speciﬁed to the detail of a complete computer program. Further, the precise

answer depends upon uninteresting coding details (e.g., did he use a case

statement or nested ifs?). Performing a precise worst-case analysis like

T(n) = 12754n2+ 4353n+ 834 lg2n+ 13546

would clearly be very diﬃcult work, but provides us little extra information

than the observation that “the time grows quadratically with n.”

It proves to be much easier to talk in terms of simple upper and lower bounds

of time-complexity functions using the Big Oh notation. The Big Oh simpliﬁes

our analysis by ignoring levels of detail that do not impact our comparison of

algorithms.

The Big Oh notation ignores the diﬀerence between multiplicative constants.

The functions f(n)=2nand g(n)=nare identical in Big Oh analysis. This

makes sense given our application. Suppose a given algorithm in (say) C language

ran twice as fast as one with the same algorithm written in Java. This multiplicative

2.2 THE BIG OH NOTATION 35

0

n

f(n)

n

.......

4321

lower bound

upper bound

Figure 2.2: Upper and lower bounds valid for n>n

0smooth out the behavior of complex

functions

factor of two tells us nothing about the algorithm itself, since both programs imple-

ment exactly the same algorithm. We ignore such constant factors when comparing

two algorithms.

The formal deﬁnitions associated with the Big Oh notation are as follows:

•f(n)=O(g(n)) means c·g(n)isanupper bound on f(n). Thus there exists

some constant csuch that f(n) is always ≤c·g(n), for large enough n(i.e. ,

n≥n0for some constant n0).

•f(n)=Ω(g(n)) means c·g(n)isalower bound on f(n). Thus there exists

some constant csuch that f(n) is always ≥c·g(n), for all n≥n0.

•f(n)=Θ(g(n)) means c1·g(n) is an upper bound on f(n)andc2·g(n)is

a lower bound on f(n), for all n≥n0. Thus there exist constants c1and c2

such that f(n)≤c1·g(n)andf(n)≥c2·g(n). This means that g(n) provides

a nice, tight bound on f(n).

Got it? These deﬁnitions are illustrated in Figure 2.3. Each of these deﬁnitions

assumes a constant n0beyond which they are always satisﬁed. We are not concerned

about small values of n(i.e. , anything to the left of n0). After all, we don’t really

care whether one sorting algorithm sorts six items faster than another, but seek

which algorithm proves faster when sorting 10,000 or 1,000,000 items. The Big Oh

notation enables us to ignore details and focus on the big picture.

Take-Home Lesson: The Big Oh notation and worst-case analysis are tools

that greatly simplify our ability to compare the eﬃciency of algorithms.

Make sure you understand this notation by working through the following ex-

amples. We choose certain constants (cand n0) in the explanations below because

36 2. ALGORITHM ANALYSIS

(c)

f(n)

c2*g(n)

n

n0

c1*g(n)

c*g(n)

f(n)

n

n0

f(n)

c*g(n)

n

n0

(b)

(a)

Figure 2.3: Illustrating the big (a) O, (b) Ω, and (c) Θ notations

they work and make a point, but other pairs of constants will do exactly the same

job. You are free to choose any constants that maintain the same inequality—ideally

constants that make it obvious that the inequality holds:

3n2−100n+6=O(n2), because I choose c=3and3n2>3n2−100n+6;

3n2−100n+6=O(n3), because I choose c=1andn3>3n2−100n+6whenn>3;

3n2−100n+6=O(n), because for any cIchoosec×n<3n2when n>c;

3n2−100n+6=Ω(n2), because I choose c=2and2n2<3n2−100n+6when n>100;

3n2−100n+6=Ω(n3), because I choose c=3and3n2−100n+6<n

3when n>3;

3n2−100n+6=Ω(n), because for any cIchoosecn < 3n2−100n+6 whenn>100c;

3n2−100n+6=Θ(n2), because both Oand Ω apply;

3n2−100n+6=Θ(n3), because only Oapplies;

3n2−100n+6=Θ(n), because only Ω applies.

The Big Oh notation provides for a rough notion of equality when comparing

functions. It is somewhat jarring to see an expression like n2=O(n3), but its

meaning can always be resolved by going back to the deﬁnitions in terms of upper

and lower bounds. It is perhaps most instructive to read the “=” here as meaning

one of the functions that are. Clearly, n2is one of functions that are O(n3).

2.3 GROWTH RATES AND DOMINANCE RELATIONS 37

Stop and Think: Back to the Deﬁnition

Problem: Is 2n+1 =Θ(2

n)?

Solution: Designing novel algorithms requires cleverness and inspiration. However,

applying the Big Oh notation is best done by swallowing any creative instincts

you may have. All Big Oh problems can be correctly solved by going back to the

deﬁnition and working with that.

•Is 2n+1 =O(2n)?Well, f(n)=O(g(n)) iﬀ (if and only if) there exists a

constant csuch that for all suﬃciently large nf(n)≤c·g(n). Is there? The

key observation is that 2n+1 =2·2n,so2·2n≤c·2nfor any c≥2.

•Is 2n+1 = Ω(2n)?Go back to the deﬁnition. f(n)=Ω(g(n)) iﬀ there exists

a constant c>0 such that for all suﬃciently large nf(n)≥c·g(n). This

would be satisﬁed for any 0 <c≤2. Together the Big Oh and Ω bounds

imply 2n+1 =Θ(2

n)

Stop and Think: Hip to the Squares?

Problem: Is (x+y)2=O(x2+y2).

Solution: Working with the Big Oh means going back to the deﬁnition at the

slightest sign of confusion. By deﬁnition, this expression is valid iﬀ we can ﬁnd

some csuch that (x+y)2≤c(x2+y2).

My ﬁrst move would be to expand the left side of the equation, i.e. (x+y)2=

x2+2xy+y2. If the middle 2xy term wasn’t there, the inequality would clearly hold

for any c>1. But it is there, so we need to relate the 2xy to x2+y2. What if x≤y?

Then 2xy ≤2y2≤2(x2+y2). What if x≥y? Then 2xy ≤2x2≤2(x2+y2). Either

way, we now can bound this middle term by two times the right-side function. This

means that (x+y)2≤3(x2+y2), and so the result holds.

2.3 Growth Rates and Dominance Relations

With the Big Oh notation, we cavalierly discard the multiplicative constants. Thus,

the functions f(n)=0.001n2and g(n) = 1000n2are treated identically, even

though g(n) is a million times larger than f(n) for all values of n.

38 2. ALGORITHM ANALYSIS

nf(n)lg n n n lg n n22nn!

10 0.003 μs0.01 μs0.033 μs0.1 μs 1 μs3.63 ms

20 0.004 μs0.02 μs0.086 μs0.4 μs1ms 77.1 years

30 0.005 μs0.03 μs0.147 μs0.9 μs1sec 8.4×1015 yrs

40 0.005 μs0.04 μs0.213 μs1.6 μs18.3 min

50 0.006 μs0.05 μs0.282 μs2.5 μs13 days

100 0.007 μs0.1 μs0.644 μs10 μs 4 ×1013 yrs

1,000 0.010 μs1.00 μs9.966 μs1ms

10,000 0.013 μs10 μs130 μs100 ms

100,000 0.017 μs0.10 ms 1.67 ms 10 sec

1,000,000 0.020 μs1ms 19.93 ms 16.7 min

10,000,000 0.023 μs0.01 sec 0.23 sec 1.16 days

100,000,000 0.027 μs0.10 sec 2.66 sec 115.7 days

1,000,000,000 0.030 μs1sec 29.90 sec 31.7 years

Figure 2.4: Growth rates of common functions measured in nanoseconds

The reason why we are content with coarse Big Oh analysis is provided by

Figure 2.4, which shows the growth rate of several common time analysis functions.

In particular, it shows how long algorithms that use f(n) operations take to run

on a fast computer, where each operation takes one nanosecond (10−9seconds).

The following conclusions can be drawn from this table:

•All such algorithms take roughly the same time for n= 10.

•Any algorithm with n! running time becomes useless for n≥20.

•Algorithms whose running time is 2nhave a greater operating range, but

become impractical for n>40.

•Quadratic-time algorithms whose running time is n2remain usable up to

about n=10,000, but quickly deteriorate with larger inputs. They are likely

to be hopeless for n>1,000,000.

•Linear-time and nlg nalgorithms remain practical on inputs of one billion

items.

•An O(lg n) algorithm hardly breaks a sweat for any imaginable value of n.

The bottom line is that even ignoring constant factors, we get an excellent idea

of whether a given algorithm is appropriate for a problem of a given size. An algo-

rithm whose running time is f(n)=n3seconds will beat one whose running time is

g(n) = 1,000,000 ·n2seconds only when n<1,000,000. Such enormous diﬀerences

in constant factors between algorithms occur far less frequently in practice than

large problems do.

2.3 GROWTH RATES AND DOMINANCE RELATIONS 39

2.3.1 Dominance Relations

The Big Oh notation groups functions into a set of classes, such that all the func-

tions in a particular class are equivalent with respect to the Big Oh. Functions

f(n)=0.34nand g(n) = 234,234nbelong in the same class, namely those that are

order Θ(n). Further, when two functions fand gbelong to diﬀerent classes, they are

diﬀerent with respect to our notation. Either f(n)=O(g(n)) or g(n)=O(f(n)),

but not both.

We say that a faster-growing function dominates a slower-growing one, just as

a faster-growing country eventually comes to dominate the laggard. When fand

gbelong to diﬀerent classes (i.e., f(n)=Θ(g(n))), we say gdominates fwhen

f(n)=O(g(n)), sometimes written gf.

The good news is that only a few function classes tend to occur in the course

of basic algorithm analysis. These suﬃce to cover almost all the algorithms we will

discuss in this text, and are listed in order of increasing dominance:

•Constant functions, f(n) = 1 – Such functions might measure the cost of

adding two numbers, printing out “The Star Spangled Banner,” or the growth

realized by functions such as f(n) = min(n, 100). In the big picture, there is

no dependence on the parameter n.

•Logarithmic functions, f(n) = log n– Logarithmic time-complexity shows up

in algorithms such as binary search. Such functions grow quite slowly as n

gets big, but faster than the constant function (which is standing still, after

all). Logarithms will be discussed in more detail in Section 2.6 (page 46)

•Linear functions, f(n)=n– Such functions measure the cost of looking at

each item once (or twice, or ten times) in an n-element array, say to identify

the biggest item, the smallest item, or compute the average value.

•Superlinear functions, f(n)=nlg n– This important class of functions arises

in such algorithms as Quicksort and Mergesort. They grow just a little faster

than linear (see Figure 2.4), just enough to be a diﬀerent dominance class.

•Quadratic functions, f(n)=n2– Such functions measure the cost of looking

at most or all pairs of items in an n-element universe. This arises in algorithms

such as insertion sort and selection sort.

•Cubic functions, f(n)=n3– Such functions enumerate through all triples of

items in an n-element universe. These also arise in certain dynamic program-

ming algorithms developed in Chapter 8.

•Exponential functions, f(n)=cnforagivenconstantc>1 – Functions like

2narise when enumerating all subsets of nitems. As we have seen, exponential

algorithms become useless fast, but not as fast as. . .

•Factorial functions, f(n)=n! – Functions like n! arise when generating all

permutations or orderings of nitems.

40 2. ALGORITHM ANALYSIS

The intricacies of dominance relations will be futher discussed in Section 2.9.2

(page 56). However, all you really need to understand is that:

n!2nn3n2nlog nnlog n1

Take-Home Lesson: Although esoteric functions arise in advanced algorithm

analysis, a small variety of time complexities suﬃce and account for most

algorithms that are widely used in practice.

2.4 Working with the Big Oh

You learned how to do simpliﬁcations of algebraic expressions back in high school.

Working with the Big Oh requires dusting oﬀ these tools. Most of what you learned

there still holds in working with the Big Oh, but not everything.

2.4.1 Adding Functions

The sum of two functions is governed by the dominant one, namely:

O(f(n)) + O(g(n)) →O(max(f(n),g(n)))

Ω(f(n)) + Ω(g(n)) →Ω(max(f(n),g(n)))

Θ(f(n)) + Θ(g(n)) →Θ(max(f(n),g(n)))

This is very useful in simplifying expressions, since it implies that n3+n2+

n+1=O(n3). Everything is small potatoes besides the dominant term.

The intuition is as follows. At least half the bulk of f(n)+g(n) must come

from the larger value. The dominant function will, by deﬁnition, provide the larger

value as n→∞. Thus, dropping the smaller function from consideration reduces

the value by at most a factor of 1/2, which is just a multiplicative constant. Suppose

f(n)=O(n2)andg(n)=O(n2). This implies that f(n)+g(n)=O(n2) as well.

2.4.2 Multiplying Functions

Multiplication is like repeated addition. Consider multiplication by any constant

c>0, be it 1.02 or 1,000,000. Multiplying a function by a constant can not aﬀect

its asymptotic behavior, because we can multiply the bounding constants in the

Big Oh analysis of c·f(n)by1/c to give appropriate constants for the Big Oh

analysis of f(n). Thus:

O(c·f(n)) →O(f(n))

Ω(c·f(n)) →Ω(f(n))

2.5 REASONING ABOUT EFFICIENCY 41

Θ(c·f(n)) →Θ(f(n))

Of course, cmust be strictly positive (i.e., c>0) to avoid any funny business,

since we can wipe out even the fastest growing function by multiplying it by zero.

On the other hand, when two functions in a product are increasing, both are

important. The function O(n! log n) dominates n!justasmuchaslogndominates

1. In general,

O(f(n)) ∗O(g(n)) →O(f(n)∗g(n))

Ω(f(n)) ∗Ω(g(n)) →Ω(f(n)∗g(n))

Θ(f(n)) ∗Θ(g(n)) →Θ(f(n)∗g(n))

Stop and Think: Transitive Experience

Problem: Show that Big Oh relationships are transitive. That is, if f(n)=O(g(n))

and g(n)=O(h(n)), then f(n)=O(h(n)).

Solution: We always go back to the deﬁnition when working with the Big Oh. What

we need to show here is that f(n)≤c3h(n)forn>n

3given that f(n)≤c1g(n)and

g(n)≤c2h(n), for n>n

1and n>n

2, respectively. Cascading these inequalities,

we get that

f(n)≤c1g(n)≤c1c2h(n)

for n>n

3= max(n1,n

2).

2.5 Reasoning About Eﬃciency

Gross reasoning about an algorithm’s running time of is usually easy given a precise

written description of the algorithm. In this section, I will work through several

examples, perhaps in greater detail than necessary.

2.5.1 Selection Sort

Here we analyze the selection sort algorithm, which repeatedly identiﬁes the small-

est remaining unsorted element and puts it at the end of the sorted portion of the

array. An animation of selection sort in action appears in Figure 2.5, and the code

is shown below:

42 2. ALGORITHM ANALYSIS

S E L E C T I O N S O R T

C E L E S T I O N S O R TC E L E S T I O N S O R T

C E L E S T I O N S O R T

C E E L S T I O N S O R TC E E L S T I O N S O R T

C E E I S T L O N S O R T

C E E I L T S O N S O R T

C E E I L N S O T S O R TC E E I L N S O T S O R T

C E E I L N O S T S O R TC E E I L N O S T S O R T

C E E I L N O O T S S R T

C E E I L N O O R S S T TC E E I L N O O R S S T T

C E E I L N O O R S S T T

C E E I L N O O R S S T TC E E I L N O O R S S T T

C E E I L N O O R S S T TC E E I L N O O R S S T T

C

E E I L

N

O

O

R

S

S

T T

Figure 2.5: Animation of selection sort in action

selection_sort(int s[], int n)

{

int i,j; /* counters */

int min; /* index of minimum */

for (i=0; i<n; i++) {

min=i;

for (j=i+1; j<n; j++)

if (s[j] < s[min]) min=j;

swap(&s[i],&s[min]);

}

}

The outer loop goes around ntimes. The nested inner loop goes around n−i−1

times, where iis the index of the outer loop. The exact number of times the if

statement is executed is given by:

S(n)=

n−1

i=0

n−1

j=i+1

1=

n−1

i=0

n−i−1

What this sum is doing is adding up the integers in decreasing order starting from

n−1, i.e.

S(n)=(n−1) + (n−2) + (n−3) + ...+2+1

How can we reason about such a formula? We must solve the summation formula

using the techniques of Section 1.3.5 (page 17) to get an exact value. But, with

the Big Oh we are only interested in the order of the expression. One way to think

about it is that we are adding up n−1 terms, whose average value is about n/2.

This yields S(n)≈n(n−1)/2.

2.5 REASONING ABOUT EFFICIENCY 43

Another way to think about it is in terms of upper and lower bounds. We have

nterms at most, each of which is at most n−1. Thus, S(n)≤n(n−1) = O(n2).

We have n/2 terms each that are bigger than n/2. Thus S(n)=≥(n/2) ×(n/2) =

Ω(n2). Together, this tells us that the running time is Θ(n2), meaning that selection

sort is quadratic.

2.5.2 Insertion Sort

A basic rule of thumb in Big Oh analysis is that worst-case running time follows

from multiplying the largest number of times each nested loop can iterate. Consider

the insertion sort algorithm presented on page 4, whose inner loops are repeated

here:

for (i=1; i<n; i++) {

j=i;

while ((j>0) && (s[j] < s[j-1])) {

swap(&s[j],&s[j-1]);

j = j-1;

}

}

How often does the inner while loop iterate? This is tricky because there are

two diﬀerent stopping conditions: one to prevent us from running oﬀ the bounds

of the array (j>0) and the other to mark when the element ﬁnds its proper place

in sorted order (s[j]<s[j−1]). Since worst-case analysis seeks an upper bound

on the running time, we ignore the early termination and assume that this loop

always goes around itimes. In fact, we can assume it always goes around ntimes

since i<n. Since the outer loop goes around ntimes, insertion sort must be a

quadratic-time algorithm, i.e. O(n2).

This crude “round it up” analysis always does the job, in that the Big Oh

running time bound you get will always be correct. Occasionally, it might be too

generous, meaning the actual worst case time might be of a lower order than implied

by such analysis. Still, I strongly encourage this kind of reasoning as a basis for

simple algorithm analysis.

2.5.3 String Pattern Matching

Pattern matching is the most fundamental algorithmic operation on text strings.

This algorithm implements the ﬁnd command available in any web browser or text

editor:

Problem: Substring Pattern Matching

Input: A text string tand a pattern string p.

Output: Does tcontain the pattern pas a substring, and if so where?

44 2. ALGORITHM ANALYSIS

a

a

b

a

b

b

a

a

a b b a

a b b

a b

Figure 2.6: Searching for the substring abba in the text aababba.

Perhaps you are interested ﬁnding where “Skiena” appears in a given news

article (well, I would be interested in such a thing). This is an instance of string

pattern matching with tas the news article and p=“Skiena.”

There is a fairly straightforward algorithm for string pattern matching that

considers the possibility that pmay start at each possible position in tand then

testsifthisisso.

int findmatch(char *p, char *t)

{

int i,j; /* counters */

int m, n; /* string lengths */

m = strlen(p);

n = strlen(t);

for (i=0; i<=(n-m); i=i+1) {

j=0;

while ((j<m) && (t[i+j]==p[j]))

j = j+1;

if (j == m) return(i);

}

return(-1);

}

What is the worst-case running time of these two nested loops? The inner while

loop goes around at most mtimes, and potentially far less when the pattern match

fails. This, plus two other statements, lies within the outer for loop. The outer loop

goes around at most n−mtimes, since no complete alignment is possible once we

get too far to the right of the text. The time complexity of nested loops multiplies,

so this gives a worst-case running time of O((n−m)(m+ 2)).

We did not count the time it takes to compute the length of the strings using

the function strlen. Since the implementation of strlen is not given, we must guess

how long it should take. If we explicitly count the number of characters until we

2.5 REASONING ABOUT EFFICIENCY 45

hit the end of the string; this would take time linear in the length of the string.

This suggests that the running time should be O(n+m+(n−m)(m+ 2)).

Let’s use our knowledge of the Big Oh to simplify things. Since m+2=Θ(m),

the “+2” isn’t interesting, so we are left with O(n+m+(n−m)m). Multiplying

this out yields O(n+m+nm −m2), which still seems kind of ugly.

However, in any interesting problem we know that n≥m, since it is impossible

to have pas a substring of tfor any pattern longer than the text itself. One

consequence of this is that n+m≤2n=Θ(n). Thus our worst-case running time

simpliﬁes further to O(n+nm −m2).

Two more observations and we are done. First, note that n≤nm, since m≥1

in any interesting pattern. Thus n+nm =Θ(nm), and we can drop the additive

n, simplifying our analysis to O(nm −m2).

Finally, observe that the −m2term is negative, and thus only serves to lower

the value within. Since the Big Oh gives an upper bound, we can drop any negative

term without invalidating the upper bound. That n≥mimplies that mn ≥m2,

so the negative term is not big enough to cancel any other term which is left. Thus

we can simply express the worst-case running time of this algorithm as O(nm).

After you get enough experience, you will be able to do such an algorithm

analysis in your head without even writing the algorithm down. After all, algorithm

design for a given task involves mentally riﬂing through diﬀerent possibilities and

selecting the best approach. This kind of ﬂuency comes with practice, but if you

are confused about why a given algorithm runs in O(f(n)) time, start by writing

it out carefully and then employ the reasoning we used in this section.

2.5.4 Matrix Multiplication

Nested summations often arise in the analysis of algorithms with nested loops.

Consider the problem of matrix multiplication:

Problem: Matrix Multiplication

Input: Two matrices, A(of dimension x×y)andB(dimension y×z).

Output: An x×zmatrix Cwhere C[i][j] is the dot product of the ith row of A

and the jth column of B.

Matrix multiplication is a fundamental operation in linear algebra, presented

with an example in catalog in Section 13.3 (page 401). That said, the elementary

algorithm for matrix multiplication is implemented as a tight product of three

nested loops:

for (i=1; i<=x; i++)

for (j=1; j<=y; j++) {

C[i][j] = 0;

for (k=1; k<=z; k++)

C[i][j] += A[i][k] * B[k][j];

}

46 2. ALGORITHM ANALYSIS

How can we analyze the time complexity of this algorithm? The number of

multiplications M(x, y, z) is given by the following summation:

M(x, y, z)=

x

i=1

y

j=1

z

k=1

1

Sums get evaluated from the right inward. The sum of zones is z,so

M(x, y, z)=

x

i=1

y

j=1

z

The sum of yzs is just as simple, yz,so

M(x, y, z)=

x

i=1

yz

Finally, the sum of xyzsisxyz.

Thus the running of this matrix multiplication algorithm is O(xyz). If we con-

sider the common case where all three dimensions are the same, this becomes

O(n3)—i.e. , a cubic algorithm.

2.6 Logarithms and Their Applications

Logarithm is an anagram of algorithm, but that’s not why we need to know what

logarithms are. You’ve seen the button on your calculator but may have forgotten

why it is there. A logarithm is simply an inverse exponential function. Saying bx=y

is equivalent to saying that x= logby. Further, this deﬁnition is the same as saying

blogby=y.

Exponential functions grow at a distressingly fast rate, as anyone who has

ever tried to pay oﬀ a credit card balance understands. Thus, inverse exponen-

tial functions—i.e. logarithms—grow refreshingly slowly. Logarithms arise in any

process where things are repeatedly halved. We now look at several examples.

2.6.1 Logarithms and Binary Search

Binary search is a good example of an O(log n) algorithm. To locate a particular

person pin a telephone book containing nnames, you start by comparing pagainst

the middle, or (n/2)nd name, say Monroe, Marilyn. Regardless of whether pbelongs

before this middle name (Dean, James) or after it (Presley, Elvis), after only one

comparison you can discard one half of all the names in the book. The number of

steps the algorithm takes equals the number of times we can halve nuntil only one

name is left. By deﬁnition, this is exactly log2n. Thus, twenty comparisons suﬃce

to ﬁnd any name in the million-name Manhattan phone book!

Binary search is one of the most powerful ideas in algorithm design. This power

becomes apparent if we imagine being forced to live in a world with only unsorted

2.6 LOGARITHMS AND THEIR APPLICATIONS 47

Figure 2.7: A height htree with dchildren per node as dhleaves. Here h=2andd=3

telephone books. Figure 2.4 shows that O(log n) algorithms are fast enough to be

used on problem instances of essentially unlimited size.

2.6.2 Logarithms and Trees

A binary tree of height 1 can have up to 2 leaf nodes, while a tree of height two

can have up to four leaves. What is the height hof a rooted binary tree with nleaf

nodes? Note that the number of leaves doubles every time we increase the height

by one. To account for nleaves, n=2

hwhich implies that h= log2n.

What if we generalize to trees that have dchildren, where d= 2 for the case

of binary trees? A tree of height 1 can have up to dleaf nodes, while one of height

two can have up to d2leaves. The number of possible leaves multiplies by devery

time we increase the height by one, so to account for nleaves, n=dhwhich implies

that h= logdn, as shown in Figure 2.7.

The punch line is that very short trees can have very many leaves, which is

the main reason why binary trees prove fundamental to the design of fast data

structures.

2.6.3 Logarithms and Bits

There are two bit patterns of length 1 (0 and 1) and four of length 2 (00, 01, 10, and

11). How many bits wdo we need to represent any one of ndiﬀerent possibilities,

be it one of nitems or the integers from 1 to n?

The key observation is that there must be at least ndiﬀerent bit patterns of

length w. Since the number of diﬀerent bit patterns doubles as you add each bit,

we need at least wbits where 2w=n—i.e., we need w= log2nbits.

2.6.4 Logarithms and Multiplication

Logarithms were particularly important in the days before pocket calculators. They

provided the easiest way to multiply big numbers by hand, either implicitly using

a slide rule or explicitly by using a book of logarithms.

48 2. ALGORITHM ANALYSIS

Logarithms are still useful for multiplication, particularly for exponentiation.

Recall that loga(xy) = loga(x) + loga(y); i.e. , the log of a product is the sum of

the logs. A direct consequence of this is

loganb=b·logan

So how can we compute abfor any aand busing the exp(x) and ln(x) functions

on your calculator, where exp(x)= exand ln(x) = loge(x)? We know

ab= exp(ln(ab)) = exp(bln a)

so the problem is reduced to one multiplication plus one call to each of these

functions.

2.6.5 Fast Exponentiation

Suppose that we need to exactly compute the value of anfor some reasonably

large n. Such problems occur in primality testing for cryptography, as discussed in

Section 13.8 (page 420). Issues of numerical precision prevent us from applying the

formula above.

The simplest algorithm performs n−1 multiplications, by computing a×a×

...×a. However, we can do better by observing that n=n/2+n/2.Ifnis

even, then an=(an/2)2.Ifnis odd, then an=a(an/2)2. In either case, we have

halved the size of our exponent at the cost of, at most, two multiplications, so

O(lg n) multiplications suﬃce to compute the ﬁnal value.

function power(a, n)

if (n= 0) return(1)

x= power(a, n/2)

if (nis even) then return(x2)

else return(a×x2)

This simple algorithm illustrates an important principle of divide and conquer.

It always pays to divide a job as evenly as possible. This principle applies to real

life as well. When nis not a power of two, the problem cannot always be divided

perfectly evenly, but a diﬀerence of one element between the two sides cannot cause

any serious imbalance.

2.6.6 Logarithms and Summations

The Harmonic numbers arise as a special case of arithmetic progression, namely

H(n)=S(n, −1). They reﬂect the sum of the progression of simple reciprocals,

namely,

H(n)=

n

i=1

1/i ∼ln n

2.6 LOGARITHMS AND THEIR APPLICATIONS 49

Loss (apply the greatest) Increase in level

(A) $2,000 or less no increase

(B) More than $2,000 add 1

(C) More than $5,000 add 2

(D) More than $10,000 add 3

(E) More than $20,000 add 4

(F) More than $40,000 add 5

(G) More than $70,000 add 6

(H) More than $120,000 add 7

(I) More than $200,000 add 8

(J) More than $350,000 add 9

(K) More than $500,000 add 10

(L) More than $800,000 add 11

(M) More than $1,500,000 add 12

(N) More than $2,500,000 add 13

(O) More than $5,000,000 add 14

(P) More than $10,000,000 add 15

(Q) More than $20,000,000 add 16

(R) More than $40,000,000 add 17

(Q) More than $80,000,000 add 18

Figure 2.8: The Federal Sentencing Guidelines for fraud

The Harmonic numbers prove important because they usually explain “where

the log comes from” when one magically pops out from algebraic manipulation. For

example, the key to analyzing the average case complexity of Quicksort is the sum-

mation S(n)=nn

i=1 1/i. Employing the Harmonic number identity immediately

reduces this to Θ(nlog n).

2.6.7 Logarithms and Criminal Justice

Figure 2.8 will be our ﬁnal example of logarithms in action. This table appears in

the Federal Sentencing Guidelines, used by courts throughout the United States.

These guidelines are an attempt to standardize criminal sentences, so that a felon

convicted of a crime before one judge receives the same sentence that they would

before a diﬀerent judge. To accomplish this, the judges have prepared an intricate

point function to score the depravity of each crime and map it to time-to-serve.

Figure 2.8 gives the actual point function for fraud—a table mapping dollars

stolen to points. Notice that the punishment increases by one level each time the

amount of money stolen roughly doubles. That means that the level of punishment

(which maps roughly linearly to the amount of time served) grows logarithmically

with the amount of money stolen.

50 2. ALGORITHM ANALYSIS

Think for a moment about the consequences of this. Many a corrupt CEO

certainly has. It means that your total sentence grows extremely slowly with the

amount of money you steal. Knocking oﬀ ﬁve liquor stores for $10,000 each will

get you more time than embezzling $1,000,000 once. The corresponding beneﬁt of

stealing really large amounts of money is even greater. The moral of logarithmic

growth is clear: “If you are gonna do the crime, make it worth the time!”

Take-Home Lesson: Logarithms arise whenever things are repeatedly halved

or doubled.

2.7 Properties of Logarithms

As we have seen, stating bx=yis equivalent to saying that x= logby.Thebterm

is known as the base of the logarithm. Three bases are of particular importance for

mathematical and historical reasons:

•Base b=2–Thebinary logarithm, usually denoted lg x, is a base 2 logarithm.

We have seen how this base arises whenever repeated halving (i.e., binary

search) or doubling (i.e., nodes in trees) occurs. Most algorithmic applications

of logarithms imply binary logarithms.

•Base b=e–Thenatural log, usually denoted ln x,isabasee=2.71828 ...

logarithm. The inverse of ln xis the exponential function exp(x)=exon your

calculator. Thus, composing these functions gives us

exp(ln x)=x

•Base b= 10 – Less common today is the base-10 or common logarithm,

usually denoted as log x. This base was employed in slide rules and logarithm

books in the days before pocket calculators.

We have already seen one important property of logarithms, namely that

loga(xy) = loga(x) + loga(y)

The other important fact to remember is that it is easy to convert a logarithm

from one base to another. This is a consequence of the formula:

logab=logcb

logca

Thus, changing the base of log bfrom base-ato base-csimply involves dividing by

logca. It is easy to convert a common log function to a natural log function, and

vice versa.

Two implications of these properties of logarithms are important to appreciate

from an algorithmic perspective:

2.8 WAR STORY: MYSTERY OF THE PYRAMIDS 51

•The base of the logarithm has no real impact on the growth rate- Compare

the following three values: log2(1,000,000) = 19.9316, log3(1,000,000) =

12.5754, and log100(1,000,000) = 3. A big change in the base of the logarithm

produces little diﬀerence in the value of the log. Changing the base of the

log from ato cinvolves dividing by logca. This conversion factor is lost to

the Big Oh notation whenever aand care constants. Thus we are usually

justiﬁed in ignoring the base of the logarithm when analyzing algorithms.

•Logarithms cut any function down to size- The growth rate of the logarithm

of any polynomial function is O(lg n). This follows because

loganb=b·logan

The power of binary search on a wide range of problems is a consequence

of this observation. Note that doing a binary search on a sorted array of

n2things requires only twice as many comparisons as a binary search on n

things.

Logarithms eﬃciently cut any function down to size. It is hard to do arith-

metic on factorials except for logarithms, since

n!=Π

n

i=1i→log n!=

n

i=1

log i=Θ(nlog n)

provides another way for logarithms to pop up in algorithm analysis.

StopandThink:ImportanceofanEvenSplit

Problem: How many queries does binary search take on the million-name Manhat-

tan phone book if each split was 1/3 to 2/3 instead of 1/2 to 1/2?

Solution: When performing binary searches in a telephone book, how important

is it that each query split the book exactly in half? Not much. For the Manhattan

telephone book, we now use log3/2(1,000,000) ≈35 queries in the worst case, not

a signiﬁcant change from log2(1,000,000) ≈20. The power of binary search comes

from its logarithmic complexity, not the base of the log.

2.8 War Story: Mystery of the Pyramids

That look in his eyes should have warned me even before he started talking.

“We want to use a parallel supercomputer for a numerical calculation up to

1,000,000,000, but we need a faster algorithm to do it.”

52 2. ALGORITHM ANALYSIS

I’d seen that distant look before. Eyes dulled from too much exposure to the raw

horsepower of supercomputers—machines so fast that brute force seemed to elim-

inate the need for clever algorithms; at least until the problems got hard enough.

“I am working with a Nobel prize winner to use a computer on a famous problem

in number theory. Are you familiar with Waring’s problem?”

I knew some number theory. “Sure. Waring’s problem asks whether every integer

can be expressed at least one way as the sum of at most four integer squares. For

example, 78 = 82+3

2+2

2+1

2=7

2+5

2+2

2. I remember proving that four

squares suﬃce to represent any integer in my undergraduate number theory class.

Yes, it’s a famous problem but one that got solved about 200 years ago.”

“No, we are interested in a diﬀerent version of Waring’s problem. A pyramidal

number is a number of the form (m3−m)/6, for m≥2. Thus the ﬁrst several

pyramidal numbers are 1, 4, 10, 20, 35, 56, 84, 120, and 165. The conjecture since

1928 is that every integer can be represented by the sum of at most ﬁve such

pyramidal numbers. We want to use a supercomputer to prove this conjecture on

all numbers from 1 to 1,000,000,000.”

“Doing a billion of anything will take a substantial amount of time,” I warned.

“The time you spend to compute the minimum representation of each number will

be critical, because you are going to do it one billion times. Have you thought

about what kind of an algorithm you are going to use?”

“We have already written our program and run it on a parallel supercomputer.

It works very fast on smaller numbers. Still, it takes much too much time as soon

as we get to 100,000 or so.”

Terriﬁc, I thought. Our supercomputer junkie had discovered asymptotic

growth. No doubt his algorithm ran in something like quadratic time, and he got

burned as soon as ngot large.

“We need a faster program in order to get to one billion. Can you help us? Of

course, we can run it on our parallel supercomputer when you are ready.”

I am a sucker for this kind of challenge, ﬁnding fast algorithms to speed up

programs. I agreed to think about it and got down to work.

I started by looking at the program that the other guy had written. He had

built an array of all the Θ(n1/3) pyramidal numbers from 1 to ninclusive.2To

test each number kin this range, he did a brute force test to establish whether it

was the sum of two pyramidal numbers. If not, the program tested whether it was

the sum of three of them, then four, and ﬁnally ﬁve, until it ﬁrst got an answer.

About 45% of the integers are expressible as the sum of three pyramidal numbers.

Most of the remaining 55% require the sum of four, and usually each of these can

be represented in many diﬀerent ways. Only 241 integers are known to require the

sum of ﬁve pyramidal numbers, the largest being 343,867. For about half of the n

numbers, this algorithm presumably went through all of the three-tests and at least

2Why n1/3? Recall that pyramidal numbers are of the form (m3−m)/6. The largest msuch that the

resulting number is at most nis roughly 3

√6n,sothereareΘ(n1/3) such numbers.

2.8 WAR STORY: MYSTERY OF THE PYRAMIDS 53

some of the four-tests before terminating. Thus, the total time for this algorithm

would be at least O(n×(n1/3)3)=O(n2) time, where n= 1,000,000,000. No

wonder his program cried “Uncle.”

Anything that was going to do signiﬁcantly better on a problem this large had to

avoid explicitly testing all triples. For each value of k, we were seeking the smallest

set of pyramidal numbers that add up to exactly to k. This problem is called the

knapsack problem, and is discussed in Section 13.10 (page 427). In our case, the

weights are the set of pyramidal numbers no greater than n, with an additional

constraint that the knapsack holds exactly kitems.

A standard approach to solving knapsack precomputes the sum of smaller sub-

sets of the items for use in computing larger subsets. If we have a table of all sums

of two numbers and want to know whether kis expressible as the sum of three

numbers, we can ask whether kis expressible as the sum of a single number plus

a number in this two-table.

Therefore I needed a table of all integers less than nthat can be expressed as

the sum of two of the 1,818 pyramidal numbers less than 1,000,000,000. There can

be at most 1,8182= 3,305,124 of them. Actually, there are only about half this

many after we eliminate duplicates and any sum bigger than our target. Building

a sorted array storing these numbers would be no big deal. Let’s call this sorted

data structure of all pair-sums the two-table.

To ﬁnd the minimum decomposition for a given k, I would ﬁrst check whether

it was one of the 1,818 pyramidal numbers. If not, I would then check whether

kwas in the sorted table of the sums of two pyramidal numbers. To see whether

kwas expressible as the sum of three such numbers, all I had to do was check

whether k−p[i] was in the two-table for 1 ≤i≤1,818. This could be done

quickly using binary search. To see whether kwas expressible as the sum of four

pyramidal numbers, I had to check whether k−two[i] was in the two-table for any

1≤i≤|two|. However, since almost every kwas expressible in many ways as the

sum of four pyramidal numbers, this test would terminate quickly, and the total

time taken would be dominated by the cost of the threes. Testing whether kwas

the sum of three pyramidal numbers would take O(n1/3lg n). Running this on each

of the nintegers gives an O(n4/3lg n) algorithm for the complete job. Comparing

this to his O(n2) algorithm for n= 1,000,000,000 suggested that my algorithm was

acool30,000 times faster than his original!

My ﬁrst attempt to code this solved up to n=1,000,000 on my ancient Sparc

ELC in about 20 minutes. From here, I experimented with diﬀerent data structures

to represent the sets of numbers and diﬀerent algorithms to search these tables. I

tried using hash tables and bit vectors instead of sorted arrays, and experimented

with variants of binary search such as interpolation search (see Section 14.2 (page

441)). My reward for this work was solving up to n=1,000,000 in under three

minutes, a factor of six improvement over my original program.

With the real thinking done, I worked to tweak a little more performance out of

the program. I avoided doing a sum-of-four computation on any kwhen k−1was

54 2. ALGORITHM ANALYSIS

the sum-of-three, since 1 is a pyramidal number, saving about 10% of the total run

time using this trick alone. Finally, I got out my proﬁler and tried some low-level

tricks to squeeze a little more performance out of the code. For example, I saved

another 10% by replacing a single procedure call with in line code.

At this point, I turned the code over to the supercomputer guy. What he did

with it is a depressing tale, which is reported in Section 7.10 (page 268).

In writing up this war story, I went back to rerun my program more than

ten years later. On my desktop SunBlade 150, getting to 1,000,000 now took 27.0

seconds using the gcc compiler without turning on any compiler optimization. With

Level 4 optimization, the job ran in just 14.0 seconds—quite a tribute to the quality

of the optimizer. The run time on my desktop machine improved by a factor of

about three over the four-year period prior to my ﬁrst edition of this book, with

an additional 5.3 times over the last 11 years. These speedups are probably typical

for most desktops.

The primary lesson of this war story is to show the enormous potential for

algorithmic speedups, as opposed to the fairly limited speedup obtainable via more

expensive hardware. I sped his program up by about 30,000 times. His million-

dollar computer had 16 processors, each reportedly ﬁve times faster on integer

computations than the $3,000 machine on my desk. That gave a maximum potential

speedup of less than 100 times. Clearly, the algorithmic improvement was the big

winner here, as it is certain to be in any suﬃciently large computation.

2.9 Advanced Analysis (*)

Ideally, each of us would be ﬂuent in working with the mathematical techniques

of asymptotic analysis. And ideally, each of us would be rich and good looking as

well.

In this section I will survey the major techniques and functions employed in

advanced algorithm analysis. I consider this optional material—it will not be used

elsewhere in the textbook section of this book. That said, it will make some of the

complexity functions reported in the Hitchhiker’s Guide far less mysterious.

2.9.1 Esoteric Functions

The bread-and-butter classes of complexity functions were presented in Section

2.3.1 (page 39). More esoteric functions also make appearances in advanced algo-

rithm analysis. Although we will not see them much in this book, it is still good

business to know what they mean and where they come from:

•Inverse Ackerman’s function f(n)=α(n) – This function arises in the de-

tailed analysis of several algorithms, most notably the Union-Find data struc-

ture discussed in Section 6.1.3 (page 198).

The exact deﬁnition of this function and why it arises will not be discussed

further. It is suﬃcient to think of it as geek talk for the slowest-growing

2.9 ADVANCED ANALYSIS (*) 55

complexity function. Unlike the constant function f(n) = 1, it eventually

gets to inﬁnity as n→∞, but it certainly takes its time about it. The value

of α(n)<5 for any value of nthat can be written in this physical universe.

•f(n) = log log n– The “log log” function is just that—the logarithm of the

logarithm of n. One natural example of how it might arise is doing a binary

search on a sorted array of only lg nitems.

•f(n) = log n/ log log n– This function grows a little slower than log nbecause

it is divided by an even slower growing function.

To see where this arises, consider an n-leaf rooted tree of degree d. For binary

trees, i.e. when d= 2, the height his given

n=2

h→h=lgn

by taking the logarithm of both sides of the equation. Now consider the height

of such a tree when the degree d= log n. Then

n= (log n)h→h= log n/ log log n

•f(n) = log2n– This is the product of log functions—i.e., (log n)×(log n). It

might arise if we wanted to count the bits looked at in doing a binary search

on nitems, each of which was an integer from 1 to (say) n2. Each such integer

requires a lg(n2)=2lgnbit representation, and we look at lg nof them, for

a total of 2 lg2nbits.

The “log squared” function typically arises in the design of intricate nested

data structures, where each node in (say) a binary tree represents another

data structure, perhaps ordered on a diﬀerent key.

•f(n)=√n– The square root is not so esoteric, but represents the class of

“sublinear polynomials” since √n=n1/2. Such functions arise in building

d-dimensional grids that contain npoints. A √n×√nsquare has area n,

and an n1/3×n1/3×n1/3cube has volume n. In general, a d-dimensional

hypercube of length n1/d on each side has volume d.

•f(n)=n(1+)– Epsilon () is the mathematical symbol to denote a constant

that can be made arbitrarily small but never quite goes away.

It arises in the following way. Suppose I design an algorithm that runs in

2cn(1+1/c)time, and I get to pick whichever cI want. For c=2,thisis4n3/2

or O(n3/2). For c=3,thisis8n4/3or O(n4/3), which is better. Indeed, the

exponent keeps getting better the larger I make c.

The problem is that I cannot make carbitrarily large before the 2cterm

begins to dominate. Instead, we report this algorithm as running in O(n1+),

and leave the best value of to the beholder.

56 2. ALGORITHM ANALYSIS

2.9.2 Limits and Dominance Relations

The dominance relation between functions is a consequence of the theory of lim-

its, which you may recall from Calculus. We say that f(n)dominates g(n)if

limn→∞ g(n)/f(n)=0.

Let’s see this deﬁnition in action. Suppose f(n)=2n2and g(n)=n2. Clearly

f(n)>g(n) for all n, but it does not dominate since

lim

n→∞ g(n)/f(n) = lim

n→∞ n2/2n2= lim

n→∞ 1/2=0

This is to be expected because both functions are in the class Θ(n2). What about

f(n)=n3and g(n)=n2? Since

lim

n→∞ g(n)/f(n) = lim

n→∞ n2/n3= lim

n→∞ 1/n =0

the higher-degree polynomial dominates. This is true for any two polynomials,

namely that nadominates nbif a>bsince

lim

n→∞ nb/na= lim

n→∞ nb−a→0

Thus n1.2dominates n1.1999999.

Now consider two exponential functions, say f(n)=3

nand g(n)=2

n. Since

lim

n→∞ g(n)/f(n)=2

n/3n= lim

n→∞(2/3)n=0

the exponential with the higher base dominates.

Our ability to prove dominance relations from scratch depends upon our ability

to prove limits. Let’s look at one important pair of functions. Any polynomial (say

f(n)=n) dominates logarithmic functions (say g(n)=lgn). Since n=2

lg n,

f(n)=(2

lg n)=2

lg n

Now consider

lim

n→∞ g(n)/f(n)=lgn/2lg n

In fact, this does go to 0 as n→∞.

Take-Home Lesson: By interleaving the functions here with those of Section

2.3.1 (page 39), we see where everything ﬁts into the dominance pecking order:

n!cnn3n2n1+nlog nn√n

log2nlog nlog n/ log log nlog log nα(n)1

Chapter Notes

Most other algorithm texts devote considerably more eﬀorts to the formal analysis

of algorithms than we have here, and so we refer the theoretically-inclined reader

2.10 EXERCISES 57

elsewhere for more depth. Algorithm texts more heavily stressing analysis include

[CLRS01, KT06].

The book Concrete Mathematics by Knuth, Graham, and Patashnik [GKP89]

oﬀers an interesting and thorough presentation of mathematics for the analysis of

algorithms. Niven and Zuckerman [NZ80] is an nice introduction to number theory,

including Waring’s problem, discussed in the war story.

The notion of dominance also gives rise to the “Little Oh” notation. We say that

f(n)=o(g(n)) iﬀ g(n) dominates f(n). Among other things, the Little Oh proves

useful for asking questions. Asking for an o(n2) algorithm means you want one

that is better than quadratic in the worst case—and means you would be willing

to settle for O(n1.999 log2n).

2.10 Exercises

Program Analysis

2-1. [3] What value is returned by the following function? Express your answer as a

function of n. Give the worst-case running time using the Big Oh notation.

function mystery(n)

r:= 0

for i:= 1 to n−1do

for j:= i+1to ndo

for k:= 1 to jdo

r:= r+1

return(r)

2-2. [3] What value is returned by the following function? Express your answer as a

function of n. Give the worst-case running time using Big Oh notation.

function pesky(n)

r:= 0

for i:= 1 to ndo

for j:= 1 to ido

for k:= jto i+jdo

r:= r+1

return(r)

2-3. [5] What value is returned by the following function? Express your answer as a

function of n. Give the worst-case running time using Big Oh notation.

function prestiferous(n)

r:= 0

for i:= 1 to ndo

for j:= 1 to ido

for k:= jto i+jdo

for l:= 1 to i+j−kdo

58 2. ALGORITHM ANALYSIS

r:= r+1

return(r)

2-4. [8] What value is returned by the following function? Express your answer as a

function of n. Give the worst-case running time using Big Oh notation.

function conundrum(n)

r:= 0

for i:= 1 to ndo

for j:= i+1to ndo

for k:= i+j−1to ndo

r:= r+1

return(r)

2-5. [5] Suppose the following algorithm is used to evaluate the polynomial

p(x)=anxn+an−1xn−1+...+a1x+a0

p:= a0;

xpower := 1;

for i:= 1 to ndo

xpower := x∗xpower;

p:= p+ai∗xpower

end

(a) How many multiplications are done in the worst-case? How many additions?

(b) How many multiplications are done on the average?

(c) Can you improve this algorithm?

2-6. [3] Prove that the following algorithm for computing the maximum value in an

array A[1..n] is correct.

function max(A)

m:= A[1]

for i:= 2 to ndo

if A[i]>mthen m:= A[i]

return (m)

Big Oh

2-7. [3] True or False?

(a) Is 2n+1 =O(2n)?

(b) Is 22n=O(2n)?

2-8. [3] For each of the following pairs of functions, either f(n)isinO(g(n)), f(n)is

in Ω(g(n)), or f(n)=Θ(g(n)). Determine which relationship is correct and brieﬂy

explain why.

(a) f(n)=logn2;g(n)=logn+5

2.10 EXERCISES 59

(b) f(n)=√n;g(n)=logn2

(c) f(n)=log

2n;g(n)=logn

(d) f(n)=n;g(n)=log

2n

(e) f(n)=nlog n+n;g(n)=logn

(f) f(n) = 10; g(n)=log10

(g) f(n)=2

n;g(n)=10n2

(h) f(n)=2

n;g(n)=3

n

2-9. [3] For each of the following pairs of functions f(n)andg(n), determine whether

f(n)=O(g(n)), g(n)=O(f(n)), or both.

(a) f(n)=(n2−n)/2, g(n)=6n

(b) f(n)=n+2

√n,g(n)=n2

(c) f(n)=nlog n,g(n)=n√n/2

(d) f(n)=n+logn,g(n)=√n

(e) f(n) = 2(log n)2,g(n)=logn+1

(f) f(n)=4nlog n+n,g(n)=(n2−n)/2

2-10. [3] Prove that n3−3n2−n+1=Θ(n3).

2-11. [3] Prove that n2=O(2n).

2-12. [3] For each of the following pairs of functions f(n)andg(n), give an appropriate

positive constant csuch that f(n)≤c·g(n) for all n>1.

(a) f(n)=n2+n+1, g(n)=2n3

(b) f(n)=n√n+n2,g(n)=n2

(c) f(n)=n2−n+1, g(n)=n2/2

2-13. [3] Prove that if f1(n)=O(g1(n)) and f2(n)=O(g2(n)), then f1(n)+f2(n)=

O(g1(n)+g2(n)).

2-14. [3] Prove that if f1(N)=Ω(g1(n)) and f2(n)=Ω(g2(n)), then f1(n)+f2(n)=

Ω(g1(n)+g2(n)).

2-15. [3] Prove that if f1(n)=O(g1(n)) and f2(n)=O(g2(n)), then f1(n)·f2(n)=

O(g1(n)·g2(n))

2-16. [5] Prove for all k≥1 and all sets of constants {ak,a

k−1,...,a

1,a

0}∈R,

aknk+ak−1nk−1+.... +a1n+a0=O(nk)

2-17. [5] Show that for any real constants aand b,b>0

(n+a)b=Θ(nb)

2-18. [5] List the functions below from the lowest to the highest order. If any two or

more are of the same order, indicate which.

60 2. ALGORITHM ANALYSIS

n2nnlg nln n

n−n3+7n5lg n√ne

n

n2+lgnn

22n−1lg lg n

n3(lg n)2n!n1+εwhere 0 <ε<1

2-19. [5] List the functions below from the lowest to the highest order. If any two or

more are of the same order, indicate which.

√nn 2n

nlog nn−n3+7n5n2+logn

n2n3log n

n1

3+logn(log n)2n!

ln nn

log nlog log n

(1/3)n(3/2)n6

2-20. [5] Find two functions f(n)andg(n) that satisfy the following relationship. If no

such fand gexist, write “None.”

(a) f(n)=o(g(n)) and f(n)=Θ(g(n))

(b) f(n)=Θ(g(n)) and f(n)=o(g(n))

(c) f(n)=Θ(g(n)) and f(n)=O(g(n))

(d) f(n)=Ω(g(n)) and f(n)=O(g(n))

2-21. [5] True or False?

(a) 2n2+1=O(n2)

(b) √n=O(log n)

(c) log n=O(√n)

(d) n2(1 + √n)=O(n2log n)

(e) 3n2+√n=O(n2)

(f) √nlog n=O(n)

(g) log n=O(n−1/2)

2-22. [5] For each of the following pairs of functions f(n)andg(n), state whether f(n)=

O(g(n)), f(n)=Ω(g(n)), f(n)=Θ(g(n)), or none of the above.

(a) f(n)=n2+3n+4,g(n)=6n+7

(b) f(n)=n√n,g(n)=n2−n

(c) f(n)=2

n−n2,g(n)=n4+n2

2-23. [3] For each of these questions, brieﬂy explain your answer.

(a) If I prove that an algorithm takes O(n2) worst-case time, is it possible that it

takes O(n) on some inputs?

(b) If I prove that an algorithm takes O(n2) worst-case time, is it possible that it

takes O(n) on all inputs?

(c) If I prove that an algorithm takes Θ(n2) worst-case time, is it possible that it

takes O(n) on some inputs?

2.10 EXERCISES 61

(d) If I prove that an algorithm takes Θ(n2) worst-case time, is it possible that it

takes O(n) on all inputs?

(e) Is the function f(n)=Θ(n2), where f(n) = 100n2for even nand f(n)=

20n2−nlog2nfor odd n?

2-24. [3] For each of the following, answer yes,no,orcan’t tell. Explain your reasoning.

(a) Is 3n=O(2n)?

(b) Is log 3n=O(log 2n)?

(c) Is 3n=Ω(2

n)?

(d) Is log 3n= Ω(log 2n)?

2-25. [5] For each of the following expressions f(n) ﬁnd a simple g(n) such that f(n)=

Θ(g(n)).

(a) f(n)=n

i=1

1

i.

(b) f(n)=n

i=11

i.

(c) f(n)=n

i=1 log i.

(d) f(n)=log(n!).

2-26. [5] Place the following functions into increasing asymptotic order.

f1(n)=n2log2n,f2(n)=n(log2n)2,f3(n)=n

i=0 2i,f4(n)=log

2(n

i=0 2i).

2-27. [5] Place the following functions into increasing asymptotic order. If two or more

of the functions are of the same asymptotic order then indicate this.

f1(n)=n

i=1 √i,f2(n)=(

√n)logn,f3(n)=n√log n,f4(n)=12n3

2+4n,

2-28. [5] For each of the following expressions f(n) ﬁnd a simple g(n) such that

f(n)=Θ(g(n)). (You should be able to prove your result by exhibiting the rel-

evant parameters, but this is not required for the homework.)

(a) f(n)=n

i=1 3i4+2i3−19i+ 20.

(b) f(n)=n

i=1 3(4i) + 2(3i)−i19 + 20.

(c) f(n)=n

i=1 5i+3

2i.

2-29. [5] Which of the following are true?

(a) n

i=1 3i=Θ(3

n−1).

(b) n

i=1 3i=Θ(3

n).

(c) n

i=1 3i=Θ(3

n+1).

2-30. [5] For each of the following functions fﬁnd a simple function gsuch that f(n)=

Θ(g(n)).

(a) f1(n) = (1000)2n+4

n.

(b) f2(n)=n+nlog n+√n.

(c) f3(n)=log(n20)+(logn)10.

(d) f4(n)=(0.99)n+n100.

62 2. ALGORITHM ANALYSIS

2-31. [5] For each pair of expressions (A, B) below, indicate whether Ais O,o,Ω,ω,or

ΘofB. Note that zero, one or more of these relations may hold for a given pair;

list all correct ones.

AB

(a)n100 2n

(b)(lgn)12 √n

(c)√nn

cos(πn/8)

(d)10

n100n

(e)nlg n(lg n)n

(f)lg(n!) nlg n

Summations

2-32. [5] Prove that:

12−22+3

2−42+...+(−1)k−1k2=(−1)k−1k(k+1)/2

2-33. [5] Find an expression for the sum of the ith row of the following triangle, and

prove its correctness. Each entry is the sum of the three entries directly above it.

All non existing entries are considered 0.

1

111

12321

1367631

1 4 10 16 19 16 10 4 1

2-34. [3] Assume that Christmas has ndays. Exactly how many presents did my “true

love” send me? (Do some research if you do not understand this question.)

2-35. [5] Consider the following code fragment.

for i=1 to n do

for j=i to 2*i do

output ‘‘foobar’’

Let T(n) denote the number of times ‘foobar’ is printed as a function of n.

a. Express T(n) as a summation (actually two nested summations).

b. Simplify the summation. Show your work.

2-36. [5] Consider the following code fragment.

for i=1 to n/2 do

for j=i to n-i do

for k=1 to j do

output ‘‘foobar’’

Assume nis even. Let T(n) denote the number of times ‘foobar’ is printed as a

function of n.

(a) Express T(n) as three nested summations.

(b) Simplify the summation. Show your work.

2.10 EXERCISES 63

2-37. [6] When you ﬁrst learned to multiply numbers, you were told that x×ymeans

adding xa total of ytimes, so 5×4 = 5+5+5+5 = 20. What is the time complexity

of multiplying two n-digit numbers in base b(people work in base 10, of course,

while computers work in base 2) using the repeated addition method, as a function

of nand b. Assume that single-digit by single-digit addition or multiplication takes

O(1) time. (Hint: how big can ybe as a function of nand b?)

2-38. [6] In grade school, you learned to multiply long numbers on a digit-by-digit basis,

so that 127 ×211 = 127 ×1 + 127 ×10 + 127 ×200 = 26,397. Analyze the time

complexity of multiplying two n-digit numbers with this method as a function of

n(assume constant base size). Assume that single-digit by single-digit addition or

multiplication takes O(1) time.

Logarithms

2-39. [5] Prove the following identities on logarithms:

(a) Prove that loga(xy)=log

ax+log

ay

(b) Prove that logaxy=ylogax

(c) Prove that logax=logbx

logba

(d) Prove that xlogby=ylogbx

2-40. [3] Show that lg(n+1)=lg n+1

2-41. [3] Prove that that the binary representation of n≥1haslg2n+1bits.

2-42. [5] In one of my research papers I give a comparison-based sorting algorithm that

runs in O(nlog(√n)). Given the existence of an Ω(nlog n) lower bound for sorting,

how can this be possible?

Interview Problems

2-43. [5] You are given a set Sof nnumbers. You must pick a subset Sof knumbers from

Ssuch that the probability of each element of Soccurring in Sis equal (i.e., each

is selected with probability k/n). You may make only one pass over the numbers.

What if nis unknown?

2-44. [5] We have 1,000 data items to store on 1,000 nodes. Each node can store copies

of exactly three diﬀerent items. Propose a replication scheme to minimize data loss

as nodes fail. What is the expected number of data entries that get lost when three

random nodes fail?

2-45. [5] Consider the following algorithm to ﬁnd the minimum element in an array

of numbers A[0,...,n]. One extra variable tmp is allocated to hold the current

minimum value. Start from A[0]; ”tmp” is compared against A[1], A[2], ..., A[N]

in order. When A[i]<tmp,tmp =A[i]. What is the expected number of times that

the assignment operation tmp =A[i] is performed?

2-46. [5] You have a 100-story building and a couple of marbles. You must identify the

lowest ﬂoor for which a marble will break if you drop it from this ﬂoor. How fast

can you ﬁnd this ﬂoor if you are given an inﬁnite supply of marbles? What if you

have only two marbles?

64 2. ALGORITHM ANALYSIS

2-47. [5] You are given 10 bags of gold coins. Nine bags contain coins that each weigh 10

grams. One bag contains all false coins that weigh one gram less. You must identify

this bag in just one weighing. You have a digital balance that reports the weight of

what is placed on it.

2-48. [5] You have eight balls all of the same size. Seven of them weigh the same, and one

of them weighs slightly more. How can you ﬁnd the ball that is heavier by using a

balance and only two weighings?

2-49. [5] Suppose we start with ncompanies that eventually merge into one big company.

How many diﬀerent ways are there for them to merge?

2-50. [5] ARamanujam number can be written two diﬀerent ways as the sum of two

cubes—i.e. , there exist distinct a,b,c,anddsuch that a3+b3=c3+d3. Generate

all Ramanujam numbers where a, b, c, d < n.

2-51. [7] Six pirates must divide $300 dollars among themselves. The division is to proceed

as follows. The senior pirate proposes a way to divide the money. Then the pirates

vote. If the senior pirate gets at least half the votes he wins, and that division

remains. If he doesn’t, he is killed and then the next senior-most pirate gets a

chance to do the division. Now you have to tell what will happen and why (i.e.

, how many pirates survive and how the division is done)? All the pirates are

intelligent and the ﬁrst priority is to stay alive and the next priority is to get as

much money as possible.

2-52. [7] Reconsider the pirate problem above, where only one indivisible dollar is to be

divided. Who gets the dollar and how many are killed?

Programming Challenges

These programming challenge problems with robot judging are available at

http://www.programming-challenges.com or http://online-judge.uva.es.

2-1. “Primary Arithmetic” – Programming Challenges 110501, UVA Judge 10035.

2-2. “A Multiplication Game” – Programming Challenges 110505, UVA Judge 847.

2-3. “Light, More Light” – Programming Challenges 110701, UVA Judge 10110.

3

Data Structures

Changing a data structure in a slow program can work the same way an organ

transplant does in a sick patient. Important classes of abstract data types such as

containers, dictionaries, and priority queues, have many diﬀerent but functionally

equivalent data structures that implement them. Changing the data structure does

not change the correctness of the program, since we presumably replace a correct

implementation with a diﬀerent correct implementation. However, the new imple-

mentation of the data type realizes diﬀerent tradeoﬀs in the time to execute various

operations, so the total performance can improve dramatically. Like a patient in

need of a transplant, only one part might need to be replaced in order to ﬁx the

problem.

But it is better to be born with a good heart than have to wait for a replace-

ment. The maximum beneﬁt from good data structures results from designing your

program around them in the ﬁrst place. We assume that the reader has had some

previous exposure to elementary data structures and pointer manipulation. Still,

data structure (CS II) courses these days focus more on data abstraction and ob-

ject orientation than the nitty-gritty of how structures should be represented in

memory. We will review this material to make sure you have it down.

In data structures, as with most subjects, it is more important to really un-

derstand the basic material than have exposure to more advanced concepts. We

will focus on each of the three fundamental abstract data types (containers, dic-

tionaries, and priority queues) and see how they can be implemented with arrays

and lists. Detailed discussion of the tradeoﬀs between more sophisticated imple-

mentations is deferred to the relevant catalog entry for each of these data types.

S.S. Skiena, The Algorithm Design Manual, 2nd ed., DOI: 10.1007/978-1-84800-070-4 3,

c

Springer-Verlag London Limited 2008

66 3. DATA STRUCTURES

3.1 Contiguous vs. Linked Data Structures

Data structures can be neatly classiﬁed as either contiguous or linked, depending

upon whether they are based on arrays or pointers:

•Contiguously-allocated structures are composed of single slabs of memory, and

include arrays, matrices, heaps, and hash tables.

•Linked data structures are composed of distinct chunks of memory bound

together by pointers, and include lists, trees, and graph adjacency lists.

In this section, we review the relative advantages of contiguous and linked data

structures. These tradeoﬀs are more subtle than they appear at ﬁrst glance, so I

encourage readers to stick with me here even if you may be familiar with both

types of structures.

3.1.1 Arrays

The array is the fundamental contiguously-allocated data structure. Arrays are

structures of ﬁxed-size data records such that each element can be eﬃciently located

by its index or (equivalently) address.

A good analogy likens an array to a street full of houses, where each array

element is equivalent to a house, and the index is equivalent to the house number.

Assuming all the houses are equal size and numbered sequentially from 1 to n,we

can compute the exact position of each house immediately from its address.1

Advantages of contiguously-allocated arrays include:

•Constant-time access given the index – Because the index of each element

maps directly to a particular memory address, we can access arbitrary data

items instantly provided we know the index.

•Space eﬃciency – Arrays consist purely of data, so no space is wasted with

links or other formatting information. Further, end-of-record information is

not needed because arrays are built from ﬁxed-size records.

•Memory locality – A common programming idiom involves iterating through

all the elements of a data structure. Arrays are good for this because they

exhibit excellent memory locality. Physical continuity between successive data

accesses helps exploit the high-speed cache memory on modern computer

architectures.

The downside of arrays is that we cannot adjust their size in the middle of

a program’s execution. Our program will fail soon as we try to add the (n+

1Houses in Japanese cities are traditionally numbered in the order they were built, not by their physical

location. This makes it extremely diﬃcult to locate a Japanese address without a detailed map.

3.1 CONTIGUOUS VS. LINKED DATA STRUCTURES 67

1)st customer, if we only allocate room for nrecords. We can compensate by

allocating extremely large arrays, but this can waste space, again restricting what

our programs can do.

Actually, we can eﬃciently enlarge arrays as we need them, through the miracle

of dynamic arrays. Suppose we start with an array of size 1, and double its size from

mto 2meach time we run out of space. This doubling process involves allocating a

new contiguous array of size 2m, copying the contents of the old array to the lower

half of the new one, and returning the space used by the old array to the storage

allocation system.

The apparent waste in this procedure involves the recopying of the old contents

on each expansion. How many times might an element have to be recopied after a

total of ninsertions? Well, the ﬁrst inserted element will have been recopied when

the array expands after the ﬁrst, second, fourth, eighth, ...insertions. It will take

log2ndoublings until the array gets to have npositions. However, most elements

do not suﬀer much upheaval. Indeed, the (n/2 + 1)st through nth elements will

move at most once and might never have to move at all.

If half the elements move once, a quarter of the elements twice, and so on, the

total number of movements Mis given by

M=

lg n

i=1

i·n/2i=n

lg n

i=1

i/2i≤n∞

i=1

i/2i=2n

Thus, each of the nelements move only two times on average, and the total work

of managing the dynamic array is the same O(n) as it would have been if a single

array of suﬃcient size had been allocated in advance!

The primary thing lost using dynamic arrays is the guarantee that each array

access takes constant time in the worst case. Now all the queries will be fast, except

for those relatively few queries triggering array doubling. What we get instead is a

promise that the nth array access will be completed quickly enough that the total

eﬀort expended so far will still be O(n). Such amortized guarantees arise frequently

in the analysis of data structures.

3.1.2 Pointers and Linked Structures

Pointers are the connections that hold the pieces of linked structures together.

Pointers represent the address of a location in memory. A variable storing a pointer

to a given data item can provide more freedom than storing a copy of the item

itself. A cell-phone number can be thought of as a pointer to its owner as they

move about the planet.

Pointer syntax and power diﬀer signiﬁcantly across programming languages, so

we begin with a quick review of pointers in C language. A pointer pis assumed to

68 3. DATA STRUCTURES

ClintonJeffersonLincoln NIL

Figure 3.1: Linked list example showing data and pointer ﬁelds

give the address in memory where a particular chunk of data is located.2Pointers

in C have types declared at compiler time, denoting the data type of the items

they can point to. We use *p to denote the item that is pointed to by pointer p,

and &x to denote the address (i.e. , pointer) of a particular variable x. A special

NULL pointer value is used to denote structure-terminating or unassigned pointers.

All linked data structures share certain properties, as revealed by the following

linked list type declaration:

typedef struct list {

item_type item; /* data item */

struct list *next; /* point to successor */

} list;

In particular:

•Each node in our data structure (here list) contains one or more data ﬁelds

(here item) that retain the data that we need to store.

•Each node contains a pointer ﬁeld to at least one other node (here next).

This means that much of the space used in linked data structures has to be

devoted to pointers, not data.

•Finally, we need a pointer to the head of the structure, so we know where to

access it.

The list is the simplest linked structure. The three basic operations supported

by lists are searching, insertion, and deletion. In doubly-linked lists, each node points

both to its predecessor and its successor element. This simpliﬁes certain operations

at a cost of an extra pointer ﬁeld per node.

Searching a List

Searching for item xin a linked list can be done iteratively or recursively. We opt

for recursively in the implementation below. If xis in the list, it is either the ﬁrst

element or located in the smaller rest of the list. Eventually, we reduce the problem

to searching in an empty list, which clearly cannot contain x.

2C permits direct manipulation of memory addresses in ways which may horrify Java programmers, but we

will avoid doing any such tricks.

3.1 CONTIGUOUS VS. LINKED DATA STRUCTURES 69

list *search_list(list *l, item_type x)

{

if (l == NULL) return(NULL);

if (l->item == x)

return(l);

else

return( search_list(l->next, x) );

}

Insertion into a List

Insertion into a singly-linked list is a nice exercise in pointer manipulation, as shown

below. Since we have no need to maintain the list in any particular order, we might

as well insert each new item in the simplest place. Insertion at the beginning of the

list avoids any need to traverse the list, but does require us to update the pointer

(denoted l) to the head of the data structure.

void insert_list(list **l, item_type x)

{

list *p; /* temporary pointer */

p = malloc( sizeof(list) );

p->item = x;

p->next = *l;

*l = p;

}

Two C-isms to note. First, the malloc function allocates a chunk of memory

of suﬃcient size for a new node to contain x. Second, the funny double star (**l)

denotes that lis a pointer to a pointer to a list node. Thus the last line, *l=p;

copies pto the place pointed to l, which is the external variable maintaining access

to the head of the list.

Deletion From a List

Deletion from a linked list is somewhat more complicated. First, we must ﬁnd a

pointer to the predecessor of the item to be deleted. We do this recursively:

70 3. DATA STRUCTURES

list *predecessor_list(list *l, item_type x)

{

if ((l == NULL) || (l->next == NULL)) {

printf("Error: predecessor sought on null list.\n");

return(NULL);

}

if ((l->next)->item == x)

return(l);

else

return( predecessor_list(l->next, x) );

}

The predecessor is needed because it points to the doomed node, so its next

pointer must be changed. The actual deletion operation is simple, once ruling out

the case that the to-be-deleted element does not exist. Special care must be taken

to reset the pointer to the head of the list (l) when the ﬁrst element is deleted:

delete_list(list **l, item_type x)

{

list *p; /* item pointer */

list *pred; /* predecessor pointer */

list *search_list(), *predecessor_list();

p = search_list(*l,x);

if (p != NULL) {

pred = predecessor_list(*l,x);

if (pred == NULL) /* splice out out list */

*l = p->next;

else

pred->next = p->next;

free(p); /* free memory used by node */

}

}

C language requires explicit deallocation of memory, so we must free the

deleted node after we are ﬁnished with it to return the memory to the system.

3.1.3 Comparison

The relative advantages of linked lists over static arrays include:

•Overﬂow on linked structures can never occur unless the memory is actually

full.

3.2 STACKS AND QUEUES 71

•Insertions and deletions are simpler than for contiguous (array) lists.

•With large records, moving pointers is easier and faster than moving the

items themselves.

while the relative advantages of arrays include:

•Linked structures require extra space for storing pointer ﬁelds.

•Linked lists do not allow eﬃcient random access to items.

•Arrays allow better memory locality and cache performance than random

pointer jumping.

Take-Home Lesson: Dynamic memory allocation provides us with ﬂexibility

on how and where we use our limited storage resources.

One ﬁnal thought about these fundamental structures is that they can be

thought of as recursive objects:

•Lists – Chopping the ﬁrst element oﬀ a linked list leaves a smaller linked list.

This same argument works for strings, since removing characters from string

leaves a string. Lists are recursive objects.

•Arrays – Splitting the ﬁrst kelements oﬀ of an nelement array gives two

smaller arrays, of size kand n−k, respectively. Arrays are recursive objects.

This insight leads to simpler list processing, and eﬃcient divide-and-conquer

algorithms such as quicksort and binary search.

3.2 Stacks and Queues

We use the term container to denote a data structure that permits storage and

retrieval of data items independent of content. By contrast, dictionaries are abstract

data types that retrieve based on key values or content, and will be discussed in

Section 3.3 (page 72).

Containers are distinguished by the particular retrieval order they support. In

the two most important types of containers, this retrieval order depends on the

insertion order:

•Stacks – Support retrieval by last-in, ﬁrst-out (LIFO) order. Stacks are simple

to implement and very eﬃcient. For this reason, stacks are probably the

right container to use when retrieval order doesn’t matter at all, such as

when processing batch jobs. The put and get operations for stacks are usually

called push and pop:

72 3. DATA STRUCTURES

–Push(x,s): Insert item xat the top of stack s.

–Pop(s): Return (and remove) the top item of stack s.

LIFO order arises in many real-world contexts. People crammed into a subway

car exit in LIFO order. Food inserted into my refrigerator usually exits the

same way, despite the incentive of expiration dates. Algorithmically, LIFO

tends to happen in the course of executing recursive algorithms.

•Queues – Support retrieval in ﬁrst in, ﬁrst out (FIFO) order. This is surely

the fairest way to control waiting times for services. You want the container

holding jobs to be processed in FIFO order to minimize the maximum time

spent waiting. Note that the average waiting time will be the same regardless

of whether FIFO or LIFO is used. Many computing applications involve data

items with inﬁnite patience, which renders the question of maximum waiting

time moot.

Queues are somewhat trickier to implement than stacks and thus are most

appropriate for applications (like certain simulations) where the order is im-

portant. The put and get operations for queues are usually called enqueue

and dequeue.

–Enqueue(x,q): Insert item xat the back of queue q.

–Dequeue(q): Return (and remove) the front item from queue q.

We will see queues later as the fundamental data structure controlling

breadth-ﬁrst searches in graphs.

Stacks and queues can be eﬀectively implemented using either arrays or linked

lists. The key issue is whether an upper bound on the size of the container is known

in advance, thus permitting the use of a statically-allocated array.

3.3 Dictionaries

The dictionary data type permits access to data items by content. You stick an

item into a dictionary so you can ﬁnd it when you need it.

The primary operations of dictionary support are:

•Search(D,k) – Given a search key k, return a pointer to the element in dic-

tionary Dwhose key value is k, if one exists.

•Insert(D,x) – Given a data item x, add it to the set in the dictionary D.

•Delete(D,x) – Given a pointer to a given data item xin the dictionary D,

remove it from D.

3.3 DICTIONARIES 73

Certain dictionary data structures also eﬃciently support other useful opera-

tions:

•Max(D) or Min(D) – Retrieve the item with the largest (or smallest) key from

D. This enables the dictionary to serve as a priority queue, to be discussed

in Section 3.5 (page 83).

•Predecessor(D,k) or Successor(D,k) – Retrieve the item from Dwhose key is

immediately before (or after) kin sorted order. These enable us to iterate

through the elements of the data structure.

Many common data processing tasks can be handled using these dictionary

operations. For example, suppose we want to remove all duplicate names from a

mailing list, and print the results in sorted order. Initialize an empty dictionary

D, whose search key will be the record name. Now read through the mailing list,

and for each record search to see if the name is already in D. If not, insert it

into D. Once ﬁnished, we must extract the remaining names out of the dictionary.

By starting from the ﬁrst item Min(D) and repeatedly calling Successor until we

obtain Max(D), we traverse all elements in sorted order.

By deﬁning such problems in terms of abstract dictionary operations, we avoid

the details of the data structure’s representation and focus on the task at hand.

In the rest of this section, we will carefully investigate simple dictionary imple-

mentations based on arrays and linked lists. More powerful dictionary implemen-

tations such as binary search trees (see Section 3.4 (page 77)) and hash tables (see

Section 3.7 (page 89)) are also attractive options in practice. A complete discussion

of diﬀerent dictionary data structures is presented in the catalog in Section 12.1

(page 367). We encourage the reader to browse through the data structures section

of the catalog to better learn what your options are.

Stop and Think: Comparing Dictionary Implementations (I)

Problem: What are the asymptotic worst-case running times for each of the seven

fundamental dictionary operations (search, insert, delete, successor, predecessor,

minimum, and maximum) when the data structure is implemented as:

•An unsorted array.

•A sorted array.

Solution: This problem (and the one following it) reveal some of the inherent trade-

oﬀs of data structure design. A given data representation may permit eﬃcient im-

plementation of certain operations at the cost that other operations are expensive.

74 3. DATA STRUCTURES

In addition to the array in question, we will assume access to a few extra

variables such as n—the number of elements currently in the array. Note that we

must maintain the value of these variables in the operations where they change

(e.g., insert and delete), and charge these operations the cost of this maintenance.

The basic dictionary operations can be implemented with the following costs

on unsorted and sorted arrays, respectively:

Unsorted Sorted

Dictionary operation array array

Search(L,k)O(n)O(log n)

Insert(L,x)O(1) O(n)

Delete(L,x)O(1)∗O(n)

Successor(L,x)O(n)O(1)

Predecessor(L,x)O(n)O(1)

Minimum(L)O(n)O(1)

Maximum(L)O(n)O(1)

We must understand the implementation of each operation to see why. First,

we discuss the operations when maintaining an unsorted array A.

•Search is implemented by testing the search key kagainst (potentially) each

element of an unsorted array. Thus, search takes linear time in the worst case,

which is when key kis not found in A.

•Insertion is implemented by incrementing nand then copying item xto

the nth cell in the array, A[n]. The bulk of the array is untouched, so this

operation takes constant time.

•Deletion is somewhat trickier, hence the superscript(∗) in the table. The

deﬁnition states that we are given a pointer xto the element to delete, so

we need not spend any time searching for the element. But removing the xth

element from the array Aleaves a hole that must be ﬁlled. We could ﬁll the

hole by moving each of the elements A[x+1] to A[n] up one position, but

this requires Θ(n) time when the ﬁrst element is deleted. The following idea

is better: just write over A[x] with A[n], and decrement n. This only takes

constant time.

•The deﬁnition of the traversal operations, Predecessor and Successor, refer

to the item appearing before/after xin sorted order. Thus, the answer is

not simply A[x−1] (or A[x+ 1]), because in an unsorted array an element’s

physical predecessor (successor) is not necessarily its logical predecessor (suc-

cessor). Instead, the predecessor of A[x] is the biggest element smaller than

A[x]. Similarly, the successor of A[x] is the smallest element larger than A[x].

Both require a sweep through all nelements of Ato determine the winner.

•Minimum and Maximum are similarly deﬁned with respect to sorted order,

and so require linear sweeps to identify in an unsorted array.

3.3 DICTIONARIES 75

Implementing a dictionary using a sorted array completely reverses our notions

of what is easy and what is hard. Searches can now be done in O(log n) time, using

binary search, because we know the median element sits in A[n/2]. Since the upper

and lower portions of the array are also sorted, the search can continue recursively

on the appropriate portion. The number of halvings of nuntil we get to a single

element is lg n.

The sorted order also beneﬁts us with respect to the other dictionary retrieval

operations. The minimum and maximum elements sit in A[1] and A[n], while the

predecessor and successor to A[x]areA[x−1] and A[x+ 1], respectively.

Insertion and deletion become more expensive, however, because making room

for a new item or ﬁlling a hole may require moving many items arbitrarily. Thus

both become linear-time operations.

Take-Home Lesson: Data structure design must balance all the diﬀerent op-

erations it supports. The fastest data structure to support both operations A

and Bmay well not be the fastest structure to support either operation Aor

B.

Stop and Think: Comparing Dictionary Implementations (II)

Problem: What is the asymptotic worst-case running times for each of the seven

fundamental dictionary operations when the data structure is implemented as

•A singly-linked unsorted list.

•A doubly-linked unsorted list.

•A singly-linked sorted list.

•A doubly-linked sorted list.

Solution: Two diﬀerent issues must be considered in evaluating these implementa-

tions: singly- vs. doubly-linked lists and sorted vs. unsorted order. Subtle operations

are denoted with a superscript:

Singly Double Singly Doubly

Dictionary operation unsorted unsorted sorted sorted

Search(L,k)O(n)O(n)O(n)O(n)

Insert(L,x)O(1) O(1) O(n)O(n)

Delete(L,x)O(n)∗O(1) O(n)∗O(1)

Successor(L,x)O(n)O(n)O(1) O(1)

Predecessor(L,x)O(n)O(n)O(n)∗O(1)

Minimum(L)O(n)O(n)O(1) O(1)

Maximum(L)O(n)O(n)O(1)∗O(1)

76 3. DATA STRUCTURES

As with unsorted arrays, search operations are destined to be slow while main-

tenance operations are fast.

•Insertion/Deletion – The complication here is deletion from a singly-linked

list. The deﬁnition of the Delete operation states we are given a pointer xto

the item to be deleted. But what we really need is a pointer to the element

pointing to xin the list, because that is the node that needs to be changed.

We can do nothing without this list predecessor, and so must spend linear

time searching for it on a singly-linked list. Doubly-linked lists avoid this

problem, since we can immediately retrieve the list predecessor of x.

Deletion is faster for sorted doubly-linked lists than sorted arrays, because

splicing out the deleted element from the list is more eﬃcient than ﬁlling

the hole by moving array elements. The predecessor pointer problem again

complicates deletion from singly-linked sorted lists.

•Search – Sorting provides less beneﬁt for linked lists than it did for arrays. Bi-

nary search is no longer possible, because we can’t access the median element

without traversing all the elements before it. What sorted lists do provide is

quick termination of unsuccessful searches, for if we have not found Abbott by

the time we hit Costello we can deduce that he doesn’t exist. Still, searching

takes linear time in the worst case.

•Traversal operations – The predecessor pointer problem again complicates

implementing Predecessor. The logical successor is equivalent to the node

successor for both types of sorted lists, and hence can be implemented in

constant time.

•Maximum – The maximum element sits at the tail of the list, which would

normally require Θ(n) time to reach in either singly- or doubly-linked lists.

However, we can maintain a separate pointer to the list tail, provided we

pay the maintenance costs for this pointer on every insertion and deletion.

The tail pointer can be updated in constant time on doubly-linked lists: on

insertion check whether last->next still equals NULL, and on deletion set

last to point to the list predecessor of last if the last element is deleted.

We have no eﬃcient way to ﬁnd this predecessor for singly-linked lists. So

why can we implement maximum in Θ(1) on singly-linked lists? The trick is

to charge the cost to each deletion, which already took linear time. Adding

an extra linear sweep to update the pointer does not harm the asymptotic

complexity of Delete, while gaining us Maximum in constant time as a reward

for clear thinking.

3.4 BINARY SEARCH TREES 77

1

2

13

13

1

2

3

2

2

3

1

2

3

Figure 3.2: The ﬁve distinct binary search trees on three nodes

3.4 Binary Search Trees

We have seen data structures that allow fast search or ﬂexible update, but not fast

search and ﬂexible update. Unsorted, doubly-linked lists supported insertion and

deletion in O(1) time but search took linear time in the worse case. Sorted arrays

support binary search and logarithmic query times, but at the cost of linear-time

update.

Binary search requires that we have fast access to two elements—speciﬁcally

the median elements above and below the given node. To combine these ideas, we

need a “linked list” with two pointers per node. This is the basic idea behind binary

search trees.

Arooted binary tree is recursively deﬁned as either being (1) empty, or (2)

consisting of a node called the root, together with two rooted binary trees called

the left and right subtrees, respectively. The order among “brother” nodes matters

in rooted trees, so left is diﬀerent from right. Figure 3.2 gives the shapes of the ﬁve

distinct binary trees that can be formed on three nodes.

A binary search tree labels each node in a binary tree with a single key such

that for any node labeled x, all nodes in the left subtree of xhave keys <xwhile

all nodes in the right subtree of xhave keys >x. This search tree labeling scheme

is very special. For any binary tree on nnodes, and any set of nkeys, there is

exactly one labeling that makes it a binary search tree. The allowable labelings for

three-node trees are given in Figure 3.2.

3.4.1 Implementing Binary Search Trees

Binary tree nodes have left and right pointer ﬁelds, an (optional) parent pointer,

and a data ﬁeld. These relationships are shown in Figure 3.3; a type declaration

for the tree structure is given below:

78 3. DATA STRUCTURES

left right

parent

Figure 3.3: Relationships in a binary search tree (left). Finding the minimum (center) and

maximum (right) elements in a binary search tree

typedef struct tree {

item_type item; /* data item */

struct tree *parent; /* pointer to parent */

struct tree *left; /* pointer to left child */

struct tree *right; /* pointer to right child */

} tree;

The basic operations supported by binary trees are searching, traversal, inser-

tion, and deletion.

Searching in a Tree

The binary search tree labeling uniquely identities where each key is located. Start

at the root. Unless it contains the query key x, proceed either left or right depending

upon whether xoccurs before or after the root key. This algorithm works because

both the left and right subtrees of a binary search tree are themselves binary search

trees. This recursive structure yields the recursive search algorithm below:

tree *search_tree(tree *l, item_type x)

{

if (l == NULL) return(NULL);

if (l->item == x) return(l);

if (x < l->item)

return( search_tree(l->left, x) );

else

return( search_tree(l->right, x) );

}

3.4 BINARY SEARCH TREES 79

This search algorithm runs in O(h) time, where hdenotes the height of the

tree.

Finding Minimum and Maximum Elements in a Tree

Implementing the ﬁnd-minimum operation requires knowing where the minimum

element is in the tree. By deﬁnition, the smallest key must reside in the left subtree

of the root, since all keys in the left subtree have values less than that of the root.

Therefore, as shown in Figure 3.3, the minimum element must be the leftmost

descendent of the root. Similarly, the maximum element must be the rightmost

descendent of the root.

tree *find_minimum(tree *t)

{

tree *min; /* pointer to minimum */

if (t == NULL) return(NULL);

min = t;

while (min->left != NULL)

min = min->left;

return(min);

}

Traversal in a Tree

Visiting all the nodes in a rooted binary tree proves to be an important component

of many algorithms. It is a special case of traversing all the nodes and edges in a

graph, which will be the foundation of Chapter 5.

A prime application of tree traversal is listing the labels of the tree nodes.

Binary search trees make it easy to report the labels in sorted order. By deﬁnition,

all the keys smaller than the root must lie in the left subtree of the root, and all

keys bigger than the root in the right subtree. Thus, visiting the nodes recursively

in accord with such a policy produces an in-order traversal of the search tree:

void traverse_tree(tree *l)

{

if (l != NULL) {

traverse_tree(l->left);

process_item(l->item);

traverse_tree(l->right);

}

}

80 3. DATA STRUCTURES

Each item is processed once during the course of traversal, which runs in O(n)

time, where ndenotes the number of nodes in the tree.

Alternate traversal orders come from changing the position of process item

relative to the traversals of the left and right subtrees. Processing the item ﬁrst

yields a pre-order traversal, while processing it last gives a post-order traversal.

These make relatively little sense with search trees, but prove useful when the

rooted tree represents arithmetic or logical expressions.

Insertion in a Tree

There is only one place to insert an item xinto a binary search tree Twhere we

know we can ﬁnd it again. We must replace the NULL pointer found in Tafter an

unsuccessful query for the key k.

This implementation uses recursion to combine the search and node insertion

stages of key insertion. The three arguments to insert tree are (1) a pointer lto

the pointer linking the search subtree to the rest of the tree, (2) the key xto be

inserted, and (3) a parent pointer to the parent node containing l. The node is

allocated and linked in on hitting the NULL pointer. Note that we pass the pointer to

the appropriate left/right pointer in the node during the search, so the assignment

*l = p; links the new node into the tree:

insert_tree(tree **l, item_type x, tree *parent)

{

tree *p; /* temporary pointer */

if (*l == NULL) {

p = malloc(sizeof(tree)); /* allocate new node */

p->item = x;

p->left = p->right = NULL;

p->parent = parent;

*l = p; /* link into parent’s record */

return;

}

if (x < (*l)->item)

insert_tree(&((*l)->left), x, *l);

else

insert_tree(&((*l)->right), x, *l);

}

Allocating the node and linking it in to the tree is a constant-time operation

after the search has been performed in O(h) time.

3.4 BINARY SEARCH TREES 81

initial tree delete node with zero children (3)

5

5

2

6

8

7

3

1

2

8

7

4

3

1

2

5

6

8

7

4

1

delete node with 2 children (4)delete node with 1 child (6)

6

8

7

4

3

1

5

2

Figure 3.4: Deleting tree nodes with 0, 1, and 2 children

Deletion from a Tree

Deletion is somewhat trickier than insertion, because removing a node means ap-

propriately linking its two descendant subtrees back into the tree somewhere else.

There are three cases, illustrated in Figure 3.4. Leaf nodes have no children, and

so may be deleted by simply clearing the pointer to the given node.

The case of the doomed node having one child is also straightforward. There

is one parent and one grandchild, and we can link the grandchild directly to the

parent without violating the in-order labeling property of the tree.

But what of a to-be-deleted node with two children? Our solution is to relabel

this node with the key of its immediate successor in sorted order. This successor

must be the smallest value in the right subtree, speciﬁcally the leftmost descendant

in the right subtree (p). Moving this to the point of deletion results in a properly-

labeled binary search tree, and reduces our deletion problem to physically removing

a node with at most one child—a case that has been resolved above.

The full implementation has been omitted here because it looks a little ghastly,

but the code follows logically from the description above.

The worst-case complexity analysis is as follows. Every deletion requires the

cost of at most two search operations, each taking O(h) time where his the height

of the tree, plus a constant amount of pointer manipulation.

3.4.2 How Good Are Binary Search Trees?

When implemented using binary search trees, all three dictionary operations take

O(h) time, where his the height of the tree. The smallest height we can hope for

occurs when the tree is perfectly balanced, where h=log n. This is very good,

but the tree must be perfectly balanced.

82 3. DATA STRUCTURES

Our insertion algorithm puts each new item at a leaf node where it should have

been found. This makes the shape (and more importantly height) of the tree a

function of the order in which we insert the keys.

Unfortunately, bad things can happen when building trees through insertion.

The data structure has no control over the order of insertion. Consider what hap-

pens if the user inserts the keys in sorted order. The operations insert(a), followed

by insert(b),insert(c),insert(d), . . . will produce a skinny linear height tree

where only right pointers are used.

Thus binary trees can have heights ranging from lg nto n. But how tall are

they on average? The average case analysis of algorithms can be tricky because we

must carefully specify what we mean by average. The question is well deﬁned if we

consider each of the n! possible insertion orderings equally likely and average over

those. If so, we are in luck, because with high probability the resulting tree will

have O(log n) height. This will be shown in Section 4.6 (page 123).

This argument is an important example of the power of randomization.Wecan

often develop simple algorithms that oﬀer good performance with high probabil-

ity. We will see that a similar idea underlies the fastest known sorting algorithm,

quicksort.

3.4.3 Balanced Search Trees

Random search trees are usually good. But if we get unlucky with our order of

insertion, we can end up with a linear-height tree in the worst case. This worst

case is outside of our direct control, since we must build the tree in response to the

requests given by our potentially nasty user.

What would be better is an insertion/deletion procedure which adjusts the tree a

little after each insertion, keeping it close enough to be balanced so the maximum

height is logarithmic. Sophisticated balanced binary search tree data structures

have been developed that guarantee the height of the tree always to be O(log n).

Therefore, all dictionary operations (insert, delete, query) take O(log n) time each.

Implementations of balanced tree data structures such as red-black trees and splay

trees are discussed in Section 12.1 (page 367).

From an algorithm design viewpoint, it is important to know that these trees

exist and that they can be used as black boxes to provide an eﬃcient dictionary

implementation. When ﬁguring the costs of dictionary operations for algorithm

analysis, we can assume the worst-case complexities of balanced binary trees to be

a fair measure.

Take-Home Lesson: Picking the wrong data structure for the job can be

disastrous in terms of performance. Identifying the very best data structure

is usually not as critical, because there can be several choices that perform

similarly.

3.5 PRIORITY QUEUES 83

Stop and Think: Exploiting Balanced Search Trees

Problem: You are given the task of reading nnumbers and then printing them

out in sorted order. Suppose you have access to a balanced dictionary data struc-

ture, which supports the operations search, insert, delete, minimum, maximum,

successor, and predecessor each in O(log n) time.