# A Practical Guide To Designing Phase II Trials In Oncology (Statistics Practice) Sarah R. Brown, Walter M. Gregory, Christopher J. Twelves, Julia B

User Manual:

Open the PDF directly: View PDF .

Page Count: 258 [warning: Documents this large are best viewed by clicking the View PDF Link!]

- A Practical Guide to Designing Phase II Trials in Oncology
- 1 Introduction
- 2 Key points for consideration
- 3 Designs for single experimental therapies with a single arm
- 4 Designs for single experimental therapies including randomisation
- 5 Treatment selection designs
- 5.1 Including a control arm
- 5.1.1 One-stage designs
- 5.1.2 Two-stage designs
- 5.1.3 Multi-stage designs
- 5.1.4 Continuous monitoring designs
- 5.1.5 Decision-theoretic designs
- 5.1.6 Three-outcome designs
- 5.1.7 Phase II/III designs – same primary outcome measure at phase II and phase III
- 5.1.8 Phase II/III designs – different primary outcome measures at phase II and phase III
- 5.1.9 Randomised discontinuation designs

- 5.2 Not including a control arm

- 5.1 Including a control arm
- 6 Designs incorporating toxicity as a primary outcome
- 7 Designs evaluating targeted subgroups
- 8 ‘Chemo-radio-sensitisation’ in head and neck cancer
- 9 Combination chemotherapy in second-line treatment of non-small cell lung cancer
- 10 Selection by biomarker in prostate cancer
- 11 Dose selection in advanced multiple myeloma
- 12 Targeted therapy for advanced colorectal cancer
- 13 Phase II oncology trials: Perspective from industry
- References
- Index
- Statistics in Practice

StatiSticS in Practice

Sarah r. Brown

walter M. GreGory

chriS twelveS

Julia Brown

A Practical Guide

to Designing Phase II

Trials in Oncology

A Practical Guide to Designing

Phase II Trials in Oncology

STATISTICS IN PRACTICE

Series Advisors

Human and Biological Sciences

Stephen Senn

CRP-Sant´

e, Luxembourg

Earth and Environmental Sciences

Marian Scott

University of Glasgow, UK

Industry, Commerce and Finance

Wolfgang Jank

University of Maryland, USA

Founding Editor

Vic Barnett

Nottingham Trent University, UK

Statistics in Practice is an important international series of texts which provide

detailed coverage of statistical concepts, methods and worked case studies in specic

elds of investigation and study.

With sound motivation and many worked practical examples, the books show

in down-to-earth terms how to select and use an appropriate range of statistical

techniques in a particular practical eld within each title’s special topic area.

The books provide statistical support for professionals and research workers

across a range of employment elds and research environments. Subject areas cov-

ered include medicine and pharmaceutics; industry, nance and commerce; public

services; the earth and environmental sciences; and so on.

The books also provide support to students studying statistical courses applied to

the above areas. The demand for graduates to be equipped for the work environment

has led to such courses becoming increasingly prevalent at universities and colleges.

It is our aim to present judiciously chosen and well-written workbooks to meet

everyday practical needs. Feedback of views from readers will be most valuable to

monitor the success of this aim.

A complete list of titles in this series appears at the end of the volume.

A Practical Guide to Designing

Phase II Trials in Oncology

Sarah R. Brown

University of Leeds, UK

Walter M. Gregory

University of Leeds, UK

Chris Twelves

St James’s University Hospital, Leeds, UK

Julia Brown

University of Leeds, UK

This edition rst published 2014

© 2014 John Wiley & Sons, Ltd

Registered ofce

John Wiley & Sons Ltd, The Atrium, Southern Gate, Chichester, West Sussex, PO19 8SQ,

United Kingdom

For details of our global editorial ofces, for customer services and for information about how to apply

for permission to reuse the copyright material in this book please see our website at www.wiley.com.

The right of the author to be identied as the author of this work has been asserted in accordance with the

Copyright, Designs and Patents Act 1988.

All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or

transmitted, in any form or by any means, electronic, mechanical, photocopying, recording or otherwise,

except as permitted by the UK Copyright, Designs and Patents Act 1988, without the prior permission of

the publisher.

Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may

not be available in electronic books.

Designations used by companies to distinguish their products are often claimed as trademarks. All brand

names and product names used in this book are trade names, service marks, trademarks or registered

trademarks of their respective owners. The publisher is not associated with any product or vendor

mentioned in this book.

Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best efforts in

preparing this book, they make no representations or warranties with respect to the accuracy or

completeness of the contents of this book and specically disclaim any implied warranties of

merchantability or tness for a particular purpose. It is sold on the understanding that the publisher is not

engaged in rendering professional services and neither the publisher nor the author shall be liable for

damages arising herefrom. If professional advice or other expert assistance is required, the services of a

competent professional should be sought.

Library of Congress Cataloging-in-Publication Data

A practical guide to designing phase II trials in oncology / [edited by] Sarah R. Brown,

Walter M. Gregory, Christopher Twelves, Julia Brown.

p. ; cm.

Includes bibliographical references and index.

ISBN 978-1-118-57090-6 (hardback)

I. Brown, Sarah R., editor of compilation. II. Gregory, Walter M., editor of compilation.

III. Twelves, Chris, editor of compilation. IV. Brown, Julia (Julia M.), editor of compilation.

[DNLM: 1. Clinical Trials, Phase II as Topic. 2. Antineoplastic Agents–therapeutic use.

3. Drug Evaluation–methods. 4. Neoplasms–drug therapy. QV 771.4]

RC271.C5

616.99′4061–dc23

2013041156

A catalogue record for this book is available from the British Library.

ISBN: 978-1-118-57090-6

Set in 10/12pt Times by Aptara Inc., New Delhi, India

1 2014

To Austin, from Sarah, for your continued support and

encouragement.

And to the many patients and their carers who take

part in clinical trials, often at the most difcult of

times, helping in the development of new and better

treatments for people with cancer now and

in the future.

Contents

Contributors xv

Foreword I xvii

Elizabeth A. Eisenhauer

Foreword II xix

Roger A’Hern

Preface xxi

1 Introduction 1

Sarah Brown, Julia Brown, Walter Gregory and Chris Twelves

1.1 The role of phase II trials in cancer 3

1.2 The importance of appropriate phase II trial design 5

1.3 Current use of phase II designs 6

1.4 Identifying appropriate phase II trial designs 7

1.5 Potential trial designs 9

1.6 Using the guidance to design your trial 10

2 Key points for consideration 12

Sarah Brown, Julia Brown, Marc Buyse, Walter Gregory, Mahesh Parmar and

Chris Twelves

2.1 Stage 1 – Trial questions 14

2.1.1 Therapeutic considerations 14

2.1.2 Primary intention of trial 16

2.1.3 Number of experimental treatment arms 17

2.1.4 Primary outcome of interest 18

2.2 Stage 2 – Design components 18

2.2.1 Outcome measure and distribution 18

2.2.2 Randomisation 21

2.2.3 Design category 26

2.3 Stage 3 – Practicalities 33

2.3.1 Practical considerations 33

2.4 Summary 35

viii CONTENTS

3 Designs for single experimental therapies with a single arm 36

Sarah Brown

3.1 One-stage designs 36

3.1.1 Binary outcome measure 36

3.1.2 Continuous outcome measure 38

3.1.3 Multinomial outcome measure 39

3.1.4 Time-to-event outcome measure 40

3.1.5 Ratio of times to progression 40

3.2 Two-stage designs 41

3.2.1 Binary outcome measure 41

3.2.2 Continuous outcome measure 50

3.2.3 Multinomial outcome measure 50

3.2.4 Time-to-event outcome measure 53

3.2.5 Ratio of times to progression 54

3.3 Multi-stage designs 55

3.3.1 Binary outcome measure 55

3.3.2 Continuous outcome measure 59

3.3.3 Multinomial outcome measure 59

3.3.4 Time-to-event outcome measure 60

3.3.5 Ratio of times to progression 60

3.4 Continuous monitoring designs 60

3.4.1 Binary outcome measure 60

3.4.2 Continuous outcome measure 63

3.4.3 Multinomial outcome measure 63

3.4.4 Time-to-event outcome measure 63

3.4.5 Ratio of times to progression 64

3.5 Decision-theoretic designs 64

3.5.1 Binary outcome measure 64

3.5.2 Continuous outcome measure 65

3.5.3 Multinomial outcome measure 65

3.5.4 Time-to-event outcome measure 65

3.5.5 Ratio of times to progression 65

3.6 Three-outcome designs 65

3.6.1 Binary outcome measure 65

3.6.2 Continuous outcome measure 66

3.6.3 Multinomial outcome measure 66

3.6.4 Time-to-event outcome measure 66

3.6.5 Ratio of times to progression 67

3.7 Phase II/III designs 67

4 Designs for single experimental therapies including randomisation 68

Sarah Brown

4.1 One-stage designs 68

4.1.1 Binary outcome measure 68

4.1.2 Continuous outcome measure 70

CONTENTS ix

4.1.3 Multinomial outcome measure 70

4.1.4 Time-to-event outcome measure 70

4.1.5 Ratio of times to progression 72

4.2 Two-stage designs 72

4.2.1 Binary outcome measure 72

4.2.2 Continuous outcome measure 73

4.2.3 Multinomial outcome measure 74

4.2.4 Time-to-event outcome measure 75

4.2.5 Ratio of times to progression 75

4.3 Multi-stage designs 75

4.3.1 Binary outcome measure 75

4.3.2 Continuous outcome measure 75

4.3.3 Multinomial outcome measure 75

4.3.4 Time-to-event outcome measure 76

4.3.5 Ratio of times to progression 76

4.4 Continuous monitoring designs 76

4.4.1 Binary outcome measure 76

4.4.2 Continuous outcome measure 76

4.4.3 Multinomial outcome measure 76

4.4.4 Time-to-event outcome measure 76

4.4.5 Ratio of times to progression 76

4.5 Three-outcome designs 77

4.5.1 Binary outcome measure 77

4.5.2 Continuous outcome measure 77

4.5.3 Multinomial outcome measure 77

4.5.4 Time-to-event outcome measure 77

4.5.5 Ratio of times to progression 77

4.6 Phase II/III designs 77

4.6.1 Binary outcome measure 77

4.6.2 Continuous outcome measure 79

4.6.3 Multinomial outcome measure 80

4.6.4 Time-to-event outcome measure 81

4.6.5 Ratio of times to progression 81

4.7 Randomised discontinuation designs 82

4.7.1 Binary outcome measure 82

4.7.2 Continuous outcome measure 82

4.7.3 Multinomial outcome measure 82

4.7.4 Time-to-event outcome measure 82

4.7.5 Ratio of times to progression 82

5 Treatment selection designs 83

Sarah Brown

5.1 Including a control arm 84

5.1.1 One-stage designs 84

5.1.2 Two-stage designs 84

x CONTENTS

5.1.3 Multi-stage designs 88

5.1.4 Continuous monitoring designs 89

5.1.5 Decision-theoretic designs 89

5.1.6 Three-outcome designs 89

5.1.7 Phase II/III designs – same primary outcome measure at

phase II and phase III 89

5.1.8 Phase II/III designs – different primary outcome measures

at phase II and phase III 99

5.1.9 Randomised discontinuation designs 102

5.2 Not including a control arm 103

5.2.1 One-stage designs 103

5.2.2 Two-stage designs 106

5.2.3 Multi-stage designs 108

5.2.4 Continuous monitoring designs 109

5.2.5 Decision-theoretic designs 110

5.2.6 Three-outcome designs 110

5.2.7 Phase II/III designs – same primary outcome measure at

phase II and phase III 110

5.2.8 Randomised discontinuation designs 111

6 Designs incorporating toxicity as a primary outcome 112

Sarah Brown

6.1 Including a control arm 112

6.1.1 One-stage designs 112

6.1.2 Two-stage designs 114

6.1.3 Multi-stage designs 115

6.2 Not including a control arm 117

6.2.1 One-stage designs 117

6.2.2 Two-stage designs 118

6.2.3 Multi-stage designs 122

6.2.4 Continuous monitoring designs 125

6.3 Toxicity alone 126

6.3.1 One stage 126

6.3.2 Continuous monitoring 127

6.4 Treatment selection based on activity and toxicity 128

6.4.1 Two-stage designs 128

6.4.2 Multi-stage designs 129

6.4.3 Continuous monitoring designs 129

7 Designs evaluating targeted subgroups 131

Sarah Brown

7.1 One-stage designs 131

7.1.1 Binary outcome measure 131

CONTENTS xi

7.2 Two-stage designs 132

7.2.1 Binary outcome measure 132

7.3 Multi-stage designs 135

7.3.1 Binary outcome measure 135

7.3.2 Time-to-event outcome measure 137

7.4 Continuous monitoring designs 138

7.4.1 Binary outcome measure 138

7.4.2 Time-to-event outcome measure 139

8 ‘Chemo-radio-sensitisation’ in head and neck cancer 141

John Chester and Sarah Brown

Stage 1 – Trial questions 141

Therapeutic considerations 141

Primary intention of trial 142

Number of experimental treatment arms 142

Primary outcome of interest 142

Stage 2 – Design components 142

Outcome measure and distribution 142

Randomisation 143

Design category 143

Possible designs 144

Stage 3 – Practicalities 146

Practical considerations for selecting between designs 146

Proposed trial design 148

Summary 150

9 Combination chemotherapy in second-line treatment of non-small

cell lung cancer 151

Ornella Belvedere and Sarah Brown

Stage 1 – Trial questions 152

Therapeutic considerations 152

Primary intention of trial 152

Number of experimental treatment arms 152

Primary outcome of interest 152

Stage 2 – Design components 153

Outcome measure and distribution 153

Randomisation 153

Design category 153

Possible designs 154

Stage 3 – Practicalities 155

Practical considerations for selecting between designs 155

Proposed trial design 158

Summary 162

xii CONTENTS

10 Selection by biomarker in prostate cancer 163

Rick Kaplan and Sarah Brown

Stage 1 – Trial questions 164

Therapeutic considerations 164

Primary intention of trial 164

Number of experimental treatment arms 164

Primary outcome of interest 164

Stage 2 – Design components 165

Outcome measure and distribution 165

Randomisation 165

Design category 166

Possible designs 167

Stage 3 – Practicalities 168

Practical considerations for selecting between designs 168

Proposed trial design 170

Summary 171

11 Dose selection in advanced multiple myeloma 174

Sarah Brown and Steve Schey

Stage 1 – Trial questions 174

Therapeutic considerations 174

Primary intention of trial 175

Number of experimental arms 175

Primary outcome of interest 175

Stage 2 – Design components 176

Outcome measure and distribution 176

Randomisation 176

Design category 177

Possible designs 177

Stage 3 – Practicalities 178

Practical considerations for selecting between designs 178

Proposed trial design 181

Summary 182

12 Targeted therapy for advanced colorectal cancer 185

Matthew Seymour and Sarah Brown

Stage 1 – Trial questions 185

Therapeutic considerations 185

Primary intention of trial 186

Number of experimental treatment arms 186

Primary outcome of interest 186

Stage 2 – Design components 187

Outcome measure and distribution 187

Randomisation 187

CONTENTS xiii

Design category 188

Possible designs 189

Stage 3 – Practicalities 190

Practical considerations for selecting between designs 190

Proposed trial design 191

Summary 194

13 Phase II oncology trials: Perspective from industry 195

Anthony Rossini, Steven Green and William Mietlowski

13.1 Introduction 195

13.2 Commercial challenges, drivers and considerations 196

13.3 Selecting designs by strategy 197

13.3.1 Basic strategies addressed by phase II studies 198

13.3.2 Potential registration 198

13.3.3 Exploratory activity 203

13.3.4 Regimen selection 204

13.3.5 Phase II to support predicting success in phase III 206

13.3.6 Phase II safety trials 208

13.3.7 Prospective identication of target populations 209

13.4 Discussion 210

References 213

Index 227

Contributors

Sarah Brown Clinical Trials Research Unit, Leeds Institute of Clinical Trials

Research, University of Leeds, UK.

This book was collectively written by Sarah Brown with contributions from:

Ornella Belvedere Department of Oncology, York Hospital, York, UK.

Julia Brown Clinical Trials Research Unit, Leeds Institute of Clinical Trials

Research, University of Leeds, UK.

Marc Buyse International Drug Development Institute, Louvain-la-Neuve, Belgium.

John Chester Institute of Cancer and Genetics, School of Medicine, Cardiff Univer-

sity, and Honorary Consultant, Velindre Cancer Centre, Cardiff, UK.

Steven Green Novartis Pharma AG, Basel, Switzerland.

Walter Gregory Clinical Trials Research Unit, Leeds Institute of Clinical Trials

Research, University of Leeds, UK.

Rick Kaplan Medical Research Council Clinical Trials Unit at University College

London, University College London Hospital, and NIHR Cancer Research Network

Coordinating Centre, UK.

William Mietlowski Novartis Pharma AG, Basel, Switzerland.

Mahesh Parmar Medical Research Council Clinical Trials Unit at University

College London, and NIHR Cancer Research Network Coordinating Centre, UK.

Anthony Rossini Novartis Pharma AG, Basel, Switzerland.

Steve Schey Kings College, London, and Lead Myeloma Clinician, Kings College

Hospital, London, UK.

Matthew Seymour Leeds Institute of Cancer and Pathology, University of Leeds,

and NIHR Cancer Research Network, Leeds and National Cancer Research Institute,

London, UK.

Chris Twelves Leeds Institute of Cancer and Pathology, University of Leeds, and St

James’s University Hospital, Leeds, UK.

Foreword I

The past two decades have seen an unprecedented expansion in the knowledge about

the biological, immunological and molecular phenomena that drive malignancy. This

knowledge has subsequently been translated into a large number of potential anti-

cancer therapeutics and potential predictive or prognostic molecular markers that are

under evaluation in clinical trials.

A key component of the oncology clinical trials development process is the

bridge that must be crossed between the end of phase I evaluation of a drug, at

which time information on its recommended dose, schedule, pharmacokinetic and

pharmacodynamics effects in a small group of individuals is available, and the deni-

tive randomised efcacy trial of that drug in the appropriately dened population of

cancer patients.

This ‘bridge’ is provided by the phase II trial. Historically, phase II oncology stud-

ies sought evidence of sufcient drug efcacy (based on objective tumour response

in a specic cancer type) that large conrmatory phase III trials would be justied.

Those not meeting the efcacy bar would not be pursued in further studies in that

tumour type. In today’s highly competitive environment, the phase II study has come

under scrutiny – some have expressed the concern that too many ‘promising’ drugs

emerging from phase II studies yield negative phase III results, that clinical trial end-

points traditionally deployed in phase II may not be specic or sensitive enough for

today’s molecular-based agents to appropriately direct subsequent drug development

decisions, that efciency is lost if discrete phase II and phase III trials are designed

and that much more should be learned about predictive or selection biomarkers before

and during phase II to optimally guide phase III design.

Numerous papers and opinion pieces on these and other phase II–related topics

have been published in the past decade. Thus this new book by Brown and colleagues:

A Practical Guide to Designing Phase II Trials in Oncology is a welcome addition

to the literature. This comprehensive and well-written guide takes a logical and step-

by-step approach by reviewing and making recommendations on the key variables

that must be considered in phase II oncology trials. Some of these include tailoring

design components to the specic trial question, the approach to studies of single-

and combination-agent trials, when and how randomised and adaptive designs might

be deployed, patient selection and phase II trial endpoints. In addition, the book drills

into issues that may be unique to designs in several specic malignancies such as

xviii FOREWORD I

non-small cell lung cancer, prostate cancer and myeloma. Throughout, examples are

utilised as a means of providing context and guiding the reader.

What is clear is that the phase II oncology trial is not a singular or simple

construct. There is no formula for its design that meets all potential needs. These

trials the ‘shape-shifters’ of the cancer trial spectrum – how they are designed, the

endpoints that are utilised, and the population enrolled depends on the agent and its

associated biology, the type of cancer, the question the trial is intended to address

and how those results are intended to guide future decisions. This comprehensive text

provides much-needed practical information in this important area of clinical cancer

research.

Elizabeth A. Eisenhauer, MD, FRCPC

Head, Department of Oncology

Queen’s University

Kingston, ON, Canada

Foreword II

Twenty years ago, in the early 1990s, the term ‘phase II trial design’ was practically

synonymous with the Simon optimal and MINIMAX two-stage trials (1989) – designs

which have stood the test of time with their pragmatic trade-off between the need to

stop a trial early for inefcacy if response rates were low and the likely overshoot of

interim analysis points in small trials. The Gehan design was also widely used but

many statisticians were wary of designs which focussed on estimation but did not

have distinct success/failure rules which allowed error rates to be tightly specied.

The eld of phase II trial design has expanded rapidly since these early days,

particularly in oncology. Phase I trial design has also been extended over the years to

go beyond mere dose nding and frequently includes an expansion phase at the chosen

dose level which provides initial information on efcacy and pharmacodynamic

predictors of response. Ideally this should enhance the relevance of the subsequent

phase II trials.

This book presents a much-needed guide to contemporary phase II clinical trial

design. Over the years trial endpoints have diversied to include the greater use

of endpoints such as progression free survival that cater for treatments that may

not cause tumour shrinkage and are thought to act by halting cancer cell growth

rather than killing the cell (cytostatic rather than cytotoxic). Recognition of the

inaccuracies inherent in designing trials on the basis of the expected response gleaned

from historical data has also seen more focus on the use of randomisation and the

incorporation of a control group. The increasing emphasis on stratied medicine,

recognising the need to tailor treatments more closely to the biological characteristics

of the individual patient’s disease, has also led to phase II trials designed to address

this need.

The recognition of the division between phase IIa trials designed to investigate

efcacy and phase IIb trials, which focus on determining whether a phase III trial

is worth undertaking, has also been welcome. The latter have increased in size and

complexity in an effort to forestall the possibility of a negative phase III trial. It

has been suggested that as many as two out of every three phase III oncology trials

are negative – a situation which is of real concern, given that drug development is

increasing in expense and comparatively few gain regulatory approval. It is reassuring

to note the number of phase II/III designs that have been developed to closely link

the development of phase II and phase III, but in some situations this is not possible.

xx FOREWORD II

The Simon Optimal Design (Simon 1989) is perhaps the seminal phase II single

arm design, and it is salutary to see how frequently this design is used and has acted

as a springboard for the development of other designs. It is frequently possible to add

judiciously placed interim analyses to trials without increasing the number of patients

or having an adverse effect on the error rates – a manoeuvre which is worth bearing

in mind. For example, the two-stage Simon MINIMAX design, which minimises

the number of patients needed to assess a binary endpoint, is frequently the same

size as the one-stage exact design – on occasion, the MINIMAX design is even

marginally smaller than the single-stage design! The MINIMAX design illustrates

the point that an optional futility interim analysis can be built into a planned one-

stage trial of a binary endpoint without increasing the number of patients or adversely

affecting the error rates. Alternatively, note that a one-stage design can frequently

be converted into a two-stage design by including a futility interim analysis at N/2

(here Nis the xed sample single-stage trial size or could be the number of events

for a time-to-event endpoint). The trial would be stopped on the grounds of futility if

the primary endpoint parameter did not exceed the value under the null hypothesis.

This approach is seen in the design mentioned by Whitehead (2009, Section 4.2.1).

A general boundary rule that I have also used is the p≤0.001 rule (Peto–Haybittle)

and related to this are common-sense considerations that should not be overlooked.

For example, if ve or more responses in a 41-patient trial are needed to demonstrate

efcacy, as soon as ve responses have been observed the efcacy threshold for the

trial has been passed, and it is clear a phase III trial will be recommended. If the

toxicity prole is acceptable, the fact the efcacy criteria has been met should be

disseminated so that planning for the follow-on phase III trial can commence.

This book will act as a valuable reference source in addition to giving sound

practical guidance. The authors identify a number of areas that have not been explored;

for example, no references were identied for randomised trials with a multinomial

outcome measure (Section 4.1.3). Statisticians who read this book could perhaps ask

themselves which neglected areas they think deserve the highest priority. As regards

phase IIb designs, I would like to see a three-outcome version of the randomised

Simon (2001) design (Section 4.1.4) based on progression-free survival.

Roger A’Hern

Senior Statistician

Clinical Trials and Statistics Unit

Institute of Cancer Research

Sutton, United Kingdom

Preface

Phase II trials are a key element of the drug development process in cancer, rep-

resenting a transition from initial evaluation in relatively small phase I studies, not

only focused on safety but also increasingly incorporating translational studies, to

denitive assessment of efcacy often in large randomised phase III trials. Efcient

design of these early phase trials is crucial to informed decision-making regarding

the future of a drug’s development. There are a number of textbooks available that

discuss statistical issues in early phase clinical trials. These cover pharmacokinetics

and pharmacodynamics studies, through to late phase II trials, and discuss issues

around sample size calculation and methods of analysis. There are few, however,

which focus specically on phase II trials in cancer, and the many elements involved

in their design. Given the large number and variety of phase II trial designs, often

conceptually innovative, and involving multiple components, the purpose of this book

is to provide practical guidance to researchers on appropriate phase II trial design in

cancer.

This book provides an overview to clinical trial researchers of the steps involved

in designing a phase II trial, from the initial discussions regarding the trial idea itself,

through to identication of an appropriate phase II design. It is written as an aid

to facilitate ongoing interaction between clinicians and statisticians throughout the

design process, enabling informed decision-making and providing insight as to how

information provided by clinicians feeds into the statistical design of a trial. The book

acts both as a comprehensive summary resource of traditional and novel phase II trial

designs and as a step-by-step approach to identifying suitable designs.

We wanted to provide a practical and structured approach to identifying appro-

priate statistical designs for trial-specic design criteria, considering both academic

and industry perspectives. A comprehensive library of available phase II trial designs

is included, and practical examples of how to use the book as a resource to design

phase II trials in cancer are given. We have purposely omitted methodological detail

associated with statistical designs for phase II trials, as well as discussion of analysis,

that can be found elsewhere, including in the references for each of the designs listed

in the library of designs.

The book begins with an introduction to phase II trials in cancer and their role

within the drug development process. A structured thought process addressing the

key elements associated with identifying appropriate phase II trial designs is intro-

duced in Chapter 2, including therapeutic considerations, outcome measures and

xxii PREFACE

randomisation. Each of these elements is discussed in detail, describing the different

stages of the thought process around which the guidance is centred. The purpose of

this detailed information is to allow readers to narrow down the number of designs that

are relevant to their trial-specic design criteria. A comprehensive library of phase II

designs is presented in Chapters 3–7, categorised according to design criteria, and a

brief summary of each trial design available is included.

Chapters 8–12 outline a series of practical examples of designing phase II trials in

cancer, providing practical illustration from trial concept to using the library to select

an appropriate trial design. The examples give a avour of how one might apply the

process described within the book, highlighting that there is no ‘one size ts all’

approach to trial design and that there are often many design solutions available to

any one scenario. We hope the book will help researchers to shortlist their options

in order to select an appropriate design to their specic setting, acknowledging other

options that may be considered.

This book has been written predominantly by academic clinical trialists, involving

both clinicians and statisticians. Many of the issues and considerations described

from an academic point of view are, however, also relevant to trials sponsored by

the pharmaceutical industry. The nal chapter of this book describes the design of

phase II trials in cancer from the industry perspective. The commercial perspective

is described in detail, outlining the design processes for phase II trials according to

specic strategic goals. This highlights both the similarities and differences in the

approach to phase II trial design between academia and industry. In the academic

setting there may be more focus on the phase II trial itself and less on the overall

development programme of the drug, compared to industry where the trial is designed

as part of a programme-oriented clinical development plan.

The book is written for both clinicians and statisticians involved in the design

of phase II trials in cancer. Although some elements are written primarily with

statisticians in mind, the discussion around key concepts of phase II trial design,

as well as the practical examples, is accessible to scientists and clinicians involved

in clinical trial design. For those new to early phase trial design, the book provides

an introduction to the concepts behind informed decision-making in phase II trials,

offering a unique and practical learning tool. For those familiar with phase II trial

design, we hope the reader will benet from exposure to new, less familiar trial

designs, providing alternative options to those which they may have previously used.

The book may also be used by postgraduate students enrolled on statistics courses

including a clinical trial or medical module, providing a useful learning tool with

core information on phase II trial design.

We hope that readers will benet from the step-by-step approach described, as

well as from the library of designs presented, enabling informed decision-making

throughout the design process and focused guidance on designs that t researchers’

pre-specied criteria.

Finally, we would like to thank all our colleagues who have contributed to this

book, for their advice and support.

1

Introduction

Sarah Brown, Julia Brown, Walter Gregory

and Chris Twelves

Traditionally, cancer drug development can be dened by four clinical testing phases

(Figure 1.1):

∙Phase I is the rst clinical test of a new drug after pre-clinical laboratory

studies and is designed to assess the safety, toxicity and pharmacology of

differing doses of a new drug. Typically such studies involve a limited number

of patients and ask the question ‘Is this drug safe?’

∙Phase II studies are designed to answer the question ‘Is this drug active, and is

it worthy of further large-scale study?’ They predominantly address the short-

term activity of a new drug, as well as assessing further safety and toxicity.

Typically sample sizes for phase II studies range from tens to low hundreds of

patients.

∙Phase III trials are often large-scale trials of hundreds, even thousands, of

patients and are usually designed to formally evaluate whether a new drug is

more effective in terms of efcacy or toxicity than current treatments. Here the

focus generally is on long-term efcacy, with the aim of identifying practice-

changing new drugs.

∙Finally, phase IV studies are carried out once a drug is licensed or approved

for a specic indication. Within the pharmaceutical industry setting, phase

IV studies may be designed to collect long-term safety information; in the

academic setting, phase IV trials may investigate the efcacy of a drug outside

of its licensed indication.

A Practical Guide to Designing Phase II Trials in Oncology, First Edition.

Sarah R. Brown, Walter M. Gregory, Chris Twelves and Julia Brown.

© 2014 John Wiley & Sons, Ltd. Published 2014 by John Wiley & Sons, Ltd.

2 A PRACTICAL GUIDE TO DESIGNING PHASE II TRIALS IN ONCOLOGY

• Determine dose and

preliminary toxicity

• Sample size–low tens

Phase I

• Establish intermediate

activity

• Gain further toxicity

information

• Sample size–high

tens to hundreds

Phase II

• Validate efficacy and

obtain further

toxicity information

• Sample size–

hundreds to

thousands

Phase III

• Post-marketing

surveillance

Phase IV

Figure 1.1 Four clinical phases of drug development.

Presented in this way drug development may appear to be a straight line pathway, but

this is often not the case in practice, with much more time and money invested in large

phase III trials than in other stages of development. Likewise, the boundaries between

the different stages of drug development are increasingly blurred. For example, many

phase I trials treat an expanded cohort of patients at the recommended phase II dose

often at least in part to demonstrate proof of principle or seek evidence of activity.

In recent years a wide range of new ‘targeted’ cancer therapies have emerged with

well-dened mechanisms of action directed at specic molecular pathways relevant

to tumour growth and often anticipated to be used in combination with other standard

treatments. This contrasts with cytotoxic chemotherapy from which the traditional

four phases of cancer drug development emerged. Nevertheless, phase II cancer trials

retain their pivotal position between initial clinical testing and costly, time-consuming

denitive efcacy studies.

The process from pre-clinical development to new drug approval typically takes

up to 10 years and is estimated to cost hundreds of millions of dollars, although

there is some uncertainty over the true costs (Collier 2009). Cytotoxic therapies,

which lack a specic target and mechanism of action, often have a low therapeutic

index, and historically have high rates of failure during drug development due to

lack of efcacy and/or toxicity (Walker and Newell 2009). Although attrition rates

for targeted cancer therapies appear lower than those of cytotoxic drugs, more drugs

progress to expensive late stages of development before being abandoned in cancer

than other therapeutic areas (DiMasi and Grabowski 2007). These worrying statistics

have led to increased attention on clinical trial design, aiming to reduce the attrition

rate and improve the efciency of cancer drug development.

This book focuses on the high-risk transition between phase II and III clinical

trials and provides a practical guide for researchers designing phase II clinical trials

in cancer. There is a clear need for phase II trials that more accurately identify

potentially effective therapies that should move rapidly to phase III trials; perhaps

even more pressing is the need for earlier rejection of ineffective therapies before they

enter phase III testing. On this basis we aim to provide researchers with a detailed

background of the key elements associated with designing phase II trials in patients

with cancer, a thought process for identifying appropriate statistical designs and a

library of available phase II trial designs. The book is not intended to be proscriptive

or didactic, but instead aims to facilitate and encourage an interactive approach by

INTRODUCTION 3

the clinical researcher and the statistician, leading to a more informed approach to

designing phase II oncology trials.

1.1 The role of phase II trials in cancer

Phase II trials in cancer are primarily designed to assess the short-term activity of

new treatments and the potential to move these treatments forward for evaluation of

longer-term efcacy in large phase III studies. In this respect, the term ‘activity’ is

used to describe the ability of an investigational treatment to produce an impact on

a short-term or intermediate clinical outcome measure. We distinguish this from the

term ‘efcacy’ which we use to describe the ability of an investigational treatment

to produce a signicant impact on a longer-term clinical outcome measure such as

overall survival in a denitive phase III trial. Cancer phase II trials are therefore

invariably conducted in the metastatic or neo-adjuvant settings, where measurable

short-term assessments of activity are more easily obtained than in the adjuvant

setting. We focus on phase II trials in cancer, where assessments of ‘activity’ are

usually not immediate and cure not achievable. Nevertheless, many of the statistical

designs available for phase II cancer trials, and concepts discussed, may be applied

to other disease areas.

Phase II trials act as a screening tool to assess the potential efcacy of a new

treatment. That broad description incorporates many different types of phase II trials

including assessing not only traditional evidence of tumour response but also proof

of concept of biological activity, selection between potential doses for further devel-

opment, choosing between potential treatments for subsequent phase III testing and

demonstration that the addition of a new agent to an established treatment appears to

increase the activity of that treatment.

In 1982 Fleming stated that ‘Commonly the central objective of phase II clinical

trials is the assessment of the antitumor “therapeutic efcacy” of a specic treatment

regimen’ (Fleming 1982). More recently the objective of a phase II trial in an idealised

pathway has been described to ‘establish clinical activity and to roughly estimate

clinical response rate in patients’ (Machin and Campbell 2005). Others have taken

this a step further to claim ‘The objective of a phase II trial should not just be to

demonstrate that a new therapy is active, but that it is sufciently active to believe that

it is likely to be successful in pivotal trials’ (Stone et al. 2007a). A common feature

of phase II trials is that their aim is not primarily to provide denitive evidence of

treatment efcacy, as in a phase III study; rather, phase II trials aim to show that a

treatment has sufcient activity to warrant further investigation.

The International Conference on Harmonisation (ICH) Guideline E8: General

Considerations for Clinical Trials prefers to consider classication of study objec-

tives rather than specic trial phases, since multiple phases of trials may incorporate

similar objectives (ICH Expert Working Group 1997). The objectives associated with

phase II trials in the ICH guidance are predominantly to explore the use of the treat-

ment for its targeted indication; estimate or conrm dosage for subsequent studies;

and provide a basis for conrmatory study design, endpoints and methodologies.

4 A PRACTICAL GUIDE TO DESIGNING PHASE II TRIALS IN ONCOLOGY

Additionally, however, ICH notes that phase II studies, on some occasions, may

incorporate human pharmacology (assessing tolerance; dening or describing

pharmacokinetics/pharmacodynamics; exploring drug metabolism and interactions;

assessing activity) or therapeutic conrmation (demonstrating/conrming efcacy;

establishing a safety prole; providing an adequate basis for assessing benet/risk

relationship for licensing; establishing a dose/response relationship).

These denitions have in common that oncology phase II trials act as an inter-

mediate step between phase I testing on a limited number of patients to establish the

safety of a new treatment and denitive phase III trials aiming to conrm the efcacy

of a new treatment in a large number of patients. The specic aims of a phase II trial

may, however, differ depending on the mechanism of action of the drug in question,

the amount of information currently available on the drug and the setting in which it

is being investigated (e.g. pharmaceutical industry vs. academia). Phase II trials can

be broadly grouped into phase IIa and phase IIb trials. A phase IIa trial may be seen

as seeking proof of concept in the sense of assessing activity of an investigational

drug that has completed phase I development or may investigate multiple doses of a

drug to determine the dose–response relationship. Phase IIa trials may be considered

learning trials and be followed by a decision-making ‘go/no-go’ phase IIb trial to

determine whether or not to proceed to phase III; phase IIb trials may include selec-

tion of a single treatment or dose from many and may include randomisation to a

control arm.

Dose–response can be evaluated throughout the early stages of drug development,

including phase II, but this book does not specically address studies where this is

the primary aim. Many designs are available to assess the dose–response relationship,

perhaps the simplest and most common being the randomised parallel dose–response

design incorporating a control arm and at least two differing dose levels. Cytotoxics

are usually given at the highest feasible dose, but investigating dose–response rela-

tionships may be important with targeted agents that are not necessarily best given

at the maximum possible dose. Such trials serve a number of objectives including

the conrmation of efcacy; the estimation of an appropriate dose; the identication

of optimal strategies for individual dose adjustments; the investigation of the shape

and location of the dose–response curve; and the determination of a maximal dose

beyond which additional benet would be unlikely to occur.

Considerations around choice of starting dose, study design and regulatory issues

in obtaining dose–response information are provided in the ICH Guideline E4: Dose

Response Information to Support Drug Registration (ICH Expert Working Group

1994). Such considerations are, however, outwith the remit of this book, which

focuses on phase II trials designed to assess activity of single-agent or combination

therapies or those designed to select the most active of multiple therapies. We do,

however, discuss phase II selection designs to identify the most active dose from a

number of pre-specied doses rather than specic issues around evaluating dose–

response relationships.

There are often signicant differences between trials conducted within the phar-

maceutical industry and those conducted within academia. Such differences are

predominantly associated with the approach to designing phase II trials, within

INTRODUCTION 5

a portfolio of research, and decision-making around the future development of a

compound or drug. Consequently, the way in which clinical trials are designed,

particularly in the early phase setting, will likely differ between the two environ-

ments. For example, in the academic setting, regardless of the specic aim of the

phase II trial (e.g. proof of concept, go/no-go), decision criteria are pre-specied

to correspond with the primary aim of the trial and form the criteria on which

decision-making and conclusions of the trial are based. Within the pharmaceutical

industry the same pre-dened study aims and objectives apply; however, decision-

making may be complicated by additional factors external to the phase II trial itself,

such as the presence of competitor compounds, patent life or company strategy.

There is inherent pressure within the pharmaceutical industry to achieve timely

regulatory approval and a license indication for a new drug. This does not apply

in the same way within the academic setting where, by the time a drug reaches

phase II testing, it may have been through considerable testing within the phar-

maceutical setting and perhaps be already licensed in alternative disease areas or

in differing combinations or schedules. There are, however, initiatives to facilitate

increased academic/pharmaceutical collaboration in the early stages of drug devel-

opment. Thus, more academic phase II trials may be conducted using novel agents

with only limited clinical data available, so thorough discussion of the aims and

design of these trials becomes even more pertinent. A detailed insight into the indus-

try approach to the design of phase II trials within a developing drug portfolio is

provided in Chapter 13. By contrast, the remainder of this book, including termi-

nology and practical examples of designing phase II trials, draws its focus from the

academic setting.

1.2 The importance of appropriate phase II

trial design

Design of phase II trials is a key aspect of the drug development process. Poor

design may lead to increased probabilities of a false-positive phase II trial resulting

in unnecessary investment in an unsuccessful phase III trial; or a false-negative phase

II resulting in the rejection of a potentially effective treatment. There is a pressing

need for phase II trials to more accurately identify those cancer therapies that will

ultimately be successful in phase III studies and to allow earlier rejection of ineffective

therapies before undertaking costly and time-consuming phase III trials.

As the development of new cancer drugs moves further away from conventional

cytotoxics and more into targeted therapies, the challenges and opportunities in

phase II trial design are ever greater. The choice of phase II design includes not only

statistical considerations, but also decisions regarding the aims of the trial, whether

or not to include randomisation, the choice of endpoints and the size of treatment

effects to be targeted. Each of these elements is critical to ensure the phase II trial

is designed and conducted efciently and that the results of the trial may be used to

make robust, informed decisions regarding future research.

6 A PRACTICAL GUIDE TO DESIGNING PHASE II TRIALS IN ONCOLOGY

Some researchers have suggested moving directly from phase I to phase III in the

drug development process, on the basis that survival benet in phase III trials may

be observed in the absence of improved response rates therefore rendering phase II

irrelevant (Booth et al. 2008). The potential perils of this approach are demonstrated

by the INTACT1 and INTACT2 trials of getinib in chemotherapy-na¨

ıve advanced

non-small cell lung cancer (NSCLC) patients (Giaccone et al. 2004; Herbst et al.

2004). Phase I trials of getinib in combination with chemotherapy had shown

acceptable tolerability and getinib as monotherapy was active in phase II NSCLC

trials; however, phase II trials of getinib in combination with chemotherapy were

not performed. Subsequently, these two phase III NSCLC trials in over 2000 patients

failed to show improved efcacy with the addition of getinib to cisplatin-based

chemotherapy (Giaccone et al. 2004; Herbst et al. 2004). A conventional, single-arm

NSCLC trial of getinib in combination with chemotherapy may have avoided the

subsequent negative phase III trials. This experience highlights the importance of

designing and conducting appropriately designed and potentially novel phase II trials

prior to embarking on large-scale phase III trials.

1.3 Current use of phase II designs

Several systematic reviews have considered current use of designs in published phase

II trials in cancer (Lee and Feng 2005; Mariani and Marubini 2000; Perrone et al.

2003). Common approaches to trial design included single-arm studies with objective

response as the primary efcacy endpoint, utilising Simon’s two-staged hypothesis

testing methods (Simon 1989), and randomised trials based on single-arm designs

embedded in a randomised setting (Lee and Feng 2005). All highlighted a distinct

lack of detail regarding an identiable statistical design, and design characteristics,

as a marked weakness of many published phase II studies, raising the possibility

that low quality may bias study ndings. Also striking is the consistent use of a

limited number of the same phase II study designs, emphasising the need for better

understanding of alternative statistical designs. A key recommendation from these

reviews is better communication between statisticians and clinical trialists to increase

the use of newer statistical designs. Likewise, the need for ‘the development of

practical designs with good statistical properties and easily accessible computing

tools with friendly user interface’ (Lee and Feng 2005) is recognised as essential so

statisticians can implement these new designs.

In 2009 the Journal of Clinical Oncology (JCO) published an editorial making

recommendations for the types of phase II trials that they would consider for publi-

cation (Cannistra 2009). The differing aims of phase II trials according to the nature

of the treatment under investigation were identied, with discussion as to the likely

priority given to each trial design. The specic categories and outcomes of phase II

trials were

∙single-arm phase II studies that represent the rst evidence of activity of a new

drug class;

INTRODUCTION 7

∙phase II studies of novel agents that not only conrm a class effect, but also

provide evidence of extraordinary and unanticipated activity compared to prior

agents in the same class;

∙phase II studies of an agent or regimen with prior promise (based on previous

reports of clinical activity), but that are convincingly negative when studied

more rigorously;

∙phase II studies of a single-agent or combination that convincingly demonstrate

a new, serious and unanticipated toxicity signal, despite being a rational and

potentially active regimen;

∙phase II studies with biomarker correlates that validate mechanism of action,

provide convincing insight into novel predictive markers or permit enrichment

of patients most likely to benet from a novel agent;

∙randomised phase II studies such as randomised selection, randomised com-

parison and randomised discontinuation designs.

The consistent use of single-arm, two-stage, response-driven designs as depicted

in the systematic reviews described previously would not optimally cover the majority

of these trial scenarios. The categories listed above were intended to provide authors

with guidance as to the types of phase II trials most relevant to informing the design of

subsequent phase III trials. Such recommendations highlight the need for awareness of

the many components contributing to the design of phase II trials and the importance

of making informed decisions to achieve the objectives of a trial and ensure the results

are robust and interpretable.

1.4 Identifying appropriate phase II trial designs

This book aims to provide guidance to both the clinical researcher and statistician

on each of the key elements of phase II trial design, enabling an understanding of

how they inform the overall design process. Recommendations published by the

Clinical Trial Design Task Force of the National Cancer Institute Investigational

Drug Steering Committee (Seymour et al. 2010) and by the Methodology for the

Development of Innovative Cancer Therapies (MDICT) Task Force (Booth et al.

2008) provide guidance on current best practice for individual aspects of early clinical

trial design. General discussion of choice of endpoints and use of randomisation

is given for the differing settings of monotherapy and combination therapy trials

(Seymour et al. 2010), as well as in the specic context of targeted therapies (Booth

et al. 2008), and discussion on reporting of phase II trials is also provided. Neither set

of recommendations, however, provides detailed guidance on the statistical design

categories available for phase II trials. Here we aim to guide researchers in a step-

by-step manner through the thought process associated with each element of phase

II design, from initial trial concept to the identication of an appropriate statistical

design. With detailed discussion on each of the elements we aim to provide researchers

8 A PRACTICAL GUIDE TO DESIGNING PHASE II TRIALS IN ONCOLOGY

with a thorough understanding of the overall process and each of the stages involved,

therefore providing a more informed approach.

Central to this approach is an overall thought process, presented in detail in Chap-

ter 2 and outlined briey below. The approach consists of three stages, highlighting

eight key elements associated with identifying an appropriate phase II trial design:

∙Stage 1 – Trial questions:

◦Therapeutic considerations

◦Primary intention of trial

◦Number of experimental treatment arms

◦Primary outcome of interest

∙Stage 2 – Design components:

◦Outcome measure and distribution

◦Randomisation

◦Design category

∙Stage 3 – Practicalities:

◦Practical considerations

Each of these elements is discussed in detail in Chapter 2, and practical examples

of using this approach to design phase II cancer trials are provided in Chapters 8–12.

These elements were identied as being essential to the design of phase II trials in

cancer through a comprehensive literature review of available statistical methodology

for phase II trials (Brown et al. 2011). The thought process itself is iterative, such that

information obtained during discussion of each element may feed into and inform

later elements of the design. The starting point of any trial design should, however, be a

discussion between the clinical researcher and the statistician that primarily concerns

clinical factors relating to the specic treatment(s) under investigation (Stage 1).

Continued interaction between the clinician and the statistician is essential throughout

the design process.

Using the detail provided in Chapter 2, each of the elements is addressed in

turn and iteratively. Decisions made throughout the process enable the statistician to

narrow down the specic statistical designs appropriate to the pre-specied criteria.

These statistical designs are provided in Chapters 3–7, a library resource of statistical

designs, as introduced here. Each design is categorised to enable efcient navigation

and identication of appropriate designs. Designs are laid out taking into account

∙The use of randomisation including

◦Single-arm designs, arranged by design category and outcome measure –

Chapter 3

INTRODUCTION 9

◦Randomised designs, arranged by design category and outcome measure –

Chapter 4

◦Treatment selection designs, arranged by inclusion of a control arm, design

category and outcome measure – Chapter 5

∙The focus on both activity and toxicity, or toxicity alone, as the primary outcome

of interest – Chapter 6

∙The evaluation of treatment activity in targeted subgroups – Chapter 7

Within each of Chapters 3–5, where there is no identied literature for spe-

cic design category and outcome measure combinations, this is highlighted within

the relevant subsection. For example, there were no references identied discussing

single-arm trial designs specically focused on continuous outcome measures, there-

fore this subsection is included to highlight this to the reader. For Chapters 6 and 7,

only those specic design category and outcome measure combinations for which

references have been identied are listed, since generally there are fewer designs

focused on activity and toxicity and targeted subgroups.

In the majority of cases there will be more than one statistical design that suits the

pre-specied trial parameters determined via the thought process. In such cases, the

nal stage in the thought process, that of practical considerations, may allow a choice

to be made between the alternatives. On the other hand, that choice may be based

on previous experience or assessment of various trial scenarios by mathematical

modelling or simulation. Further detail on choosing between multiple designs is

provided in Chapter 2.

1.5 Potential trial designs

The statistical designs summarised in Chapters 3–7 were identied from a compre-

hensive literature review of phase II statistical design methodology conducted in

January 2008 and updated in January 2010 (Brown et al. 2011). Individual designs

were specically assessed to determine their ease of implementation. Designs were

dened as not easy to implement if

∙the data required to enable implementation were not likely to be available;

∙there was no sample size justication rendering the design difcult or impos-

sible to interpret;

∙criteria were not specied for the study being positive or negative as this makes

the trial of little if any use in taking a new treatment forward;

∙each patient needed to be assessed prior to the next patient being recruited, as

this will usually be prohibitively restrictive in a phase II cancer trial; and

∙the necessary statistical softwares were not detailed as being available and/or

insufcient detail was provided to enable implementation.

10 A PRACTICAL GUIDE TO DESIGNING PHASE II TRIALS IN ONCOLOGY

While this assessment of ease of implementation is inherently subjective, these

criteria reect the practicalities of design implementation.

Applying the above criteria, those designs classed as being easy to implement

are included in Chapters 3–7. This amounts to over 100 statistical designs, ranging

from Gehan’s original two-stage design published in 1961 (Gehan 1961) to com-

plex multi-arm, multi-stage designs of more recent years. Mariani and Marubini

highlighted researchers’ preferences for single-arm, two-stage designs (Mariani and

Marubini 2000); there are, however, a wealth of alternative designs available, ranging

from adaptations of Simon’s original two-stage design to incorporate adjustments for

over-/under-recruitment, to randomised trials with formal hypothesis testing between

experimental and control arms. The intention of this book is to present researchers

with the designs available to them for their specic trial, rather than to recommend

one design over another. In doing so we incorporate the well-established designs of

Gehan (1961), Fleming (1982) and Simon (1989), as well as bringing lesser known

designs to the attention of researchers, allowing the user to make informed choices

regarding trial design. A brief overview of each design identied is presented; how-

ever, the technical detail of each design is omitted and may be further evaluated by

considering the complete references, as appropriate.

With the continued development of targeted therapies in cancer, and a drive

towards personalised medicine, the role of biomarkers within phase II trials is an

important area for discussion. Where known biomarkers are available to identify

selected patient populations most likely to benet from an intervention, phase II trials

may be designed as enrichment trials, whereby only biomarker-positive patients are

included. In these cases, any of the statistical designs listed within Chapters 3–6

may be appropriate, focusing solely on the target population. Alternatively, when

selected populations are perhaps less well validated, biomarker-stratied designs

may be considered. Here both biomarker-positive and biomarker-negative subgroups

are explored within a trial, ensuring adequate numbers of patients within each cohort

to potentially detect differing treatment effect sizes. Such designs are listed within

Chapter 7. A more detailed discussion of the incorporation of biomarkers within phase

II trials in cancer is provided in Chapter 2. There have, however, been a number of

recently published articles in this area that may not be included in the library of

available statistical designs since they post-date the updated systematic review on

which the library is based. Where the incorporation of biomarkers is of particular

relevance to a trial design, the researcher may use the thought process described

within this book and should consider not only any appropriate designs identied in

Chapters 3–7, but also additional, more recent, designs specically intended for trials

incorporating biomarkers.

1.6 Using the guidance to design your trial

We present a thought process for the design of phase II trials in cancer, introduced

briey in Section 1.4, addressing the key elements associated with identifying an

appropriate trial design; each of these elements is discussed in detail in Chapter 2.

INTRODUCTION 11

The information in Chapter 2 will allow researchers to narrow down the number of

appropriate designs for their trial and then navigate to the relevant designs in Chapters

3–7, where a brief summary of each trial design is provided. The statistical theory

underpinning the designs detailed is published elsewhere (Mariani and Marubini

1996; Machin et al. 2008; Machin and Campbell 2005), as well as in the individual

papers referenced.

This process is illustrated in Chapters 8–12 by a series of practical, real-life exam-

ples of designing phase II trials in cancer following the thought process and library

of statistical designs. The examples are intended merely as pragmatic illustrations of

how one might apply the process described within the book; they should not be taken

as sole solutions to trial design under the particular settings presented. It is acknowl-

edged that there may be a number of appropriate designs available, and exploration

of various possibilities is encouraged. Examples are presented in the setting of head

and neck cancer, lung cancer, prostate cancer, myeloma and colorectal cancer. Each

example gives differing trial design scenarios highlighting various common issues

encountered when designing phase II trials in cancer. These examples demonstrate

the types of discussions expected between statisticians and clinicians in order to

extract the necessary information to design a phase II trial. They also provide practi-

cal advice regarding how choice of design may be made when several designs t the

trial-specic requirements.

2

Key points for consideration

Sarah Brown, Julia Brown, Marc Buyse, Walter

Gregory, Mahesh Parmar and Chris Twelves

Designing a phase II trial requires ongoing discussion between the clinician, statis-

tician and other members of the trial team, so the design can evolve on the basis of

information specic to each trial. Central to the approach of identifying an optimal

phase II trial design is the thought process introduced in Chapter 1, and presented

diagrammatically in Figure 2.1. The process provides an overview of the key stages

and elements for consideration during the phase II trial design process. Each of these

elements should be worked through in turn in an iterative manner as information

derived at earlier stages feeds in to design choices and decisions in the latter stages

and consideration of alternative designs.

The thought process is made up of three stages:

∙Stage 1 – Trial questions. This stage elicits information predominantly relating

to the trial itself in relation to the treatment under investigation, the primary

intention of the trial, number of arms and primary outcome of interest.

∙Stage 2 – Design components. The information from the rst stage feeds

into the discussions relating to design components considering the outcome

measure, randomisation (or not) and category of design, enabling attention to

be focused on the specic statistical designs relevant to the trial.

∙Stage 3 – Practicalities. Finally, practical considerations may inform which,

from a number of candidate trial designs, is the one best suited to a particular

situation.

A Practical Guide to Designing Phase II Trials in Oncology, First Edition.

Sarah R. Brown, Walter M. Gregory, Chris Twelves and Julia Brown.

© 2014 John Wiley & Sons, Ltd. Published 2014 by John Wiley & Sons, Ltd.

Targeted

subgroups

Stage1 – trial

questions

One-stage

Two-stage

Multi-stage

Continuous

monitoring

Decision-theoretic

Three-outcome

PhaseII/III

Randomised

discontinuation

Designcategory

Outcomemeasure

anddistribution

Binary(e.g.

response/no

response)

Multinomial

(e.g.CRvs.

PRvs.

SD/PD)

Continuous

(e.g.

biomarker)

Time-to-

event

Ratioof

timesto

progression

Primary

outcomeof

interest

Activity

Activityand

toxicityor

Toxicity

Primary

intentionof

trial

Proofof

concept

Go/no-go

decisionfor

phaseIII

Randomisation

Single arm(no

randomisation)

Randomisation

toexperimental

arms(selection)

Randomisation

incl.control,

withnoformal

comparison

(referencearm

only)

Randomisation

incl.control,

withformal

comparison

Practical

considerations

Availability/

robustnessof

priordata

Early

termination

forlackof

activity

Programming

requirements

Early

termination

forevidence

ofactivity

Numberof

experimental

treatmentarms

One

Morethan

one

Therapeutic

considerations

Mechanism

ofaction

Singleor

combination

therapy

Biomarker

dependent

(enrichment

orendpoint)

Aimof

treatment

Stage2 – design

components

Stage3 –

practicalities

Operating

characteristics

Figure 2.1 Thought process for identifying phase II trial designs.

14 A PRACTICAL GUIDE TO DESIGNING PHASE II TRIALS IN ONCOLOGY

This chapter works through each of the stages and components of Figure 2.1.

2.1 Stage 1 – Trial questions

2.1.1 Therapeutic considerations

The choice of trial design depends not only on statistical considerations, but more

importantly on the clinical factors relating to the treatment(s) and/or disease under

investigation. Discussion of these therapeutic considerations is essential to inform

decisions to be made later in the thought process. At the rst meeting between the

clinician and statistician, discussion of the following points will provide an overview

of the setting of the trial and the specic therapeutic issues to be incorporated into

the trial design.

2.1.1.1 Mechanism of action

An important question to ask when beginning the trial design process is ‘how does

this treatment work?’ The term ‘cytotoxic’ may be used to describe chemothera-

peutic agents, where tumour shrinkage or response is widely accepted as reecting

anti-cancer activity. Many new cancer therapies are, however, targeted at specic

molecular pathways relevant to tumour growth, apoptosis (programmed cell death)

or angiogenesis (new blood vessel formation). Such ‘targeted therapies’, including

tyrosine kinase inhibitors, monoclonal antibody therapies and immunotherapeutic

agents, may be ‘cytostatic’. Here, a change in tumour volume may not be the expected

outcome: in such cases, tumour stabilisation or delay in tumour progression may be

a more anticipated outcome.

The mechanism of action of the agent under investigation will inform many

subsequent decisions, including the choice of outcome measure and whether or not

the trial should be randomised.

2.1.1.2 Aim of treatment

The aim of the treatment under investigation should be considered both in the context

of its mechanism of action and the specic population of patients in which the

treatment is being considered.

It is important to consider the ultimate aim of treatment, which would inform the

outcome measures in future phase III studies, and how this relates to shorter term

aims that can be incorporated into phase II trials. For example, in a population of

patients with a relatively long median progression-free survival (PFS) and overall

survival (OS), the aim of a phase III trial may be to prolong further PFS and/or

OS. These would, however, be unrealistic short-term outcomes for a phase II trial;

tumour response, which may reect PFS or OS, can be an appropriate shrinkage aim

in a phase II trial. By contrast, where the prognosis is less good PFS may provide a

realistic short-term outcome in phase II.

KEY POINTS FOR CONSIDERATION 15

It is essential to consider how the longer term and shorter term aims of treatment

are related, to ensure an appropriate intermediate outcome measure is chosen in phase

II that provides a robust assessment of potential efcacy in subsequent phase III trials.

2.1.1.3 Single or combination therapy

It is important to ascertain whether the treatment under investigation will be given

as a single agent or in combination with another novel or established intervention.

This distinction can inform the decision as to whether or not randomisation should

be incorporated. Where an investigational agent, be it a conventional cytotoxic or a

targeted agent, is used in combination with another active treatment it can be very

difcult to distinguish the effect of the investigational agent from that of the standard

partner therapy; this distinction can be made easier by incorporating randomisation

(see Section 2.2.2 for further discussion).

Similarly, the assessment of toxicity for combination treatments should also be

addressed. Where the addition of an investigational therapy is expected to increase

both activity and toxicity to a potentially signicant degree, dual primary endpoints

may be considered to assess the ‘trade-off’ between greater activity and increased

toxicity (see Section 2.1.4 for further discussion).

2.1.1.4 Biomarker dependent

Biomarkers are an increasingly important part of clinical trials. They can be dened

as ‘a characteristic that is objectively measured and evaluated as an indicator of

normal biological processes, pathogenic processes, or pharmacologic responses to a

therapeutic intervention’ (Atkinson et al. 2001).

Biomarkers may be considered in the design of phase II trials in two ways.

First, a biomarker may serve as an outcome measure. The biomarker may be an

intermediate (primary) endpoint in a phase II trial provided it reects the activity of a

treatment and is associated with efcacy; this may form the basis for a stop/go decision

regarding a subsequent phase III trial. Decisions regarding the use of biomarkers as

primary outcome measures will feed into the decision regarding use of randomisation,

considering whether any historical data exist for the biomarker with the standard

treatment and the reliability of such data. Where a change in a biomarker reects

the biological activity of an agent, but is not predictive of the natural history of the

disease, this alone may be an appropriate endpoint for a proof of concept phase II trial;

in such cases a second, go/no-go phase IIb trial may be required to assess the impact

of the treatment on the cancer prior to a decision on proceeding to a phase III trial.

The use of biomarkers as outcome measures is discussed further in Section 2.2.1.

Second, in the era of targeted therapies a molecular characteristic of the tumour

that is relevant to the mechanism of action of the treatment under investigation may

serve as a biomarker to dene a specic subgroup of patients in whom an intervention

is anticipated to be effective. This has been done especially successfully in studies

of small molecules and monoclonal antibodies targeting HER-2 and related cell

surface receptors (Piccart-Gebhart et al. 2005; Slamon et al. 2001). The potential

16 A PRACTICAL GUIDE TO DESIGNING PHASE II TRIALS IN ONCOLOGY

for a biomarker to identify a subpopulation of patients may, however, only become

apparent after phase III investigation, as in the case of the monoclonal antibody

cetuximab in colon cancer where efcacy is limited to patients with no mutation in

the KRAS oncogene (Bokemeyer et al. 2009; Tol et al. 2009; Van Cutsem et al. 2009).

Where available, using a biomarker to enrich the population in a phase II trial in

this way can increase the likelihood of anti-tumour activity being identied, and thus

speed up drug development. By denition, when using a biomarker for population

enrichment, the resulting phase II population is not representative of the general

population. Interpreting outcomes in the enriched population may, therefore, be more

challenging as historical control data may be unreliable; randomisation incorporating

a control arm should be considered in such situations.

There are, however, potential risks with an over-reliance on biomarkers in phase

II trials. If the mode of action of a novel therapy has been incorrectly characterised,

the biomarker chosen for enrichment may be inappropriate and could lead to a

false-negative phase II trial because the wrong patient population has been treated.

Likewise, if a biomarker used to demonstrate proof of principle of biological activity

does not accurately reect the clinically relevant mode of action, the outcome of a

phase II trial may be misleading. When a biomarker is the primary endpoint for a trial

or used to enrich the patient population of patients it is vital that the biomarker be

adequately validated. Where there is insufcient evidence that a biomarker reliably

reects biological activity or identies an optimal patient group, measurement of

the biomarker in an unselected phase II trial population may be appropriate as a

hypothesis-generating exercise for future studies.

Approaches to trial design that incorporates biomarker stratication are discussed

further in Section 2.2.3.

2.1.2 Primary intention of trial

In this context, we dene the ‘intention’ of a trial not as the specic research question

but in the wider sense of classifying trials into two categories:

∙proof of concept, be that biological or therapeutic, or phase IIa;

∙go/no-go decision for further evaluation in a phase III trial, or phase IIb.

A proof of concept, or phase IIa, trial may be undertaken after completing a phase

I trial to screen the investigational treatment for initial evidence of activity. This may

then be followed by a go/no-go phase IIb trial to determine whether a phase III trial

is justied. Running two sequential phase II trials may, in some cases, be inefcient.

The Clinical Trial Design Task Force of the National Cancer Institute Investigational

Drug Steering Committee proposed that, where appropriate, proof of concept may be

embedded in a single go/no-go trial (Seymour et al. 2010).

A model that is increasingly relevant to the development of targeted anti-

cancer agents is to incorporate proof of concept translational imaging and/or

molecular/biomarker studies within the expanded cohort of patients treated at the

recommended phase II dose in a phase I trial. Where clear proof of concept can

KEY POINTS FOR CONSIDERATION 17

be demonstrated in this way, there is a blurring of the conventional divide between

phase I and IIa studies but the need remains for a subsequent phase IIb trial with the

intention of making a formal decision regarding further evaluation in a phase III trial.

While this specic point for consideration is not used to group the trial designs

given in Chapters 3–7, it is important in considering issues such as primary outcome

measures and the use of randomisation. Where a trial is designed as a proof of

concept study alone, it may be appropriate to conduct a single-arm trial to obtain

an estimate of the potential activity of a treatment to within an acceptable degree

of accuracy. Short-term clinical or biomarker outcomes may be appropriate to give

a preliminary assessment of activity prior to embarking on a larger scale phase IIb

study. Where the aim of the phase II trial is to determine whether or not to continue

evaluation of a treatment within a large-scale phase III trial, the ability to make formal

comparisons between experimental and standard treatments may be more appropriate,

to be more condent of that decision to proceed or not. Similarly, in phase IIb

trials outcome measures known to be strongly associated with the primary phase III

outcome measure are desirable for robust decision-making. Further discussion on the

choice of outcome measures and the use of randomisation is given in Sections 2.2.1

and 2.2.2, respectively.

2.1.3 Number of experimental treatment arms

Whereas historically phase II cancer trials invariably had a single-arm, an increasing

number now comprise multiple arms, one of which is often a ‘control’ standard

treatment arm. The most common randomised phase II cancer trial designs have a

single experimental arm with a control arm so the activity seen in the experimental

arm can be compared formally or informally with that seen in the control arm.

Randomisation may be appropriate where historical data on the outcome measure are

unreliable or when a novel agent is being added to an effective standard therapy (see

Section 2.2.2 for discussion).

Where multiple experimental treatments are available, or a single treatment that

may be effective using different doses or schedules, a phase II trial may be designed to

select which, if any, of these options should be taken forward for phase III evaluation.

Randomisation can also be used to evaluate multiple treatment strategies such as

the sequence of rst- and second-line treatments. In these settings assessment of

activity of each individual novel treatment, based on pre-specied minimal levels of

activity, can be assessed using treatment selection designs which are described in

Chapter 5.

Where multiple treatments are being investigated in a single phase II trial,

with each single treatment in a different subgroup of patients (e.g. treatment A

in biomarker-X-positive patients, and treatment B in biomarker-X-negative patients),

this should not be considered as a treatment selection trial since only one experi-

mental treatment is being investigated within each subgroup. For the purposes of

trial design, such trials fall under the ‘single experimental arm’ category. Further

discussion regarding trials of subgroups of patients is provided in Section 2.2.3.

18 A PRACTICAL GUIDE TO DESIGNING PHASE II TRIALS IN ONCOLOGY

2.1.4 Primary outcome of interest

The primary outcome of interest will depend on the existing evidence base and/or

stage of development of the treatment under investigation, its mechanism of action

and potential toxicity. Thus, information obtained from discussion of the therapeutic

considerations of the treatment is important in deciding the primary focus of the trial,

as well as incorporating data from previous studies of the same, or similar, treatments.

At this stage, for the purpose of categorising trial designs, the primary outcome of

interest is categorised as being either activity alone, or both activity and toxicity.

Designs are also available that address a third option, of considering toxicity alone as

the primary outcome measure in a phase II trial. These designs are incorporated with

those assessing both activity and toxicity and are described in Chapter 6. Discussion

regarding the specic primary clinical outcome measure is given in Section 2.2.1.

2.1.4.1 Activity

Where the toxicity of the investigational treatment is believed to be modest in the

context of phase II decision-making or the toxicity of agents in the same class is

well known, the primary phase II trial outcome measure will usually be anti-tumour

activity, with toxicity included amongst the secondary outcome measures.

2.1.4.2 Activity and toxicity (or toxicity alone)

If the toxicity prole of the investigational treatment, be it a single-agent or com-

bination therapy, is of particular concern, the activity and toxicity of the treatment

may be considered as joint primary outcome measures, such that the investigational

treatment must show both promising activity and an acceptable level of toxicity to

warrant further evaluation. Such designs allow incorporation of trade-offs between

pre-specied levels of increased activity and increased toxicity, to determine the

acceptability of a new treatment with respect to further evaluation in a phase III trial.

2.2 Stage 2 – Design components

2.2.1 Outcome measure and distribution

Emerging cancer treatments have many differing modes of action, which should be

reected in the choice of outcome measures used to assess their activity. While tumour

response according to Response Evaluation Criteria in Solid Tumours (RECIST)

(Eisenhauer et al. 2009) has historically been the most widely used primary outcome

measure, non-binary denitions or volumetric measures of response, measures of time

to an event such as disease progression or continuous markers such as biomarkers

may be more relevant when evaluating the activity of targeted or cytostatic agents

(Adjei et al. 2009; Booth et al. 2008; Dhani et al. 2009; Karrison et al. 2007; McShane

et al. 2009).

When choosing between the many possible primary outcome measures for a

phase II trial the key points to consider include the expected mechanism of action of

KEY POINTS FOR CONSIDERATION 19

the intervention under evaluation, the aim of treatment in the current population of

patients, whether there are any biomarker outcome measures available, the stage of

assessment in the drug development pathway (i.e. phase IIa or IIb) and the strength

of the association between the proposed phase II outcome measure and the primary

outcome measure that would be used in future phase III trials. The chosen outcome

measure should also be robust with respect to external factors such as investigator

bias and patient and/or data availability.

The primary outcome measure of a phase II trial should be chosen on the basis

that if a treatment effect is observed, this provides sufcient evidence that a treatment

effect on the phase III primary outcome is likely to be seen. The use of surro-

gate endpoints has been investigated in a number of disease areas, including breast

(Burzykowski et al. 2008), colorectal (Piedbois and Buyse 2008) and head and neck

cancer (Michiels et al. 2009). While the outcome measures used in phase II trials do

not need to full formal surrogacy criteria (Buyse et al. 2000) evidence of correlation

between the phase II and III outcome measures is important to ensure reliability in

decision-making at the end of a phase II trial.

The choice of primary outcome measures for a phase II trial reects the outcome

distribution. This section outlines the various options used to categorise phase II trial

designs within Chapters 3–7, according to the distribution of the chosen primary

outcome measure (as described in Chapter 1).

2.2.1.1 Binary

Response is usually evaluated via a continuous outcome measure, that is, the percent-

age change in tumour size. This is, however, typically dichotomised as ‘response’

versus ‘no response’ following RECIST criteria (Eisenhauer et al. 2009). Such binary

outcomes, categorised as ‘yes’ or ‘no’, may be used for any measure that can be

reduced to a dichotomous outcome including toxicity or change in a biomarker. Other

outcome measures that may be expressed as continuous, such as time to disease pro-

gression, are frequently dichotomised to reect an event rate, such as progression at

a xed time point.

In phase II studies of cytotoxic chemotherapy the biological rationale for response

as an indicator of anti-cancer activity is based in part on the natural history of

untreated cancers which grow, spread and ultimately cause death. Administration

of each cycle or dose of treatment kills a substantial proportion of tumour cells

(Norton and Simon 1977) and as such is linked to delaying death (Norton 2001).

These principles may be applicable to chemotherapeutic agents which target tumour

cell kill, and therefore the endpoint of response may be a relevant indicator of anti-

tumour activity.

There are inherent issues in the assessment of tumour response, associated with

investigator bias, inter-observer reliability and variation in observed response rates

over multiple trials (Therasse 2002). These may, to some degree, be alleviated by the

incorporation of independent central review of response assessments or the incor-

poration of a randomised control arm when historical response data are unreliable.

The use of classical response criteria for trials of drugs that may not cause tumour

20 A PRACTICAL GUIDE TO DESIGNING PHASE II TRIALS IN ONCOLOGY

shrinkage is likely to be inappropriate and raises questions over the design of phase

II trials and the endpoints being used (Twombly 2006). Measures of time to an event

such as disease progression or novel endpoints such as biomarkers may be more rel-

evant when evaluating the activity of newer targeted therapies. Nevertheless, because

most targeted or biological therapies are selected for clinical development on the

basis of pre-clinical data demonstrating at least some degree of tumour regression,

tumour response may remain an appropriate outcome measure for novel agents, as

acknowledged by two Task Force publications (Booth et al. 2008; Seymour et al.

2010).

2.2.1.2 Continuous

Continuous outcome measures such as tumour volume or biomarker response may

be appropriate and relevant outcome measures for consideration in studies of novel

agents (Adjei et al. 2009; Karrison et al. 2007; McShane et al. 2009). The use of

biomarkers in clinical trials is becoming increasingly common in the development

of targeted treatments with novel mechanisms of action. Only when a biomarker has

been validated as an outcome measure of activity, that is, when a clear relationship

has been established with a more conventional clinically relevant outcome measure,

should a biomarker be used as the primary outcome measure of a phase II trial.

The difculties in identifying validated biomarkers have been highlighted (McShane

et al. 2009), in addition to the need for technical validation and quality assurance

of the relevant assays. As discussed above, biomarkers may be dichotomised to

produce a binary outcome; statistical designs can, however, incorporate biomarkers

as a continuous outcome, which may often lead to more efcient trial design.

2.2.1.3 Multinomial

Multinomial outcome measures may offer an alternative to binary outcomes when

multiple levels of a clinical outcome may be of importance. For targeted or cytostatic

therapies, an alternative to binary tumour response (i.e. response vs. no response) that

remains objective may be the ordered categories of tumour response such as complete

response plus partial response versus stable disease versus progressive disease (Booth

et al. 2008; Dhani et al. 2009). Alternatively activity of an experimental therapy may

be evidenced by either a sufciently high response rate or a sufciently low early

progressive disease rate (Sun et al. 2009).

2.2.1.4 Time to event

Time to progression (TTP), time-to-treatment failure (TTF) or PFS may be considered

as appropriate outcome measures to assess the activity of treatments in phase II clinical

trials (Pazdur 2008).

∙TTP may be dened as the time from registration or randomisation into a

clinical trial to time of progressive disease;

KEY POINTS FOR CONSIDERATION 21

∙TTF may be dened as time from registration/randomisation to treatment dis-

continuation for any reason, including disease progression, treatment toxicity,

patient preference or death;

∙PFS may be dened as time from registration/randomisation to objective

tumour progression or death.

The use of these endpoints has increased in recent years as a means of assessing

the activity of targeted or cytostatic treatments, including cancer vaccines. While

TTP and PFS may better capture the activity of such agents, they do present their

own challenges. Trials incorporating TTP or PFS as the primary outcome measure

may be constrained by a lack of accurate historical time-to-event population data

with which to make comparisons. This limitation may be overcome by randomised,

comparative designs, but they inherently require larger sample sizes. TTP or PFS may

be inuenced by assessment bias in terms of frequency of assessment irrespective of

randomisation, highlighting the need to carefully consider the schedule of follow-up

assessments; increasingly, assessments are recommended at xed time points rather

than in relation to the number of cycles of treatment received to avoid such biases.

Additional time-to-event outcome measures may also be considered including, for

example, time to developing an SAE in trials primarily concerned with safety assess-

ment or time to a clinical event such as bone fracture in trials of drugs specically

acting against bone metastases.

2.2.1.5 Ratio of times to progression

One way to overcome the limitations of TTP and PFS as outcome measures with

regard to the challenges of unreliable historical data, and to avoid the need for

additional patient numbers in a randomised study, may be to use each patient as their

own control. The ratio of times to progression or ‘growth modulation index’ has been

proposed for trials in patients who have had at least one previous line of treatment

(Mick et al. 2000; Von Hoff 1998).

The growth modulation index (GMI) represents the ratio of the TTP on the current

investigational treatment relative to that on the previous line of ‘standard’ treatment,

that is, sequentially measured paired failure times for each patient. Although origi-

nally proposed in the 1990s, this outcome measure may be considered exploratory, as

it has not been widely used in phase II trials to date and relies on TTP data from the

previous line of treatment, the accuracy of which may be uncertain as it will usually

have been administered outside a clinical trial when assessments are less structured.

A GMI of 1.33 has been proposed as clinically relevant, but this threshold has not

been validated (Von Hoff 1998). Time-to-event ratios may, however, be worthy of

consideration as a phase II outcome measure where randomisation is not appropriate.

2.2.2 Randomisation

The use of randomisation in phase II trials is widely debated (Buyse 2000; Redman

and Crowley 2007; Yothers et al. 2006). Randomisation protects against selection bias,

22 A PRACTICAL GUIDE TO DESIGNING PHASE II TRIALS IN ONCOLOGY

balances treatment groups for prognostic factors and contributes towards ensuring a

valid comparison of the treatments under investigation, such that any treatment effect

observed can reasonably be attributed to the treatment under investigation and not

external confounding factors.

Although randomised phase III clinical trials provide the mainstay of evidence-

based clinical research, the use of randomisation within phase II is not so straight-

forward. Those opposed to randomisation in phase II trials argue that it can be

unacceptably restrictive from a resource perspective, as it inevitably requires at least

twice the number of patients (assuming 1:1 randomisation), increasing both the cost

and duration of the trial (Yothers et al. 2006). A further criticism is that where the

main purpose of randomisation is to balance for potential prognostic factors (of

which there may be many), this is unlikely to be achieved in randomised phase II

trials that are generally only modest in size (Redman and Crowley 2007). On the

other hand, those making the case for randomised phase II trials stress the inherent

problems of selection bias in uncontrolled trials (Buyse 2000). Therapeutic benets

are generally smaller than potential differences in outcome due to baseline patient

and disease characteristics; patient selection bias can, therefore, seriously confound

the interpretation of non-randomised phase II trials, and thus the decision to take a

treatment forward to phase III. This may not be a problem in a phase IIa trial of a

new cytotoxic that is simply screening to establish whether it has a pre-specied, and

often low, level of activity; bias is more of a challenge in a phase IIb trial where the

key question is whether a new treatment has a sufciently high level of activity to

warrant a large phase III trial.

For an increasing number of phase II studies, especially those of cytostatic or

targeted agents, where ‘traditional’ endpoints such as response rate are not likely

to be the most appropriate outcome measures, historical controls are problematic as

data for alternative endpoints such as PFS may not be available. Where such data do

exist, the population of patients on which the data are available must be considered

since patients entering phase II clinical trials will not be representative of the broader

patient population treated in routine practice from which historical outcome data

may be derived. It is, therefore, important that the patients from whom the historical

outcome data are derived are matched as closely as possible to the phase II population

in terms of baseline characteristics and disease biomarkers if used for enrichment.

If this is not possible, there is a strong argument to include randomisation against a

control arm within the phase II trial.

In the context of randomisation, another important point is whether the experimen-

tal therapy under investigation is to be delivered as a single agent or in combination.

Where an experimental therapy is given in combination with the current standard

treatment, it is very difcult to identify any additional activity of the experimen-

tal agent over and above that of the standard partner therapy unless a comparative

control arm is incorporated into the trial. Even if historical activity data do exist

for the standard therapy, patient selection and evolving patterns of patient care may

often render the interpretation of such data difcult. This should be considered in

detail when making the decision as to whether or not to incorporate a randomised

control arm.

KEY POINTS FOR CONSIDERATION 23

Although randomisation is increasingly being incorporated into phase II trial

design, it can take various forms. Simply because randomisation between experimen-

tal and control treatments is incorporated into a phase II trial does not automatically

imply that the two arms are formally statistically compared with sufcient power; the

reasons for randomisation should, therefore, be critically evaluated.

The statistical implications of conducting a single-arm or a randomised phase II

study have been evaluated in simulation studies. One study compared the results of

multi-centre single-arm and randomised phase II trials of the same sample size, where

the decision as to whether or not the experimental treatment was deemed successful

was based solely on it showing a higher response rate than in the historical control

population, or randomised control population, that is, no formally powered statistical

comparison was employed (Taylor et al. 2006). Where there was expected to be little

variability in response rates between centres, and both the variability and uncertainty

in the response rate for the control population were small, single-arm studies were

found to be adequate in terms of correct decision-making. However, with increased

variability and uncertainty in response rates for either the experimental or control

population, randomised studies were more likely to make the correct recommen-

dation regarding proceeding to phase III, and should be considered as a possible

option. A further study compared error rates between single-arm and randomised

comparative phase II trials, which reected more realistically the characteristics of

a phase II trial (Tang et al. 2010). Although sample sizes for the randomised tri-

als were at least double those of the single-arm trials, the false-positive error rates

(type I error) in single-arm trials were two to four times those projected when the

characteristics of the study patients differed from those of the historical controls;

by contrast, randomised trials remained close to the planned type I error. Statistical

power (type II error) remained stable for both designs despite differences in the patient

populations.

The impact of misspecication of the control data for either approach should be

considered in detail, for example, the impact of specifying a control response rate of,

say, 60% when in fact it may be as low as, say, 50%, or as high as 70%. In the single-

arm setting, the impact of such misspecication, potentially leading to increased

false-negative or false-positive results, is much higher than in the randomised setting

since there is no concurrent control arm against which to verify the initial control

assumptions made. Thus where there is uncertainty in the control data, the inclusion

of a control arm may be considered appropriate.

There is no one-size-ts-all approach to phase II trial design, and the theoretical

and practical implications of randomisation must be considered on a trial-by-trial

basis. Below we discuss the various randomisation options for phase II trial design

and provide examples of when each setting may be appropriate. Randomisation is

categorised within the thought process as

i. no randomisation (single-arm phase II trial);

ii. randomisation incorporating a control arm, no formally powered statistical

comparison intended;

24 A PRACTICAL GUIDE TO DESIGNING PHASE II TRIALS IN ONCOLOGY

iii. randomisation incorporating a control arm, formal comparison intended; and

iv. randomisation to multiple experimental treatments.

The use of randomised discontinuation designs is addressed separately in Section

2.2.3.

2.2.2.1 No randomisation

Chapter 3 outlines those designs that incorporate only a single experimental arm. The

results of most single-arm phase II trials are interpreted in the context of historical

control data. The reliability, or otherwise, of these historical data is one of the main

issues driving discussion about randomisation in phase II studies (Rubinstein et al.

2009; Vickers et al. 2007). Single-arm phase II designs have been reported that utilise

historical data but incorporate an estimate of potential variability arising from the

number of patients or trials from which those historical data have been derived, and

are presented in Chapter 3.

A single-arm study may be considered appropriate where

∙comparison with a control group is not relevant. For example, a phase IIa trial

designed to show proof of concept, where the intention is to obtain an initial

estimate of treatment activity to inform the design of a randomised phase IIb

trial;

∙the historical data are sufciently robust for the primary outcome measure as

to allow a reliable comparison, for example, a study of a single-agent cytotoxic

treatment with response rate as the primary outcome measure, conducted in

a broad population of patients with a common cancer refractory to standard

therapy.

2.2.2.2 Randomisation including a control arm

Randomisation including a control arm can be considered in two ways: randomisation

with no formal comparison between experimental and control arms and randomisation

with a formal comparison between experimental and control arms. Further discussion

of each of these is given below. Phase II trial designs incorporating randomisa-

tion between a single experimental therapy (or combination therapy) and a control

arm are presented in Chapter 4.

With no formal comparison

Those designs that incorporate a control arm with no formal comparison intended

as the primary decision-making assessment are highlighted in Chapter 4, as the

study is not designed to have sufcient power to detect statistically signicant dif-

ferences between treatment arms. This does not infer that a comparison may not be

made of outcomes between the arms; rather, that these comparisons be made with

the acknowledgement of reduced statistical power therefore providing additional

exploratory comparisons only. This approach may be appropriate if it is sufcient to

KEY POINTS FOR CONSIDERATION 25

simply show that the experimental treatment has activity within a certain range. Ran-

domisation provides a level of reassurance that the study population is representative

and guards against patient selection bias; this approach is more acceptable when at

least some historical data exist to further aid interpretation of the activity of the investi-

gational agent.

In the absence of formal comparison between treatment arms, the sample size may

simply be doubled compared to a single-arm study and decision-making at the end of

the trial based primarily on the results of the experimental arm, albeit in the context

of outcomes in the control arm. Data from the patients randomised to the control

arm can be more formally incorporated. For example, response rates in the control

arm may be compared to the historical control rates to determine whether they are

reective of the assumptions made when designing the trial (Buyse 2000; Herson and

Carter 1986).

It has been suggested that the use of a control arm as a reference arm only

should be avoided, particularly in trials of targeted or cytostatic agents, since it may

be difcult to interpret the results when unexpected outcomes are observed in the

control arm and when the sample size is not sufcient enough to permit direct formally

powered comparisons (Rubinstein et al. 2009). For example, if positive results were

observed in the experimental arm on the basis of pre-dened criteria, but higher than

expected activity was also observed in the control arm, does this call into question

the positive trial outcome? On the other hand, if the outcome of the experimental

arm is negative and the control arm also has a lower level of activity than expected,

should the apparent low activity of the experimental treatment be questioned? These

uncertainties may be mitigated by looking at both study arms in relation to appropriate

historical data, where available.

With formal comparison

When a phase II trial aims to determine more than whether the investigational agent

has activity within a broad range, or there are serious doubts about the accuracy

of historical control data, formal comparison between the control and experimental

treatment arms is preferred. The trade-off for increased reliability is inevitably a

larger sample size.

The level of statistical signicance within a comparative, randomised phase II

trial should be considered carefully as this will further impact on sample size. It is

acceptable to increase the type I error in a phase II trial compared to the typical 5%

level used within phase III trials, and error rates of up to 20% have been used. This

may enable more realistic sample sizes, and the error associated with incorrectly

declaring a non-active treatment worthy of further investigation in phase III (i.e. the

type I error) may be deemed more acceptable than that of incorrectly rejecting a

treatment that is active. It is, therefore, important to maintain the power associated

with the design of the trial.

Another consideration in the use of formally comparative phase II designs is

the feasibility of achieving the treatment difference that is being specied. While it

may be appropriate to target large treatment effects in some circumstances, this may

26 A PRACTICAL GUIDE TO DESIGNING PHASE II TRIALS IN ONCOLOGY

not be the case in others. The size of the clinically relevant treatment effect should,

therefore, be considered carefully to ensure the outcome assumptions are realistic

and not simply used as a method of reducing sample size.

It must also be stressed that with the use of formally comparative phase II designs,

a statistically signicant result at phase II would not usually obviate the need for a

subsequent phase III trial. In contrast to the short-term endpoints usually selected

in phase II trials, longer term endpoints such as PFS and OS are typically selected

in large-scale phase III trials. Additionally, in a relatively small randomised phase

II study only a limited number of patients will have received study drug so not all

clinically relevant toxicities may be identied and should therefore be studied further.

Information gained from the phase II trial may also inuence patient selection for

the denitive phase III study. Subsequent conrmatory phase III trials are, therefore,

usually required even after a statistically positive randomised phase II trial.

2.2.2.3 Randomisation to experimental arms (selection)

Where the aim of a phase II trial is to select which of several candidate investigational

treatments to take forward for further evaluation, randomisation may be incorporated

to randomise patients between several experimental treatments. Where historical

control data are either available or not relevant as discussed above, this will inuence

the decision as to whether or not a control arm is also incorporated, also as discussed

above.

2.2.3 Design category

Phase II statistical designs can be broadly separated into nine statistical design cate-

gories:

∙one-stage;

∙two-stage;

∙multi-stage (or group sequential);

∙continuous monitoring;

∙decision-theoretic;

∙three-outcome;

∙phase II/III;

∙randomised discontinuation; and

∙targeted subgroups.

These categories are not mutually exclusive. For example, a one-stage trial

may incorporate a three-outcome design, or analysis based on a decision-theoretic

approach. Where this is the case, designs have been listed according to their primary

design categorisation.

KEY POINTS FOR CONSIDERATION 27

A brief description of these nine categories is provided below, focusing on the

practical implementation of each design. Previous reviews have used alternative cat-

egories for phase II designs, focusing either on single-arm versus randomised studies

(Seymour et al. 2010) or specic designs such as randomised designs, enrichment

designs and adaptive Bayesian designs for trials of molecularly targeted agents only

(Booth et al. 2008). Mariani and Marubini also previously conducted a review of

the statistical methods available for phase II trials, categorising designs according to

one sample versus controlled, as well as according to the number of stages of assess-

ments, and focusing on a framework for trial analysis, that is, frequentist, Bayesian or

decision theoretic (Mariani and Marubini 1996). The grouping of trial designs within

this book adopts a similar design categorisation to Mariani and Marubini (Mariani

and Marubini 1996), but the thought process incorporates points for discussion prior

to making the specic design decision. Additional categories of design that Mariani

and Marubini (Mariani and Marubini 1996) did not consider are also included.

2.2.3.1 One-stage

A one-stage design utilises a xed sample of patients, recruited until the required

sample size is obtained. After the necessary follow-up of patients, analysis and

decision-making regarding proof of concept, whether to move to phase III or not, or

which treatment(s) to select to take forward to phase III, is made. One-stage designs

are relatively straightforward, avoiding complexities relating to recruitment strategies

if interim analyses are undertaken. They do not, however, allow for adaptations

such as early trial termination due to low levels of activity. Where the safety of a

treatment is well known, and data are already available to suggest activity, either for

a similar treatment or for the same treatment in an alternative population of patients,

a single-stage design may be appropriate since an interim ‘check’ may be deemed

unnecessary.

Where the experimental therapy is highly active, over and above the current

standard therapy, fewer patients may be required under a one-stage design than

other two- or multi-stage designs that incorporate early termination for lack of

activity only.

2.2.3.2 Two-stage

Under a two-stage design patients are recruited to the trial in two stages such that at

the end of the rst stage an interim analysis is performed and the trial may be stopped

for a number of reasons, including lack of activity, early evidence of activity or

unacceptable toxicity; otherwise, the trial continues to a second stage. Alternatively,

the interim analysis may be used to select which of several experimental treatments

to take forward to the second stage. Additional adaptations may be incorporated at

the end of the rst stage according to the specic trial design, for example, sample

size re-estimation.

Stopping rules are developed for each stage of the study to determine whether

to stop or continue, based on pre-specied operating characteristics relevant to the

28 A PRACTICAL GUIDE TO DESIGNING PHASE II TRIALS IN ONCOLOGY

specic trial and design. At the end of the study, data from both stages are typically

used in deciding how to proceed.

Two-stage designs are benecial in that the analysis at the end of the rst stage

may act as a ‘check’ on the treatment(s) under investigation, potentially exposing

fewer patients to an inactive treatment than would be exposed using a one-stage

design (i.e. under the null hypothesis of no treatment activity, two-stage designs may

be more efcient). There are, however, issues around patient recruitment while data

from the rst stage of the trial are being analysed. This is a particular issue if the

outcome of interest requires a substantial period of follow-up or observation, for

example, PFS requiring a specic number of events to be observed. During this time,

patients either continue to be recruited, therefore contributing to the second-stage

sample size, without the results of the rst stage being known, or recruitment is

suspended. Continuing recruitment avoids the trial losing momentum and may be

acceptable where recruitment is slow. If a trial is recruiting rapidly, however, the total

required number of patients may be entered to the second stage of the trial before the

rst-stage analysis is complete, rendering the two-stage design futile.

Careful thought must be given to these points when a two-stage design is being

considered. A compromise may be considered, which does not require a break in

recruitment but takes into account data from patients recruited during the follow-up

and analysis period of the rst-stage patients if required (Herndon 1998). If the rst-

stage analysis suggests stopping due to lack of activity, recruitment may be suspended

at this stage and an additional assessment performed incorporating data on all patients

recruited during the follow-up and analysis period to determine whether to stop the

trial permanently or to resume recruitment (see Chapter 3). Additionally, other two-

stage designs can be adapted when the attained sample size is different to the planned

sample size, especially if this results in over-recruitment, where the decision-making

criteria may be updated in line with the actual number of patients recruited (Chen

and Ng 1998; Green and Dahlberg 1992).

Due to the nature of two-stage designs, the total sample size requirement is not

xed, so a maximum sample size and an average sample number (ASN) are generally

specied, to account for possible early termination of the trial.

2.2.3.3 Multi-stage

Multi-stage designs, also known as group-sequential designs, are similar to the two-

stage designs described above, but with additional analyses throughout the course of

the trial. This allows more opportunities to terminate the trial, exposing fewer patients

to inactive and/or toxic treatments, or to accelerate the start of the phase III trial

through early termination of the phase II trial if sufcient evidence of activity is seen.

Additional adaptations may be incorporated at each stage, and again stopping rules are

developed for each stage of the study based on pre-specied operating characteristics

relevant to the specic trial. As with two-stage designs, multi-stage designs require

consideration of whether to continue recruitment whilst interim assessments are

underway. Again, due to the nature of multi-stage designs, a maximum sample size

and an ASN are generally specied.

KEY POINTS FOR CONSIDERATION 29

Two-stage or multi-stage designs are generally chosen because of their ability

to terminate a trial earlier than a xed sample one-stage design and may be seen to

allow a more cautious approach. This is useful when the activity of a treatment is

not known and/or toxicity is likely to be considerable. In this case a two-stage or

multi-stage design may be appropriate irrespective of the implications of continuing

or suspending recruitment. Where such caution is not necessary, a two-stage or multi-

stage design may not be appropriate, especially when either suspending or continuing

recruitment is problematic; in these cases, a one-stage design may be an alternative.

2.2.3.4 Continuous monitoring

With continuous monitoring designs the outcome of interest is assessed after each

individual patient’s primary outcome has been observed. The rationale behind this

design is generally to allow early termination of the trial in case of lack of activity. This

provides, therefore, a more cautious approach to trial design, allowing termination

as soon as possible rather than waiting for a pre-dened number of patients to

be recruited. Again, pre-dened stopping rules are required and determined via

the specic design operating characteristics. Recruitment may continue while the

outcomes are observed, but real-time reporting of outcomes is fundamental to this

design. This is not possible if the primary outcome requires a prolonged period of

observation as with PFS or even best response, both of which may not be available

for some months following the start of treatment. In such cases there is little to be

gained from continuous monitoring over a multi-stage design, since many additional

patients may be recruited before data from the ‘last’ patient can be analysed. This may

be less problematic where, for example, acute treatment toxicity is the key outcome

measure, as this can generally be assessed more quickly than activity.

2.2.3.5 Decision-theoretic

Decision-theoretic designs consider costs and gains associated with making incorrect

decisions at the end of the phase II trial and incorporate utility functions associated

with these costs and gains. Variables such as the total patient ‘horizon’, that is,

the likely number of patients who would be treated with an effective new drug as

standard therapy after completion of a successful phase III trial, before the next new,

more effective drug becomes available, are required (Sylvester 1988). If that patient

population is small, and especially if the likely cost of the new treatment high, there

may be a nancial imperative that the magnitude of the treatment effect sought in

the phase II study be large. Decisions are generally made at the end of the trial after

a xed number of patients have been recruited, as in a one-stage design, although

multi-stage designs may also be used. These designs allow decisions to be made based

on an all-round assessment of gain, as opposed to concentrating solely on clinical

activity (Sylvester and Staquet 1980).

Although the ability to incorporate information regarding costs and gains

associated with decisions made throughout a trial is clearly potentially useful,

decision-theoretic designs are rarely used in phase II oncology trials. This may reect

30 A PRACTICAL GUIDE TO DESIGNING PHASE II TRIALS IN ONCOLOGY

the difculty of identifying accurately the cost and patient horizon information that is

often required for their design and difculty in formulating realistic and interpretable

models.

2.2.3.6 Three-outcome

The three-outcome design may be seen as a sub-design of the one-, two- or multi-stage

designs (Storer 1992). The main characteristic of this design is that instead of there

being two possible outcomes at the end of the phase II trial, that is, reject the null

hypothesis or reject the alternative hypothesis, a third outcome is incorporated where

the trial is inconclusive on the basis of primary endpoint data. This approach may

be used when there is a region of uncertainty between, for example, a response rate

above which further investigation in phase III is warranted and a response rate below

which it is not. If the primary outcome measure data of such a trial are inconclusive,

the decision to move to phase III or not may be based on alternative outcome measures

such as safety or cost.

Three-outcome designs may be single arm or randomised. Upper and lower

stopping boundaries are developed for each stage of the study to determine whether

to stop, continue or declare the trial inconclusive. To calculate these boundaries,

in addition to the conventional type I and type II errors (𝛼and 𝛽, respectively),

two further errors must also be considered. The probability of correctly making

the decision to reject the null hypothesis when the alternative is true (𝜋) and the

probability of correctly making the decision to reject the alternative hypothesis when

the null is true (𝜂), are required. With these four errors specied one can then

determine the probability of incorrectly declaring an inconclusive result when in fact

the alternative hypothesis is true (𝛿) and the probability of incorrectly declaring an

inconclusive result when in fact the null hypothesis is true (𝜆). These differing error

rates are shown graphically in Figure 2.2, assuming a binary outcome measure of

success or failure. Here, nLand nUrepresent the lower and upper stopping boundaries,

respectively, for the number of successes observed; Nis the sample size; and p0and

p1represent the success rates associated with the null and alternative hypotheses,

respectively. Alternatively, the probabilities of concluding in favour of either the null

or alternative hypotheses when in fact the true response rate lies within the region of

uncertainty may be specied (Storer 1992).

Three-outcome designs generally require fewer patients than a typical two-

outcome design using the same design criteria. They may also be seen as better

reecting clinical reality than the typical two-outcome design where the decision

λ

ηπ

nL

α

β

δ

Np1

Np0nU

Figure 2.2 Probabilities associated with the three-outcome design.

KEY POINTS FOR CONSIDERATION 31

between accepting and rejecting the null hypothesis may be determined by a single

success or failure (i.e. a single stopping boundary).

2.2.3.7 Phase II/III

Phase II/III designs are used when the transition from phase II to III is required to be

seamless. Such designs typically allow data generated from the phase II component of

the trial to be incorporated in the nal phase III analysis. These trials are, therefore,

usually randomised, incorporating a control arm. Randomisation may be used to

select the ‘best’ of several treatments in the phase II component to be carried forward

into phase III or to decide whether or not to continue an individual experimental

treatment (single-agent or combination therapy) into the phase III component.

One of the main benets of these designs is that they reduce the total time required

for the study to progress through phases II and III compared to running separate trials.

Since data from the phase II component may also be incorporated in the phase III

analysis, patient resources are also reduced; this is a major benet in rarer cancers

or disease sub-types where the patient population is small. Where a limited number

of patients are available for trial recruitment, the optimal use of patient data is even

more crucial than usual.

Alternatively, where separate phase II and phase III trials are to be carried out, a

phase II trial allowing early termination for evidence of activity may be considered

appropriate, bringing forward the phase III trial and saving patient resource. Where

patient numbers are limited, various trial design scenarios should be investigated

to identify the design which is most efcient in terms of patient numbers whilst

providing sufciently robust results.

As with multi-stage designs, the issue of whether to continue or suspend recruit-

ment during the analysis of the phase II component arises. The trial risks losing

momentum if recruitment is suspended, but rapid recruitment during this period may

result in a substantial number of patients being entered into the phase III element,

rendering the phase II/III design futile in its attempts to reduce the number of patients

exposed prior to embarking on phase III. For these trials to be carried out successfully,

the funding body, be it academic or industry, must commit to the full phase II/III

package in the knowledge that the trial may terminate after the phase II component.

Often many more centres will participate in the phase III component than in the phase

II. Since the trial may terminate for lack of activity after the phase II component, the

early preparation of centre set-up to enable a smooth transition to phase III must be

weighed against this possibility of early termination.

Specic phase II/III designs are outlined in Chapters 3–7. An alternative approach

to designing a phase II/III trial is to use conventional stand-alone phase II designs to

make decisions as to whether to continue to phase III or not and incorporate these

into phase III seamlessly (Storer 1990). Typically in this case, the primary outcome

measure under investigation during phase II is different to the primary outcome

measure under investigation during the phase III component, to avoid the need to

adjust the type I error rate in the phase III component. Otherwise, when the same

outcome measure is used in both phases II and III, with a formal comparison between

control and experimental treatments at the end of phase II, this essentially becomes

32 A PRACTICAL GUIDE TO DESIGNING PHASE II TRIALS IN ONCOLOGY

a phase III trial with at least one interim analysis. This approach, using the same

endpoints, is not generally recommended since the phase III endpoint will usually

be a long-term outcome such as OS. A long follow-up period would, therefore, be

required for that endpoint to be assessed in the interim/phase II analysis. Multi-stage

approaches may, however, be based on the phase II outcome measure with subsequent

interim analyses within the phase III component based on the phase III endpoints.

Phase II/III designs are inevitably associated with patient and resource efcien-

cies, accelerating the transition between the two trial phases, and usually allowing

patients recruited to phase II to be incorporated in the phase III analysis. However, by

performing separate phase II and phase III trials the results of the phase II trial, and

lessons learned during its set-up and conduct, may be incorporated into the design of

the phase III trial. Changes to eligibility criteria or follow-up schedules, for example,

may be required for the phase III trial. Here separate phase II and III trials enable

these alterations to be made. Such an approach to the planning of current and future

trials may be benecial where experience with an experimental treatment is minimal,

or data for the control population of the disease area in question are minimal, enabling

additional learning between stages of the development pathway.

2.2.3.8 Randomised discontinuation

Randomised discontinuation, or enrichment, trial designs (Kopec et al. 1993; Rosner

et al. 2002; Stadler 2007) involve all study patients initially being treated with the

experimental treatment for a pre-dened period of time, and then assessed for response

to treatment. Typically, those with progressive disease come off study whilst patients

who are responding continue to receive the experimental agent; those with stable

disease are randomised to either continue the experimental treatment or discontinue

it and either remain off treatment or receive standard treatment depending on the

question being asked in the trial.

Such an approach may be appropriate when the specic population of patients

in which the experimental treatment is expected to be effective is unknown. For

example, when evaluating a targeted agent where the level of expression of the relevant

target required for potential activity is not known, a randomised discontinuation

design may allow de facto enrichment of the patient population. Against this, only a

limited proportion of the population recruited to the trial actually contributes to the

randomised part of the study. An overview of the randomised discontinuation design

is presented by Stadler, providing an example of where the design has been used

successfully, as well as providing a summary of the advantages and disadvantages of

the design (Stadler 2007).

The role of the randomised discontinuation design has been reviewed in detail

(Booth et al. 2008; Capra 2004; Freidlin and Simon 2005; Kopec et al. 1993; Rosner et

al. 2002; Rubinstein et al. 2009). The Methodology for the Development of Innovative

Cancer Therapies (MDICT) Task Force (Booth et al. 2008) considered the design as

being exploratory in nature due to lack of clarity on its role in oncology. One study

comparing the randomised discontinuation design with a comparative randomised

design showed that the randomised discontinuation design may be underpowered

in comparison to the traditional design due to the number of patients who start the

KEY POINTS FOR CONSIDERATION 33

investigational treatment who are not then randomised (Capra 2004). An accurate

estimate of the proportion of patients likely to enter randomisation is, therefore,

essential in planning the study sample size. By contrast, a second study concluded

that, although the randomised discontinuation design may be less efcient than the

classical randomised design in many settings, it can be useful in the early development

of targeted agents where a reliable assay to select patients expressing the target is not

available (Freidlin and Simon 2005). The randomised discontinuation design may

be especially appropriate when treatment benet is restricted to a select group of

patients who are not identiable at the start of the trial.

2.2.3.9 Targeted subgroups

In the era of targeted therapies it may be appropriate to investigate the activity of a

treatment in a specic subgroup of patients in whom the intervention is anticipated to

be effective. Alternatively, where the specic subgroup of patients is not determined,

or there is uncertainty about whether a biomarker accurately identies a ‘sensitive’

patient population, it may be appropriate to assess activity simultaneously in several

subgroups of patients according to biomarker characterisation. Population enrichment

for a specic biomarker in phase II trials was discussed in Section 2.1.1.4, highlighting

the risks associated with incorrect characterisation and the possibilities of false-

negative results, as well as issues surrounding the use of historical data.

Designs that incorporate subgroup stratication may be used to enable the

inclusion of separate cohorts of patients dened by the biomarker in question, or

populations dened by other disease sub-types or patient characteristics, ensuring that

adequate numbers are recruited into each cohort. Approaches incorporating stratica-

tion range from separate phase II trials within each stratication level, to hierarchical

Bayesian designs (Thall et al. 2003) and tandem two-stage methods where the exper-

imental treatment is rst tested in an unselected patient population, and if there is

insufcient activity in this overall group, the trial continues in a select population

(e.g. marker-positive patients) only (Pusztai et al. 2007). Trials may also be partially

enriched to include a larger proportion of, for example, biomarker-positive patients,

providing additional power to detect treatment effects in this targeted subgroup of

patients.

These designs are discussed in Chapter 7; however, as noted in Chapter 1, a

number of recent papers have been published on biomarker stratication (An et al.

2012; Buyse et al. 2011; Freidlin et al. 2012; Freidlin and Korn 2013; Jenkins et al.

2011; Lai et al. 2012; Mandrekar et al. 2013; Roberts and Ramakrishnan 2011;

Tournoux-Facon et al. 2011), therefore we encourage consideration of additional

literature outlining alternative designs available.

2.3 Stage 3 – Practicalities

2.3.1 Practical considerations

At this stage of the design process, when faced with a number of statistical designs

from which to choose, deciding which particular design is most appropriate to your

34 A PRACTICAL GUIDE TO DESIGNING PHASE II TRIALS IN ONCOLOGY

particular setting can be difcult. Although all the designs considered are deemed

easy to implement, this section focuses on key practical aspects.

2.3.1.1 Programming requirements

For each statistical design described in Chapters 3–7, the programming require-

ments have been considered. It is important that the statistical methodology can be

implemented easily and efciently, allowing the statistician to consider various trial

scenarios during the design process. Only those designs that detail availability of

programs, or for which sufcient information is provided to allow the design to be

implemented, have been incorporated in this book. Nevertheless, some designs may

still be easier to implement than others depending on the resources available.

2.3.1.2 Availability/robustness of prior data

It is essential to consider the design parameters that must be dened in order to

implement each design and the variability associated with each of these parameters.

There may be, for example, a paucity of data on the primary outcome measure for

patients receiving the current standard treatment in the particular trial setting, so what

is the impact of those historical data being unreliable? For example, if the response

rate with standard therapy is estimated to be 20%, a phase II study may aim for a

response rate of 30% with the experimental arm. If a randomised phase II trial is

powered on such a basis but the patients in the control arm have better outcomes than

expected, the study may be underpowered. On the other hand, if a single-arm study is

undertaken and the historical response rate is overestimated an active treatment may

be inappropriately discarded. The implication of misspecication of study parameters

may be investigated by simulation or may already be addressed within the specic

design publication. If a trial design is not robust with regard to misspecication, it

may be more appropriate to consider a design that allows either an estimate of the

variance of the parameter to be incorporated into the design or to select a different

outcome measure that is robust in the face of misspecication. Additionally there may

be a specic design parameter for which no reliable data are available, for example, an

estimate of the correlation between a change in a biomarker and a clinically relevant

outcome measure. Here it may be possible either to consider a design that does not

require the parameter in question or to simulate the performance of the design under

differing parameter assumptions.

2.3.1.3 Early termination

Typically in phase II trials, early termination of a trial is incorporated to ensure

the safety and appropriate treatment of patients, usually in the context of lack of

activity or unacceptable toxicity. Early termination when evidence of activity has

been demonstrated may not be deemed necessary given the desire to obtain as much

information on the treatment as possible to provide a more robust estimate of the

treatment’s activity to inform the design of subsequent trials. On the other hand,

KEY POINTS FOR CONSIDERATION 35

this may delay opening of subsequent phase III trials and there are designs that do

incorporate early termination for activity.

2.3.1.4 Operating characteristics

In phase II trials, a larger type I error than typically used in phase III trials (e.g. 5%

two sided) is generally accepted due to the nature of phase II trials. Type I errors

of up to 20% have been used where the consequences of incorrectly rejecting an

active treatment are deemed less acceptable than those of inappropriately continuing

to develop a treatment that will ultimately not be active. In such circumstances,

subsequent larger phase III studies would be expected to identify the treatment as

inactive, whereas if a treatment is rejected the situation may well not be remedied as a

phase III trial is unlikely. The selection of an appropriate type I error rate is, therefore,

crucial to the reliability of the trial results and the efcient development of new

treatments. While larger type I error rates allow smaller sample sizes, investigators

need to consider carefully whether it is appropriate to conduct a small phase II study

with a high risk of a false-positive result, and ‘negative’ subsequent phase III trial;

the alternative is a larger phase II study with a lesser chance of development of an

ultimately ‘negative’ phase III trial.

Since the primary aim of many phase II trials is to determine whether a treatment

has a pre-specied level of activity, the power of phase II studies should generally

remain high; in practice, this means a power of at least 80%.

2.4 Summary

This chapter provides guidance on decision-making when identifying a trial design

for a phase II trial (Figure 2.1). Clinical researchers and statisticians should consider

carefully each of the three stages of the thought process; additional resources should

be consulted where necessary, and discussion maintained between the clinician and

statistician. The guidance we offer is not intended to be exhaustive or proscriptive.

Further reading and discussion around specic areas relevant to each specic phase

II design element should always be encouraged.

Examples of using the thought process in practice are presented in Chapters 8–12

for various trial design scenarios. These are intended as practical examples of how

the thought process may be implemented.

3

Designs for single experimental

therapies with a single arm

Sarah Brown

3.1 One-stage designs

3.1.1 Binary outcome measure

Fleming (1982)

∙One-stage, binary outcome

∙Standard software available

Fleming proposes a one-stage, two-stage and multi-stage design requiring spec-

ication of response rates under the null and alternative hypotheses and type I and

II error rates. Decision criteria are based around rejecting the null hypothesis that

the response rate of the experimental treatment is not less than some pre-specied

response rate, typically dened as the expected response rate of the current historical

control treatment. Sample size is based on normal approximation to the binomial

distribution. This is a widely used design and programs are readily available (e.g.

Machin et al. 2008).

Fazzari et al. (2000)

∙One-stage, binary outcome

∙Requires programming

A Practical Guide to Designing Phase II Trials in Oncology, First Edition.

Sarah R. Brown, Walter M. Gregory, Chris Twelves and Julia Brown.

© 2014 John Wiley & Sons, Ltd. Published 2014 by John Wiley & Sons, Ltd.

DESIGNS WITH A SINGLE ARM 37

Fazzari and colleagues propose modications to previously published phase II

designs. The modications include: incorporating a patient population that is more

representative of the intended phase III trial population, by reducing the eligibility

restrictions and increasing the number of centres; increasing the sample size to

allow more accurate estimates of the treatment activity; using an outcome measure

that is more representative of that to be used in phase III, recommending a k-year

progression-free survival (PFS) or overall survival (binary) outcome measure for

advanced-stage disease populations; taking the upper limit of the 75% condence

interval of the activity of previous treatments as the minimum activity required to be

observed to warrant moving to phase III. The methodology for the design of the study

is based on rejecting the minimum activity required from an x% condence interval

around the estimate of treatment activity with, say, 80% probability. Sample size is

generated using Monte Carlo simulation which will require programming.

A’Hern (2001)

∙One-stage, binary outcome

∙Standard software available

A’Hern presents an adaptation of Fleming’s design (Fleming 1982). Calculation

of sample sizes and cut-offs is based on exact binomial distributions as opposed to

normal approximation. Trials based on exact distributions are typically larger than

those using the normal approximation; however, they avoid the possibility that con-

dence intervals around the estimate of activity at the end of the trial will incorrectly

contain the lower rejection proportion due to approximation to the normal distribu-

tion. As for Fleming, this design is widely used and programs are readily available

for its implementation. The choice between Fleming and A’Hern should be based on

the sample sizes and the requirement for exact testing.

Chang et al. (2004)

∙One-stage, binary outcome

∙Pascal programs noted as being available from authors

Chang and colleagues propose a design whereby the sample size, and thus the

test statistic, is calculated using exact unconditional methods. This design may be

used when the historical control data are based on only a few patients (say up to

120). The number of patients on which the historical data are based is required to be

known as analyses take into account the pooled variance of the historical control and

experimental data. Tables and software are available to calculate sample sizes.

Mayo and Gajewski (2004)

∙One-stage, binary outcome

∙Requires programming

38 A PRACTICAL GUIDE TO DESIGNING PHASE II TRIALS IN ONCOLOGY

Mayo and Gajewski propose sample size calculations for a single-arm single-stage

trial with binary outcome, using Bayesian informative priors (pessimistic/optimistic).

This is an extension of the two-stage designs proposed by Tan and Machin (2002).

Prior information regarding expected response rate and level of uncertainty in this

value is required to determine sample sizes using either the mode, median or mean

approach. Programming is required for the median and mean approaches, possible in

Matlab. Sample sizes will vary depending on the approach used.

Gajewski and Mayo (2006)

∙One-stage, binary outcome

∙Requires programming

Gajewski and Mayo describe Bayesian sample size calculations where conicting

opinions on prior information can be incorporated. Information required to design

the trial includes prior distributions, cut-off for the posterior probability that the true

response rate is greater than some pre-specied value and an error term relating to a

small increase in the target response rate. Sample size calculation is iterative; therefore

some computation is required to identify the design characteristics, for which no

software is detailed but for which formulae are given to enable implementation.

This design differs from the earlier design proposed by Mayo and Gajewski (2004)

as it allows incorporation of pessimistic and optimistic priors, as opposed to one

informative prior.

Vickers (2009)

∙One-stage, binary outcome

∙Stata programs given in appendix to manuscript

Vickers proposes a design using historical control data to generate a statistical

prediction model for phase II trial. The observed trial data for the experimental

arm are then compared to the predicted results to give an indication of whether

patients treated with the experimental agents are doing better than expected, based

on the prediction model. The authors note that the methodology hinges on quality

historical control data relevant to the patient population under study. Step-by-step

methodology is presented which incorporates bootstrapping on both the historical

data set and the observed data set and a comparison of the predicted and actual

outcomes. Example Stata code is given in the appendix to the manuscript to allow

implementation of the statistical analysis, as well as assessment of power.

3.1.2 Continuous outcome measure

No references identied.

DESIGNS WITH A SINGLE ARM 39

3.1.3 Multinomial outcome measure

Zee et al. (1999)

∙One-stage, multinomial outcome

∙Requires programming

Zee and colleagues propose single-stage and multi-stage single-arm designs con-

sidering a multinomial outcome, in the context of incorporating progressive disease

as well as response into the primary outcome measure. Analysis is based on the num-

ber of responses and progressions observed, compared with predetermined stopping

criteria. A computer program written in SAS identies the operating characteristics

of the designs. This is not noted as being available in the paper; however, detail is

given to allow implementation.

Lu et al. (2005)

∙One-stage, multinomial outcome

∙Programs may be available from authors

Lu and colleagues propose a design (one-stage or two-stage) to look at both

complete response (CR) and total response (or other such outcome measures whereby

observing one outcome implies the other outcome is also observed). The design

recommends a treatment for further investigation if either of the alternative hypotheses

is met (i.e. for CR or for total response) and rejects the treatment if neither is met.

The designs follow the general approach of Fleming’s single-stage (Fleming 1982)

or Simon’s two-stage (Simon 1989) approach whereby the number of CRs and total

responses are compared to identied stopping boundaries. Tables are provided for

some combinations of null and alternative hypotheses; however, formulae are given

and at the time of manuscript publication programs were in development. The design

differs from others in this section in that one outcome measure is a sub-outcome

measure of the other, whereas other designs consider discrete outcome measures

such as partial response (PR) versus CR.

Chang et al. (2007)

∙One-stage, multinomial outcome

∙Programs noted as being available from authors

Chang and colleagues propose a single-stage and a two-stage design for window

studies which aim to assess the potential activity of a new treatment in newly diag-

nosed patients. Treatment is given to patients for a short period of time before

standard chemotherapy, and each patient is assessed for response or early pro-

gression (both binary outcome measures). The alternative hypothesis is based on

40 A PRACTICAL GUIDE TO DESIGNING PHASE II TRIALS IN ONCOLOGY

both the response rate being above a pre-specied rate and the early progressive

disease rate being below a pre-specied rate. The outcomes follow a multino-

mial distribution. A SAS program is noted as being available from the authors to

identify designs.

Stallard and Cockey (2008)

∙One-stage, multinomial outcome

∙Programs noted as being available from author

Stallard and Cockey propose single-arm, one- and two-stage designs for ordered

categorical data, where the rejection region for the null hypothesis is dened based

on the likelihood ratio test. The null region over which the type I error is controlled

considers a weighting of the proportion of patients in each response category, in a

similar manner to that of Lin and Chen (2000). The focus of the paper is on response

with three levels; however, the design may be extended to more than three levels.

Programs are noted as being available from the rst author to allow identication of

designs.

3.1.4 Time-to-event outcome measure

No references identied.

3.1.5 Ratio of times to progression

Mick et al. (2000)

∙One-stage, ratio of times to progression

∙Requires programming

Mick and colleagues propose a design based on the growth modulation index

(ratio of time to progression of experimental treatment relative to that on the patients’

previous course of anti-cancer treatment). The outcome measure is novel and the

authors justify its use for trials of cytostatic treatments where outcome measures such

as tumour response may not be appropriate. Various values of the growth modulation

index for null and alternative hypotheses should be considered to explore design

parameters, as appropriate for the setting of the study. Each patient acts as their

own control. Information is required for each patient on their time to progression on

previous treatment, and an estimate of the correlation between the two times is needed.

The design is identied via simulation, which allows investigation of the effect of the

correlation estimate on the overall design. Although software is not detailed as being

available, this has been implemented in Splus, and detail is provided to allow design

implementation.

DESIGNS WITH A SINGLE ARM 41

3.2 Two-stage designs

3.2.1 Binary outcome measure

Gehan (1961)

∙Two-stage, binary outcome

∙Standard software available

∙Early termination for lack of activity

Gehan proposes one of the earliest designs to assess experimental treatments

in phase II trials. The methodology is based on the double sampling method and

considers a phase II trial composed of a ‘preliminary’ stage and a ‘follow-up’

stage. The preliminary stage assesses whether the treatment under investigation is

likely to be worth further investigation, using a condence interval approach to

exclude treatments with response rates below those of interest from further investi-

gation. The follow-up stage assesses the activity of the treatment with pre-specied

precision. The number of patients to be included in the follow-up stage is deter-

mined according to the number of responses observed during the rst stage. The

proposed design is intended to completely reject inactive treatments quickly, such

that if the response rate of interest is excluded from the condence interval at the

end of the rst stage, the trial is terminated early. Otherwise the trial continues.

In the second stage the activity of the treatment is estimated with given preci-

sion, rather than providing decision criteria for continuing to a further trial. On

this basis, this design may be seen as an estimation procedure for initial proof

of concept trials rather than trials to determine whether or not to proceed to

phase III.

Fleming (1982)

∙Two-stage, binary outcome

∙Standard software available for overall sample size

∙Early termination for activity or lack of activity

Fleming proposes a one-stage, two-stage and multi-stage design. The multi-stage

design addresses multiple testing considerations to allow early termination in case

of extreme results, employing the standard single-stage test procedure at the last

test. Tables are presented for specic design scenarios using the exact underlying

binomial probabilities rather than the normal approximation to these probabilities.

Programs are readily available to calculate the overall sample size for a one-stage

design (e.g. Machin et al. 2008), with sample sizes at each stage chosen to be approx-

imately equal. Termination at the end of each stage is permitted for activity or lack

of activity.

42 A PRACTICAL GUIDE TO DESIGNING PHASE II TRIALS IN ONCOLOGY

Simon (1987)

∙Two-stage, binary outcome

∙Requires programming

∙Early termination for lack of activity

Simon introduces a two-stage design that is single arm with a binary outcome

whereby the sample size is minimised under a pre-specied expected response rate,

not necessarily the null or alternative response rate. Where this expected response rate

corresponds with the null hypothesis response rate, this design is equivalent to the

optimal design proposed in the subsequent paper summarised below (Simon 1989).

The current design is optimised by keeping the size of the rst stage small, making

the probability of rejecting an inactive drug high, and not allowing too high a sample

size in the second stage. Early termination is permitted at the end of stage 1 only

for lack of activity. A table is provided with limited design scenarios; however, the

designs detailed below (Simon’s optimal and minimax) are more widely used and

may be considered ahead of this earlier design.

Simon (1989)

∙Two-stage, binary outcome

∙Standard software available

∙Early termination for lack of activity

Simon proposes a single-arm two-stage design based on minimising the expected

number of patients under the null hypothesis (optimal), as well as an additional

design that minimises the maximum sample size (minimax). This is a well-known

and widely used two-stage design, based on null and alternative response rates, power

and signicance level, and the observed number of responses at the end of each stage

is used to assess stopping rules. The outcome of interest is binary and the trial may

only be terminated at the end of the rst stage for lack of activity. Extensive tables are

provided for different design scenarios and software is readily available (e.g. Machin

et al. 2008).

Green and Dahlberg (1992)

∙Two-stage, binary outcome

∙Requires programming

∙Early termination for lack of activity

The design described by Green and Dahlberg permits early termination for lack

of activity at the end of stage 1 when the alternative hypothesis is rejected at the 0.02

signicance level. At the end of the second stage a signicance level of 0.055 is used to

reject the null hypothesis and declare sufcient activity for further investigation. Some

DESIGNS WITH A SINGLE ARM 43

detail is given regarding stopping boundary and sample size calculation, although this

would need to be programmed and solved iteratively to nd the most suitable design.

This paper also discusses adaptations to the designs of Gehan (1961), Fleming (1982),

and Simon (1989), in the cases where the nal attained trial sample size differs from

the original planned design.

Heitjan (1997)

∙Two-stage, binary outcome

∙Programs noted as being available from the author

∙Early termination for activity or lack of activity

Heitjan proposes a design whereby decision-making is based on the ability to

persuade someone with extreme prior beliefs that the treatment under investigation

is either active or not. This requires specication of extreme priors. For a sceptic,

the probability that the experimental treatment is better than the standard treatment

must be at least some pre-specied value (e.g. 70%) for the treatment to be declared

active (known as the ‘persuade the pessimist probability’ PPP), and for an enthusiast,

the probability that the experimental treatment is worse than the standard treatment

must be at least some pre-specied value (e.g. 70%) for the treatment to be declared

inactive (known as the ‘persuade the optimist probability’ POP). Timing of interim

analyses can either be based on numbers of patients or time during the trial. Sample

size is justied by assessing the operating characteristics and calculating PPPs and

POPs of the design under various scenarios. Programs are noted as being available

upon request from the author. Early termination is permitted for activity or lack of

activity.

Herndon (1998)

∙Two-stage, binary outcome

∙Requires programming

∙Early termination for lack of activity

Herndon proposes a hybrid two-stage design that allows continuation of recruit-

ment while the results of the rst stage are being analysed. If the results of the rst

stage indicate the treatment is inactive, accrual is suspended and data are re-analysed

including data from all patients recruited to that time point. Otherwise, the design

continues to target recruitment for the second stage. The sample sizes for the rst

and second stages are chosen for practicality rather than via Simon’s optimal method,

with overall sample size calculated to maintain pre-specied type I and II errors for

study-specic null and alternative hypotheses. Critical values for suspending recruit-

ment, reinitiating or terminating recruitment and for declaring the treatment worthy

of further investigation at the end of stage 2 are calculated. To identify the critical

values a numerical search is required, for which formulae are provided. If the stage

44 A PRACTICAL GUIDE TO DESIGNING PHASE II TRIALS IN ONCOLOGY

I results indicate re-analysis using all patients to that time point, analysis follows

similar methodology to that proposed by Green and Dahlberg (1992), detailed above,

as does the analysis of stage II.

Chen and Ng (1998)

∙Two-stage, binary outcome

∙Programs noted as being available from authors

∙Early termination for lack of activity

Chen and Ng propose a exible design that operates in the same manner as

Simon’s two-stage design (Simon 1989), but here the number of patients at the rst

and second stages can vary by up to eight patients to allow a period of grace in

halting recruitment (in a similar manner to that described by Green and Dahlberg

1992, detailed above). A FORTRAN program is noted as being available from the

authors to enable implementation, and tables are given for some scenarios.

Chang et al. (1999)

∙Two-stage, binary outcome

∙Requires programming

∙Early termination for activity or lack of activity

Chang and colleagues outline a design for continuous or binary outcomes that

takes into account the number of patients on whom historical control data are based.

This reects the fact that the variances of the historical control data and the experi-

mental data will differ. The trial may be terminated at the end of the rst stage for

either activity or lack of activity. Algorithms are used to determine critical values for

stopping, and sample size is calculated by multiplying the single-stage sample size

(formulae provided) by between 1.02 and 1.05.

Hanfelt et al. (1999)

∙Two-stage, binary outcome

∙Programs noted as being available from authors

∙Early termination for lack of activity

Hanfelt and colleagues propose a modication to Simon’s two-stage design

(Simon 1989) that minimises the median number of patients required under the

null hypothesis, as opposed to the expected number of patients. A program is noted

as being available from the authors that performs the design search. The design differs

very little to that of Simon, other than when the response rate of the treatment is much

less than the null hypothesis rate. Termination at the end of the rst stage is for lack

of activity only.

DESIGNS WITH A SINGLE ARM 45

Shuster (2002)

∙Two-stage, binary outcome

∙Requires programming

∙Early termination for activity or lack of activity

The minimax design proposed by Shuster follows the same format as, for example,

Simon’s design (Simon 1989), although it allows early termination for activity at the

end of the rst stage, as well as for lack of activity. Sample sizes and cut-offs are

calculated based on exact type I and II errors, and the smallest expected maximum

sample size is calculated. The author shows that the proposed design generates the

smallest sample sizes under the null, alternative and maximum scenarios, compared

to Chang et al. (1987) and Fleming (1982). The author advises use of the proposed

minimax design when early termination for activity is benecial (giving as an example

the setting of paediatric cancer). A table of specic design scenarios is presented;

otherwise the design will require programming.

Tan and Machin (2002)

∙Two-stage, binary outcome

∙Standard software available

∙Early termination for lack of activity

Tan and Machin propose two Bayesian designs: the single threshold design (STD)

and the dual threshold design (DTD). The designs are intended to be user-friendly and

easily interpreted by those familiar with frequentist phase II designs. They provide

an alternative approach to the design, analysis and interpretation of phase II trial

data, allowing incorporation of relevant prior information and summarising results in

terms of the probability that a response proportion falls within a pre-specied region

of interest. The following design parameters are required: target response rate for

a new treatment; prior distribution for the experimental treatment being tested; the

minimum probability of the true response rate being at least the target response rate

at the end of stage 1 (for the STD, 𝜆1) and at the end of the study (𝜆2). For the DTD,

the lower response rate of no further interest is also required, and here 𝜆1 represents

the probability that the true response rate is lower than the rate of no further interest

at the end of stage 1.

The STD focuses on ensuring, at the end of the rst stage, that the nal response

rate of the drug has a reasonable probability of passing the target response rate at the

end of the trial. The DTD, however, focuses on ensuring, at the end of the rst stage,

that the nal response rate at the end of the trial is not below the response rate of no

further interest. Tables are given for a number of design scenarios and the designs

are compared with the frequentist approach of Simon (1989). Programs have been

developed and are available in Machin et al. (2008).

46 A PRACTICAL GUIDE TO DESIGNING PHASE II TRIALS IN ONCOLOGY

Case and Morgan (2003)

∙Two-stage, binary outcome

∙Standard software available and programs noted as being available from authors

∙Early termination for lack of activity

Case and Morgan outline a design with survival outcomes which are dichotomised

to give survival probabilities at pre-specied time points of interest, incorporating all

available information. The design is aimed to avoid the drawbacks of extended follow-

up periods and breaks in recruitment during follow-up between stages. The design

does not require a halt in recruitment between stages as Nelson–Aalen estimates

of survival are used to incorporate all survival information up to the time point of

interest, at the time of interim analysis. Early termination is permitted only for lack

of activity. FORTRAN programs are noted as being available upon request from the

authors, to identify the optimal design, and the proposed design is also available in

Machin et al. (2008).

Jung et al. (2004)

∙Two-stage, binary outcome

∙Programs noted as being available from authors

∙Early termination for activity or lack of activity

Jung and colleagues propose a searching algorithm to identify admissible two-

stage designs based on Bayesian decision theory, incorporating a loss function which

is a weighted function of the expected number of patients and the maximum number

of patients required. A computer program, developed in Java and noted as being

available upon request from the authors, searches admissible designs (comparing the

expected loss to the Bayes risk) using information provided on the response rates

under null and alternative hypotheses, type I and II errors and maximum number

of patients available. Stopping rules are generated based on a minimum number of

responses required to be observed.

Lin and Shih (2004)

∙Two-stage, binary outcome

∙Programs noted as being available from authors

∙Early termination for lack of activity

Lin and Shih propose an adaptive design which allows sample size to be adjusted

at the end of the rst stage, to account for uncertainty in the response rate under the

alternative hypothesis. Two potential response rates are pre-specied at the design

stage, and the adjustment made based on these. Tables are provided and software is

DESIGNS WITH A SINGLE ARM 47

noted as being available from the authors to compute sample size and cut-offs that

are not displayed.

Wang et al. (2005)

∙Two-stage, binary outcome

∙Requires programming

∙Early termination for lack of activity

Wang and colleagues propose a Bayesian version of Simon’s two-stage design

(Simon 1989), controlling frequentist type I and II error rates, as well as Bayesian

error rates measured using posterior distributions. The design therefore allows incor-

poration of commonly controlled error rates familiar with frequentists, as well as

enabling calculation of posterior probabilities regarding treatment activity. Stopping

at the end of stage I is permitted for lack of activity only. Sample sizes and stopping

boundaries for each stage are provided in tables for specic design scenarios, and the

design is compared with that of Simon (1989) and the STD and DTDs of Tan and

Machin (2002). The design requires programming to enable implementation.

Banerjee and Tsiatis (2006)

∙Two-stage, binary outcome

∙Programs noted as being available from authors

∙Early termination for activity or lack of activity

Banerjee and Tsiatis propose an adaptive design that is similar to Simon’s optimal

design (Simon 1989); however, the sample size and decision criteria of the second

stage depend on the outcome of the rst stage, and the trial may terminate at the

end of the rst stage for either activity or lack of activity. The sample size and

decision criteria of the second stage are computed using Bayesian decision theory,

minimising the average sample size under the null hypothesis. The design offers

a small sample size reduction over Simon’s optimal design (3–5%); however, the

authors note potential difculties in planning a trial where the total sample size is

unknown at the outset. Tables are given for various design scenarios, and software is

noted as being available on request.

Ye and Shyr (2007)

∙Two-stage, binary outcome

∙Programs available on website

∙Early termination for lack of activity

The design proposed by Ye and Shyr follows that of Simon (1989) but is designed

to balance the number of patients investigated in each of the stages. Attention is

48 A PRACTICAL GUIDE TO DESIGNING PHASE II TRIALS IN ONCOLOGY

focused on a binary response outcome measure although the design may be extended

to multiple correlated outcome measures (where more than one outcome can occur

for one patient). Tables are provided with various design scenarios and software is

available at www.vicc.org/biostatistics/ts/freqapp.php (last accessed August 2013).

The authors note that when there are few patients available, Simon’s minimax

design would be preferable. If the optimal and minimax designs have dramatically

imbalanced sample sizes between the two stages then the proposed design may be

preferable; otherwise Simon’s optimal design can be used as this minimises the sam-

ple size under the null hypothesis. Termination at the end of the rst stage is for lack

of activity only.

Litwin et al. (2007)

∙Two-stage, binary outcome

∙Programs noted as being available from authors

∙Early termination for activity or lack of activity

Litwin and colleagues describe a design based on the outcome measure of PFS

at two set time points. In the second stage of the design the success rate of a binary

outcome measure at a set time point t2 is considered, which is dependent upon the

success rate of a, possibly different, binary outcome measure at an earlier set time

point t1 (assessed at the end of stage 1), for example, the progression-free rate at

time t2, dependent upon the progression-free rate at time t1. The design incorporates

the possibility of stopping for either activity or lack of activity at the end of the rst

stage and proceeds as follows:

1. n1 patients are recruited to the study and followed to time t1 for PFS.

2. If there are too few patients who are progression-free at time t1 then the trial

is stopped early for lack of activity.

3. If there are sufcient patients who are progression-free at time t1 then accrual

continues to the second stage until a total of n2 patients are recruited. Patients

in the initial cohort who are progression-free at t1 continue on in the study.

4. At the end of the second stage (t2) all n2–n1 patients from the second stage

and all those patients progression-free at t1 are evaluated at time t2.

Programs are noted as being available upon request from the authors.

Wu and Shih (2008)

∙Two-stage, binary outcome

∙Requires programming for adaptations

∙Early termination for lack of activity

DESIGNS WITH A SINGLE ARM 49

Wu and Shih propose approaches to handling data that deviate from the pre-

specied Simon’s two-stage design (Simon 1989). The following scenarios are

considered:

∙Simon’s design ‘interrupted’, such that there is additional evaluation at the

following times:

a. before completion of the rst stage;

b. after the rst stage but before completion of the second stage;

c. before completion of the rst stage and again before completion of the

second stage.

∙Simon’s design ‘abandoned’, that is, the rst unscheduled assessment leads to

abandoning the original design and an adapted assessment schedule is devel-

oped.

Adaptations to stopping rules are presented as well as detail regarding adjusting

the p-value associated with decision-making under the deviated scenario. Adaptations

are based on the conditional probability of passing the rst stage and the conditional

power of rejecting the null hypothesis assuming the study continues to its nal stage.

No software is detailed; however, sufcient detail is given to allow the design to be

programmed for implementation.

Koyama and Chen (2008)

∙Two-stage, binary outcome

∙Programs available on website

∙Early termination for lack of activity

Koyama and Chen detail an adaptation to Simon’s two-stage design (Simon

1989) to allow proper inference when the actual sample size at stage 2 deviates

from the planned sample size. The methodology allows computation of updated

critical values for the second stage, based on the number of responses observed

in the rst stage, and adapted p-values, point estimates and condence intervals,

incorporating conditional power. Software is available at http://biostat.mc.vanderbilt.

edu/wiki/Main/TwoStageInference (last accessed August 2013).

Chi and Chen (2008)

∙Two-stage, binary outcome

∙Standard programs available as per Simon’s two-stage design (Simon 1989)

∙Early termination for activity or lack of activity

Chi and Chen propose a curtailed sampling adaptation to Simon’s two-stage

design (Simon 1989). The design allows earlier termination of the trial in the event that

50 A PRACTICAL GUIDE TO DESIGNING PHASE II TRIALS IN ONCOLOGY

the treatment is either very active or very inactive. Detail of the proposed adaptations

is presented and is easily implemented, using standard software to identify a design

via Simon’s methodology (Simon 1989). The design can offer substantial savings in

sample sizes when compared to continuing recruitment to the predetermined number

of patients under Simon’s design.

Sambucini (2008)

∙Two-stage, binary outcome

∙Programs noted as being available

∙Early termination for lack of activity

Sambucini proposes a Bayesian design which represents a predictive version of

the STD proposed by Tan and Machin (2002), taking into account the uncertainty

about the data that have not yet been observed, to identify optimal two-stage sample

sizes and cut-off values. A ‘design’ prior and an ‘analysis’ prior are required to

be specied to compute prior predictive distributions and posterior probabilities of

treatment activity, respectively. A program written in R is available to determine

optimal two-stage designs.

3.2.2 Continuous outcome measure

Chang et al. (1999)

∙Two-stage, continuous outcome

∙Requires programming

∙Early termination for activity or lack of activity

Chang and colleagues outline a design for continuous or binary outcomes that

takes into account the number of patients on whom historical control data are based.

This reects the fact that the variances of the historical control data and the experi-

mental data will differ. The trial may be terminated at the end of the rst stage for

either activity or lack of activity. Algorithms are used to determine critical values for

stopping, and sample size is calculated by multiplying the single-stage sample size

(formulae provided) by between 1.02 and 1.05.

3.2.3 Multinomial outcome measure

Zee et al. (1999)

∙Two-stage, multinomial outcome

∙Requires programming

∙Early termination for activity or lack of activity

DESIGNS WITH A SINGLE ARM 51

Zee and colleagues propose single-stage and multi-stage single-arm designs con-

sidering a multinomial outcome, in the context of incorporating progressive disease

as well as response into the primary outcome measure. Analysis is based on the num-

ber of responses and progressions observed, compared with predetermined stopping

criteria. A computer program written in SAS identies the operating characteristics

of the designs. This is not noted as being available in the paper; however, detail is

given to allow implementation.

Lin and Chen (2000)

∙Two-stage, multinomial outcome

∙Programs noted as being available from authors

∙Early termination for activity or lack of activity

Lin and Chen detail a design that considers both CRs and PRs in a trinomial

outcome, weighting CR as the more desirable outcome. Investigators must specify

overall response rates under the null and alternative hypotheses, and the proportion

that is attributable to CR. A weighted score is calculated at the end of each stage

and this is compared with predetermined cut-off boundaries as in Simon’s optimal

and minimax designs (to which this paper may be viewed as an extension) (Simon

1989). Tables are given for specic scenarios; however, programs are noted as being

available upon request from the authors.

Panageas et al. (2002)

∙Two-stage, multinomial outcome

∙Programs noted as being available from authors

∙Early termination for lack of activity

Panageas and colleagues propose a single-arm two-stage design based on Simon’s

optimal design (Simon 1989), but with a trinomial outcome (e.g. CR vs. PR vs. non-

response). The design requires null and alternative response rates to be specied for

both CR and PR, that is, improvements in both categories are required. The optimal

design is identied iteratively, to minimise the expected sample size and to satisfy

the type I and II error rates. A computer program is noted as being available from

the authors, with specic design scenarios presented in tables. There is a marginal

saving on sample size over Simon’s design (Simon 1989). The design differs from

that of Zee et al. (1999), detailed above, since early termination is permitted for lack

of activity only and does not incorporate weighting of the different outcomes.

Lu et al. (2005)

∙Two-stage, multinomial outcome

∙Programs may be available from authors

∙Early termination for lack of activity

52 A PRACTICAL GUIDE TO DESIGNING PHASE II TRIALS IN ONCOLOGY

Lu and colleagues propose a design (one-stage or two-stage) to look at both CR

and total response (or other such outcome measures whereby observing one outcome

implies the other outcome is also observed). The design recommends a treatment

for further investigation if either of the alternative hypotheses is met (i.e. for CR

or for total response) and rejects the treatment if neither is met. The designs follow

the general approach of Fleming’s single-stage (1982) or Simon’s two-stage (Simon

1989) approach whereby the number of CRs and total responses are compared to

identied stopping boundaries. Tables are provided for some combinations of null

and alternative hypotheses; however, formulae are given and at the time of manuscript

publication programs were in development. The design differs from others in this

section in that one outcome measure is a sub-outcome measure of the other, whereas

other designs consider discrete outcome measures such as PR versus CR.

Chang et al. (2007)

∙Two-stage, multinomial outcome

∙Programs noted as being available from authors

∙Early termination for activity or lack of activity

Chang and colleagues propose a single-stage and a two-stage design for win-

dow studies which aim to assess the potential activity of a new treatment in newly

diagnosed patients. Treatment is given to patients for a short period of time before

standard chemotherapy, and each patient is assessed for response or early progres-

sion (both binary outcome measures). The alternative hypothesis is based on both the

response rate being above a pre-specied rate and the early progressive disease rate

being below a pre-specied rate. The outcomes follow a multinomial distribution. A

SAS program is noted as being available from the authors to identify designs.

Gofn and Tu (2008)

∙Two-stage, multinomial outcome

∙Programs noted as being available from authors

∙Early termination for lack of activity

Gofn and Tu outline an adaptation to the design proposed by Zee et al. (1999),

based on a simulation approach to determine design. The authors note that the previous

design of Zee was found to have lower power than intended (Freidlin et al. 2002;

Zee et al. 1999). In the proposed two-stage design decision criteria are based on

the proportion of patients with response and the proportion of patients with early

progressive disease, in an advanced disease setting. The alternative hypothesis is that

the response rate is sufciently high or the early progressive disease rate is sufciently

low. Simulation is used to determine the required stopping boundaries to satisfy pre-

specied design criteria. Programs are noted as being available upon request from

the authors. Early termination is permitted for lack of activity only.

DESIGNS WITH A SINGLE ARM 53

Kocherginsky et al. (2009)

∙Two-stage, multinomial outcome

∙Programs available from website

∙Early termination for lack of activity

Kocherginsky and colleagues outline a design to consider the proportion of

patients achieving response and the proportion of patients not progressing early.

The alternative hypothesis being tested is that the response rate is sufciently high or

the non-progression rate is sufciently high. Sample size is calculated via numerical

searching, with the initial sample size estimate calculated following Simon’s two-

stage design (Simon 1989) based on the response rate limits. A numerical search is

then performed over all combinations of design parameters to determine stopping

rules, evaluated by assessing the probability of early termination and the probability

of rejecting the null hypothesis. The design incorporates a thorough assessment of

the operating characteristics over a range of response and progression rates, to guard

against unexpectedly high false-positive rates under certain parameters. Programs

written in R to implement the numerical search are noted as being available from

http://health.bsd.uchicago.edu/lestore/biostatlab/ (last accessed July 2013). Early

termination is permitted for lack of activity only.

Stallard and Cockey (2008)

∙Two-stage, multinomial outcome

∙Programs noted as being available from author

∙Early termination for lack of activity

Stallard and Cockey propose single-arm, one- and two-stage designs for ordered

categorical data, where the rejection region for the null hypothesis is dened based

on the likelihood ratio test. The null region over which the type I error is controlled

considers a weighting of the proportion of patients in each response category, in a

similar manner to that of Lin and Chen (2000). The focus of the paper is on response

with three levels; however, the design may be extended to more than three levels.

Programs are noted as being available from the rst author to allow identication of

designs.

3.2.4 Time-to-event outcome measure

Case and Morgan (2003)

∙Two-stage, time-to-event outcome

∙Standard software available and programs noted as being available from authors

∙Early termination for lack of activity

54 A PRACTICAL GUIDE TO DESIGNING PHASE II TRIALS IN ONCOLOGY

Case and Morgan outline a design with survival outcomes which are dichotomised

to give survival probabilities at pre-specied time points of interest, incorporating all

available information. The design is aimed to avoid the drawbacks of extended follow-

up periods and breaks in recruitment during follow-up between stages. The design

does not require a halt in recruitment between stages as Nelson–Aalen estimates

of survival are used to incorporate all survival information up to the time point of

interest, at the time of interim analysis. Early termination is permitted only for lack

of activity. FORTRAN programs are noted as being available upon request from the

authors, to identify the optimal design, and the proposed design is also available in

Machin et al. (2008).

Litwin et al. (2007)

∙Two-stage, time-to-event outcome

∙Programs noted as being available from authors

∙Early termination for activity or lack of activity

Litwin and colleagues describe a design based on the outcome measure of

progression-free survival at two set time points, that is, a binary outcome. In the

second stage of the design the success rate of a binary outcome measure at a set

time point t2 is considered, which is dependent upon the success rate of a, possibly

different, binary outcome measure at an earlier set time point t1 (assessed at the end

of stage 1), for example, the progression-free rate at time t2, dependent upon the

progression-free rate at time t1. The design incorporates the possibility of stopping

for either activity or lack of activity at the end of the rst stage and proceeds as

follows:

1. n1 patients are recruited to the study and followed to time t1 for PFS.

2. If there are too few patients who are progression-free at time t1 then the trial

is stopped early for lack of activity.

3. If there are sufcient patients who are progression-free at time t1 then accrual

continues to the second stage until a total of n2 patients are recruited. Patients

in the initial cohort who are progression-free at t1 continue on the study.

4. At the end of the second stage (t2) all n2–n1 patients from the second stage

and all those patients progression-free at t1 are evaluated at time t2.

Programs are noted as being available upon request from the authors.

3.2.5 Ratio of times to progression

No references identied.

DESIGNS WITH A SINGLE ARM 55

3.3 Multi-stage designs

3.3.1 Binary outcome measure

Herson (1979)

∙Multi-stage, binary outcome

∙Programs noted as being available from author

∙Early termination for lack of activity

Herson describes a Bayesian multi-stage design that considers early stopping

rules based on the predictive probability that a treatment will not be successful at

the end of the phase II trial. Early termination is therefore only permitted for lack

of activity. The design incorporates investigators’ prior information on the response

rate of the experimental treatment and condence in this prior information (via a

coefcient of variation). Early termination boundaries are calculated based on pre-

specied sample sizes ranging from 20 to 30 patients, and consideration is also given

to the expected sample size of a subsequent phase III trial. Programs are noted as

being available from the author.

Fleming (1982)

∙Multi-stage, binary outcome

∙Standard software available for overall sample size

∙Early termination for activity or lack of activity

Fleming proposes a one-stage, two-stage and multi-stage design. The multi-stage

design addresses multiple testing considerations to allow early termination in the case

of extreme results, employing the standard single-stage test procedure at the last test.

Tables are presented for specic design scenarios using the exact underlying binomial

probabilities rather than the normal approximation to these probabilities. Programs

are readily available to calculate the overall sample size for a one-stage design (e.g.

Machin et al. 2008), with sample sizes at each stage chosen to be approximately

equal. Termination at the end of each stage is permitted for activity or lack of activity.

Bellissant et al. (1990)

∙Multi-stage, binary outcome

∙Requires programming

∙Early termination for activity or lack of activity

Bellissant and colleagues apply the triangular test (TT) and sequential probability

ratio test (SPRT), previously used in phase III trials, to single-arm group-sequential

56 A PRACTICAL GUIDE TO DESIGNING PHASE II TRIALS IN ONCOLOGY

phase II trials with a binary outcome. An efcient score, Z, and Fisher’s information,

V, are calculated derived from the likelihood function. The log odds ratio statis-

tic is used as the measure of the difference between the actual success rate and

the null hypothesis rate. Formulae are given for the calculation of Zand Vas well

as for calculation of the stopping boundaries, whereby Zis seen as the difference

between observed and expected number of responses under the null hypothesis and

Vas the variance of Zunder the null hypothesis. Early termination is permitted for

either activity or lack of activity. Sample size is justied via the operating charac-

teristics of the TT and SPRT, and group sizes and number of stages are arbitrary,

ranging from 5 to 15 in the examples. The design requires programming to enable

implementation.

Chen et al. (1994)

∙Multi-stage, binary outcome

∙Requires programming

∙Early termination for lack of activity

Chen and colleagues propose a multi-stage design that is an extension of Gehan’s

two-stage design (Gehan 1961), where the chance of stopping early is increased if

the observed response rate is smaller than that of interest. It is noted that this design

is suitable for phase II trials that have high expected response rates, in contrast to

the design of Gehan where the chance of stopping a trial early is low if the response

rate of interest is above 0.3. Limited tables of designs are presented, therefore addi-

tional designs will require programming. Early termination is permitted for lack of

activity only.

Ensign et al. (1994)

∙Multi-stage, binary outcome

∙Requires programming

∙Early termination for lack of activity

Ensign and colleagues propose a single-arm three-stage design that is an extension

to the two-stage design of Simon (1989). At the end of the rst stage, the trial is

terminated if no responses are observed (i.e. for lack of activity). If at least one

response is observed, stages 2 and 3 are carried out as per Simon’s stages 1 and 2.

The sample sizes and cut-offs for stages 2 and 3 are determined to minimise the

expected sample size under the null hypothesis. A restriction is made that the rst

stage must include at least ve patients. Extensive tables are provided for designs

under differing scenarios; however, the design will need programming to enable

implementation outwith those provided.

DESIGNS WITH A SINGLE ARM 57

Thall and Simon (1994a)

∙Multi-stage, binary outcome

∙Programs noted as being available from authors

∙Early termination for lack of activity

Thall and Simon present sample size calculations for their original Bayesian

continuous monitoring design (Thall and Simon 1994b). Adaptations to this design are

also provided. The impact of group-sequential monitoring, as opposed to continuous

monitoring, is assessed and it is found that assessment after every two, three or four

patients has little impact on results; however, reducing assessments much further can

increase the likelihood of inconclusive results. The rst adaptation considers early

stopping boundaries for inconclusive results. The second adaptation considers early

termination for lack of activity, which considers only lower stopping boundaries.

Software is noted as being available upon request to compute and implement each

of these designs, including the original continuous monitoring design (Thall and

Simon 1994b).

Tan and Xiong (1996)

∙Multi-stage, binary outcome

∙Programs available on website

∙Early termination for activity or lack of activity

Tan and Xiong propose a group-sequential (or continuous monitoring) design for

the assessment of a binary outcome in a single-arm trial, based on the sequential

conditional probability ratio test (SCPRT). The design is based around comparison to

a reference xed sample size test (RFSST) such as that proposed by Fleming (1982),

and the results that this would achieve, since it is desirable to preserve the power of

this test while incorporating additional opportunities to terminate the trial early. The

proposed design provides similar power to the xed sample size test, but allows more

opportunity to terminate the trial early (for activity or lack of activity). A FORTRAN

program is available via the website (http://lib.stat.cmu.edu/designs/scprtbin (last

accessed July 2013)) to compute the design characteristics.

Chen (1997)

∙Multi-stage, binary outcome

∙Program noted as being available from author

∙Early termination for lack of activity

Chen proposes an extension to Simon’s minimax and optimal two-stage designs

(Simon 1989), simply incorporating an additional stage. Tables are provided with

designs under various scenarios, and a FORTRAN program is noted as being available

58 A PRACTICAL GUIDE TO DESIGNING PHASE II TRIALS IN ONCOLOGY

from the author for other scenarios. When compared to Simon’s design, the three-stage

design sometimes has smaller expected sample size; however, this is not consistent.

Compared to Ensign’s three-stage design (Ensign et al. 1994), the proposed design

does not make restrictions on the size and cut-off for the rst stage.

Murray et al. (2004)

∙Multi-stage, binary outcome

∙Requires programming

∙Early termination for activity or lack of activity

Murray and colleagues detail calculation of stopping rules based on condence

interval estimation of the response rate at each stage. A table of specic design

scenarios is presented; however, the design requires programming to identify optimal

decision criteria for scenarios outwith the tables. The design is based on a pre-

specied xed sample size (i.e. no sample size calculation is performed) and a xed

number of stages (with xed sample size at each stage), with type I and II errors

evaluated for the resulting design. Early termination is permitted for either activity

or lack of activity. The design may be used when only a small number of patients

are available for study (30 patients considered in the motivating example) and exact

binomial calculations are employed.

Ayanlowo and Redden (2007)

∙Multi-stage, binary outcome

∙Requires programming

∙Early termination for lack of activity

Ayanlowo and Redden propose a stochastic curtailment design which is based

on the simple binomial test and considers the conditional probability of declaring a

treatment active at the end of the trial, conditional upon the responses observed to

date and the assumption that the alternative hypothesis is true. The design requires

programming to identify the points at which to conduct interim assessments. Sample

size determination is based on a binomial test. Stochastic curtailment adaptations to

Simon’s minimax and optimal design are also proposed (Simon 1989). While the

proposed designs provide more opportunity to stop a trial early due to an inactive

treatment, the authors suggest its use only when Simon’s minimax design is already

being considered, and when the trial is expected to recruit slowly and the outcome

may be observed relatively quickly.

Chen and Shan (2008)

∙Multi-stage, binary outcome

∙Programs noted as being available from authors

∙Early termination for activity or lack of activity

DESIGNS WITH A SINGLE ARM 59

Chen and Shan outline a three-stage design, extending previous designs to allow

early termination for either activity or lack of activity (Chen 1997; Ensign et al.

1994; Simon 1989). Tables are given for optimal and minimax designs where the

difference in null and alternative hypothesis rates is 0.20 or 0.15, for a number of

scenarios. A C program is noted as being available from the authors to search for

designs under alternative scenarios. Comparing the proposed optimal and minimax

designs with those of Chen (1997), the designs presented in the current paper require

larger maximal sample size under the optimal design and similar maximal sample

size under the minimax design, but have a smaller average sample number in most

cases. Due to the ability to terminate early for either activity or lack of activity, the

probability of early termination at the rst stage and overall is higher for the current

designs compared to those of Chen (1997).

Lee and Liu (2008)

∙Multi-stage, binary outcome

∙Programs available from website

∙Early termination for lack of activity or activity

Lee and Liu outline a Bayesian group-sequential/continuous monitoring design

based on a binary outcome and the use of predictive probabilities (probability of a

positive result should the trial run to conclusion, given the interim data observed).

The design incorporates early termination for lack of activity, as well as activ-

ity. The continuous monitoring design is compared to Simon’s two-stage design

(Simon 1989). Under the proposed approach the probability of stopping the trial

early is higher, and in general, the expected sample size under the null hypothesis

is smaller. When assessing the design for robustness to deviation from continu-

ous monitoring, although the type I error rate is inated (usually less than 10%)

the design generally remains robust. The authors provide further considerations of

robustness to early termination, estimation bias and comparison to posterior probabil-

ity designs. Software is available from https://biostatistics.mdanderson.org/Software

Download/SingleSoftware.aspx?Software_Id=84 (last accessed July 2013) to allow

implementation.

3.3.2 Continuous outcome measure

No references identied.

3.3.3 Multinomial outcome measure

Zee et al. (1999)

∙Multi-stage, multinomial outcome

∙Requires programming

∙Early termination for activity or lack of activity

60 A PRACTICAL GUIDE TO DESIGNING PHASE II TRIALS IN ONCOLOGY

Zee and colleagues propose single-stage and multi-stage single-arm designs con-

sidering a multinomial outcome, in the context of incorporating progressive disease

as well as response into the primary outcome measure. Analysis is based on the num-

ber of responses and progressions observed, compared with predetermined stopping

criteria. A computer program written in SAS identies the operating characteristics

of the designs. This is not noted as being available in the paper; however, detail is

given to allow implementation.

3.3.4 Time-to-event outcome measure

Cheung and Thall (2002)

∙Multi-stage, time-to-event outcome

∙Programs noted as being available from authors

∙Early termination for activity or lack of activity

Cheung and Thall propose a Bayesian sequential-adaptive procedure for contin-

uous monitoring, which may be extended to assessment after cohorts of more than

one patient, that is, multi-stage. The outcome measure of interest is a binary indica-

tor of a composite time-to-event outcome, utilising all the censored and uncensored

observations at each interim assessment. Continuous monitoring based on the approx-

imate posterior (CMAP) is used following Thall and Simon (1994b). The design can

incorporate multiple competing and non-competing outcomes. Early termination is

permitted for activity or lack of activity. R programs are noted as being available

from the authors to allow implementation of the design. This design enables data to

be incorporated on all patients at each interim assessment without all follow-up data

being obtained and may therefore be used when follow-up of each patient is for a

non-trivial period of time.

3.3.5 Ratio of times to progression

No references identied.

3.4 Continuous monitoring designs

3.4.1 Binary outcome measure

Thall and Simon (1994b)

∙Continuous monitoring, binary outcome

∙Programs noted as being available from authors

∙Early termination for activity or lack of activity

Thall and Simon propose a Bayesian continuous monitoring design to assess the

binary outcome of response in a single-arm trial. Information required includes prior

DESIGNS WITH A SINGLE ARM 61

information on the standard treatment, required improvement due to the experimental

treatment and minimum and maximum boundaries on sample size. A at prior is

assumed for the experimental treatment. Also required is a concentration parameter

for the experimental treatment, representing the amount of dispersion about the

mean of the experimental treatment. After the response outcome is observed on each

patient, the trial may be terminated for lack of activity, terminated for activity or

continued to the next patient (although this assessment is not required before the next

patient can be recruited). If the maximum sample size is obtained and neither of the

stopping boundaries for activity or lack of activity is crossed, the trial is declared

inconclusive. Stopping boundaries are calculated in terms of upper and lower posterior

probability limits, calculated by numerical integration. Designs should be assessed

by simulation to investigate the operating characteristics. Detail regarding software

and implementation is presented elsewhere (Thall and Simon 1994a).

Thall and Simon (1994a)

∙Continuous monitoring, binary outcome

∙Programs noted as being available from authors

∙Early termination for lack of activity

Thall and Simon present sample size calculations for their original Bayesian con-

tinuous monitoring design (Thall and Simon 1994b) outlined above. Adaptations to

this design are also provided. The rst adaptation considers early stopping boundaries

for inconclusive results. The second adaptation considers early termination for lack

of activity, which considers only lower stopping boundaries. Software is noted as

being available upon request to compute and implement the designs, including the

original continuous monitoring design (Thall and Simon 1994b).

Tan and Xiong (1996)

∙Continuous monitoring, binary outcome

∙Programs available on website

∙Early termination for activity or lack of activity

Tan and Xiong propose a group-sequential (or continuous monitoring) design for

the assessment of a binary outcome in a single-arm trial, based on the SCPRT.

The design is based around comparison to a RFSST such as that proposed by

Fleming (1982), and the results that this would achieve, since it is desirable to

preserve the power of this test while incorporating additional opportunities to ter-

minate the trial early. The proposed design provides similar power to the xed

sample size test, but allows more opportunity to terminate the trial early (for

activity or lack of activity). A FORTRAN program is available via the website

(http://lib.stat.cmu.edu/designs/scprtbin (last accessed July 2013)) to compute the

design characteristics.

62 A PRACTICAL GUIDE TO DESIGNING PHASE II TRIALS IN ONCOLOGY

Chen and Chaloner (2006)

∙Continuous monitoring, binary outcome

∙Programs noted as being available from authors

∙Early termination for lack of activity

Chen and Chaloner propose a stopping rule for a Bayesian continuous monitoring

design. Stopping rules are based on both the posterior probability that the failure rate

is unacceptably high and the posterior probability that the failure rate is acceptably

low, where these high and low values are derived from historical data. Patients are

recruited until either the stopping rules are met or a maximum sample size has been

recruited. Programs are noted as being available in R (via contacting the authors) to

enable computation of the stopping boundaries and operating characteristics based

on maximum sample size available, prior information on the experimental treatment,

null and alternative hypothesis rates and the upper and lower posterior probability

bounds. Early termination is permitted only for lack of activity.

Lee and Liu (2008)

∙Continuous monitoring, binary outcome

∙Programs available from website

∙Early termination for lack of activity or activity

Lee and Liu outline a Bayesian group-sequential/continuous monitoring design

based on a binary outcome and the use of predictive probabilities (probability of a

positive result should the trial run to conclusion, given the interim data observed).

The design incorporates early termination for lack of activity, as well as activ-

ity. The continuous monitoring design is compared to Simon’s two-stage design

(Simon 1989). Under the proposed approach the probability of stopping the trial

early is higher, and in general, the expected sample size under the null hypothesis

is smaller. When assessing the design for robustness to deviation from continu-

ous monitoring, although the type I error rate is inated (usually less than 10%)

the design generally remains robust. The authors provide further considerations of

robustness to early termination, estimation bias and comparison to posterior probabil-

ity designs. Software is available from https://biostatistics.mdanderson.org/Software

Download/SingleSoftware.aspx?Software_Id=84 (last accessed July 2013) to allow

implementation.

Johnson and Cook (2009)

∙Continuous monitoring, binary outcome

∙Programs available on website

∙Early termination for lack of activity or activity

DESIGNS WITH A SINGLE ARM 63

Johnson and Cook propose a Bayesian continuous monitoring design based

on formal hypothesis tests. They argue that, in contrast to Bayesian designs based

on posterior credible intervals, any misspecication of prior densities associated

with the alternative hypothesis cannot bias the trial results in favour of the null

hypothesis when the proposed formal hypothesis test approach is used. Analysis

is performed after data are available for each patient, and the trial may be termi-

nated early for activity or lack of activity. Software is available from https://bio

statistics.mdanderson.org/SoftwareDownload/SingleSoftware.aspx?Software_Id=94

(last accessed July 2013) which allows the trial to be designed according to user-

specied priors.

3.4.2 Continuous outcome measure

No references identied.

3.4.3 Multinomial outcome measure

No references identied.

3.4.4 Time-to-event outcome measure

Cheung and Thall (2002)

∙Continuous monitoring, time-to-event outcome

∙Programs noted as being available from authors

∙Early termination for activity or lack of activity

Cheung and Thall propose a Bayesian sequential-adaptive procedure for continu-

ous monitoring. The outcome measure of interest is a binary indicator of a composite

time-to-event outcome, utilising all the censored and uncensored observations at

each interim assessment. Continuous monitoring based on the approximate posterior

(CMAP) is used following Thall and Simon (1994b). The design can incorporate

multiple competing and non-competing outcomes. Early termination is permitted for

activity or lack of activity. R programs are noted as being available from the author

to allow implementation of the design. This design enables data to be incorporated

on all patients at each assessment without all follow-up data being obtained and may

therefore be used when follow-up of each patient is for a non-trivial period of time.

Thall et al. (2005)

∙Continuous monitoring, time-to-event outcome

∙Programs noted as being available from authors

∙Early termination for activity or lack of activity

64 A PRACTICAL GUIDE TO DESIGNING PHASE II TRIALS IN ONCOLOGY

Thall and colleagues propose Bayesian continuous monitoring designs that incor-

porate three time-to-event outcomes (death, disease progression and SAE). Various

amendments to the design are proposed, including randomisation, frequent interval

monitoring, alternative distribution assumptions and incorporation of interval cen-

soring for disease progression. The trial may be stopped early for lack of activity or

for activity. Simulations are performed to establish operating characteristics of the

designs. Programs are noted as being available from the authors upon request.

Johnson and Cook (2009)

∙Continuous monitoring, time-to-event outcome

∙Programs available on website

∙Early termination for lack of activity or activity

Johnson and Cook propose a Bayesian continuous monitoring design based

on formal hypothesis tests. They argue that, in contrast to Bayesian designs based

on posterior credible intervals, any misspecication of prior densities associated

with the alternative hypothesis cannot bias the trial results in favour of the null

hypothesis when the proposed formal hypothesis test approach is used. Analysis

is performed after data are available for each patient, and the trial may be termi-

nated early for activity or lack of activity. Software is available from https://bio

statistics.mdanderson.org/SoftwareDownload/SingleSoftware.aspx?Software_Id=94

(last accessed July 2013) which allows the trial to be designed according to user-

specied priors.

3.4.5 Ratio of times to progression

No references identied.

3.5 Decision-theoretic designs

3.5.1 Binary outcome measure

Sylvester and Staquet (1980)

∙Decision-theoretic, binary outcome

∙Requires programming

∙Early termination for activity or lack of activity

Sylvester and Staquet outline a decision-theoretic design whereby the sample

size and cut-off boundaries for decision-making in the phase II trial are calculated

based on the number of patients who would be expected to receive the experimental

treatment in a subsequent phase III trial, as well as prior probabilities of the response

proportions of the experimental treatment in the phase II trial. There are examples

DESIGNS WITH A SINGLE ARM 65

of specic design scenarios; however, the design would require programming to

enable implementation. Decision criteria are based on observing a given number of

responses. The design allows incorporation of interim assessments, at which the trial

may be terminated early for either activity or lack of activity.

3.5.2 Continuous outcome measure

No references identied.

3.5.3 Multinomial outcome measure

No references identied.

3.5.4 Time-to-event outcome measure

No references identied.

3.5.5 Ratio of times to progression

No references identied.

3.6 Three-outcome designs

3.6.1 Binary outcome measure

Lee et al. (1979)

∙Three-outcome design, binary outcome

∙Requires programming

∙Early termination for activity or lack of activity

Lee and colleagues present a two-stage, three-outcome design whereby the avail-

able sample size is pre-specied based on non-statistical considerations such as

patient availability, and the optimal design is identied based on given constraints.

The design is based on determining whether the true response rate is above or below

a single pre-specied response rate, incorporating the possibility to declare an incon-

clusive result. Tables are presented for a target 20% response rate only, with upper

and lower limits of 30% and 10%, respectively, for determining activity, lack of

activity or an inconclusive result. The paper is therefore somewhat impractical for

designs beyond this specic setting, without further work to implement for other

scenarios. The design may be seen to complement the condence interval approach

to estimating a response rate with given precision.

66 A PRACTICAL GUIDE TO DESIGNING PHASE II TRIALS IN ONCOLOGY

Storer (1992)

∙Three-outcome design, binary outcome

∙Programs noted as being available from author

∙Early termination for activity or lack of activity

Storer proposes a three-outcome design that is an adaptation to single-, two- and

multi-stage designs such as those described by Fleming (1982). The event rate of

uncertainty is taken to be around the midpoint between the event rate of no interest

and the event rate of interest. As described in Chapter 2, various error rates are

required to be specied. Here the probabilities of concluding in favour of either the

null or alternative hypothesis when in fact the true response rate lies within the region

of uncertainty are required to be specied. These error rates are set to be equal under

this design. Programs are noted as being available to identify the design, upon request

from the author. Early termination is permitted for activity or lack of activity in the

two- and multi-stage designs.

Sargent et al. (2001)

∙Three-outcome design, binary outcome

∙Requires programming; programs may be available upon request

∙Early termination for lack of activity

Sargent and colleagues propose a single-stage (and two-stage) design with three

possible outcomes. As described in Chapter 2, various error rates are required to be

specied, corresponding to differing regions of the distribution curves presented in

Figure 2.2. Here specic probabilities for concluding uncertainty are specied under

both the null and alternative hypotheses (𝜆and 𝛿, respectively, in Figure 2.2), and

these may differ. Tables and formulae are provided for sample size and stopping rule

calculation. The design requires programming; however, programs may be available

upon request from the authors. Under the two-stage design, early termination is for

lack of activity only.

3.6.2 Continuous outcome measure

No references identied.

3.6.3 Multinomial outcome measure

No references identied.

3.6.4 Time-to-event outcome measure

No references identied.

DESIGNS WITH A SINGLE ARM 67

3.6.5 Ratio of times to progression

No references identied.

3.7 Phase II/III designs

There are no phase II/III designs listed in this chapter since these designs require a

control arm to be incorporated in the phase II trial, to enable a seamless transition to

phase III.

4

Designs for single experimental

therapies including

randomisation

Sarah Brown

The designs included in this chapter incorporate randomisation to a control arm with

the intention of a formally powered statistical comparison between the experimental

and control arms, as well as designs where incorporation of randomisation is primarily

to provide a calibration arm, with no statistical comparison formally powered. The

distinction between these approaches is presented for each design listed.

4.1 One-stage designs

4.1.1 Binary outcome measure

Herson and Carter (1986)

∙One-stage, binary outcome

∙No formally powered statistical comparison between arms

∙Requires programming

Herson and Carter consider the inclusion of a randomised calibration group

in single-stage phase II trials of a binary endpoint, in order to reduce the risk of

A Practical Guide to Designing Phase II Trials in Oncology, First Edition.

Sarah R. Brown, Walter M. Gregory, Chris Twelves and Julia Brown.

© 2014 John Wiley & Sons, Ltd. Published 2014 by John Wiley & Sons, Ltd.

RANDOMISED DESIGNS FOR SINGLE EXPERIMENTAL THERAPIES 69

false-negative decision-making. Patients are randomised between current standard

treatment (calibration group) and the treatment under investigation. Results of the

calibration group are intended largely to assess the credibility of the outcome in the

experimental arm, that is, not for formal comparative purposes. Decision criteria are

based primarily on the experimental arm results; however, outcomes in the calibration

arm are also considered to address the initial assumptions made regarding the current

standard treatment. Thus the trial essentially constitutes two separate designs, one for

the experimental arm and one for the calibration arm. Due to the assessment of the

control arm results, the overall sample size of the trial may be between three and ve

times that of a non-calibrated design. An example is provided; however, the design

will require programming.

Thall and Simon (1990)

∙One-stage, binary outcome

∙No formally powered statistical comparison between arms

∙Requires programming

Thall and Simon outline a design that incorporates historical data, including

variability, into the design of the trial. A specic proportion of patients are randomised

to a control arm dependent upon the amount of historical control data available, the

degree of both inter-study and intra-study variability and the overall sample size of

the phase II study being planned (following formulae provided). The inclusion of a

sample of patients randomised to a control arm allows the precision of the response

rate in the experimental arm at the end of the trial to be maximised, relative to

the control. Sample size is determined iteratively and the design would need to be

programmed to allow implementation.

Stone et al. (2007b)

∙One-stage, binary outcome

∙Formally powered statistical comparison between arms

∙Standard software available

Stone et al. discuss the use of progressive disease rate at a given time point (as

well as overall progression-free survival) as an outcome measure in randomised phase

II trials of cytostatic agents. Formal comparison between the experimental treatment

and the control treatment is performed for superiority; however, larger type I error

rates than would be used in phase III are incorporated, and large treatment effects

are targeted. The use of relaxed type I errors and large targeted treatment effects

contribute to reduced sample sizes compared to phase III trials, and may therefore be

deemed more realistic for phase II trials.

70 A PRACTICAL GUIDE TO DESIGNING PHASE II TRIALS IN ONCOLOGY

4.1.2 Continuous outcome measure

Thall and Simon (1990)

∙One-stage, continuous outcome

∙No formally powered statistical comparison between arms

∙Requires programming

Thall and Simon outline a design that incorporates historical data, including

variability, into the design of the trial. A specic proportion of patients are randomised

to a control arm dependent upon the amount of historical control data available, the

degree of both inter-study and intra-study variability and the overall sample size of

the phase II study being planned (following formulae provided). The inclusion of a

sample of patients randomised to a control arm allows the precision of the outcome

estimate in the experimental arm at the end of the trial to be maximised, relative to

the control. Sample size is determined iteratively and the design would need to be

programmed to allow implementation.

Chen and Beckman (2009)

∙One-stage, continuous outcome

∙Formally powered statistical comparison between arms

∙Programming code provided

Chen and Beckman describe an approach to a randomised phase II trial design

that incorporates optimal error rates. Optimal type I and II errors for the design are

identied by means of an efciency score function which is based on initial proposed

error rates and the ratio of sample sizes between phases II and III. Sample size

calculation is performed using standard phase III-type approaches using the optimal

identied type I and II errors. Formal comparison with the control arm is incorporated.

The design considers cost efciency of the phase II and III trials, on the basis of the

ratio of sample sizes between phases II and III and the aprioriprobability of success

of the investigational treatment. An R program is provided in the appendix of the

manuscript to identify optimal designs.

4.1.3 Multinomial outcome measure

No references identied.

4.1.4 Time-to-event outcome measure

Simon et al. (2001)

∙One-stage, time-to-event outcome

∙Formally powered statistical comparison between arms

∙Standard software available

RANDOMISED DESIGNS FOR SINGLE EXPERIMENTAL THERAPIES 71

Simon and colleagues propose what is termed a randomised ‘phase 2.5’ trial

design, incorporating intermediate outcome measures such as progression-free sur-

vival. The design takes the approach of a phase III trial design, with a formally

powered statistical comparison with the control arm for superiority. It incorporates

a relaxed signicance level, large targeted treatment effects and intermediate out-

come measures, resulting in more pragmatic and feasible sample sizes than would be

required in a phase III trial. The design is straightforward, following the methodology

of phase III trials; however, it is important to note that this should only be used where

large treatment differences are realistic and should not be seen as a way to eliminate

phase III testing.

Stone et al. (2007b)

∙One-stage, time-to-event outcome

∙Formally powered statistical comparison between arms

∙Standard software available

Stone et al. discuss the use of progressive disease rate at a given time point, as

well as overall progression-free survival, as an outcome measure in randomised phase

II trials of cytostatic agents. Formal comparison between the experimental treatment

and the control treatment is performed for superiority; however, larger type I error

rates than would be used in phase III are incorporated, and large treatment effects

are targeted. The use of relaxed type I errors and large targeted treatment effects

contribute to reduced sample sizes compared to phase III trials, and may therefore be

deemed more realistic for phase II trials. This reects the designs described above

by Simon et al. in the setting of time-to-event outcomes, which are described by the

authors as ‘phase 2.5’ designs (Simon et al. 2001).

Chen and Beckman (2009)

∙One-stage, time-to-event outcome

∙Formally powered statistical comparison between arms

∙Programming code provided

Chen and Beckman describe an approach to a randomised phase II trial design

that incorporates optimal error rates. Optimal type I and II errors for the design are

identied by means of an efciency score function which is based on initial proposed

error rates and the ratio of sample sizes between phases II and III. Sample size

calculation is performed using standard phase III-type approaches using the optimal

identied type I and II errors. Formal comparison with the control arm is incorporated.

The design considers cost efciency of the phase II and III trials, on the basis of the

ratio of sample sizes between phases II and III and the aprioriprobability of success

of the investigational treatment. An R program is provided in the appendix of the

manuscript to identify optimal designs.

72 A PRACTICAL GUIDE TO DESIGNING PHASE II TRIALS IN ONCOLOGY

4.1.5 Ratio of times to progression

No references identied.

4.2 Two-stage designs

4.2.1 Binary outcome measure

Whitehead et al. (2009)

∙Two-stage, binary outcome

∙Formally powered statistical comparison between arms

∙Requires programming

∙Early termination for activity or lack of activity

Whitehead and colleagues outline a randomised controlled two-stage design with

normally distributed outcome measures that may be extended to the setting of binary

and ordinal outcomes. The design allows early termination for activity, or lack of

activity, and incorporates formal comparison between experimental and control arms.

At the interim assessment, which takes place after approximately half the total number

of patients have been recruited, sample size re-estimation may be incorporated if

necessary. The methodology employs approximations to the normal distribution since

sample sizes are generally large enough. No software is detailed as being available

to identify designs; however, programming is noted as being possible in SAS, and

detail is provided to allow its implementation. Simulation is also required to evaluate

potential designs.

Jung (2008)

∙Two-stage, binary outcome

∙Formally powered statistical comparison between arms

∙Programs noted as being available from author

∙Early termination for lack of activity

Jung proposes a randomised controlled extension to Simon’s optimal and minimax

designs (Simon 1989) in the context of a binary outcome measure (e.g. response). The

experimental arm is formally compared with the control arm and declared worthy of

further investigation only if there are sufciently more responders in the experimental

arm. Extensive tables are provided, and programs to identify designs not included in

tables are noted as being available upon request from the author. Extensions to the

design include unequal allocation, strict type I and II error control and randomisation

to more than one experimental arm.

RANDOMISED DESIGNS FOR SINGLE EXPERIMENTAL THERAPIES 73

Jung and George (2009)

∙Two-stage, binary outcome

∙Formally powered statistical comparison between arms

∙Requires minimal programming

∙Early termination for lack of efcacy

Jung and George propose methods of comparing treatment arms in a randomised

phase II trial, where the intention is either to determine whether a single treatment

is worthy of evaluation compared to a control or to select one treatment from many

for further evaluation. The phase II design for a single experimental treatment ver-

sus control is initially based on the evaluation of the control and experimental arms

independently following Simon’s two-stage design (Simon 1989), or similar. The

experimental treatment must rst be accepted via this evaluation, that is, compared

to historical control rates, and is then formally compared with the concurrent con-

trol arm. The experimental treatment is deemed worthy of further evaluation if the

treatment difference between the two arms is above some pre-dened value. No soft-

ware is detailed; however, detail is given which should allow implementation, and

sufcient examples are also provided. The initial two-stage design can be calculated

using standard software available for Simon’s two-stage design.

4.2.2 Continuous outcome measure

Whitehead et al. (2009)

∙Two-stage, continuous outcome

∙Formally powered statistical comparison between arms

∙Requires programming

∙Early termination for activity or lack of activity

Whitehead and colleagues outline a randomised controlled two-stage design with

normally distributed outcome measures. The design allows early termination for

activity, or lack of activity, and incorporates formal comparison between experimental

and control arms. At the interim assessment, which takes place after approximately

half the total number of patients have been recruited, sample size re-estimation may be

incorporated if necessary. The methodology employs approximations to the normal

distribution since sample sizes are generally large enough. No software is detailed as

being available to identify designs; however, programming is noted as being possible

in SAS, and detail is provided to allow its implementation. Simulation is also required

to evaluate potential designs. The authors note that the design may be extended to

binary and ordinal outcome measures.

74 A PRACTICAL GUIDE TO DESIGNING PHASE II TRIALS IN ONCOLOGY

4.2.3 Multinomial outcome measure

Whitehead et al. (2009)

∙Two-stage, multinomial outcome

∙Formally powered statistical comparison between arms

∙Requires programming

∙Early termination for activity or lack of activity

Whitehead and colleagues outline a randomised controlled two-stage design with

normally distributed outcome measures, which may be extended to binary and ordinal

outcome measures. The design allows early termination for activity, and lack of

activity, and incorporates formal comparison between experimental and control arms.

At the interim assessment, which takes place after approximately half the total number

of patients have been recruited, sample size re-estimation may be incorporated if

necessary. The methodology employs approximations to the normal distribution since

sample sizes are generally large enough. No software is detailed as being available

to identify designs; however, programming is noted as being possible in SAS, and

detail is provided to allow its implementation. Simulation is also required to evaluate

potential designs.

Sun et al. (2009)

∙Two-stage, multinomial outcome

∙Formally powered statistical comparison between arms

∙Software noted as being available from author

∙Early termination for lack of activity

Sun and colleagues propose a randomised two-stage design based on Zee’s single-

arm multi-stage design with multinomial outcome measure (Zee et al. 1999), adjusting

the rules such that a sufciently high response rate or a sufciently low early pro-

gressive disease rate should warrant further investigation of the treatment. Optimal

and minimax designs are proposed following the methodology of Simon (1989).

Differences in response and progressive disease rates between control and exper-

imental arms are compared, and the authors note that the intention of the phase

II trial is to screen for potential efcacy as opposed to identifying statistically

signicant differences. An extension is also proposed to the multi-arm selection

setting. Detail is given regarding how to implement the designs in practice, and

software is noted as being available by contacting the rst author to allow iden-

tication of designs. The design recommends a treatment for further investigation

when the response rate is sufciently high, or the early progressive disease rate is

sufciently low. Early termination is permitted for lack of activity only. The authors

RANDOMISED DESIGNS FOR SINGLE EXPERIMENTAL THERAPIES 75

also note that the design may be extended to studies monitoring safety and efcacy

simultaneously.

4.2.4 Time-to-event outcome measure

No references identied.

4.2.5 Ratio of times to progression

No references identied.

4.3 Multi-stage designs

4.3.1 Binary outcome measure

No references identied.

4.3.2 Continuous outcome measure

Cronin et al. (1999)

∙Multi-stage, continuous outcome

∙Formally powered statistical comparison between arms

∙Standard software available for sample size

∙Early termination for activity or lack of activity

Cronin and colleagues propose a Bayesian design for monitoring of phase II trials.

The design incorporates both sceptical and indifferent priors at each of the interim

analyses, according to the hypothesis being tested. Early termination is permitted for

activity or lack of activity, and as such, priors differ at interim and nal analysis.

Posterior distributions are updated at each analysis. When compared with frequentist

group-sequential methods, the proposed Bayesian methods performed at least as well

for the main purpose of detecting ineffective treatments early. The Bayesian method

was slowest to stop when the treatment had clear biological activity. The authors

note that the Bayesian method provides exibility to make changes to outcome

measures, analyses and original trial plans at interim analyses without introducing

theoretical statistical complications. Standard software is available for sample size

calculation.

4.3.3 Multinomial outcome measure

No references identied.

76 A PRACTICAL GUIDE TO DESIGNING PHASE II TRIALS IN ONCOLOGY

4.3.4 Time-to-event outcome measure

No references identied.

4.3.5 Ratio of times to progression

No references identied.

4.4 Continuous monitoring designs

4.4.1 Binary outcome measure

No references identied.

4.4.2 Continuous outcome measure

No references identied.

4.4.3 Multinomial outcome measure

No references identied.

4.4.4 Time-to-event outcome measure

Thall et al. (2005)

∙Continuous monitoring, time-to-event outcome

∙Formally powered statistical comparison between arms

∙Programs available from authors

∙Early termination for activity or lack of activity

Thall and colleagues propose Bayesian continuous monitoring designs that incor-

porate three time-to-event outcomes (death, disease progression and serious adverse

event). Various amendments to the initial proposed single-arm continuous moni-

toring design assuming exponential distribution are proposed (Cheung and Thall

2002), including randomisation, frequent interval monitoring, alternative distribution

assumptions and incorporation of interval censoring for disease progression. The trial

may be stopped early for lack of activity or for activity. Simulations are performed

to assess the performance of the design. Programs are noted as being available from

the authors upon request.

4.4.5 Ratio of times to progression

No references identied.

RANDOMISED DESIGNS FOR SINGLE EXPERIMENTAL THERAPIES 77

4.5 Three-outcome designs

4.5.1 Binary outcome measure

Hong and Wang (2007)

∙Three-outcome design, binary outcome measure

∙Formally powered statistical comparison between arms

∙Programs noted as being available from authors

∙Early termination for lack of activity

Hong and Wang detail both a single-stage and a two-stage three-outcome design

which extend that of Sargent et al. (2001) (Chapter 3) to a randomised comparative

design. The region of uncertainty falls around the middle region between the null

hypothesis that the difference in response rates between the arms is zero and the

alternative hypothesis that the difference is delta. In the two-stage design the trial

may only be terminated at the end of the rst stage for lack of activity. A SAS program

to identify the design is noted as being available on request from the authors.

4.5.2 Continuous outcome measure

No references identied.

4.5.3 Multinomial outcome measure

No references identied.

4.5.4 Time-to-event outcome measure

No references identied.

4.5.5 Ratio of times to progression

No references identied.

4.6 Phase II/III designs

4.6.1 Binary outcome measure

Storer (1990)

∙Phase II/III, binary outcome

∙No formal comparison with control in phase II

∙Standard software available for phase II, phase III requires programming

∙No early termination during phase II

78 A PRACTICAL GUIDE TO DESIGNING PHASE II TRIALS IN ONCOLOGY

Storer proposes a phase II/III design with the same binary outcome at both stages.

This corresponds to a single-arm phase II design (e.g. A’Hern 2001) embedded in a

randomised phase III trial (i.e. randomisation takes place in phase II but the design

and primary decision-making are based on a single-arm design). The phase II decision

criteria are based on the results of the experimental arm only, as opposed to comparing

activity between the experimental and control arms. Sample size calculations for the

phase II aspect may be performed using standard available software for one-stage

designs, based on numerical searching to satisfy given type I and II errors and null

and alternative hypothesis response rates. Standard approaches to phase III sample

size calculation are used, with formulae provided to incorporate an adjustment for

the phase II/III design. This design may be used as a basis for phase II/III designs

whereby any single-arm phase II design is embedded in a phase III trial, including

where the outcome measure at phase III differs to that at phase II.

The design described above uses the same outcome measure at phase II as it does

at phase III. Although this may be seen as seamless phase II/III approach, in effect

it reects a phase III trial with an early interim analysis on the primary outcome

measure (albeit based on a single-arm design). In this setting, consideration should

be given to the most appropriate outcome measure to use for both the phase II and

phase III primary outcome. It is rare that efcacy in the phase III setting could be

claimed on the basis of a binary outcome; rather, a time-to-event outcome is usually

required in phase III trials.

Lachin and Younes (2007)

∙Phase II/III, binary outcome

∙Formally powered statistical comparison between arms

∙Requires programming

∙No early termination during phase II

Lachin and Younes outline a phase II/III design that incorporates different out-

come measures at phases II and III (with phase II being a shorter term outcome

measure). Joint distributions for the phase II and III outcomes are calculated, and

the design operating characteristics and sample sizes are calculated via iteration and

numerical integration. An estimate of the correlation between the two outcome mea-

sures is required. The design preserves the type I and II error rates, and patients

randomised during phase II are included in the phase III analysis. Analysis of the

phase II outcome measure considers a formal comparison for lack of activity only (or

excessive toxicity). Software is not detailed as being available; therefore, this design

would require programming to allow implementation.

Chow and Tu (2008)

∙Phase II/III, binary outcome

∙Formally powered statistical comparison between arms

RANDOMISED DESIGNS FOR SINGLE EXPERIMENTAL THERAPIES 79

∙Requires programming

∙No early termination during phase II

Chow and Tu present sample size formulae for seamless adaptive phase II/III

designs where the outcome measures at each phase differ, but the outcome measure

distributions remain the same (e.g. binary outcome in phase II, binary outcome in

phase III). This design is based on two separate studies, with differing endpoints and

durations, which are then combined. Data from patients in the phase II trial are used

to predict the phase III endpoint, for those patients, rather than continuing to follow

patients to observe the phase III endpoint. These data are then combined with the

data from the phase III trial. The relationship between the outcome measures at each

phase is required to be known and well established. This is an essential component

due to the predictive nature of the design. The design will require programming to

enable implementation.

4.6.2 Continuous outcome measure

Liu and Pledger (2005)

∙Phase II/III, continuous outcome

∙Formally powered statistical comparison between arms

∙Requires programming

∙No early termination during phase II

Liu and Pledger detail a phase II/III design for a single experimental treat-

ment compared to a control, as well as outlining a design in the dose-nding

context. In the single experimental treatment setting, the experimental treatment

is compared with the control treatment at the end of the phase II trial, based on

the short-term continuous outcome measure associated with the phase II trial. At

this stage, there is no break in recruitment during the analysis, and the sample

size for the phase III trial may be modied to allow estimation of the standard

deviation of the phase III outcome measure. Different phase II and III outcome

measures are used. At the end of the trial, the test statistics from the rst and sec-

ond stages (i.e. phases II and III) are combined. The treatment effect required to be

observed is the same for both short- and long-term outcome measures and needs to

be pre-specied, along with prior information on probability of success and stan-

dard deviation for each outcome measure. This information is used to generate the

operating characteristics of the design. Formulae are given which would need to

be implemented in order to identify the design. The design offers exibility in that

the second-stage (phase III) sample size may be calculated based on updated data

from the rst stage (phase II), and adaptation rules do not need to be specied

in advance.

80 A PRACTICAL GUIDE TO DESIGNING PHASE II TRIALS IN ONCOLOGY

Lachin and Younes (2007)

∙Phase II/III, continuous outcome

∙Formally powered statistical comparison between arms

∙Requires programming

∙No early termination during phase II

Lachin and Younes outline a phase II/III design that incorporates different out-

come measures at phases II and III (with phase II being a shorter term outcome

measure). Joint distributions for the phase II and III outcomes are calculated, and

the design operating characteristics and sample sizes are calculated via iteration and

numerical integration. An estimate of the correlation between the two outcome mea-

sures is required. The design preserves the type I and II error rates, and patients

randomised during phase II are included in the phase III analysis. Analysis of the

phase II outcome measure considers a formal comparison for lack of activity only (or

excessive toxicity). Software is not detailed as being available; therefore, this design

would require programming to allow implementation. Detail is provided for binary

and continuous phase II outcome measures; however, extensions to time-to-event

outcomes are discussed.

Chow and Tu (2008)

∙Phase II/III, continuous outcome

∙Formally powered statistical comparison between arms

∙Requires programming

∙No early termination in phase II

Chow and Tu present sample size formulae for seamless adaptive phase II/III

designs where the outcome measures at each phase differ, but the outcome measure

distributions remain the same (e.g. binary outcome in phase II, binary outcome in

phase III). This design is based on two separate studies, with differing endpoints and

durations, which are then combined. Data from patients in the phase II trial are used

to predict the phase III endpoint, for those patients, rather than continuing to follow

patients to observe the phase III endpoint. These data are then combined with the

data from the phase III trial. The relationship between the outcome measures at each

phase is required to be known and well established. This is an essential component

due to the predictive nature of the design. The design will require programming to

enable implementation.

4.6.3 Multinomial outcome measure

No references identied.

RANDOMISED DESIGNS FOR SINGLE EXPERIMENTAL THERAPIES 81

4.6.4 Time-to-event outcome measure

Lachin and Younes (2007)

∙Phase II/III, time-to-event outcome

∙Formally powered statistical comparison between arms

∙Requires programming

∙No early termination during phase II

Lachin and Younes outline a phase II/III design that incorporates different out-

come measures at phases II and III (with phase II being a shorter term outcome

measure). Joint distributions for the phase II and III outcomes are calculated, and

the design operating characteristics and sample sizes are calculated via iteration and

numerical integration. An estimate of the correlation between the two outcome mea-

sures is required. The design preserves the type I and II error rates, and patients

randomised during phase II are included in the phase III analysis. Analysis of the

phase II outcome measure considers a formal comparison for lack of activity only (or

excessive toxicity). Software is not detailed as being available; therefore, this design

would require programming to allow implementation.

Chow and Tu (2008)

∙Phase II/III, time-to-event outcome

∙Formally powered statistical comparison between arms

∙Requires programming

∙No early termination in phase II

Chow and Tu present sample size formulae for seamless adaptive phase II/III

designs where the outcome measures at each phase differ, but the outcome measure

distributions remain the same (e.g. binary outcome in phase II, binary outcome in

phase III). This design is based on two separate studies, with differing endpoints and

durations, which are then combined. Data from patients in the phase II trial are used

to predict the phase III endpoint, for those patients, rather than continuing to follow

patients to observe the phase III endpoint. These data are then combined with the

data from the phase III trial. The relationship between the outcome measures at each

phase is required to be known and well established. This is an essential component

due to the predictive nature of the design. The design will require programming to

enable implementation.

4.6.5 Ratio of times to progression

No references identied.

82 A PRACTICAL GUIDE TO DESIGNING PHASE II TRIALS IN ONCOLOGY

4.7 Randomised discontinuation designs

4.7.1 Binary outcome measure

Kopec et al. (1993)

∙Randomised discontinuation, binary outcome

∙Formally powered statistical comparison between arms

∙Requires programming (can incorporate standard software)

∙No early termination

Kopec et al. introduce the randomised discontinuation design. All eligible patients

are initially treated with the investigational treatment for a pre-dened period of

time. At this time, all patients are assessed for response to treatment. Treatment

‘responders’ are randomised to either continue with the investigational treatment or

to discontinue the investigational treatment (and instead receive a placebo or current

standard treatment). A formal comparison is made between the experimental and

control arms at the end of the second stage (i.e. after randomisation). Formulae for

the calculation of response proportions are provided and are based on the sample

size needed for the randomised phase to assess relative activity. The design would

therefore need programming. Analysis may also be adapted to incorporate data from

patients in the rst stage, to adapt the response requirements for randomisation, for

example, to incorporate patients with stable disease or greater, as detailed by Rosner

et al. (2002). Alternatively, patients achieving response may continue with treatment,

those with progressive disease discontinue treatment and those with stable disease

are randomised (Stadler 2007). The current design, incorporating randomisation of

patients who are responding to treatment, may be more applicable to other disease

areas where life-threatening consequences of discontinuing treatment may be less

immediate, and there are fewer potential ethical implications associated with this.

4.7.2 Continuous outcome measure

No references identied.

4.7.3 Multinomial outcome measure

No references identied.

4.7.4 Time-to-event outcome measure

No references identied.

4.7.5 Ratio of times to progression

No references identied.

5

Treatment selection designs

Sarah Brown

The designs described within this chapter specically address the question of treat-

ment selection, that is, randomisation to multiple experimental treatment arms is

incorporated. It is, however, also possible to consider treatment selection using single-

arm or randomised phase II designs described in Chapters 3 and 4. In this respect the

aim is to show that each experimental treatment has sufcient activity (and tolerabil-

ity, if appropriate) before performing treatment selection. Treatment selection from

those experimental arms found to be sufciently active (and tolerable if appropriate)

may then take place, for example, using selection designs such as those described

by Sargent and Goldberg (2001) or Simon et al. (1985) (see Section 5.2.1 for further

details). These designs select the most active treatment with a pre-specied probabil-

ity of correct selection, according to the difference in activity observed between the

experimental arms. Such an approach, combining these selection designs with other

phase II designs, ensures that the treatments considered for selection have already

passed pre-specied minimum activity criteria (and possibly tolerability criteria),

prior to selection. Steinberg and Venzon provide an example of such an approach,

as described in Section 5.2.2 (Steinberg and Venzon 2002). The efciency of such

an approach, as compared with the alternative treatment selection designs described

within this chapter, should be considered in further detail on a trial-specic basis.

The designs within this chapter are organised as follows. First, designs including

a control arm are described in Section 5.1, organised by design category and by

outcome measure distribution. Second, in Section 5.2, designs that do not include

a control arm are presented, again by design category and by outcome measure.

Treatment selection designs that incorporate both activity and toxicity are presented

separately in Section 6.4.

A Practical Guide to Designing Phase II Trials in Oncology, First Edition.

Sarah R. Brown, Walter M. Gregory, Chris Twelves and Julia Brown.

© 2014 John Wiley & Sons, Ltd. Published 2014 by John Wiley & Sons, Ltd.

84 A PRACTICAL GUIDE TO DESIGNING PHASE II TRIALS IN ONCOLOGY

5.1 Including a control arm

5.1.1 One-stage designs

5.1.1.1 Binary outcome measure

No references identied.

5.1.1.2 Continuous outcome measure

No references identied.

5.1.1.3 Multinomial outcome measure

Whitehead and Jaki (2009)

∙One-stage, multinomial outcome, control arm

∙Formal comparison with control for selection

∙Programs noted as being available from authors

∙No early termination

Whitehead and Jaki propose one- and two-stage designs for phase II trials based

on ordered category outcomes, when the aim of the trial is to select a single treatment

to take forward to phase III evaluation. The design is randomised to incorporate a

formal comparison with a control arm, and hypothesis testing is based on the Mann–

Whitney statistic. The treatment identied with the smallest p-value indicating a

treatment effect is selected as the treatment to take forward for further investigation.

Details of sample size and critical value calculation are provided, and R code is noted

as being available from the authors to allow implementation. Specication of the

worthwhile treatment effect and the small positive treatment effect that is not worth

further investigation are required to be specied.

5.1.1.4 Time-to-event outcome measure

No references identied.

5.1.1.5 Ratio of times to progression

No references identied.

5.1.2 Two-stage designs

5.1.2.1 Binary outcome measure

Jung (2008)

∙Two-stage, binary outcome, control arm

∙Formal comparison with control for selection

TREATMENT SELECTION DESIGNS 85

∙Programs noted as being available from author

∙Early termination for lack of activity

Jung proposes a randomised controlled extension to Simon’s optimal and mini-

max designs (Simon 1989), considering a binary outcome measure and incorporating

early termination for lack of activity. The experimental arms are compared with

the control arm at the end of stage 1 and treatments may be dropped for lack of

activity. More than one experimental arm may therefore be taken forward to stage

2. If no treatments show improved activity over the control arm at the end of stage

1 the trial may be terminated for lack of activity. At the end of stage 2, all arms

that pass the stage 2 cut-off boundaries compared to control are deemed worthy of

further investigation. The selection design is an extension to the design described

comparing a single experimental arm with a control. In the selection design the

family-wise error rate, the probability of erroneously accepting an inactive treat-

ment, is controlled. Programs to identify designs are available upon request from

the author.

Jung and George (2009)

∙Two-stage, binary outcome, control arm

∙Formal comparison with control for selection

∙Requires minimal programming

∙Early termination for lack of activity

Jung and George propose methods of comparing treatment arms in a randomised

phase II trial, where the intention is either to select one treatment from many for

further evaluation or to determine whether a single treatment is worthy of evaluation

compared to a control. The phase II design is based on a k-armed trial (with or with-

out a control arm for selection) with each arm designed for independent evaluation

following Simon’s two-stage design (Simon 1989), or similar, based on historical con-

trol data, that is, no comparison is made with the control arm at this stage. Different

designs (i.e. the same two-stage design but with different operating characteristics)

may be used for different arms in the independent evaluation if deemed necessary.

A treatment must be accepted via the independent evaluation before it can be con-

sidered for selection, at which point comparisons may be made with the control

arm. p-Values are calculated to represent the probability that the difference between

the arms being compared is at least some pre-dened minimal accepted difference,

given the actual difference observed. The outcome measure used to select the better

treatment is the same outcome measure used for evaluation of each arm indepen-

dently, for example, tumour response. No software is detailed; however, detail is

given which should allow implementation, and sufcient examples are also provided.

The initial two-stage design can be calculated using software available for Simon’s

two-stage design.

86 A PRACTICAL GUIDE TO DESIGNING PHASE II TRIALS IN ONCOLOGY

5.1.2.2 Continuous outcome measure

Levy et al. (2006)

∙Two-stage, continuous outcome, control arm

∙No formal comparison with control for selection

∙Requires programming

∙No early termination; treatment selection at the end of stage 1

Levy et al. propose a randomised two-stage futility design incorporating treatment

selection at the end of the rst stage. At the end of the rst stage the ‘best’ treatment is

selected based on the treatment with the highest/lowest (‘best’) mean outcome, that is,

no comparison with control is made here. Sample size for the rst stage is calculated

to give at least 80% probability of correct selection. Patients then continue to be

randomised between control and the selected treatment, and data from the rst stage

is incorporated into the second-stage futility analysis, incorporating a bias correction.

The null hypothesis is that the selected treatment reduces the mean response by at

least x% compared to control; the alternative hypothesis is that the selected treatment

reduces the mean response by less than x% compared to control (reecting a futility

design). Sample size and power calculation details are provided in appendices.

Shun et al. (2008)

∙Two-stage, continuous outcome, control arm

∙No formal comparison with control for treatment selection

∙Requires programming

∙No early termination at the end of stage 1

Shun et al. propose a phase II/III or two-stage treatment selection design where

a single treatment is selected from two at the end of the rst stage. Randomisation

incorporates a control arm, with the intention of formal comparison at the end of the

second stage only, that is, no formal comparison for treatment selection. Treatment

selection is based on the experimental treatment with the highest/lowest (‘best’)

mean outcome. A normal approximation approach is proposed to avoid complex

numerical integration requirements. The design assumes that the treatment effects

of the experimental treatments are not the same. The practical approach to timing

of interim analysis addresses the need to perform this early in order to avoid type

I error ination and the need to perform this late enough such that there is a high

probability of correctly selecting the better treatment. No software is noted as being

available; however, detail is provided to allow implementation and a detailed example

is given. The authors note that this design can be extended to binary and time-to-

event outcome measures if the correlation between the nal and interim test statistics

is known.

TREATMENT SELECTION DESIGNS 87

5.1.2.3 Multinomial outcome measure

Sun et al. (2009)

∙Two-stage, multinomial outcome, control arm

∙Formal comparison with control

∙Software noted as being available from author

∙Early termination for lack of activity; early treatment selection

Sun and colleagues propose a randomised two-stage design based on Zee’s single-

arm multi-stage design with multinomial outcome measure (Zee et al. 1999), adjusting

the rules such that a sufciently high response rate or a sufciently low early pro-

gressive disease rate should warrant further investigation of a treatment. Optimal and

minimax designs are proposed following the methodology of Simon (1989), incor-

porating comparison with a control arm. Differences in response and progressive

disease rates between control and experimental arms are compared. The authors note

that the intention of the phase II trial is to screen for potential efcacy as opposed

to identifying statistically signicant differences compared with control. Patients are

randomised between multiple experimental treatments and a control arm. At the end

of the rst stage only those treatments that pass the stopping boundaries for both

response and progressive disease are continued to the second stage. If there is clear

evidence that one treatment is better than the other, selection may take place at the

end of the rst stage. If, at the end of the second stage, there is no clear evidence

that one experimental treatment is better than the other both arms may be considered

for further evaluation. Detail is given regarding how to implement the designs in

practice, and software is noted as being available by contacting the rst author to

allow identication of designs. The authors also note that the design may be extended

to studies monitoring safety and efcacy simultaneously.

Whitehead and Jaki (2009)

∙Two-stage, multinomial outcome, control arm

∙Formal comparison with control for selection

∙Programs noted as being available from authors

∙No early termination

Whitehead and Jaki propose one- and two-stage designs for phase II trials based

on ordered category outcomes, when the aim of the trial is to select a single treatment

to take forward to phase III evaluation. The design is randomised to incorporate a

formal comparison with a control arm, and hypothesis testing is based on the Mann–

Whitney statistic. In the two-stage design, treatment selection takes place at the end of

stage 1 whereby the treatment with the smallest p-value indicating a treatment effect is

selected as the treatment to take forward to stage 2. In stage 2, patients are randomised

between the selected treatment and control only. The nal analysis at the end of stage

88 A PRACTICAL GUIDE TO DESIGNING PHASE II TRIALS IN ONCOLOGY

2 is based on all data available on patients in the control arm and the selected treatment

arm. Details of sample size and critical value calculation are provided, and R code

is noted as being available from the authors to allow implementation. Specication

of the worthwhile treatment effect and the small positive treatment effect that is not

worth further investigation are required to be specied.

5.1.2.4 Time-to-event outcome measure

No references identied.

5.1.2.5 Ratio of times to progression

No references identied.

5.1.3 Multi-stage designs

5.1.3.1 Binary outcome measure

No references identied.

5.1.3.2 Continuous outcome measure

Cheung (2009)

∙Multi-stage, continuous outcome, control arm

∙Formal comparison with control for treatment selection

∙Requires programming

∙Early treatment selection and early termination for lack of activity

Cheung describes an adaptive multi-arm, multi-stage selection design incorpo-

rating a control arm and considering a normally distributed outcome measure. Two

multi-stage procedures are proposed: an extension of the sequential probability ratio

test (SPRT) with a maximum sample size and a truncated sequential elimination

procedure (ELIM). The SPRT method allows early selection of a treatment when

there is evidence of increased activity compared to control, whereas the ELIM pro-

cedure also allows early termination of arms for lack of activity. The proposed

procedures are compared with single-arm trials and the ELIM procedure is rec-

ommended over these, incorporating sample size reassessment at interim analyses.

Cohort sizes between interim assessments may range from 1 to 10 with little impact

on the design’s operating characteristics. Sample size formulae are provided which

will require implementing in order to identify the trial design.

5.1.3.3 Multinomial outcome measure

No references identied.

TREATMENT SELECTION DESIGNS 89

5.1.3.4 Time-to-event outcome measure

No references identied.

5.1.3.5 Ratio of times to progression

No references identied.

5.1.4 Continuous monitoring designs

No references identied.

5.1.5 Decision-theoretic designs

No references identied.

5.1.6 Three-outcome designs

No references identied.

5.1.7 Phase II/III designs – same primary outcome measure at

phase II and phase III

The designs outlined within this section incorporate the same primary outcome

measure for phase II assessment as that used for phase III. Although this may be

seen as a seamless phase II/III approach, in effect it reects a phase III trial with an

early interim analysis on the primary outcome measure. In this setting, consideration

should be given to the most appropriate outcome measure to use for both the phase II

and phase III primary outcome. It is rare that efcacy in the phase III setting could

be claimed on the basis of, for example, a binary outcome; rather, a time-to-event

outcome is usually required in phase III trials.

5.1.7.1 Binary outcome measure

Bauer et al. (1998)

∙Phase II/III, binary outcome, control arm

∙Formal comparison with control for treatment selection

∙Programs noted as being available from authors

∙Early termination for efcacy at the end of phase II

Bauer and colleagues outline a simulation program for an adaptive two-stage

design with application to phase II/III and dose nding. Two outcomes may be con-

sidered, with one primary variable on which formal hypothesis testing is performed

and the other for which adaptations at the end of the rst stage may be based on.

The outcomes may be binary or continuous, or a combination. The same primary

outcome measure is used at each analysis. Simulation is required to identify the best

90 A PRACTICAL GUIDE TO DESIGNING PHASE II TRIALS IN ONCOLOGY

design according to various operating characteristics and the performance of different

designs. A program is detailed (the focus of the manuscript) to allow implementation,

which is noted as being available on request from the authors. At the end of the rst

stage the stage 1 hypothesis is tested, generating a p-value p1. At the end of the second

stage the stage 2 hypothesis is tested using only data obtained from patients in stage

2, generating a p-value p2. The overall hypothesis is then tested combining p1 and

p2 using Fisher’s combination test (Fisher 1932). Application is given to phase II/

III, with treatment selection at the end of stage 1: if the p-value is signicant that at

least one of the treatments is superior then the treatment with the ‘best’ outcome is

considered in phase III. The trial may also terminate early for efcacy at the end of

stage 1 if the p-value is signicant at the stage 2 signicance level.

Bauer and Kieser (1999)

∙Phase II/III, binary outcome, control arm

∙Formal comparison with control for treatment selection

∙Programs noted as being available from author

∙Early termination for efcacy at the end of phase II

Bauer and Kieser detail a design that incorporates formal comparison of each of

the experimental arms with the control at the end of phase II (as well as testing whether

any of the treatments are superior to control). The same primary outcome measure is

used in both phases II and III. A xed sample size is used for phase II, however the

phase III sample size can be updated adaptively at the end of phase II. Stopping at the

end of phase II is permissible for either lack of efcacy or early evidence of efcacy.

The design also allows more than one treatment to be taken forward to phase III. At

the end of phase II the sample size may be re-estimated and the test statistics to use

at phase III are determined, according to the number of treatments taken forward and

the hypotheses to be tested. The decision criterion at the end of phase III is based on

Fisher’s combination test (Fisher 1932) whereby the p-values from both phases are

combined (as opposed to combining data from all patients). Simulation is required

as detailed in Bauer et al. (1998), as above. Examples are given in the dose-nding

setting and the authors note that the main advantage of this design is its exibility and

its control of the family-wise error rate. The design is similar to that detailed above

(Bauer et al. 1998) with the exception that the current paper gives more detail relating

to multiple comparisons between experimental treatments and control arm. When

considering either of these two designs, it is advised that both papers be considered

together since the software detailed in Bauer et al., above, is required to identify the

design proposed here.

Stallard and Todd (2003)

∙Phase II/III, binary outcome, control arm

∙Formal comparison with control for treatment selection

TREATMENT SELECTION DESIGNS 91

∙Programs noted as being available from authors

∙Early termination for efcacy at the end of phase II

Stallard and Todd propose a design whereby patients from phase II are incorpo-

rated in the phase III analysis, and treatment selection at the end of phase II is based

on the treatment with the largest test statistic using efcient scores and Fisher’s infor-

mation. A formal comparison is made between the selected treatment and control,

and the trial may be terminated early for lack of efcacy or superiority at this stage.

The type I error in the nal phase III analysis is adjusted for the treatment selection

in phase II. Overall sample size and phase II sample size are computed according to

group-sequential phase III designs such as those described by Whitehead (1997). A

computer program is noted as being available from the authors to calculate power for

stopping boundaries, according to pre-specied group sizes. The authors note that

the design is useful when one treatment is likely to be much better than the others at

phase II, as opposed to taking multiple treatments to phase III. Consideration should

also be given to the timing of the rst interim analysis (i.e. phase II assessment).

Too early and there is too little information, too late and there are too many patients

enrolled and thus potentially wasted resources.

Kelly et al. (2005)

∙Phase II/III, binary outcome, control arm

∙Formal comparison with control for treatment selection

∙Requires programming

∙Early termination for efcacy and lack of efcacy during phase II

Kelly and colleagues propose an adaptation to the design proposed by Stallard and

Todd (detailed above), such that more than one treatment may be selected at multiple

stages within the phase II part of the trial. Treatments are evaluated for selection

using Fisher’s information and an efcient score statistic which may be applied to

continuous, binary and failure time data. p-Values are calculated at each stage for

comparison of the best treatment with control. Only treatments within a pre-specied

margin of the efcient score statistic of the best treatment are continued to the next

stage, and all other treatments are dropped. Patients are randomised between control

and each of the treatments under investigation at each stage. The trial may stop for

efcacy or lack of efcacy at each stage. The example given is based on the use

of the triangular test described by Whitehead (1997), which uses expected Fisher’s

information to calculate operating characteristics.

Wang and Cui (2007)

∙Phase II/III, binary outcome, control arm

∙No formal comparison with control in phase II

92 A PRACTICAL GUIDE TO DESIGNING PHASE II TRIALS IN ONCOLOGY

∙Requires programming

∙No early termination during phase II

Wang and Cui outline a design whereby patients are randomised to each of

the experimental treatments under investigation and a control arm, using response-

adaptive randomisation (the paper is written in the context of dose selection but

could be applied to treatment selection). The allocation ratios are calculated based on

distance conditional powers (i.e. the probability that the event rate for the treatment

under investigation is larger than some pre-specied xed rate, based on the observed

data and the fact that some patients will not yet have had their outcome observed).

The treatment to which most patients have been randomised is deemed the most

efcacious at the end of the recruitment period. This selected treatment is then

formally compared with the control treatment, forming the phase III comparison.

This design uses binary outcome measures such as treatment response, for both the

phase II treatment selection and the phase III formal comparison; although it is noted

that continuous outcomes may be used. Simulation is required to investigate the

design parameters, with sample size calculated based on the phase III comparison.

The design may be implemented with the development of programs based on formulae

provided.

5.1.7.2 Continuous outcome measure

Bretz et al. (2006)

∙Phase II/III, continuous outcome, control arm

∙Formal comparison with control for treatment selection

∙Minimal programming required

∙Early termination for efcacy or lack of efcacy at the end of phase II

Bretz and colleagues outline a phase II/III design which allows treatment selection

at the interim assessment (i.e. at the end of phase II). The design allows data from

the rst stage to be incorporated into the nal analysis. Formal comparisons between

control and experimental treatments are performed at the end of each stage. Early

termination is permitted at the end of the rst stage (i.e. at the end of phase II) for

lack of efcacy or for early evidence of efcacy. Also at this time, if the study is to

be continued to phase III, adaptations to the design of the trial may be made such as

sample size reassessment based on the data observed to date. Final analysis includes

data from both stages, with decision criteria based on a combination of test results

(i.e. using methods such as Fisher’s product test of the conditional error function).

The closure principle is incorporated, such that a hypothesis is only rejected if it

and all associated intersection hypotheses are also rejected. Sample size formulae are

given to allow calculation. The design may be extended to multiple stages, in which

case early termination during the phase II aspect may be incorporated.

TREATMENT SELECTION DESIGNS 93

Bauer et al. (1998)

∙Phase II/III, continuous outcome, control arm

∙Formal comparison with control for treatment selection

∙Programs noted as being available from authors

∙Early termination for efcacy at the end of phase II

Bauer and colleagues outline a simulation program for an adaptive two-stage

design with application to phase II/III and dose nding. Two outcomes may be con-

sidered, with one primary variable on which formal hypothesis testing is performed

and the other for which adaptations at the end of the rst stage may be based on.

The outcomes may be binary or continuous, or a combination. The same primary

outcome measure is used at each analysis. Simulation is required to identify the best

design according to various operating characteristics and the performance of different

designs. A program is detailed (the focus of the manuscript) to allow implementation,

which is noted as being available on request from the authors. At the end of the rst

stage the stage 1 hypothesis is tested, generating a p-value p1. At the end of the

second stage the stage 2 hypothesis is tested using only data obtained from patients in

stage 2, generating a p-value p2. The overall hypothesis is then tested combining p1

and p2 using Fisher’s combination test (Fisher 1932). Application is given to phase

II/III, with treatment selection at the end of stage 1: if the p-value is signicant that

at least one of the treatments is superior then the treatment with the ‘best’ outcome

is considered in phase III. The trial may also terminate early for efcacy at the end

of stage 1 if the p-value is signicant at the stage 2 signicance level.

Bauer and Kieser (1999)

∙Phase II/III, continuous outcome, control arm

∙Formal comparison with control for treatment selection

∙Programs noted as being available from author

∙Early termination for efcacy at the end of phase II

Bauer and Kieser detail a design that incorporates formal comparison of each

of the experimental arms with the control at the end of phase II (as well as testing

whether any of the treatments are superior to control). The same primary outcome

measure is used in both phases II and III. A xed sample size is used for phase II,

however the phase III sample size can be updated adaptively at the end of phase

II. Stopping at the end of phase II is permissible for either lack of efcacy or early

evidence of efcacy and is based on p-value calculation. The design also allows more

than one treatment to be taken forward to phase III. At the end of phase II the sample

size may be re-estimated and the test statistics to use at phase III are determined,

according to the number of treatments taken forward and the hypotheses to be tested.

The decision criterion at the end of phase III is based on Fisher’s combination test

94 A PRACTICAL GUIDE TO DESIGNING PHASE II TRIALS IN ONCOLOGY

(Fisher 1932) whereby the p-values from both phases are combined (as opposed to

combining data from all patients). Simulation is required as detailed in Bauer et al.

(1998), as above. Examples are given in the dose-nding setting and the authors

note that the main advantage of this design is its exibility and its control of the

family-wise error rate. The design is similar to that detailed above (Bauer et al. 1998)

with the exception that the current paper gives more detail relating to the multiple

comparisons between experimental treatments and control arm. When considering

either of these two designs, it is advised that both papers be considered together since

the software detailed in Bauer et al., above, is required to identify the design proposed

here.

Stallard and Todd (2003)

∙Phase II/III, continuous outcome, control arm

∙Formal comparison with control for treatment selection

∙Programs noted as being available from authors

∙Early termination for efcacy at the end of phase II

Stallard and Todd propose a design whereby patients from phase II are incorpo-

rated in the phase III analysis, and treatment selection at the end of phase II is based

on the treatment with the largest test statistic using efcient scores and Fisher’s infor-

mation. A formal comparison is made between the selected treatment and control,

and the trial may be terminated early for lack of efcacy or superiority at this stage.

The type I error in the nal phase III analysis is adjusted for the treatment selection

in phase II. Overall sample size and phase II sample size are computed according to

group-sequential phase III designs such as those described by Whitehead (1997). A

computer program is noted as being available from the authors to calculate power for

stopping boundaries, according to pre-specied group sizes. The authors note that

the design is useful when one treatment is likely to be much better than the others at

phase II, as opposed to taking multiple treatments to phase III. Consideration should

also be given to the timing of the rst interim analysis (i.e. phase II assessment).

Too early and there is too little information, too late and there are too many patients

enrolled and thus potentially wasted resources.

Kelly et al. (2005)

∙Phase II/III, continuous outcome, control arm

∙Formal comparison with control for treatment selection

∙Requires programming

∙Early termination for efcacy and lack of efcacy during phase II

Kelly and colleagues propose an adaptation to the design proposed by Stallard and

Todd (detailed above), such that more than one treatment may be selected at multiple

stages within the phase II part of the trial. Treatments are evaluated for selection

TREATMENT SELECTION DESIGNS 95

using Fisher’s information and an efcient score statistic which may be applied to

continuous, binary and failure time data. p-Values are calculated at each stage for

comparison of the best treatment with control. Only treatments within a pre-specied

margin of the efcient score statistic of the best treatment are continued to the next

stage, and all other treatments are dropped. Patients are randomised between control

and each of the treatments under investigation at each stage. The trial may stop for

efcacy or lack of efcacy at each stage. The example given is based on the use

of the triangular test described by Whitehead (1997), which uses expected Fisher’s

information to calculate operating characteristics.

Wang (2006)

∙Phase II/III, continuous outcome, control arm

∙Formal comparison with control for treatment selection

∙Requires programming

∙Early termination for efcacy at the end of phase II

Wang proposes an adaptive design with treatment selection at the end of phase

II. Patients are randomised between control and each of the experimental treatments

under investigation in phase II. The design controls the overall type I error and allows

the conditional error function of the phase III trial to depend on the data observed

during phase II. Maximum sample sizes are required to be specied and simulations

performed to evaluate expected sample size. The identication of the optimal design

requires detailed numerical integration. At the end of the rst stage the treatment with

the largest test statistic is selected to take forward to phase III; however, the trial could

also be stopped at this point (i.e. at the end of phase II) for efcacy or lack of efcacy.

There is formal comparison between each of the experimental arms and the control

arm at the end of phase II, and as long as at least one experimental treatment has

sufcient activity, a treatment is selected for further testing in phase III (or selected

as being superior if signicant enough). The design has been implemented in R and

formulae are given to allow this to be implemented in other software, therefore the

design would need programming.

Wang and Cui (2007)

∙Phase II/III, continuous outcome, control arm

∙No formal comparison with control in phase II

∙Requires programming

∙No early termination during phase II

Wang and Cui outline a design whereby patients are randomised to each of

the experimental treatments under investigation and a control arm, using response-

adaptive randomisation (the paper is written in the context of dose selection but

could be applied to treatment selection). The allocation ratios are calculated based

96 A PRACTICAL GUIDE TO DESIGNING PHASE II TRIALS IN ONCOLOGY

on distance conditional powers (i.e. the probability that the treatment effect for the

treatment under investigation is larger than some pre-specied xed value, based on

the observed data and the fact that some patients will not yet have had their outcome

observed). The treatment to which most patients have been randomised is deemed the

most efcacious at the end of the recruitment period. This selected treatment is then

formally compared with the control treatment, forming the phase III comparison.

This design as detailed uses binary outcome measures such as treatment response, for

both the phase II treatment selection and the phase III formal comparison, although it

is noted that continuous outcomes may be used. Simulation is required to investigate

the design parameters, with sample size calculated based on the phase III comparison.

The design may be implemented with the development of programs based on formulae

provided.

Shun et al. (2008)

∙Phase II/III, continuous outcome, control arm

∙No formal comparison with control for treatment selection

∙Requires programming

∙No early termination at the end of phase II

Shun et al. propose a phase II/III or two-stage treatment selection design where

a single treatment is selected from two at the end of the rst stage. Randomisation

incorporates a control arm, with the intention of formal comparison at the end of the

second stage only, that is, no formal comparison for treatment selection. Treatment

selection is based on the experimental treatment with the highest/lowest (‘best’)

mean outcome. A normal approximation approach is proposed to avoid complex

numerical integration requirements. The design assumes that the treatment effects of

the experimental treatments are not the same. The practical approach to timing of

interim analysis addresses the need to perform this early in order to avoid type I error

ination, and the need to perform this late enough such that there is a high probability

of correctly selecting the better treatment. No software is noted as being available;

however, detail is provided to allow implementation and a detailed example is given.

The authors note that this design can be extended to binary and time-to-event outcome

measures if the correlation between the nal and interim test statistics is known.

5.1.7.3 Multinomial outcome measure

Whitehead and Jaki (2009)

∙Phase II/III, multinomial outcome, control arm

∙Formal comparison with control for selection

∙Programs noted as being available from authors

∙No early termination during phase II

TREATMENT SELECTION DESIGNS 97

Whitehead and Jaki propose one- and two-stage designs for phase II trials based

on ordered category outcomes, when the aim of the trial is to select a single treatment

to take forward to phase III evaluation. The authors note that the two-stage design

detailed may be applied to the phase II/III setting, although renements to the design

may be required including the use of different outcome measures for treatment selec-

tion and nal analysis. The design is randomised to incorporate a formal comparison

with a control arm, and hypothesis testing is based on the Mann–Whitney statistic.

In the phase II/III setting, treatment selection takes place at the end of stage 1, that

is, phase II, whereby the treatment with the smallest p-value indicating a treatment

effect is selected as the treatment to take forward to stage 2, that is, phase III. Early

termination for lack of activity is permitted at the end of phase II. During phase III,

patients are randomised between the selected treatment and control only. The nal

analysis at the end of phase III is based on all data available on patients in the con-

trol arm and the selected treatment arm. Details of sample size and critical value

calculation are provided, and R code is noted as being available from the authors

to allow implementation. Specication of the worthwhile treatment effect and the

small positive treatment effect that is not worth further investigation are required to

be specied.

5.1.7.4 Time-to-event outcome measure

Bauer and Kieser (1999)

∙Phase II/III, time-to-event outcome, control arm

∙Formal comparison with control for treatment selection

∙Programs noted as being available from author

∙Early termination for efcacy at the end of phase II

Bauer and Kieser detail a design that incorporates formal comparison of each

of the experimental arms with the control at the end of phase II (as well as testing

whether any of the treatments are superior to control). The same primary outcome

measure is used in both phases II and III. A xed sample size is used for phase II,

however the phase III sample size can be updated adaptively at the end of phase

II. Stopping at the end of phase II is permissible for either lack of efcacy or early

evidence of efcacy and is based on p-value calculation. The design also allows more

than one treatment to be taken forward to phase III. At the end of phase II the sample

size may be re-estimated and the test statistics to use at phase III are determined,

according to the number of treatments taken forward and the hypotheses to be tested.

The decision criterion at the end of phase III is based on Fisher’s combination test

(Fisher 1932) whereby the p-values from both phases are combined (as opposed to

combining data from all patients). Simulation is required as detailed in Bauer et al.

(1998). Examples are given in the dose-nding setting and the authors note that the

main advantage of this design is its exibility and its control of the family-wise error

rate. When considering this design, it is advised that the detail provided by Bauer et al.

98 A PRACTICAL GUIDE TO DESIGNING PHASE II TRIALS IN ONCOLOGY

(1998) also be reviewed since this paper outlines the software required to identify the

design proposed here.

Stallard and Todd (2003)

∙Phase II/III, time-to-event outcome, control arm

∙Formal comparison with control for treatment selection

∙Programs noted as being available from authors

∙Early termination for efcacy at the end of phase II

Stallard and Todd propose a design whereby patients from phase II are incorpo-

rated in the phase III analysis, and treatment selection at the end of phase II is based

on the treatment with the largest test statistic using efcient scores and Fisher’s infor-

mation. A formal comparison is made between the selected treatment and control,

and the trial may be terminated early for lack of efcacy or superiority at this stage.

The type I error in the nal phase III analysis is adjusted for the treatment selection

in phase II. Overall sample size and phase II sample size are computed according to

group-sequential phase III designs such as those described by Whitehead (1997). A

computer program is noted as being available from the authors to calculate power for

stopping boundaries, according to pre-specied group sizes. The authors note that

the design is useful when one treatment is likely to be much better than the others at

phase II, as opposed to taking multiple treatments to phase III. Consideration should

also be given to the timing of the rst interim analysis (i.e. phase II assessment).

Too early and there is too little information, too late and there are too many patients

enrolled and thus potentially wasted resources.

Kelly et al. (2005)

∙Phase II/III, time-to-event outcome, control arm

∙Formal comparison with control for treatment selection

∙Requires programming

∙Early termination for efcacy and lack of efcacy during phase II

Kelly and colleagues propose an adaptation to the design proposed by Stallard and

Todd (detailed above), such that more than one treatment may be selected at multiple

stages within the phase II part of the trial. Treatments are evaluated for selection

using Fisher’s information and an efcient score statistic which may be applied to

continuous, binary and failure time data. p-Values are calculated at each stage for

comparison of the best treatment with control. Only treatments within a pre-specied

margin of the efcient score statistic of the best treatment are continued to the next

stage, and all other treatments are dropped. Patients are randomised between control

and each of the treatments under investigation at each stage. The trial may stop for

efcacy or lack of efcacy at each stage. The example given is based on the use

TREATMENT SELECTION DESIGNS 99

of the triangular test described by Whitehead (1997), which uses expected Fisher’s

information to calculate operating characteristics.

5.1.7.5 Ratio of times to progression

No references identied.

5.1.8 Phase II/III designs – different primary outcome

measures at phase II and phase III

The literature described within this section considers designs whereby different pri-

mary outcome measures are used for phase II and for phase III. Here the phase II

primary outcome measure should be selected based on the discussions provided in

Chapter 2, as this is not intended to be used for phase III decision-making.

5.1.8.1 Binary outcome measure

Todd and Stallard (2005)

∙Phase II/III, binary outcome, control arm

∙Formal comparison with control for treatment selection

∙Programs noted as being available from the authors

∙No early termination during phase II

Todd and Stallard describe a design where treatment selection occurs at the

rst interim assessment (phase II) based on comparison of a short-term outcome

measure for each of the treatments versus control. Patients are randomised to

each of the experimental treatments and control during phase II, and then to the

selected treatment and the control during phase III. Phase III is carried out in a

group-sequential manner, with the experimental treatment compared to control in

terms of a longer term outcome measure. Selection at phase II is based on the

treatment with the largest test statistic, that is, there is formal comparison with

control but the study may only be terminated for lack of activity at this stage. The

trial protocol remains the same throughout the study; therefore, patients in phase II

can be incorporated in phase III. Required treatment effects (clinically signicant),

treatment effects that are still desirable but not clinically signicant and expected

correlation between phase II and III outcome measures are all required to identify

the complete phase II/III design. Formulae are given and programs are noted as

being available from the authors to calculate stopping boundaries.

5.1.8.2 Continuous outcome measure

Todd and Stallard (2005)

∙Phase II/III, continuous outcome, control arm

∙Formal comparison with control for treatment selection

100 A PRACTICAL GUIDE TO DESIGNING PHASE II TRIALS IN ONCOLOGY

∙Programs noted as being available from authors

∙No early termination during phase II

Todd and Stallard describe a design where treatment selection occurs at the

rst interim assessment (phase II) based on comparison of a short-term outcome

measure for each of the treatments versus control. Patients are randomised to each

of the experimental treatments and control during phase II, and then to the selected

treatment and the control during phase III. Phase III is carried out in a group-

sequential manner, with the experimental treatment compared to control in terms of

a longer term outcome measure. Selection at phase II is based on the treatment with

the largest test statistic, that is, there is formal comparison with control but the study

may only be terminated for lack of activity at this stage. The trial protocol remains

the same throughout the study; therefore, patients in phase II can be incorporated in

phase III. Required treatment effects (clinically signicant), treatment effects that are

still desirable but not clinically signicant and expected correlation between phase II

and III outcome measures are all required to identify the complete phase II/III design.

Formulae are given and programs are noted as being available from the authors to

calculate stopping boundaries.

Liu and Pledger (2005)

∙Phase II/III, continuous outcome, control arm

∙Formal comparison with control for treatment selection

∙Requires programming

∙No early termination during phase II

Liu and Pledger detail a phase II/III design in the dose-nding context where

patients are randomised to different doses and a placebo–control with the intention

of dose selection, as well as an adaptive two-stage phase II/III design where the

intention of phase II is to determine whether or not to continue to phase III, for a

single experimental treatment. Short-term continuous outcome measures are used at

the end of phase II to ‘prune’ the doses and to perform sample size adjustment for

the second stage (phase III), and long-term continuous outcome measures are used to

estimate the dose–response curve to calculate trend statistics for the analysis of the

phase III (and also at the end of phase II). Patients continue to be randomised to all

doses for a short period of phase III during the rst analysis for dose selection at the

end of phase II (i.e. there is no break in recruitment for phase II analysis), at which

point more than one dose may be carried forward. The treatment effect required to be

observed is the same for both short- and long-term outcome measures and needs to be

pre-specied, along with prior information on probability of success for each dose,

standard deviation for each outcome measure, the time period between enrolment of

the rst patient and the rst analysis and the likely recruitment in this period. This

information is used to generate the operating characteristics of the design. Formulae

are given which would need to be implemented in order to identify the design. The

TREATMENT SELECTION DESIGNS 101

design offers exibility in that the second stage (phase III) sample size may be

calculated based on updated data from the rst stage (phase II), and adaptation rules

do not need to be specied in advance.

Shun et al. (2008)

∙Phase II/III, continuous outcome, control arm

∙No formal comparison with control for treatment selection

∙Requires programming

∙No early termination at the end of phase II

Shun et al. propose a phase II/III or two-stage treatment selection design where

a single treatment is selected from two at the end of the rst stage (i.e. phase II).

Randomisation incorporates a control arm, with the intention of formal comparison at

the end of the second stage only, that is, no formal comparison for treatment selection.

Treatment selection is based on the experimental treatment with the highest/lowest

(‘best’) mean outcome. A normal approximation approach is proposed to avoid

complex numerical integration requirements. Where a different outcome measure is

used during phase II for treatment selection, the correlation between the phase II

and III outcome measures must be specied. The design assumes that the treatment

effects of the experimental treatments are not the same. The practical approach to

timing of interim analysis addresses the need to perform this early in order to avoid

type I error ination, and the need to perform this late enough such that there is a

high probability of correctly selecting the better treatment. No software is noted as

being available; however, detail is provided to allow implementation and a detailed

example is given. The authors note that this design can be extended to binary and

time-to-event outcome measures if the correlation between the nal and interim test

statistics is known.

5.1.8.3 Multinomial outcome measure

No references identied.

5.1.8.4 Time-to-event outcome measure

Royston et al. (2003)

∙Phase II/III, time-to-event outcome, control arm

∙Formal comparison with control for treatment selection

∙Some programming required before using standard software

∙No early termination during phase II

Royston and colleagues outline a multi-arm, two-stage design aimed at iden-

tifying treatments worthy of further consideration at the end of the rst stage by

102 A PRACTICAL GUIDE TO DESIGNING PHASE II TRIALS IN ONCOLOGY

comparing each treatment with a control arm using an intermediate outcome measure

of treatment activity. Only those treatments showing sufcient improvement in activ-

ity over control are continued to the second stage, at the end of which the treatments

are each compared with control using an outcome measure of primary interest (i.e.

different to that used at the end of the rst stage). Data from both stages of the trial

are incorporated in the nal analysis at the end of stage 2. The design may be seen

to reect a seamless phase II/III design with treatment selection at the end of phase

II, allowing more than one treatment to be continued into phase III. In evaluating

the operating characteristics of the design, an estimate of the correlation between the

treatment effects on the intermediate and nal outcome measures is required. The

authors propose an empirical approach to identifying this correlation using bootstrap

resampling of previous data sets, thus the design requires data of this type to be

available in order to allow implementation.

Todd and Stallard (2005)

∙Phase II/III, time-to-event outcome, control arm

∙Formal comparison with control for treatment selection

∙Programs noted as being available from authors

∙No early termination during phase II

Todd and Stallard describe a design where treatment selection occurs at the

rst interim assessment (phase II) based on comparison of a short-term outcome

measure for each of the treatments versus control. Patients are randomised to

each of the experimental treatments and control during phase II, and then to the

selected treatment and the control during phase III. Phase III is carried out in a

group-sequential manner, with the experimental treatment compared to control in

terms of a longer term outcome measure. Selection at phase II is based on the

treatment with the largest test statistic, that is, there is formal comparison with

control but the study may only be terminated for lack of activity at this stage. The

trial protocol remains the same throughout the study therefore patients in phase II

can be incorporated in phase III. Required treatment effects (clinically signicant),

treatment effects that are still desirable but not clinically signicant and expected

correlation between phase II and III outcome measures are all required to identify

the complete phase II/III design. Formulae are given and programs are noted as

being available from the authors to calculate stopping boundaries.

5.1.8.5 Ratio of times to progression

No references identied.

5.1.9 Randomised discontinuation designs

No references identied.

TREATMENT SELECTION DESIGNS 103

5.2 Not including a control arm

5.2.1 One-stage designs

5.2.1.1 Binary outcome measure

Whitehead (1985)

∙One-stage, binary outcome, no control arm

∙Requires programming

∙No early termination

Whitehead discusses a phase II selection design when there are a number of

treatments available for study, currently and expected in the near future, and a xed

number of patients available over a period of time. Patients are randomised to the

treatments currently available and new treatments may be entered as they become

available. A given number of patients are recruited to each treatment and analysis takes

place when all treatments have been considered. The design, for which no software

is detailed but for which formulae are given to allow implementation, identies the

optimal number of treatments (t) and patients per treatment (n) such that nt =total

number of patients available. The examples given consider trials including around

10 treatments, 6 patients per treatment, that is, 60 patients in total. Analysis takes

the form of an appropriate statistical model, tting treatment as a covariate and

incorporating other prognostic variables as necessary. The treatment with the largest

estimated benecial effect is then selected for further investigation in phase III. The

design allows modication such that more than one treatment may be taken forward

and such that cut-off boundaries may be incorporated to ensure a pre-specied level

of success. Any outcome measure distribution may be considered. No control patients

are incorporated and no assessment of risk of a false-negative result is considered.

It is noted that when only a few treatments are to be tested and when the number of

patients available is plentiful, this design may be less appropriate.

Simon et al. (1985)

∙One-stage, binary outcome, no control arm

∙Software available

∙No early termination

Simon et al. detail a selection procedure based on correctly selecting the treatment

with the higher event rate when the difference in event rates is at least d,somepre-

specied amount. The design proposed will always select a treatment, even if the

differences are <d, but will do so with less assurance that the correct treatment is

being selected. The design does not include a pre-specied minimum level of activity;

however, it may be applied as an addition to another trial design establishing minimum

levels of activity prior to treatment selection (as described in the introduction to this

104 A PRACTICAL GUIDE TO DESIGNING PHASE II TRIALS IN ONCOLOGY

chapter). The design is easily implemented in statistical programming software such

as SAS and is available in Machin et al (2008).

Sargent and Goldberg (2001)

∙One-stage, binary outcome, no control arm

∙Requires programming

∙No early termination

Sargent and Goldberg propose a similar treatment selection design to Simon

et al., described above. The treatment with the higher event rate is selected when

the difference between treatments in the event rate is at least d(required to be pre-

specied). If the difference is less than d, other criteria for selection can be used.

Sample size is selected by considering the probability that the better treatment is

correctly selected. Treatments do not have to pass given boundaries for minimum

activity, it is simply necessary for one treatment to be better than the other by at

least d. Sample size can be reduced by incorporating allowance to correctly pick the

better treatment when the result is ambiguous, that is, when the difference between

treatments is less than d, assume that, for example, 50% of the time the better

treatment would correctly be chosen based on other criteria. As described for the

design proposed by Simon et al., this design does not need to operate alone and

can be used in conjunction with other trial designs to ensure minimum levels of

activity and to generate sample size. The design is easily implemented in statistical

programming software such as SAS.

5.2.1.2 Continuous outcome measure

Whitehead (1985)

∙One-stage, continuous outcome, no control arm

∙Requires programming

∙No early termination

Whitehead discusses a phase II selection design when there are a number of

treatments available for study, currently and expected in the near future, and a xed

number of patients available over a period of time. Patients are randomised to the

treatments currently available and new treatments may be entered as they become

available. A given number of patients are recruited to each treatment and analysis takes

place when all treatments have been considered. The design, for which no software

is detailed but for which formulae are given to allow implementation, identies the

optimal number of treatments (t) and patients per treatment (n) such that nt =total

number of patients available. The examples given consider trials including around

10 treatments, 6 patients per treatment, that is, 60 patients in total. Analysis takes

the form of an appropriate statistical model, tting treatment as a covariate and

TREATMENT SELECTION DESIGNS 105

incorporating other prognostic variables as necessary. The treatment with the largest

estimated benecial effect is then selected for further investigation in phase III. The

design allows modication such that more than one treatment may be taken forward

and such that cut-off boundaries may be incorporated to ensure a pre-specied level

of success. Any outcome measure distribution may be considered. No control patients

are incorporated and no assessment of risk of a false-negative result is considered.

It is noted that when only a few treatments are to be tested and when the number of

patients available is plentiful, this design may be less appropriate.

5.2.1.3 Multinomial outcome measure

No references identied.

5.2.1.4 Time-to-event outcome measure

Whitehead (1985)

∙One-stage, time-to-event outcome, no control arm

∙Requires programming

∙No early termination

Whitehead discusses a phase II selection design when there are a number of

treatments available for study, currently and expected in the near future, and a xed

number of patients available over a period of time. Patients are randomised to the

treatments currently available and new treatments may be entered as they become

available. A given number of patients are recruited to each treatment and analysis takes

place when all treatments have been considered. The design, for which no software

is detailed but for which formulae are given to allow implementation, identies the

optimal number of treatments (t) and patients per treatment (n) such that nt =total

number of patients available. The examples given consider trials including around

10 treatments, 6 patients per treatment, that is, 60 patients in total. Analysis takes

the form of an appropriate statistical model, tting treatment as a covariate and

incorporating other prognostic variables as necessary. The treatment with the largest

estimated benecial effect is then selected for further investigation in phase III. The

design allows modication such that more than one treatment may be taken forward

and such that cut-off boundaries may be incorporated to ensure a pre-specied level

of success. Any outcome measure distribution may be considered. No control patients

are incorporated and no assessment of risk of a false-negative result is considered.

It is noted that when only a few treatments are to be tested and when the number of

patients available is plentiful, this design may be less appropriate.

5.2.1.5 Ratio of times to progression

No references identied.

106 A PRACTICAL GUIDE TO DESIGNING PHASE II TRIALS IN ONCOLOGY

5.2.2 Two-stage designs

5.2.2.1 Binary outcome measure

Weiss and Hokanson (1984)

∙Two-stage, binary outcome, no control arm

∙Standard software available

∙Early termination for lack of activity

Weiss and Hokanson discuss the concept of integrated phase II trials to minimise

the enrolment of an excessive number of patients into trials of potentially ineffective

drugs. It is assumed that a number of treatments are available for investigation at the

same time, which are ranked to determine which treatment should be assessed rst in a

sequence of integrated trials. The rst cohort of n1patients are recruited to treatment 1

and followed up for response, during which the next cohort of n1patients are recruited

to treatment 2, and so on. If a pre-specied number of responses are observed in any

of the treatment arms from the rst n1patients in that cohort, recruitment to further

treatment arms is halted and a further n2patients are recruited to the treatment on

which the responses were observed, essentially following an integrated scheme of

multiple trials based on Gehan’s design (Gehan 1961). The aim of the process is to

assess each treatment individually for its inclusion in a phase III trial, as opposed

to selecting one of the treatments alone to investigate further. As such, at the end

of the integrated process, a number of treatments may be deemed worthy of further

investigation.

Steinberg and Venzon (2002)

∙Two-stage, binary outcome, no control arm

∙Requires programming if design not in tables

∙Early treatment selection permitted

Steinberg and Venzon propose an early selection design which may be used in

conjunction with a single-arm phase II trial design (to generate sample size; the

authors use Simon’s two-stage design as an example). The proposed design incorpo-

rates assessment of two experimental treatments at the end of stage 1, selecting

the superior treatment as the treatment with an event rate at least x% higher

than the other treatment, with probability of correct selection z(xand zrequire

pre-specifying). Tables are given to identify the required difference in number of

events observed between the two arms to select the superior treatment early. If

no treatment is selected at the end of stage 1, the trial continues to randomise

between the two treatments with selection at the end, otherwise if a treatment is

selected, the trial continues recruiting patients to that arm to the desired number of

patients under the underlying design. When used in combination with, for example,

TREATMENT SELECTION DESIGNS 107

Simon’s two-stage design, each treatment arm must independently pass the stop-

ping boundaries at each stage to be evaluable for selection. This design may be

considered over single-stage selection designs to enable treatment selection as early

as possible.

Logan (2005)

∙Two-stage, binary outcome, no control arm

∙Requires programming if design not in tables

∙Early termination for lack of activity

Logan proposes a two-stage selection design based on an adaptation of Simon’s

two-stage design (Simon 1989), as applied to a randomised selection trial. Treatments

that do not pass the stopping criteria at the end of the rst stage are dropped, and

the sample size for the second stage is adapted based on the number of treatments

remaining for stage 2, up to a maximum sample size. Since the second-stage sample

size is adaptive, the cut-off boundaries for the second stage are also dependent on the

number of treatments continuing. The intention is that at the end of the second stage,

additional selection criteria may be applied to those treatments successfully passing

the nal stopping criteria, in the case of more than one treatment. The proposed design

offers a saving in the total number of patients as compared to Simon’s designs, when

it is anticipated that not all treatments will be highly active. Tables of sample sizes and

stopping boundaries are presented for various scenarios; however, additional designs

require computing.

Jung and George (2009)

∙Two-stage, binary outcome, no control arm

∙Requires minimal programming

∙Early termination for lack of activity

Jung and George propose methods of comparing treatment arms in a randomised

phase II trial, where the intention is either to select one treatment from many for

further evaluation or to determine whether a single treatment is worthy of evaluation

compared to a control. The phase II design is based on a k-armed trial (with

or without a control arm for selection) with each arm designed for independent

evaluation following Simon’s two-stage design (Simon 1989), or similar, based on

historical control data. Different designs (i.e. the same two-stage design but with

different operating characteristics) may be used for different arms in the independent

evaluation if deemed necessary. A treatment must be accepted via the independent

evaluation before it can be considered for selection, at which point between-arm

comparisons are made. p-Values are calculated to represent the probability that the

difference between the arms being compared is at least some pre-dened minimal

accepted difference, given the actual difference observed. The outcome measure

108 A PRACTICAL GUIDE TO DESIGNING PHASE II TRIALS IN ONCOLOGY

used to select the better treatment is the same outcome measure used for evaluation

of each arm independently, for example, tumour response. No software is detailed;

however, detail is given which should allow implementation, and sufcient examples

are also provided. The initial two-stage design can be calculated using software

available for Simon’s two-stage design.

5.2.2.2 Continuous outcome measure

No references identied.

5.2.2.3 Multinomial outcome measure

No references identied.

5.2.2.4 Time-to-event outcome measure

No references identied.

5.2.2.5 Ratio of times to progression

No references identied.

5.2.3 Multi-stage designs

5.2.3.1 Binary outcome measure

Thall et al. (2000)

∙Multi-stage, binary outcome, no control arm

∙Requires programming (simulation programs noted as being available from

author)

∙No early termination (multi-stage randomisation)

Thall et al. outline a design for treatment strategy selection that incorporates

response-adaptive randomisation within each patient, that is, future treatment strate-

gies for each patient depend on the treatments they have already received, and their

responses. This design is multi-stage in nature since patients are randomised at vari-

ous stages throughout their treatment schedule. Sample size requires investigation via

simulation, and the authors note that sample size should be determined empirically

rather than using simpler methods. Response is categorised as success or failure,

the criteria for which can be different for different stages of treatment. The ‘best’

treatment strategy is selected as that with the largest estimated success probability,

which can be assessed by considering responses to multiple treatment strategies for

each patient.

TREATMENT SELECTION DESIGNS 109

5.2.3.2 Continuous outcome measure

No references identied.

5.2.3.3 Multinomial outcome measure

No references identied.

5.2.3.4 Time-to-event outcome measure

Cheung and Thall (2002)

∙Multi-stage, time-to-event outcome, no control arm

∙Programs noted as being available from authors

∙Early termination for activity or lack of activity

Cheung and Thall propose a Bayesian sequential-adaptive procedure for contin-

uous monitoring, which may be extended to assessment after cohorts of more than

1 patient, that is, multi-stage. The outcome measure of interest is a binary indicator of

a composite time-to-event outcome, utilising all the censored and uncensored obser-