East 6 User Manual

User Manual:

Open the PDF directly: View PDF PDF.
Page Count: 2767

DownloadEast 6 User Manual
Open PDF In BrowserView PDF
<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016

i

<<< Contents

* Index >>>
Preface

Acknowledgements

Welcome to East, a software platform for the statistical design, simulation and
monitoring of clinical trials.
The current release of East (version 6.4) was developed by a team comprising (in
alphabetical order): Gordhan Bagri, Dhaval Bapat, Priyanka Bhosle, Jim Bolognese,
Sudipta Basu, Jaydeep Bhattacharyya, Swechhya Bista, Apurva Bodas, Pushkar
Borkar, V. P. Chandran, Soorma Das, Pratiksha Deoghare, Aniruddha Deshmukh,
Namita Deshmukh, Yogesh Dhanwate, Suraj Ghadge, Pranab Ghosh, Karen Han,
Aarati Hasabnis, Pravin Holkar, Munshi Imran Hossain, Abhijit Jadhav, Yogesh
Jadhav, Prachi Jagtap, Paridhi Jain, Yannis Jemiai, Ashwini Joshi, Nilesh Kakade,
Janhavi Kale, Aditya Kamble, Anthiyur Kannappan, Parikshit Katikar, Uday
Khadilkar, Kapildev Koli, Yogita Kotkar, Hrishikesh Kulkarni, Mandar Kulkarni,
Mangesh Kulkarni, Shailesh Kulthe, Charles Liu, Lingyun Liu, Shashank Maratkar,
Cyrus Mehta, Pradoshkumar Mohanta, Manashree More, Tejal Motkar, Ankur
Mukherjee, Nabeela Muzammil, Neelam Nakadi, Vijay Nerkar, Sandhya Paranjape,
Gaurangi Patil, Vidyadhar Phadke, Anup Pillai, Shital Pokharkar, Vidyagouri Prayag,
Achala Sabane, Sharad Sapre, Rohan Sathe, Pralay Senchaudhuri, Rhiannon Sheaparé,
Pradnya Shinde, Priyadarshan Shinde, Sumit Singh, Sheetal Solanki, Chitra Tirodkar,
Janhavi Vaidya, Shruti Verma, Pantelis Vlachos, Suchita Wageshwari, Kiran Wadje,
Ritika Yadav.
Others contributors to this release include Asmita Ghatnekar, Sam Hsiao, Brent Rine,
Ajay Sathe, Chinny Swamy, Nitin Patel, Yogesh Gajjar, Shilpa Desai.
Other contributors who worked on previous releases of East: Gayatri Bartake, Ujwala
Bamishte, Apurva Bhingare, Bristi Bose, Chandrashekhar Budhwant, Krisnaiah
Byagari, Vibhavari Deo, Rupali Desai, Namrata Deshpande, Yogesh Deshpande,
Monika Ghatage, Ketan Godse, Vishal Gujar, Shashikiran Halvagal, Niranjan
Kshirsagar, Kaushal Kulkarni, Nilesh Lanke, Manisha Lohokare, Jaydip
Mukhopadhyay, Abdulla Mulla, Seema Nair, Atul Paranjape, Rashmi Pardeshi, Sanket
Patekar, Nabarun Saha, Makarand Salvi, Abhijit Shelar, Amrut Vaze, Suryakant
Walunj, Sanhita Yeolekar.
We thank all our beta testers for their input and obvious enthusiasm for the East
software. They are acknowledged by name in Appendix Z.
We owe a debt of gratitude to Marvin Zelen and to Swami Sarvagatananda, special

ii

Preface

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
people whose wisdom, encouragement and generosity have inspired Cytel for over two
decades.
Finally, we dedicate this software package to our families and to the memory of our
dearly departed Stephen Lagakos and Aneesh Patel.

Our Philosophy

We would like to share with you what drives and inspires us during the research and
development stages of the East software.
Empower, do not Frustrate
We believe in making simple, easy-to-use software that empowers people.
We believe that statisticians have a strategic role to play within their organization and
that by using professionally developed trial design software they will utilize their time
better than if they write their own computer programs in SAS or R to create and
explore complex trial designs. With the help of such software they can rapidly generate
many alternative design options that accurately address the questions at hand and the
goals of the project team, freeing time for strategic discussions about the choice of
endpoints, population, and treatment regimens.
We believe that software should not frustrate the user’s attempt to answer a question.
The user experience ought to engage the statistician and inspire exploration,
innovation, and the quest for the best design. To that end, we believe in the following
set of principles:
Fewer, but Important and Useful Features It is better to implement fewer, but
important and useful features, in an elegant and simple-to-use manner, than to
provide a host of options that confuse more than they clarify.
As Steve Jobs put it: ’Innovation is not about saying ”Yes” to everything. It’s
about saying ”No” to all but the most crucial features.’
Just because we Can, doesn’t mean we Should Just because we can provide
functionality in the software, doesn’t mean we should.
Simplify, Simplify, Simplify Find and offer simple solutions - even for the most
complex trial design problems.
Don’t Hurry, but continually Improve Release new solutions when they are
ready to use and continually improve the commercial releases with new features,
bug fixes, and better documentation.
Provide the best Documentation and Support Our manuals are written like
textbooks, to educate, clarify, and elevate the statistical knowledge of the user.
Preface

iii

<<< Contents

* Index >>>

Preface
Our support is provided by highly competent statisticians and software
engineers, focusing on resolving the customer’s issue, and being mindful of the
speed and quality requirements. We believe that delivering delightful customer
support is essential to our company’s lifeblood.
Finally, we listen to our customers constantly and proactively through countless
informal and formal interactions, software trainings, and user group meetings. This
allows us to follow all the principles laid out above in the most effective manner.
Assess
It is essential to be able to assess the benefits and flaws of various design options and
to work one’s way through a sensitivity analysis to evaluate the robustness of design
choices. East can very flexibly generate multiple fixed sample size, group sequential,
and other adaptive designs at a click of a button. The wealth of design data generated
in this manner requires new tools to preview, sort, and filter through in order to make
informed decisions.
Share
Devising the most innovative and clever designs is of no use if the statistician is unable
to communicate in a clear and convincing manner what the advantages and
characteristics of the design are for the clinical trial at hand. We believe statistical
design software tools should also be communication tools to share the merits of
various trial design options with the project team and encourage dialog in the process.
The many graphs, tables, simulation output, and other flexible reporting capabilities of
East have been carefully thought out to provide clear and concise communication of
trial design options in real time with the project team.
Trust
East has been fully validated and intensely tested. In addition, the East software
package has been in use and relied upon for almost 20 years. East has helped design
and support countless actual studies at all the major pharmaceutical and biotech
companies, academic research centers, and government institutions.
We use and rely on our software every day in our consulting activities to collaborate
with our customers, helping them optimize and defend their clinical trial designs. This
also helps us quickly identify things that are frustrating or unclear, and improve them
fast - for our own sake and that of our customers.

iv

Preface

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016

What’s New in East 6.4

Version 6.4 of East introduces some important new features:
1. Multi-arm multi-stage designs East now offers the ability to design multi-arm
multi-stage studies with options for early stopping, dose selection, and sample
size re-estimation. The group sequential procedures (Gao et al., 2014) have been
implemented for normal endpoint whereas the p-value combination approaches
(Posch et al. 2005) have been implemented for both normal and binomial
endpoints. See Chapters 17, 18 and 29 for more details.
2. Multiple endpoints designs for binomial endpoints Gatekeeping procedures to
control family-wise type-1 error when testing multiple families of binomially
distributed endpoints are now available in East for fixed sample (1-look) designs.
East will also use the intersection-union test when testing a single family of
endpoints. See Chapter 16 and 28 for more details.
3. Multi-arm designs for survival endpoints Designs for pairwise comparisons of
treatment arms to control have been added for survival endpoints. See
Chapter 51 for more details.
4. Enrollment and event prediction East now includes options to predict
enrollment and events based on accumulating blinded data and summary
statistics. Prediction based on unblinded data was already implemented in the
previous version so the current version provides both options - Unblinded as
well as Blinded. See Chapter 68 for more details.
5. Dual agent dose-escalation designs This version of East adds methods to the
Escalate module for dual-agent dose-escalation designs, including the Bayesian
logistic regression model (BLRM; Neuenschwander et al., 2014), and the
Product of Independent beta Probabilities dose Escalation (PIPE; Mander et al.,
2015). Numerous feature enhancements have also been made to the existing
single-agent dose escalation designs. See Chapter 32 for more details.
6. Bayesian probability of success (assurance) and predictive power for
survival designs
East 6.4 will now calculate assurance (O’Hagan et al., 2005), or Bayesian
probability of success, and predictive power for survival endpoints. See
Chapter 48 for more details.
7. Interim monitoring using Muller and Schafer method East6.4 will now
provide the capability of monitoring clinical trials using the adaptive approach.
It can be done using the Muller and Schafer method. Currently, this feature is
Preface

v

<<< Contents

* Index >>>

Preface
available for Survival Endpoint tests only. See Chapter 56 for more details.
8. General usability enhancements Numerous enhancements have been made to
the software to improve the user experience and workflow.

What’s New in East 6.3

Version 6.3 of East introduces some important new features:
1. Updates to Promising Zone designs: Ratio of Proportions designs; Müller
and Schäfer type-1 error control method; Estimation
East 6.3 introduces Promising Zone designs for the ratio of proportions. East 6.3
also implements the method of Müller and Schäfer (2001) to control type-1 error
for adaptive unblinded sample size re-estimation designs. This is available for
simulation and interim monitoring. Also estimation using Repeated Confidence
Intervals (RCI) and Backward Image Confidence Intervals (BWCI) (Gao, Liu &
Mehta, 2013) are available in Müller and Schäfer simulations. See Chapter 52
for more details.
2. Multiple endpoint designs
Parallel gatekeeping procedures to control family-wise type-1 error when testing
multiple families of normally distributed endpoints are now available in East for
fixed sample (1-look) designs. East will also use the intersection-union test
when testing a single family of endpoints. See Chapter 16 for more details.
3. Exact designs for binomial endpoints
East now includes the ability to use the exact distribution when computing
power and samples size for binomial endpoints. This applies for all binomial
tests in the case of fixed designs. In addition, group sequential exact designs are
available for the single proportion case, and the Simon’s two-stage optimal and
minimax designs (Simon, 1989) have been implemented that allow for early
futility stopping while optimizing the expected sample size and the maximum
sample size, respectively. See Chapter 33 for more details.
4. Dose escalation designs
East 6.3 now includes a module for the design, simulation, and monitoring of
modern dose-escalation clinical trials. Model-based dose-escalation methods in
this module include the Continual Reassessment Method (mCRM; Goodman et
al., 1995), the Bayesian logistic regression model (BLRM; Neuenschwander et
al., 2008), and the modified Toxicity Probability Interval (mTPI; Ji et al., 2010).
See Chapter 32 for more details.
5. Predictive interval plots, conditional simulations, , and enrolment/events

vi

Preface

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
prediction
East 6.3 now includes a module that offers the ability to simulate and forecast
the future course of the trial based on current data. This includes conditional
simulations to assess expected treatment effects and associated repeated
confidence intervals at future looks (also called Predicted Interval Plots or PIP;
Li et al. 2009), as well as the probability of finishing with a successful trial
(conditional power). You can also plan and simulate clinical trials with greater
precision using different accrual patterns and response information for different
regions/sites. East allows you to make probabilistic statements about accruals,
events, and study duration using Bayesian models and accumulating data. See
Chapters 65, 66 and 67 for more details.
6. Sample size and information calculators
Sample size and information calculators have been added back into East to allow
easy calculation of the two quantities. See Chapter 59 for more details.
7. Exporting/Importing between East and East Procs
East 6.3 designs can now be exported to work with the newly released East
Procs. The output from East Procs can be imported back into East 6.3 for use in
the East Interim Monitoring dashboard and to conduct conditional inference and
simulations. See Chapters 69 for more details.
8. Changes to East input
Many changes have been implemented in East to enhance the user experience in
providing input for their designs. These changes include the ability to specify
multiple values of input parameters for survival designs (most notably the
Hazard Ratio), the ability to directly convert many fixed sample designs into
group sequential designs with the use of the Sample Size based design option,
and the ability to convert an ANOVA design into a Multiple Comparison to
Control design.
9. Changes to East output
Display of East output has been changed in many ways, including color coding
of input and output, ability to collapse and expand individual tables, greater
decimal display control, and more exporting options for results (e.g. ability to
export graphs directly into Microsoft Power Point).

What’s New in East 6.2

Version 6.2 of East introduces some important new features:
1. Promising Zone Designs using CHW and CDL type-1 error control methods
Preface

vii

<<< Contents

* Index >>>

Preface
East 6.2 introduces Promising Zone Designs from East 5.4 for differences of
means, proportions, and the log-rank test. The methods of Cui, Hung, and Wang
(1999) and Chen, DeMets, and Lan (2003) are implemented for adaptive
unblinded sample size re-estimation designs and available for simulation and
interim monitoring.
2. Multiple endpoint designs Serial gatekeeping procedures to control
family-wise type-1 error when testing multiple families of normally-distributed
endpoints are now available in East for fixed sample (1-look) designs.
3. Power and sample size calculations for count data East now offers power
analysis and sample size calculations for count data in fixed sample (1-look)
designs. Specifically, East provides design capabilities for:
(a) Test of a single Poisson rate
(b) Test for a ratio of Poisson rates
(c) Test for a ratio of Negative Binomial rates
4. Precision-based sample size calculations Sample size calculations are now
available based on specification of a confidence interval for most tests provided
in East.

What’s New in East 6.1

Version 6.1 of East introduces some important new features:
1. Bayesian probability of success (assurance) and predictive power
For one-sample and two-sample continuous and binomial endpoints, East 6.1
will now compute Assurance (O’Hagan et al., 2005) or Bayesian probability of
success, a Bayesian version of power, which integrates power over a prior
distribution of the treatment effect, giving an unconditional probability that the
trial will yield a significant result. When monitoring such a design using the
Interim Monitoring dashboard, East 6.1 will also compute Bayesian predictive
power using the pre-specified prior distribution on the treatment effect. This
computation will be displayed in addition to the fiducial version of predictive
power, which uses the estimated treatment effect and standard error to define a
Gaussian prior distribution.
2. Stratification in simulation of survival endpoints
When simulating a trial design with a time-to-event endpoint, East 6.1
accommodates data generation in a stratified manner, accounting for up to 3
stratification variables and up to 25 individual strata. The fraction of subject data
generated in each stratum, and the survival response generation mechanism for
each stratum, can be flexibly adjusted. In addition, stratified versions of the
logrank statistic and other test statistics available for analysis of the simulated
data are provided.

viii

Preface

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
3. Integration of R code into simulations
East 6.1 simulations now include the option to use custom R code to define
specific elements of the simulation runs. R code can be used to modify the way
the subjects are accrued, how they are randomized, how their response data are
generated, and how the test statistic is computed.
4. Reading East 5.4 workbooks
East 5.4 workbooks can be read into East 6.1 after conversion using the utility
provided in the program menu. Go to the start menu and select:
Programs > East Architect > File Conversion> East5 to East6
5. Floating point display of sample size
East 6.1 now has a setting to choose whether to round sample sizes (at interim
and final looks) up to the nearest integer, or whether to display them as a floating
point number, as in East 5. (See
6. Enhancement to the Events vs. Time plot
This useful graphic for survival designs has been updated to allow the user to
edit study parameters and create a new plot directly from a previous one,
providing the benefit of quickly assessing the overall impact of input values on a
design prior to simulation. (See
7. Interim monitoring (IM) dashboard
The capability to save snapshots of the interim monitoring (IM) dashboard is
now supported in East 6.1. At each interim look of a trial, updated information
can be saved and previous looks can be easily revisited. Alternatively, prior to
employing actual data this functionality could be used to compare multiple
possible scenarios, providing the user a sense of how a future trial could unfold.
8. Enhancement to the Logrank test
For trials with survival endpoints, East 6.1 allows the user to simultaneously
create multiple designs by specifying a range of values for key parameters in t
Logrank test. (See Subsection
9. Enhancement to binomial designs
For studies with discrete outcomes, East 6.1 allows the user to simultaneously
create multiple designs by specifying a range of values for key parameters.

What’s New in East 6.0 on
the Architect Platform

East Architect is version 6.0 of the East package and builds upon earlier versions of the
software. The transition of East to the next generation platform that is Architect has
abandoned all prior dependencies of Microsoft Excel. As a result the user interface is
very different leading to a new user experience and workflow. Although you might find
that there is a learning curve to getting comfortable with the software, we trust that you
will find that the new platform provides for a superior user experience and improved
workflow.
Preface

ix

<<< Contents

* Index >>>

Preface
The Architect platform also adds data management and analysis capabilities similar to
those found in Cytel Studio, StatXact, and LogXact, as well as a powerful reporting
tool we call Canvas, which provides flexible and customizable reports based on design
and simulation information.
Version 6.0 of East introduces some important new features in addition to the new
platform environment. Here is a selection:
1. New designs A large number of fixed sample designs have been added for
various endpoints and trial types. These were present in the SiZ software and
have now been fully integrated into East.
2. Multi-arm designs Designs for pairwise comparisons of treatment arms to
control have been added for differences of means and differences of proportions.
These designs are mostly simulation-based and provide operating characteristics
for fixed sample studies using multiplicity adjusting procedures such as
Dunnett’s, Bonferroni, Sidak, Hochberg, Fallback, and others.
3. Creation of multiple designs or simulations at once:
East Architect provides the ability to create multiple designs or to run multiple
simulation scenarios at once, by specifying lists or sequences of values for
specific parameters rather than single scalars. This capability allows the user to
explore a greater space of possibilities or to easily perform sensitivity analysis.
Accompanying tools to preview, sort, and filter are provided to easily parse the
large output generated by East.
4. Response lag, accrual, and dropouts for continuous and discrete endpoints:
Designs created for continuous and discrete endpoints now have the option for
the user to specify a response lag (between randomization and observation of the
endpoint), as well as an accrual rate and dropout rate for the study population.
As a result, some terminology has been introduced to distinguish between the
number of subjects who need to be enrolled in the study (Sample Size) and the
number of subjects whose endpoint must be observed in order to properly power
the study (Completers).
5. Flexibility in setting up boundaries Both the efficacy and futility rules of a
design need not be present at each and every look anymore. The user can specify
whether a look includes either the efficacy stopping rule or the futility rule or
both. Therefore, a design can be set up where at the first look only futility
stopping is possible, whereas at later looks both efficacy and futility or maybe
only efficacy stopping is allowed. In addition, the futility rule can now be
specified on two new scales, which are the standardized treatment scale and the
conditional power scale.
6. Predictive power Predictive power is now provided as an alternative to
conditional power in the interim monitoring sheet of the software. Further
x

Preface

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
details about how this is implemented can be found in the appendix C.
7. Comparing designs One can compare multiple designs either graphically or in
tabular format simply by selecting them and choosing a plot or table output
button.
8. Improvements in algorithms Many improvements have been made to the way
computations are performed, both to improve accuracy and speed, but also to
provide more intuitive results. For example, older versions of East used an
approximation to conditional power based on ignoring all future looks but the
final one. This approximation has been dropped in favor of computing the exact
value of conditional power. Many other changes have been made that might
result in different values being computed and displayed in East Architect as
compared to earlier versions of the software. For greater details about the
changes made, please refer to the ”Read Me” notes that accompany the software
release.

What’s New in East 5

After East 5 (version 5.0) was released, a few upgrades have been issued. The details
are:
1.
2.
3.
4.

In the current release of version 5.4, the module EastSurvAdapt has been added.
In the previous version 5.3, the module EastAdapt was substantially revised.
In the earlier version 5.2, the module EastExact was released.
In the still earlier version 5.1, several improvements were introduced in EastSurv
module.

The details of these modules can be found in the respective chapters of the user manual.
East 5 upgraded the East system in several important ways in direct response to
customer feedback. Six important extensions had been developed in East 5:
1. Designs using t-tests:
In previous versions of East, the single look design was treated as a special case
of a group sequential design. Thus the same large sample theory was used to
power and size these traditional types of designs. Recognizing this solution not
to be entirely satisfactory for small sample trials, in East 5, we have
implemented single-look t-test designs for continuous data. (Sections 8.1.4,
8.2.4, 9.1.3, and 11.1.3)
2. New boundaries:
East 5 provides two new procedures for specifying group sequential boundaries.
Generalized Haybittle-Peto boundaries allow the user to specify unequal
p-values at each interim look for a group sequential plan. East will
Preface

xi

<<< Contents

* Index >>>

Preface
recalculate the final p-value in order to preserve the type I error. (Section
38.1)
The cells for entering the cumulative alpha values of an interpolated
spending function can be automatically populated with the cumulative
alpha values of any of the published spending functions available to East,
and subsequently edited to suit user requirements. For example, a 4-look
Lan and DeMets O’Brien-Fleming spending function can be modified so
that the critical value at the first look is less conservative than usual.
(Section 38.3.1)
3. Interim monitoring and simulation for single-look designs:
Interim monitoring and simulation sheets have been provided for all single look
designs in East 5.
4. Improvement to Charts:
Many improvements to existing charts in East have been implemented in this
version.
Scaling in the Duration vs. Accrual chart has been corrected to provide a
better tool for the user.
The use of semi-log scaling has enabled us to represent many charts on the
natural scale of the treatment effect. This concerns mostly any ratio and
odds ratio metrics such as the relative risk, the hazard ratio, and the odds
ratio. Boundaries on the relative risk scale for example are now available in
East 5.
Boundaries can also be visualized on the score scale.
Charts can be summarized in tabular form. Option is given to the user to
generate tables of power vs. sample size, power vs. treatment effect, events
vs. time, and so on. These tables can easily be copied and pasted into
external applications like Microsoft Word and Excel (Section 4.5)
5. Improved usability:
Much attention in East 5 was spent to improve the user’s experience within the
environment.
A graph sheet allows the user to compare up to 16 charts side by side.
Charts for any number of plans within a workbook can be exported to the
graph sheet. (Section 5.3)
The scratch sheet is a full-fledged Microsoft Excel sheet that can be
brought up within the East application . (Section 4.4)
The split view option enables the user to see two sheets of the same
workbook simultaneously. This can be useful if one window pane contains
a scratch sheet where side calculations may be done based on numbers in
xii

Preface

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
the other window pane. Another use can be to have two or plans to show
up on one pane and their graphsheet containing boundaries or other charts
to show up on another pane for easy comparison. (Section 4.8)
Messages in the help menu, pop-up help, and context sensitive help have
been revised and rendered more informative to the user.
The default appearance of charts can be specified by the user through the
preferences settings menu item. (Section 4.7)
6. Installation validation:
East 5 includes an installation validation procedure that will easily check that the
software has been properly installed on the user’s system. (Section 2.3)
Finally, there has been an important reorganization of the East manual, which now
comprises seven volumes organized as follows: (1) The East System (2) Continuous
Endpoints (3) Binomial and Categorical Endpoints (4) Time-to-Event Endpoints (5)
Adaptive Designs (6) Special Topics (7) Appendices. Page numbers are continuous
through volumes 1-7. Each volume contains a full table of contents and index to the
whole manual set.

Preface to East 4

East 4 was a very large undertaking involving over 20 developers, documenters, testers
and helpers over a two-year period. Our goal was to produce one single powerful
design and monitoring tool with a simple, intuitive, point and click, menu driven user
interface, that could cover the full range of designs commonly encountered in a clinical
trial setting, for either fixed sample or group sequential designs. The resulting product,
East 4, extends the East system for flexible design and interim monitoring in four
major ways as listed below.
1. Broad Coverage:
Previous versions of East dealt primarily with the design of two-arm group
sequential trials to detect a difference of means for normal and binomial
endpoints and a hazard ratio for survival endpoints. East 4 extends these
capabilities to other settings.
Easily design and monitor up to 34 different clinical trial settings including
one-, two- and K-sample tests; linear, logistic and Cox regression;
longitudinal designs; non-inferiority and bioequivalence designs;
cross-over and matched-pair designs; nonparametric tests for continuous
and ordered categorical outcomes.
Comparisons between treatment and control groups can be in terms of
differences, ratios or odds ratios.
Preface

xiii

<<< Contents

* Index >>>

Preface
Non-inferiority trials can be designed to achieve the desired power at
superiority alternatives
2. New Stopping Boundaries and Confidence Intervals:
Non-binding futility boundaries. Previously futility boundaries could not
be overruled without inflating the type-1 error. New non-binding futilty
boundaries preserve power and type-1 error and yet can be overruled if
desired.
Asymmetric two-sided efficacy boundaries. You can allocate the type-1
error asymmetrically between the upper and lower stopping boundaries,
and can spend it at different rates with different error spending functions.
This will provide added flexiblity for aggressive early stopping if the
treatment is harmful and conservative early stopping if the treatment is
beneficial.
Futility boundaries can be represented in terms of conditional power. This
brings greater objectivity to conditional power criteria for early stopping.
Two sided repeated confidence intervals are now available for one-sided
tests with efficacy and futility boundaries. Previously only one-sided
confidence bounds were available.
Interactive repeated confidence intervals are provided at the design stage to
aid in sample size determination and selection of stopping boundaries.
3. New Analytical and Simulation Tools for Survial Studies:
EastSurv is an optional new module, fully integrated into the East system, that
extends East’s design capabilities to survival studies with non-uniform accrual,
piecewise exponential distributions, drop outs, and fixed length of follow-up for
each subject. Designs can be simulated under general settings including
non-proportional hazard alternatives.
4. Design and Simulation of Adaptive Trials:
EastAdapt is an optional new module, fully integrated into the East system, that
permits data-dependent changes to sample size, spending functions, number and
spacing of interim looks, study objectives, and endpoints using a variety of
published flexible approaches.
In addition to these substantial statistical capabilities, East 4 has added numerous
improvements to the user interface including clearer labeling of tables and graphs,
context sensitive help, charts of power versus sample size and power versus number of
events, convenient tools for calculating the test statistics to be entered into the interim
monitoring worksheet for binomial endpoints, and the ability to type arithmetic
expressions into dialog boxes and into design, interim monitoring and simulation
worksheets.
xiv

Preface

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016

Preface to East 3

East 3 is a major upgrade of the East-2000 software package for design and interim
monitoring of group sequential clinical trials. It has evolved over a three-year period
with regular input from our East-2000 customers. The main improvements that East 3
offers relative to East-2000 are greater flexibility in study design, better tracking of
interim results, and more powerful simulation capabilities. Many of our East-2000
customers expressed the desire to create group sequential designs that are
ultra-conservative in terms of stopping early for efficacy, but which can be quickly
terminated for futility. The extremely wide selection of spending functions and
stopping boundaries in East 3, combined with its interactive Excel-based spreadsheet
user interface for comparing multiple designs quickly and effortlessly, make such
designs possible. The interim monitoring module of East 3 has been completely
revised, with a “dashboard” user interface that can track the test statistic, error spent,
conditional power, post-hoc power and repeated confidence intervals on a single
worksheet, over successive interim monitoring time points, for superior trial
management and decision making by a data monitoring committee. Finally, we have
enhanced the simulation capabilities of East 3 so that it is now possible to evaluate the
operating characteristics not only of traditional group sequential designs, but also of
adaptive designs that permit mid-course alterations in the sample size based on interim
estimates of variance or treatment effect. A list of the substantial new features in East 3
relative to East-2000 is given below. (The items on this list beginning with ‘(*)’ are
optional extras.)
New Design Features
1. Design of non-inferiority trials.
2. Design of trials with unequally spaced looks.
3. Use of Lan and DeMets (1983) error spending functions to derive stopping
boundaries.
4. (*) Flexible stopping boundaries derived from the gamma spending function
family (Hwang, Shih and DeCani, 1990) and the rho spending function family
(Kim and DeMets, 1987).
5. Haybittle-Peto stopping boundaries (Haybittle, 1971).
6. (*) Boundaries derived from user-specified spending functions with
interpolation.
7. Boundaries for early stopping for futility only.
8. Graphical and numerical representation of stopping boundaries on other scales
besides the standard normal scale; e.g., boundaries expressed on the p-value
scale, effect size scale, and conditional power scale.
9. Computing power for a fixed sample size.
10. Chart displaying the number of events as a function of time (for survival studies).
Preface

xv

<<< Contents

* Index >>>

Preface
New Interim Monitoring Features
1. Detailed worksheet for keeping track of interim monitoring data and providing
input to the data monitoring committee.
2. Simultaneous view of up to four thumbnail charts on the interim monitoring
worksheet. Currently one may select any four charts from, the stopping
boundary chart, the error spending chart, the conditional power chart, the
post-hoc power chart, and the repeated confidence intervals chart. You can also
expand each thumbnail into a full-sized chart by a mouse click.
3. Computation of repeated confidence interval (Jennison and Turnbull, 2000) at
each interim look.
New Simulation Features
1. (*) Simulation of actual data generated from the underlying normal or binomial
model instead of simulating the large sample distribution of the test statistic.
2. (*) Simulation on either the maximum sample size scale, or the maximum
information scale.
3. (*) Simulation of the adaptive design due to Cui, Hung and Wang (1999).
New User Interface Features
1. Full integration into the Microsoft Excel spreadsheet for easy generation and
display of multiple designs, interim monitoring or simulation worksheets, and
production of reports.
2. Save design details and interim monitoring results in Excel worksheets for easy
electronic transmission to regulatory reviewers or to end-users.
3. Create custom calculators in Excel and save them with the East study workbook.

Preface to East-2000

For completeness we repeat below the preface that we wrote for the East-2000
software when it was released in April, 2000.
Background to the East-2000 Development The precursor to East-2000 was
East-DOS an MS-DOS program with design and interim monitoring capabilities for
normal, binomial and survival end points. When East-DOS was released in 1991 its
user interface and statistical features were adequate to the needs of its customer base.
MS-DOS was still the industry standard operating system for desktop computers.
Group sequential designs were not as popular then as they are now. The role of data
and safety monitoring boards (DSMB’s) in interim monitoring was just beginning to
emerge. FDA and industry guidelines on the conduct of group sequential studies were
in the early draft stage. Today the situation is very different. Since the publication of

xvi

Preface

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
the ICH-E9 guidance on clinical trials by the FDA and regulatory bodies in Europe and
Japan, industry sponsors of phase-III clinical trials are more favorably inclined to the
group sequential approach. For long-term mortality studies especially, interim
monitoring by an independent DSMB is almost mandatory. As the popularity of group
sequential studies has increased so has the demand for good software to design and
monitor such studies. For several years now we have been flooded with requests from
our old East-DOS customers to move away from the obsolete MS-DOS platform to
Microsoft Windows and to expand the statistical capabilities of the software. We have
responded by developing East-2000, a completely re-designed Windows package with
unparalleled design, simulation and interim monitoring capabilities.
What’s New in East-2000 The East-2000 software adds considerable functionality to
its MS-DOS predecessor through a superior user interface and through the addition of
new statistical methods.
New User Interface East-2000 is developed on the Microsoft Windows platform. It
supports a highly interactive user interface with ready access to stopping
boundary charts, error spending function charts, power charts and the ability to
present the results as reports in Microsoft Office.
1. Interactivity Designing a group sequential study is much more complex
than designing a fixed sample study. The patient resources needed in a
group sequential setting depend not only on the desired power and
significance level, but also on how you will monitor the data.
How many interim looks are you planning to take? What stopping
boundary will you use at each interim look? Does the stopping
boundary conform to how you’d like to spend the type-1 error at
each look? Do you intend to stop early only for benefit, only for
futility, or for both futility and benefit? In a survival study, how
long are you prepared to follow the patients?
These design and monitoring decisions have profound implications for the
maximum sample size you must commit up-front to the study, the expected
sample size under the null and alternative hypotheses, and the penalty you
will have to pay in terms of the nominal p-value needed for declaring
significance at the final look. To take full advantage of the group sequential
methodology and consider the implications of potential decisions you must
have highly interactive software available, both at the study design stage
and at the interim monitoring stage. East-2000 is expressly developed with
this interactivity in mind. Its intuitive form-fill-in graphical user interface
can be an invaluable tool for visualizing how these design and monitoring
decisions will affect the operating characteristics of the study.
Preface

xvii

<<< Contents

* Index >>>

Preface
2. Charts By clicking the appropriate icon on the East toolbar you can view
stopping boundary charts, study duration charts, error spending function
charts, conditional and post-hoc power charts, and exit probability tables.
The ease with which these charts can be turned on and off ensures that they
will be well utilized both at the design and interim monitoring phases of
the study.
3. Reports All worksheets, tables and charts produced by East-2000 can be
copied and pasted into Microsoft Word, Excel and PowerPoint pages thus
facilitating the creation of annotated reports describing the study design
and interim monitoring schedule.
New Statistical Methods East-2000 has greatly expanded the design and interim
monitoring capabilities previously available in East-DOS. In addition East-2000
provides a simulation module for investigating how the power of a sequential
design is affected by different assumptions about the magnitude of the treatment
difference. Some highlights from these new capabilities are listed below.
1. Design Whereas East-DOS only provided design capabilities for normal,
binomial and survival end points East-2000 makes it possible to design
more general studies as well. This is achieved through the use of an
inflation factor. The inflation factor determines the amount by which the
sample size of a fixed sample study should be inflated so as to preserve its
type-1 error in the presence of repeated hypothesis tests. It is thus possible
to use any external software package to determine the fixed sample size of
the study, input this fixed sample size into the design module of East-2000
and have the sample size inflated appropriately. These general capabilities
are discussed in Chapter 8.
2. Interim Monitoring A major new feature in the interim monitoring module
of East-2000 is the computation of adjusted p-values, confidence intervals
and unbiased parameter estimates at the end of the sequential study.
Another important feature is the ability to monitor the study on the Fisher
information scale and thereby perform sample-size re-estimation if initial
assumptions about the data generating process were incorrect. Chapter 9
provides an example of sample-size re-estimation for a binomial study in
which the initial estimate of the response rate of the control drug was
incorrect.
3. Simulation East-2000 can simulate an on-going clinical trial and keep track
of the frequency with which a stopping boundary is crossed at each interim
monitoring time-point. These simulations can be performed under the null
hypothesis, the alternative hypothesis or any intermediate hypothesis thus
permitting us to evaluate how the various early stopping probabilities are
affected by miss-specifications in the magnitude of the treatment effect.
xviii

Preface

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
Continuous Development of East East-2000 will undergo continuous development
with major new releases expected on an annual basis and smaller improvements
regularly posted on the Cytel web site. We will augment the software and implement
new techniques based on the recommendations of the East Advisory Committee, and
as the demand for them is expressed by our customers. The following items are already
on the list:
Easy links to fixed-sample design packages so as to extend the general methods
in Chapter 8;
Analytical and simulation tools to convert Fisher information into sample size
and thereby facilitate the information based design and interim monitoring
methods of Chapter 9, especially for sample-size re-estimation.
We will build a forum for discussing East related issues on the Cytel web site,
www.cytel.com. Interesting case studies, frequently asked questions, product news and
other related matters will be posted regularly on this site.
Roster of East Consultants Cytel offers consulting services to customers requiring
assistance with study design, interim monitoring or representation on independent data
and safety monitoring boards. Call us at 617-661-2011, or email sales@cytel.com, for
further information on this service.

Preface

xix

<<< Contents

* Index >>>

<<< Contents

* Index >>>

Table of Contents

Preface

1

The East System

ii

1

1

Introduction to Volume 1

2

2

Installing East 6

3

3

Getting Started

7

4

Data Editor

2

Continuous Endpoints

55

71

5

Introduction to Volume 2

73

6

Tutorial: Normal Endpoint

79

7

Normal Superiority One-Sample

91

8

Normal Noninferiority Paired-Sample

113

9

Normal Equivalence Paired-Sample

128

10 Normal Superiority Two-Sample

141

11 Nonparametric Superiority Two Sample

179

12 Normal Non-inferiority Two-Sample

185

13 Normal Equivalence Two-Sample

211

xxi

<<< Contents

* Index >>>

Table of Contents

xxii

14 Normal: Many Means

232

15 Multiple Comparison Procedures for Continuous Data

240

16 Multiple Endpoints-Gatekeeping Procedures

265

17 Continuous Endpoint: Multi-arm Multi-stage (MaMs) Designs

285

18 Two-Stage Multi-arm Designs using p-value combination

309

19 Normal Superiority Regression

332

3

342

Binomial and Categorical Endpoints

20 Introduction to Volume 3

344

21 Tutorial: Binomial Endpoint

350

22 Binomial Superiority One-Sample

363

23 Binomial Superiority Two-Sample

394

24 Binomial Non-Inferiority Two-Sample

474

25 Binomial Equivalence Two-Sample

535

26 Binomial Superiority n-Sample

549

27 Multiple Comparison Procedures for Discrete Data

577

28 Multiple Endpoints-Gatekeeping Procedures for Discrete Data

601

29 Two-Stage Multi-arm Designs using p-value combination

621

30 Binomial Superiority Regression

644

31 Agreement

649

32 Dose Escalation

658

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
4

Exact Binomial Designs

708

33 Introduction to Volume 8

709

34 Binomial Superiority One-Sample – Exact

714

35 Binomial Superiority Two-Sample – Exact

736

36 Binomial Non-Inferiority Two-Sample – Exact

751

37 Binomial Equivalence Two-Sample – Exact

767

38 Binomial Simon’s Two-Stage Design

774

5

784

Poisson and Negative Binomial Endpoints

39 Introduction to Volume 4

785

40 Count Data One-Sample

790

41 Count Data Two-Samples

799

6

819

Time to Event Endpoints

42 Introduction to Volume 6

820

43 Tutorial: Survival Endpoint

826

44 Superiority Trials with Variable Follow-Up

865

45 Superiority Trials with Fixed Follow-Up

908

46 Non-Inferiority Trials Given Accrual Duration and Accrual Rates

934

47 Non-Inferiority Trials with Fixed Follow-Up

950

48 Superiority Trials Given Accrual Duration and Study Duration

966

49 Non Inferiority Trials Given Accrual Duration and Study Duration

984
xxiii

<<< Contents

* Index >>>

Table of Contents
50 A Note on Specifying Dropout parameters in Survival Studies

994

51 Multiple Comparison Procedures for Survival Data

999

7

xxiv

Adaptive Designs

1019

52 Introduction To Adaptive Features

1020

53 The Motivation for Adaptive Sample Size Changes

1027

54 The Cui, Hung and Wang Method

1055

55 The Chen, DeMets and Lan Method

1160

56 Muller and Schafer Method

1221

57 Conditional Power for Decision Making

1350

8

Special Topics

1387

58 Introduction to Volume 8

1388

59 Design and Monitoring of Maximum Information Studies

1393

60 Design and Interim Monitoring with General Endpoints

1423

61 Early Stopping for Futility

1434

62 Flexible Stopping Boundaries in East

1460

63 Confidence Interval Based Design

1493

64 Simulation in East

1552

65 Predictive Interval Plots

1575

66 Enrollment/Events Prediction - At Design Stage (By Simulation)

1609

67 Conditional Simulation

1658

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
68 Enrollment/Events Prediction - Analysis

1675

69 Interfacing with East PROCs

1787

9

1795

Analysis

70 Introduction to Volume 9

1798

71 Tutorial: Analysis

1806

72 Analysis-Descriptive Statistics

1827

73 Analysis-Analytics

1837

74 Analysis-Plots

1854

75 Analysis-Normal Superiority One-Sample

1890

76 Analysis-Normal Noninferiority Paired-Sample

1901

77 Analysis-Normal Equivalence Paired-Sample

1907

78 Analysis-Normal Superiority Two-Sample

1913

79 Analysis-Normal Noninferiority Two-Sample

1926

80 Analysis-Normal Equivalence Two-Sample

1941

81 Analysis-Nonparametric Two-Sample

1956

82 Analysis-ANOVA

1976

83 Analysis-Regression Procedures

1987

84 Analysis-Multiple Comparison Procedures for Continuous Data

2024

85 Analysis-Multiple Endpoints for Continuous Data

2055

86 Analysis-Binomial Superiority One-Sample

2060

xxv

<<< Contents

* Index >>>

Table of Contents

xxvi

87 Analysis-Binomial Superiority Two-Sample

2069

88 Analysis-Binomial Noninferiority Two-Sample

2088

89 Analysis-Binomial Equivalence Two-Samples

2106

90 Analysis-Discrete: Many Proportions

2111

91 Analysis-Binary Regression Analysis

2131

92 Analysis- Multiple Comparison Procedures for Binary Data

2180

93 Analysis-Comparison of Multiple Comparison Procedures for Continuous Data- Analysis

2207

94 Analysis-Multiple Endpoints for Binary Data

2211

95 Analysis-Agreement

2216

96 Analysis-Survival Data

2219

97 Analysis-Multiple Comparison Procedures for Survival Data

2240

10

2267

Appendices

A Introduction to Volume 10

2269

B Group Sequential Design in East 6

2271

C Interim Monitoring in East 6

2313

D Computing the Expected Number of Events

2334

E Generating Survival Simulations in EastSurv

2345

F Spending Functions Derived from Power Boundaries

2347

G The Recursive Integration Algorithm

2352

H Theory - Multiple Comparison Procedures

2353

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
I

Theory - Multiple Endpoint Procedures

2368

J

Theory-Multi-arm Multi-stage Group Sequential Design

2374

K Theory - MultiArm Two Stage Designs Combining p-values

2394

L Technical Details - Predicted Interval Plots

2404

M Enrollment/Events Prediction - Theory

2409

N Dose Escalation - Theory

2412

O R Functions

2427

P East 5.x to East 6.4 Import Utility

2478

Q Technical Reference and Formulas: Single Look Designs

2484

R Technical Reference and Formulas: Analysis

2542

S Theory - Design - Binomial One-Sample Exact Test

2605

T Theory - Design - Binomial Paired-Sample Exact Test

2611

U Theory - Design - Simon’s Two-Stage Design

2614

V Theory-Design - Binomial Two-Sample Exact Tests

2617

W Classification Table

2638

X Glossary

2639

Y On validating the East Software

2657

Z List of East Beta Testers

2686

References

2695

Index

2719
xxvii

<<< Contents

* Index >>>

Volume 1

The East System

1 Introduction to Volume 1
2 Installing East 6

3

3 Getting Started
4 Data Editor

7
55

2

<<< Contents

* Index >>>

1

Introduction to Volume 1

This volume contains chapters which introduce you to East software system.
Chapter 2 explains the hardware and operating system requirements and the
installation procedures. It also explains the installation validation procedure.
Chapter 3 is a tutorial for introducing you to East software quickly. You will learn the
basic steps involved in getting in and out of the software, selecting various test options
under any of the endpoints, designing a study, creating and comparing multiple
designs, simulating and monitoring a study, invoking the graphics, saving your work in
files, retrieving previously saved studies, obtaining on-line help and printing reports. It
basically describes the menu structure and the menus available in East software, which
is a menu driven system. Almost all features are accessed by making selections from
the menus.
Chapter 4 discusses the Data Editor menu of East 6 which allows you to create and
manipulate the contents of your Case Data and Crossover Data. This menu is in use
while working with the Analysis menu as well as with some other features like PIP or
Conditional Simulations.
These features are illustrated with the help of a simple worked example of a binary
endpoint trial.

2

<<< Contents

* Index >>>

2
2.1

System Requirements
to run East 6

Installing East 6

The minimum hardware/operating system/software requirements for East 6
(standalone version of the software or the East client in case of concurrent version) are
listed below:
In case of Standalone version and East clients in case of concurrent version, the
following operating systems are supported:
– Windows 7 (32-bit / 64 bit)
– Windows 8 (32-bit / 64 bit)
– Windows 8.1 (32-bit / 64-bit)
– Windows 10 (32-bit / 64-bit)
– All of above for computers with English, European and Japanese versions
of Windows.
In case of concurrent user version, the following server operating systems are
supported:
– Windows 7 (32-bit / 64 bit)
– Windows 8 (32-bit / 64 bit)
– Windows 8.1 (32-bit / 64-bit)
– Windows 10 (32-bit / 64-bit)
– All of above for computers with English, European and Japanese versions
of Windows
– Windows Server 2008 (32-bit / 64-bit)
– Windows Server 2012
– Citrix
∗
∗
∗
∗

XenApp 6.0 on Windows 2008
XenApp 6.5 on Windows 2008
XenApp 7.6 on Windows 2008
XenApp 7.6 on Windows 2012

Further, East has the following hardware/software requirements:
– CPU -1 GHz or faster x86 (32 bit) or x64 (64 bit) processor
– Memory - Minimum 1 GB of RAM
– Hard Drive - Minimum 5 GB of free hard disk space
– Display - 1024 x 768 or higher resolution
2.1 System Requirements to run East 6

3

<<< Contents

* Index >>>

2 Installing East 6
– Microsoft .Net Framework 4.0 Full (this will be installed as a part of
prerequisites if your computer does not have it)
– Microsoft Visual C++ 2010 SP1 (this will be installed as a part of
prerequisites if your computer does not have it) Installer 4.5
– Internet Explorer 9.0 or above
– A stable internet connection is required during installation so that
prerequisites like the
– East is compatible and supported with R versions between 2.9.0 to 3.2.3.
East may or may not work well with later versions of R. If R is not
installed, the ability to include custom R functions to modify specific
simulation steps will not be available. The R integration feature is an
Add-on to East and is required only to integrate custom R functions with
East. But note that this feature doesn’t affect any of the core functionalities
of East.

2.2

Other Requirements

Users with Windows 7 or above: East uses the font Verdana. Generally Verdana is a
part of the default fonts installed by Windows. However, sometimes this font may not
be available on some computers, especially if a language other than English has been
selected. In such cases, the default fonts need to be restored. To restore fonts, go to
Control Panel → Fonts → Font settings. Click the button “Restore default font
settings”. This will restore all default fonts including Verdana. Note that this must be
done before the first use of East.
Users with Windows 8.1 On some computers with Windows 8.1, problems may be
observed while uninstalling East, especially if the user has upgraded from the previous
version using a patch. This is because of a security update KB2962872 (MS14-037)
released by Microsoft for Internet Explorer versions 6, 7, 8, 9, 10 and 11. Microsoft
has fixed this issue and released another security update KB2976627 (MS14-051) for
Internet Explorer which replaces the old problematic update. So it is recommended
that users who are affected by this issue install security update KB2976627
(MS14-051) on their computers.

2.3

Installation

IMPORTANT: Please follow the steps below if you are installing a standalone/single
user version of East. If you are installing a concurrent version, please refer to the
document ”Cytel License Manager Setup.pdf” for detailed installation instructions.
1. Uninstalling Previous VersionsIf any version (including a beta or demo) of
East 6 is currently installed on your PC, please uninstall it completely or else the

4

2.3 Installation

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
installation of the current version will not proceed correctly. To uninstall the
earlier version of East 6, go to the Start Menu and select:
All Programs→ Cytel Architect → East 6.x→ Uninstall
Or
All Programs→ East Architect → Uninstall East
Architect
depending upon the version installed on your computer.
2. Installing Current Version You will need to be an administrator of your
computer in order to perform the following steps. If you do not have
administrator privileges on your computer, please contact your system
administrator / IT.
In order to install East, please follow these steps:
(a) If you received an email containing a link for downloading the setup,
please follow the link and download the setup. This will be a zipped folder.
Unzip this folder completely.
(b) In the setup folder, locate the program setup.exe and double-click on it.
Follow the instructions on the subsequent windows.

2.4

Installation
Qualification
and Operational
Qualification

To perform the installation and operational qualification of East 6, go to the Start
Menu and select
All Programs→ Cytel Architect → East 6.4→ Installation
Qualification (IQ).
You will be presented with the following dialog box.

It will take a few minutes to complete. At the end of the process, the status of the
installation qualification will appear. Press Enter (or any other key) to open the
2.4 Installation Qualification and Operational Qualification

5

<<< Contents

* Index >>>

2 Installing East 6
validation log. Similarly, one can run the Operational Qualification (OQ). If the
validation is successful, the log file will contain a detailed list of all files installed by
East on your computer and other details related to IQ and OQ.
If the validation fails, the validation log file will contain detailed error messages.
Please contact your system administrator with the log file.
IQ (Installation Qualification) script: This script verifies whether the software
is completely and correctly installed on the system or not. It does this by
checking whether all the software components, XML and DLL files are in place.
OQ (Operational Qualification) script: This script runs some representative
test cases covering all the major modules/features of East and compares the
runtime results to the benchmarks (benchmarks are validated results stored
internally in the OQ program). It ensures the quality and consistency of the
results in the new version.
Manual Examples: In addition to IQ/OQ, if more testing is to be done, refer to
the user manual and reproduce the results for some representative
examples/modules. The flow of examples is easy to follow. Some examples in
the manual require additional files (datasets) which are available to you in the
Samples folder.
Validation Chapter: There is a chapter in this manual dedicated to describe
how every feature was validated within Cytel. Refer to the appendix chapter Y
on ”Validating East Software”. This covers validation strategies for all the
features available in East 6.

6

2.4 Installation Qualification and Operational Qualification

<<< Contents

* Index >>>

3

Getting Started

East has evolved over the past several years with MS Excel R as the user interface.
The East on MS Excel R did not integrate directly with any other Cytel products.
Under the Architect platform, East is expected to coexist and integrate seamlessly
with other Cytel products such as SiZ, and Compass. Architect is a common platform
designed to support various Cytel products. It provides a user-friendly,
Windows-standard graphical environment, consisting of tabs, icons, and dialog boxes,
with which you can design, simulate and analyze. Throughout the user manual, this
product is referred to as East 6.
One major advantage of East 6 is the facility for creating multiple designs. This is
achieved by giving multiple inputs of the parameters as either comma separated, or in a
range such as (a:b:c) with a as the initial value, b as the last value and c as the step
size. If you give multiple values for more than one parameter, East creates all possible
combinations of the input parameters. This is an immense advancement over earlier
versions of East, where you had to create one design at a time. Furthermore, one could
not compare different types of designs (e.g., superiority vs. noninferiority designs).
Similarly, graphical comparison of designs with different numbers of looks was
difficult with earlier versions of East. All such comparisons are readily available in
East 6.
Another new feature is the option to add assumptions for accruals and dropouts at the
design stage. Previously, this was available only for survival endpoint trials, but has
been extended to continuous and discrete endpoints in East 6. Information about
accrual rates, response lag, and dropouts can be given whether designing or simulating
a trial. This makes more realistic, end-to-end design and simulation of a trial possible.
Section 3.6 discusses all the above features under the Design menu with the help of a
case study, CAPTURE.
Simulations help to develop better insight into the operating characteristic of a design.
In East 6, the simulation module has now been enhanced to allow fixed or random
allocation to treatment and control, and different sample sizes. Such options were not
possible with earlier versions of East. Section 3.7 briefly describes the Simulations in
East 6.
Section 3.8 discusses capability to flexibly monitoring a group sequential trial using
the Interim Monitoring feature of East 6.
We have also provided powerful data editors to create, view, and modify data. A wide
variety of statistical tests are now a part of East 6, which enables you to conduct
7

<<< Contents

* Index >>>

3 Getting Started
statistical analysis of interim data for continuous, discrete and time to event endpoints.
Sections 3.4 and 3.5 briefly describes the Data Editor and Analysis menus in East 6.
The purpose of this chapter is to familiarize you with the East 6 user interface.

3.1

Workflow in East

In this section, the architecture of East 6 is explained. The logical workflow in which
the different parts of the user interface co-ordinate with each other is discussed.
The basic structure of the user interface is depicted in the following diagram.

Besides the top Ribbon, there are four main windows in East 6 namely, (starting from
left), the Library pane, the Input / Output window, the Output Preview window and
the Help pane. Note that both, the Library and the Help Pane can be auto-hidden
temporarily or throughout the session, allowing the other windows to occupy larger
area on the screen for display.
Initially, Library shows only the Root node. As you work with East, several nodes
corresponding to designs, simulation scenarios, data sets and related analyses can be
managed using this panel. Various nodes for outputs and plots are created in the
Library, facilitating work on multiple scenarios at a time. The width of the Library
window can be adjusted for better readability.
The central part of the user interface, the Input / Output window, is the main work
area where you can8

3.1 Workflow in East

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
Enter input parameters for design computation create and compare multiple
designs, view plots
Simulate a design under different scenarios
Perform interim analysis on a group sequential design look by look and view the
results, receive decisions such as stopping or continuing during the execution of
a trial
Open a data on which you want to perform analysis, enter new data, view
outputs, prepare a report etc.
This is the area where the user interacts with the product most frequently.
The Output Preview window compiles several outputs together in a grid like structure
where each row is either a design or simulation run. This area is in use only when
working with Design or Simulations.
When the Compute or Simulate button is clicked, all requested design or simulation
results are computed and are listed row wise in the Output Preview window:

By clicking different rows of interest while simultaneously holding the Ctrl key, either
a single or multiple designs can be displayed in the Output Summary in vertical
3.1 Workflow in East

9

<<< Contents

* Index >>>

3 Getting Started
manner or side-by-side comparison can be done.

Note that the active window and the Output Preview can be minimized, maximized,
or resized. If you want to focus on the Output Summary, click the
icon in the
top-right corner of the main window. The Output will be maximized as shown below:

Any of the designs/simulations in the Output Preview window can be saved in the
Library, as depicted in the following workflow diagram.

10

3.1 Workflow in East

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016

Double click any of these nodes and the detailed output of the design will be displayed.
This will include all relevant input and output information. Right clicking any design
node in the Library will allow you to perform various operations on the design such as
interim monitoring and simulation.
The Help pane displays the context sensitive help for the control currently under the
focus. This help is available for all the controls in the Input / Output window. This
pane also displays the design specific help which discusses the purpose of the selected
test, the published literature referred while developing it and the chapter/section
numbers of this user manual to quickly look-up for more details. This pane can be
hidden or locked by clicking the pin in its corner.
All the windows and features mentioned above are described in detail with the help of
an illustration in the subsequent sections of this chapter.

3.2

A Quick Overview of
User Interface

Almost all the functionalities of East 6 are invoked by selecting appropriate menu
items and icons from the Ribbon. The interface consists of four windows as described
3.2 A Quick Overview of User Interface

11

<<< Contents

* Index >>>

3 Getting Started
in the previous section and four major menu items. These menu items are:

Home. This menu contains typical file-related Windows sub-menus. The Help
sub-menu provides access to this manual.
Data Editor. This menu will be available once a data set is open, providing
several sub-menus used to create, manage and transform data.
Design. This menu provides a sub-menu for each of the study designs which can
be created using East 6. The study designs are grouped according to nature of
the response. The tasks like Simulations and Interim Monitoring are available
for almost all the study designs under this menu.
Analysis. This menu provides a sub-menu for each of the analysis procedure
that can be carried out in East 6. The tests are grouped according to the nature of
the response. There are also options for basic statistics and plots.

3.3

Home Menu

3.3.1 File
3.3.2 Importing workbooks
from East5.4
3.3.3 Settings
3.3.4 View
3.3.5 Window
3.3.6 Help

The Home menu contains icons that are logically grouped under File, Settings, View,
Window and Help. These icons can be used for specific tasks.

3.3.1

File

Click this icon to create new case data or crossover data. A new workbook or
log can also be created.
Click this icon to open a saved data set, workbook, or log file.
Click this icon to import external files created by other programs.
Click this icon to export files in various formats.
Click this icon to save the current files or workbooks.
Click this icon to save a file or workbook with different name.
12

3.3 Home Menu – 3.3.2 Importing workbooks from East5.4

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016

3.3.2

Importing workbooks from East5.4

East allows the conversion of workbooks previously created in East 5.4 (and above) to
be imported into East 6 for further development. In order to open a workbook with the
.es5 extension given by previous versions of East, it must first be converted to a file
with the .cywx extension that will be recognized by East 6. This is easily accomplished
through the Covert Old Workbook utility. Click the
to see the location of this utility.

icon under Home menu

From the Start Menu and select:
All Programs→ Cytel Architect → East 6.x→ Convert Old
Workbook

We can see the following window which accepts East5.4 workbook as input and
outputs a workbook of East6. Click the Browse buttons to choose the East 5.4 file to

3.3 Home Menu – 3.3.2 Importing workbooks from East5.4

13

<<< Contents

* Index >>>

3 Getting Started
be converted and the file to be saved with .cywx extension of East 6 version.

To start the conversion, click Convert Workbook. Once complete, the file can be
opened as a workbook in East 6 as shown below:

In order to convert files from East 5.3 or older versions, open the file in East 5.4, save it
with a new name say with a suffix East5.4 and then convert this 5.4 file to 6.x as
explained above. To get East 5.4 or any help regarding file conversion, contact Cytel at
support@cytel.com.

14

3.3 Home Menu – 3.3.3 Settings

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016

3.3.3

Settings

Click the

icon in the Home menu to adjust default values in East 6.

The options provided in the Display Precision tab are used to set the decimal places of
numerical quantities. The settings indicated here will be applicable to all tests in East 6
under the Design and Analysis menus.
All these numerical quantities are grouped in different categories depending upon their
usage. For example, all the average and expected sample sizes computed at simulation
or design stage are grouped together under the category ”Expected Sample Size”. So to
view any of these quantities with greater or lesser precision, select the corresponding
category and change the decimal places to any value between 0 to 9.
The General tab has the provision of adjusting the paths for storing workbooks, files,
and temporary files. These paths will remain throughout the current and future sessions
even after East is closed. This is the place where we need to specify the installation
directory of the R software in order to use the feature of R Integration in East6.

3.3 Home Menu – 3.3.3 Settings

15

<<< Contents

* Index >>>

3 Getting Started
The Design Defaults is where the user can change the settings for trial design:

Under the Common tab, default values can be set for input design parameters.
You can set up the default choices for the design type, computation type, test type and
the default values for type-I error, power, sample size and allocation ratio. When a new
design is invoked, the input window will show these default choices.
Time Limit for Exact Computation
This time limit is applicable only to exact designs and charts. Exact methods are
computationally intensive and can easily consume several hours of computation
time if the likely sample sizes are very large. You can set the maximum time
available for any exact test in terms of minutes. If the time limit is reached, the
test is terminated and no exact results are provided. Minimum and default value
is 5 minutes.
Type I Error for MCP
If user has selected 2-sided test as default in global settings, then any MCP will
use half of the alpha from settings as default since MCP is always a 1-sided test.
Sample Size Rounding
Notice that by default, East displays the integer sample size (events) by rounding
up the actual number computed by the East algorithm. In this case, the
look-by-look sample size is rounded off to the nearest integer. One can also see
the original floating point sample size by selecting the option ”Do not round
sample size/events”.
16

3.3 Home Menu – 3.3.3 Settings

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
Under the Group Sequential tab, defaults are set for boundary information. When a
new design is invoked, input fields will contain these specified defaults. We can also
set the option to view the Boundary Crossing Probabilities in the detailed output. It can
be either Incremental or Cumulative.

Simulation Defaults is where we can change the settings for simulation:

If the checkbox for ”Save summary statistics for every simulation” is checked, then
East simulations will by default save the per simulation summary data for all the
3.3 Home Menu – 3.3.3 Settings

17

<<< Contents

* Index >>>

3 Getting Started
simulations in the form of a case data.
If the checkbox for ”Suppress All Intermediate Output” is checked, the intermediate
simulation output window will be always suppressed and you will be directed to the
Output Preview area.
The Chart Settings allows defaults to be set for the following quantities on East6
charts:

We suggest that you do not alter the defaults until you are quite familiar with the
software.

3.3.4

View

The View submenu consists of enabling or disabling the Help and Library panes by
(un)checking the respective check boxes.

3.3.5

Window

The Window submenu contains an Arrange and Switch option. This provides the
18

3.3 Home Menu – 3.3.5 Window

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
ability to view different standard arrangements of available windows (Design Input
Output, Log, Details, charts and plots) and to switch the focus from one window to
another.

3.3.6

Help

The Help group provides the following ways to access the East6 documentation:

User Manual: Invoke the current East 6 user manual.
Tutorial: Invoke the available East 6 tutorials.
About East 6: Displays the current version and license information for the
installed software.
Update License: Use this utility to update the license file which you will be
receiving from Cytel.

3.4

Data Editor Menu

All submenus under the Data Editor menu are enabled once a new or existing data set
is open. The Open command under the Home menu shows the list of items that can be
opened:

Suppose East 6 is installed in the directory C:/Program Files (x86)/Cytel/Cytel
3.4 Data Editor Menu

19

<<< Contents

* Index >>>

3 Getting Started
Architect/East 6.4 on your machine. You can find sample datasets in the Samples
under this directory.

Suppose, we open the file named Toxic from the Samples folder. The data is displayed
in the main window under the Data Editor menu as shown:

Here the columns represent the variable and the rows are the different records. Placing
the cursor on a cell containing data will enable all submenus under the Data Editor
menu. The submenus are grouped into three sections, Variable, Data and Edit. Here
we can modify and transform variables, perform operations on case data, and edit a
case or variable in the data.
The icons in the Variable group are:
Creates a new variable at the current column position.
20

3.4 Data Editor Menu

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016

Renames the current variable.
Modifies the currently selected variable.
Transforms the currently selected variable.
Numerous algebraic, statistical functions are available which can be used to transform
the variable. This feature can also be used to generate a data randomly from
distributions such as Normal, Uniform, Chi-Square etc.
The following functions are available in the Data group:
Sorts case data in ascending or descending order.
Filter cases from the case data as per specified criteria.
Converts case data to crossover data.
Converts crossover data to case data.
Displays case data contents to the log window.

For the Edit group the following options are available:
Selects a case or variable.
Inserts a case or variable.
Deletes a case or variable.
Navigates to a specified case.

3.5

Analysis Menu

The Analysis menu allows access to analytical tests which can be performed in East 6.

3.5.1 Basic Plots
3.5.2 Crossover Plots

The tests available in the Analysis menus are grouped according to the nature of the
response variable. Click an icon to select the test available in a drop down menu.
3.5 Analysis Menu

21

<<< Contents

* Index >>>

3 Getting Started
Basic Statistics - This part contains tests to compute basic statistics and
frequency distribution from a dataset.
Continuous - This part groups analysis tests for continuous response.
Discrete - This part groups all analysis tests for discrete response.
Events - This group contains tests for time to event outcomes
Predict - This group contains different procedures to predict the future course of
the trial given the current subject level data or summary data. Refer to
chapter 68 for more details.

3.5.1

Basic Plots
Bar and pie charts for categorical data.
Plots such as area, bubble, scatter plot and normality plots for continuous data.

Plots related to frequency distributions such as histogram, stem and leaf plots,
cumulative plots.

3.5.2

Crossover Plots

This menu provides plots applicable to 2x2 crossover data.
Subject plots.
Summary plots.
Diagnostic plots.

All the tests under Analysis menu are discussed in detail under Volume 8 of this
manual.

22

3.5 Analysis Menu – 3.5.2 Crossover Plots

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016

3.6

Design Menu

3.6.1 Design Input-Output
Window
3.6.2 Creating Multiple
Designs
3.6.3 Filter Designs
3.6.4 What is a Workbook?
3.6.5 Group Sequential
Design for the
CAPTURE Trial
3.6.6 Adding a Futility
Boundary
3.6.7 Accrual Dropout
Information
3.6.8 Output Details

This section discusses with the help of the CAPTURE trial the various East features
mentioned so far in this chapter. This was a randomized clinical trial of placebo versus
Abciximab for patients with refractory unstable angina. Results from this trial were
presented at a workshop on clinical trial data monitoring committees Randomised
placebo-controlled trial of abciximab before and during coronary intervention in
refractory unstable angina: the CAPTURE study, THE LANCET: Vol 349 - May 17,
1997.
Let us design, simulate and monitor the CAPTURE trial using East6. The goal of this
study is to test the null hypothesis, H0 , that the Abciximab and placebo arms both have
an event rate of 15%, versus the alternative hypothesis, H1 , that Abciximab reduces
the event rate by 5%, from 15% to 10%. It is desired to have a 2-Sided test with three
looks at the data, a type-1 error, α as 0.05 and a power, (1 − β) as 0.8.
We shall start with a fixed sample design and then extend it to group sequential design.
In this process, we demonstrate the useful features of Architect one by one.
To begin, click Design menu, then Two Samples on the Discrete group, and then click
Difference of Proportions.

Below the top ribbon, there are three windows: the Input/Output, the Library, and
the Help. All these windows are explained in section 3.1 on Workflow of East. Both
the Library and the Help can be hidden temporarily or throughout the session. The

3.6 Design Menu

23

<<< Contents

* Index >>>

3 Getting Started
input window for Difference of Proportions test appears as shown below:

The design specific help can be accessed by clicking the
design. This help is available for all the designs in East6.

3.6.1

icon after invoking a

Design Input-Output Window

This window is used to enter various design specific input parameters in the input
fields and drop-down options available. Let us enter the following inputs for the
CAPTURE Trial and create a fixed sample design. Test Type as 2-Sided, Type I Error
as 0.05, Power as 0.8, πc as 0.15 and πt as 0.1. On clicking Compute button, a new
row for this design gets added in the Output Preview window. Select this row and
click the
icon. Rename this design as CAPT-FSD to indicate that it is a fixed
sample design for the CAPTURE trial.

24

3.6 Design Menu – 3.6.2 Creating Multiple Designs

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016

3.6.2

Creating Multiple Designs

Before finalizing on any particular study design, the statisticians might want to assess
the operating characteristics of the trial under different conditions and over a range of
parameter values. For example, when we are working on time-to-event trials, we want
to see the effect of different values of hazard ratio on the overall power and duration of
the study.
East makes it easy to rapidly generate and assess multiple options, to perform
sensitivity analysis, and select the optimal plan. We can enter multiple values for one
or more input parameters and East creates designs for all possible combinations. These
designs can then be compared in a tabular as well as graphical manner.
Following are the three ways in which we can enter the multiple values:
Comma-separated values: (0.8, 0.9, 0.95)
Colon-separated range of values: (0.8 to 0.9 in steps of 0.05 can be entered as
0.8:0.9:0.05)
Combined values: (0.7, 0.8, 0.85: 0.95: 0.01)
Multiple values can be entered only in the cells with pink background color.
Now suppose, we want to create designs for two values of Type I Error, three values
of Power and four values of πt : 0.1, 0.2 : 0.3 : 0.05. Without changing other
parameters, let us enter these ranges for the three parameters as shown below:

On clicking Compute button, East will create 2 × 3 × 4 = 24 designs for the
CAPTURE Trial. To view all the designs in the Output Preview window, maximize it

3.6 Design Menu – 3.6.2 Creating Multiple Designs

25

<<< Contents

* Index >>>

3 Getting Started
from the right-hand top.

3.6.3

Filter Designs

Suppose we are interested in designs with some specific input/output values, we can
set up a criterion by using Filter functionality by clicking the
icon available on
the top right corner of Output Preview window.
For example, we want to see designs with Sample Size less than 1000 and Type I Error

26

3.6 Design Menu – 3.6.3 Filter Designs

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
equal to 0.05.

The qualified designs appear in the Output Preview window as shown below:

The filter criteria can be edited or cleared by again clicking the Filter icon. On clearing
the above criterion, all the 24 designs are displayed back. Before we proceed, let us
first delete these recently created 24 designs, leaving behind CAPT-FSD and then
minimize the Output Preview window from the right-hand top.
One or more rows in the can be deleted by selecting them and clicking the
Use the Ctrl key and mouse click to select specific rows.
Use the Shift key and mouse click to select all the rows in the range.
Use the combination Ctrl + A to select all the rows.
The resulting Output Preview is shown below:

icon.

It is advisable to save this design or any work which you would like to refer in future in
an East Workbook. The next subsection briefly discusses about use of workbooks.
3.6 Design Menu – 3.6.4 What is a Workbook?

27

<<< Contents

* Index >>>

3 Getting Started
3.6.4

What is a Workbook?

A Workbook is a storage construct managed by East for holding different types of
generated outputs. The user designs a trial, simulates it, monitors it at several interim
looks, conducts certain analyses, draws plots, etc. All of these outputs can be kept
together in a workbook which can be saved and retrieved for further development when
required. . Note that a single workbook can also contain outputs from more than one
design. Select the design CAPT-FSD in the Output Preview window and click the
icon. When a design is saved to the library for the first time, East automatically
creates a workbook named Wbk1 which can be renamed by right-clicking the node.

Let us name it as CAPTURE. Now this is still a temporary storage which means if we
exit out of East without saving it permanently, the workbook will not be available in
future.
Note that Workbooks are not saved automatically on your computer; they are to
be saved by either right-clicking the node in the Library and selecting Save or

28

3.6 Design Menu – 3.6.4 What is a Workbook?

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016

clicking the

icon.

In addition, the user will be prompted to save contents of the Library while closing
East 6.
Many a times, we wish to add some specific comments to a design or any other output
window. These comments are useful for future references. One can do that by
attaching a Note to any node by selecting it and clicking on the
icon. A small
window will pop up where comments can be stored.

Once saved, a yellow icon against the design node will indicate the presence of a note.
If you want to view or remove the note, right click the design node, select Note, and
clear the contents.
The tabs available on the status bar at the bottom left of the screen can be used to
navigate between the active windows of East.

3.6 Design Menu – 3.6.4 What is a Workbook?

29

<<< Contents

* Index >>>

3 Getting Started

For example, if you wish to return to the design inputs, click the Input button which
will take you the latest Input window you worked with. As we proceed further, more
such tabs will appear enabling us to navigate from one screen of East to another.

3.6.5

Group Sequential Design for the CAPTURE Trial

icon in the Library to modify the
Select the design CAPT-FSD and click the
design. On clicking this icon, following message will pop up. Click ”Yes” to continue.

Let us extend this fixed sample design to a group sequential design by changing the
Number of Looks from 1 to 3. It means that we are planning to take 2 interim looks and
one final look at the data while monitoring the study.

An additional tab named Boundary is added which allows us to enter inputs related to

30

3.6 Design Menu – 3.6.5 Group Sequential Design for the CAPTURE Trial

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
the boundary family, look spacing and error spending functions.

Let the boundary family be Spending Functions and the alpha spending function,
Lan-DeMets with the parameter OF. Click on Compute to create the three-look
design and rename it as CAPT-GSD.
As you go on creating multiple designs in East, the output preview area can become
too busy to manage. Thus, you can also select the designs you are interested in, save
them in the workbook and then rename them appropriately. The Output Preview
window now looks as shown below:

Notice that CAPT-GSD requires 18 subjects more than CAPT-FSD to achieve 80%
power. This view gives us the horizontal comparison of two designs. Save the design
CAPT-GSD in the workbook.
One can also compare these designs in a vertical manner. Select the two designs by
clicking on one of them, pressing Ctrl and then clicking on the other one. Next, click

3.6 Design Menu – 3.6.5 Group Sequential Design for the CAPTURE Trial

31

<<< Contents

* Index >>>

3 Getting Started
the

icon.

This is the Output Summary window of East which compares the two designs
vertically. We can easily copy this display from East to MS Excel and modify/save it
further in any other format. To do that, right click anywhere in the Output Summary
window, select Copy All option and paste the copied data in an Excel workbook. The
table gets pasted as two formatted columns.
Let us go back to the input window of CAPT-GSD (select the design and click the
icon) and activate the Boundary tab. By default, the boundary values in the table at the
bottom of this tab are displayed on Z Scale. We can also view these boundaries on
other scales such as: Score Scale, δ Scale and p-value Scale.

Let us view the efficacy boundaries for CAPT-GSD on a p-value scale.

32

3.6 Design Menu – 3.6.5 Group Sequential Design for the CAPTURE Trial

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016

The final p-value required to attain statistical significance at level 0.05 is 0.0463. This
is sometimes regarded as the penalty for taking two interim looks at the data.
Also observe that, although the maximum sample size for this design is 1384, the
expected sample size under alternative that δ = -0.05 is much less, 1183. However,
there is very little saving under the null hypothesis that δ = 0. The sample size in this
case is 1378.
Therefore, it might be beneficial to consider replacing the lower efficacy boundary by a
futility boundary. Also, sometimes we might wish to stop a trial early because the
effect size observed at an interim analysis is too small to warrant continuation. This
can be achieved by using β-spending function and introducing a futility boundary at
the design stage.

3.6.6

Adding a Futility Boundary

Select the design CAPT-GSD and click
icon to edit it. Change the Test Type
from 2-Sided to 1-Sided and also the Type I Error from 0.05 to 0.025. Go to
Boundary tab and add the futility boundaries by using γ (-2) spending function.

3.6 Design Menu – 3.6.6 Adding a Futility Boundary

33

<<< Contents

* Index >>>

3 Getting Started
Before we create this design, we can see the error spending chart and the boundaries
chart for the CAPTURE trial with efficacy as well as futility boundaries. This gives us
a way to explore different boundary families and error spending functions and deciding
icon to
upon the desired combination before even creating a design. Click the
view the Error Spending Chart.

34

3.6 Design Menu – 3.6.6 Adding a Futility Boundary

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
Click the

icon to view the Boundaries Chart.

The shaded region in light pink corresponds to the critical region for futility and the
one in light blue corresponds to the critical region for efficacy.
We can also view the boundaries on conditional power scale in presence of a futility
boundary. Select the entry named cp deltahat Scale from the dropdown Boundary

3.6 Design Menu – 3.6.6 Adding a Futility Boundary

35

<<< Contents

* Index >>>

3 Getting Started
Scale. The chart is be updated and the boundaries are displayed on CP scale.

Zooming the Charts

To zoom into any area of the chart, click and drag the mouse over that area. After
clicking Zoom button, click on the plot at the top left corner of the area you want to
magnify, keep the mouse button pressed and drag the mouse over the desired area. This
draws a rectangle around that area. Now leave the mouse button and East magnifies the

36

3.6 Design Menu – 3.6.6 Adding a Futility Boundary

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
selected area. You can keep doing this to zoom in further.

The magnified chart appears as below:

Note that after zooming, the Zoom button changes to Reset. When you click it, the plot
3.6 Design Menu – 3.6.6 Adding a Futility Boundary

37

<<< Contents

* Index >>>

3 Getting Started
is reset back to the original shape.
Let us compute the third design for the CAPTURE trial and rename it as
CAPT-GSD-EffFut. Save it in the workbook.

Click the

icon to compare all the three designs side-by-side as explained above.

Along with the side-by-side comparison, let us compare the two group sequential
designs graphically. Press Ctrl and click on CAPT-FSD. Notice that the remaining
two designs are still highlighted which means they are selected and CAPT-FSD is
unselected. Now click the
icon and select Stopping Boundaries to view the
graphical comparison of boundaries of the two designs.

38

3.6 Design Menu – 3.6.6 Adding a Futility Boundary

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
As we can see, the design CAPT-GSD uses an upper efficacy boundary whereas
CAPT-GSD-EffFut uses an upper futility boundary. We can turn ON and OFF the
boundaries by checking the boxes available in the legends.
Before we proceed, let us save this third design in the workbook. We can also create
several workbooks in the Library and then compare multiple designs across the
workbooks. This is an advantage of working with workbooks in East6.

3.6.7

Accrual / Dropout option for Continuous and Discrete Endpoints

In the earlier versions of East, the option to incorporate the accrual and dropout
information was available only for tests under time-to-event/survival endpoint. East 6
now provides this option for almost all the tests under Continuous and Discrete
endpoints as well. Let us see the use it in CAPTURE trial. Select the design
CAPT-GSD-EffFut from the Library and edit it to add the accrual-dropout
information. From the Design Parameters tab, add the option Accrual/Dropout Info
by clicking on Include Options button.

Let the accrual rate be 12 subjects/week. Suppose we expect the response to be
observed after 4 weeks from the recruitment. Let us create a design by first assuming
that there will not be any dropouts during the course of trial. We will then introduce
some dropouts and compare the two designs. After entering the above inputs, click on

3.6 Design Menu – 3.6.7 Accrual Dropout Information

39

<<< Contents

* Index >>>

3 Getting Started
the

icon to see how the subjects will accrue and complete the study.

Close the chart, create the design by clicking the Compute button, save it in the
workbook CAPTURE and rename it as CAPT-GSD-NoDrp to indicate that there are
no dropouts in this design. Notice that in this design, the maximum sample size and
maximum number of completers is same as there is no dropout.
Let us now introduce dropouts. Suppose there is a 5% chance of a subject dropping out

40

3.6 Design Menu – 3.6.7 Accrual Dropout Information

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
of the trial.

Notice that the two lines are not parallel anymore because of the presence of dropouts.
Click Compute button to create this design. Save the design in the workbook
CAPTURE and rename it as CAPT-GSD-Drp. Compare this design with
CAPT-GSD-NoDrp by selecting the two designs and clicking on

icon

Notice the inflation in sample size for CAPT-GSD-Drp. This design will require
additional 80 subjects to obtain data on 1455 subjects (1455 completers).
Let us now compare all the five designs saved in the workbook. Select them all

3.6 Design Menu – 3.6.7 Accrual Dropout Information

41

<<< Contents

* Index >>>

3 Getting Started
together and click the

icon.

The resulting screen will look as shown below:

We can see additional quantities in the design CAPT-GSD-Drp. These correspond to
42

3.6 Design Menu – 3.6.7 Accrual Dropout Information

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
the information on total number of completers and the study duration which are
computed by taking into account the non-zero response lag and possibility of dropouts.
Also notice the trend in Maximum Sample Size across all these designs. We can see
that it increases as more constraints are added to the study. But if we see values of
Expected Sample Size under null and alternative, there is a significant potential
saving.
You can also save this output summary window comparing three designs in the library
by clicking the

3.6.8

icon

Output Details

In the earlier part of this chapter, we have seen the design output at two different
places: Output Preview (horizontal view) and Output Summary (vertical view). The
final step in the East6 design workflow is to see the detailed output in the form of an
HTML file.
Select the design CAPT-GSD-Drp from the Library and click the
icon.
Alternatively, one can also double-click on any of the nodes in the Library to see the

3.6 Design Menu – 3.6.8 Output Details

43

<<< Contents

* Index >>>

3 Getting Started
details.

The output details are broadly divided into two panels. The left panel consists of all the
input parameters and the right panel consists of all the design output quantities in the
tabular format. These tables will be explained in detail in subsequent chapters of this
manual.
Click the Save icon to save all the work done so far. This is the end of introduction to
the Design Menu. The next section discusses another very useful feature called
Simulations.

3.7

Simulations in East6

A simulation is a very useful way to perform sensitivity analysis of the design
assumptions. For instance - What happens to the power of the study when the δ value
is not the same as specified at the design stage?
We will now simulate design CAPT-GSD-Drp. Select this design from the library and
click the
icon. Alternatively, you can right-click this design in the Library, and
select Simulate.

44

3.7 Simulations in East6

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
The default view of input window for simulations is as shown below:

Notice that the value of δ on the Response Generation tab is -0.05. This corresponds
to the difference in proportions under the alternative hypothesis. You may either keep
this default value for the simulation or change it if you wish to simulate the study with
a different value of δ. Let us run some simulations by changing the value of δ. We will
run simulations over a range of values for πt , say, 0.1,0.125 and 0.14. Enter the values
as shown below:

Before running simulations, let us have a quick look at the Simulation Control tab
where we can change the number of simulations, save the simulation data in East
format or in a csv format and some more useful things. You can manipulate the
simulations with the following actions:
Enter the number of simulations you wish to run in the ”Number of Simulations”
field. The default is 10000 simulations.
Increase/ Decrease the ”Refresh Frequency” field to speed up or slow down the
simulations. The default is to refresh the screen after every 1000 simulations.
Set the Random Number Seed to Clock or Fixed. The default is Clock.
Select the checkbox of ”Suppress All Intermediate Output” to suppress the
intermediate output.
3.7 Simulations in East6

45

<<< Contents

* Index >>>

3 Getting Started
To see the intermediate results after a specific number of simulations, select the
checkbox of ”Pause after Refresh” and enter the refresh frequency accordingly.
The checkbox of ”Stop At End” is selected by default to display the summary results at
the end of all the simulations a corresponding item gets created in the Output Preview
window. One can uncheck this box and save the simulation node directly in the
Output Preview window.
One can also save the summary statistics for each simulation run and the subject level
simulated data in the form of a Case Data or a Comma Separated File. Select the
checkboxes accordingly and provide the file names and paths while using the CSV
option. If you are saving the data as Case Data, the corresponding data file will be
associated with the simulation node. It can be accessed by saving the simulation node
from Output Preview to the workbook in Library.
For now, let us keep the Simulation Control tab as shown below:

Click the Simulate button on right hand side bottom to run the simulations. Three
scenarios corresponding to three values of πt are simulated one after the other and in
the end, the following output window appears. This is the Simulation Intermediate
Output window which shows the results from last simulated scenario. The two plots

46

3.7 Simulations in East6

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
on this window are useful to see how the study performed over 10000 simulations.

Click the Close button on this intermediate window which takes us to the Output
Preview window. Save these three simulation rows in the workbook CAPTURE.
Since we simulated the design CAPT-GSD-Drp, the three simulation nodes get saved
as child nodes of this design. This is the hierarchy which is followed throughout the
East6 software.

A full picture of the CAPTURE trial design with accrual/dropout information and its
simulations can be viewed easily. Select the three simulation nodes and the parent
3.7 Simulations in East6

47

<<< Contents

* Index >>>

3 Getting Started
design node in the Library and click the

icon.

Note the drop in simulated power as the difference between the two arms decreased.
This is because, the sample size of 1532 was insufficient to detect the δ value -0.025
and -0.01. It shows the effect of mis-specifying the alternative hypothesis. It did
achieve the power of 80% for the first case with δ equal to -0.05 which was actually the
δ at the design stage. This is called simulating the design under Alternative. We can
also simulate a design under Null by entering πt equal to 0.15, same as πc and verify
that the type I error is preserved.
The column width is the comparison mode is fixed and the heading appears in the
format workbook name:design name:Sim. If this string is longer than the fixed width
then you may not be able to see the complete heading. In that case, you can hover the
mouse on cell of column heading to see the complete heading.

48

3.7 Simulations in East6

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
Thus, simulations in East6 are one of the very powerful tools which help us to verify
the operating characteristics of the design.
The next section introduces us to another key feature of East6 - Interim Monitoring.
Let us see monitor the CAPTURE Trial using this feature.

3.8

Interim Monitoring

Interim monitoring is critical for the management of group sequential trials, and there
are many reasons why flexibility in both design and monitoring is necessary.
Administrative schedules may call for the recalculation of statistical information and
unplanned analyses at arbitrary time points, while the need for simultaneously
preserving both the type-1 error and power of the study must be maintained. East
provides the capability to flexibly monitor a group sequential trial using the Interim
Monitoring.
The IM dashboard provides a coherent visual display of many output values based on
interim information. In addition to important statistical information, included are
tables and graphs for stopping boundaries, conditional power, error spending and
confidence intervals for each interim look. All of this information is useful in tracking
the progress of a trial for decision making purposes, as well as allowing for
improvements to a study design adaptively.
Consider the monitoring of CAPT-GSD-Drp of the CAPTURE trial. Select this
design from the Library and click the
icon. The adaptive version of IM
dashboard can be invoked by clicking the
icon. But for this example, we will
use regular IM dashboard.
A node named Interim Monitoring gets associated with the design in the Library and a

3.8 Interim Monitoring

49

<<< Contents

* Index >>>

3 Getting Started
blank IM dashboard is opened up as shown below:

Suppose we have to take the first look at the data based on 485 completers.
The interim data on these subjects is to be entered in Test Statistic Calculator which
can be opened by clicking
OK with default parameters.

50

3.8 Interim Monitoring

button. Open this calculator and click

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016

If we have run any analysis procedure on the interim data then the test statistic
calculator can read in the information from Analysis node. Select the appropriate
workbook and the node and hit Recalc to see the interim inputs.
Alternatively, for binomial endpoint trials, we can enter the interim data in terms of the
number of responses on each arm and East computes the difference in proportion and
its standard error. Alternatively, we can directly enter the and its standard error which
can be the output of some external computation. The inputs on the test statistic
calculator depend upon the type of trial you are monitoring.
3.8 Interim Monitoring

51

<<< Contents

* Index >>>

3 Getting Started
The resulting screen is as shown below:

The output quantities for the first look are computed in that row and all the four charts
are updated based on the look1 data.
There some more advanced features like Conditional Power calculator, Predicted
Intervals Plot, Conditional Simulations available from the IM dashboard. These are
explained in later sections of this manual.
Let us take the second look at 970 subjects. Open the test statistic calculator and
leaving all other parameters default, change the number of responses on Treatment arm

52

3.8 Interim Monitoring

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
to 30. Click the OK button. The screen will look as shown below:

East tells us that the null hypothesis is rejected at second look, provides an option to
stop the trial and conclude efficacy of the drug over the control arm. It computes the
final inference in the end. At this stage, it also provides another option to continue
entering data for future looks. But the final inference is computed only once.
In the last part of this chapter, we shall see how to capture a snapshot of any ongoing
interim monitoring of a trial.
The IM dashboard can also be used as a tool at design time, where we can construct
and analyze multiple possible trial scenarios before actual data is collected. The
feature to save a snapshot of information at interim looks can be employed to allow a
user the benefit of quickly comparing multiple scenarios under a variety of
assumptions. This option increases the flexibility of both, design and interim
monitoring process. At each interim look, a snapshot of the updated information in the
dashboard can be saved for the current design in the workbook.
icon located at the top of IM Dashboard window to save the current
Click the
contents of the dashboard:
A new snapshot node is added under the Interim Monitoring node in the library. The
Interim Monitoring window is the input window which can’t be printed whereas it

3.8 Interim Monitoring

53

<<< Contents

* Index >>>

3 Getting Started
snapshot is in the HTML format which can be printed and shared.

To illustrate the benefit of the snapshot feature, it is often the case that actual trial data
is not available at design time. Determining a reasonable estimate of nuisance
parameters, such as the variance, rather than making strong assumptions of its certainty
may be desired. The ability to quickly compare potential results under a variety of
different estimates of the variance by easily looking at multiple interim snapshots of a
study can be a powerful tool.
Other examples could include sample size re-estimation where initial design
assumptions may be incorrect or using hypothetical interim data to compare relevant
treatment differences.
With this, we come to an end of the chapter on getting started with East6. The
subsequent chapters in this manual discuss in detail with the help of case studies all the
features available in the software. The theory part of all the design and analysis
procedures is explained in Appendix A of this manual.

54

3.8 Interim Monitoring

<<< Contents

* Index >>>

4

Data Editor

Data Editor allows you to manipulate the contents of your data. East caters to Case
Data and Crossover Data. Depending on the type of data, a corresponding set of menu
items becomes available in the Data Editor menu.

4.1

Case Data

4.1.1 Data Editor
Capabilities for
Case Data
4.1.2 Creating Variables
4.1.3 Variable Type Setting
4.1.4 Editing Data
4.1.5 Filter Cases

The Data editor window for case data is a spreadsheet-like facility for creating or
editing case data files. A case data file is organized as a sequence of records called
cases one below the other. Each record is subdivided into a fixed number of fields,
called variables. The name assigned to that field is referred to as the variable
name. Each such name identifies a specific variable across all the cases. Each cell
holds a value of a variable for a case. The top line of the Data editor holds the
variable names. Case data is the most common format to enter and store data. If
you plan to share data with any other package you need to use case data editor.

4.1.1

Data Editor Capabilities for Case Data

The Data Editor is used to create a new Case Data file or to edit one that was
previously saved. You can:
Create new variables
Change names and attributes of existing variables
Alter the column width
Alter the row height
Type in new case data records
Edit existing case data records
Insert new variables into the data set
Remove variables from the data set
Select or reject subsets of the data
Transform variables
List data in the log window
Calculate summary measures from the variables

4.1.2

Creating Variables

To create a new Case Data set, invoke the menu Home. Click on the icon
Select Case Data. When you create a new case data set, all the columns are labeled
var, indicating that new variables may be created in any of the columns. To create a
new variable simply start entering data in a blank column. The column is given a
default name Var1, Var2, etc. Alternatively, select any unused column, right click and
select Create Variable from the menu that appears. The data editor will create all
the variables with default names up to the column you are working on. To create a new
4.1 Case Data – 4.1.2 Creating Variables

55

<<< Contents

* Index >>>

4 Data Editor
variable in the first unused column and to select its attributes, choose menu Data
Editor. Click on the icon
You will be presented with the dialog box shown below, in which you can select the
variable name, variable type, alignment, format, value labels and missing values.

4.1.3

Variable Type Setting

You can change the default variable name and its type in this dialog box and click on
the OK button. East will automatically add this new variable to the case data file.
New variables are added immediately adjacent to the last existing variable in the case
data set.
The Variable Type Setting dialog box contains five tabs: Detail, Alignment,
Format, Value Label, and Missing Value(s).
Detail
The Detail tab allows you to change the default variable name, add a
description of the variable and select the type (Numeric, String,
Date, Binary, Categorical or Integer). Note that depending on
the type of the variable, different tabs and options become available in
56

4.1 Case Data – 4.1.3 Variable Type Setting

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
the Variable Type Settings. For example the tab Category
Details and option Base Level become available only if you select
Variable Type as Categorical.
Value Label
The Value label tab is displayed below. Here, you can add labels for
particular data values, change a selected label, remove a selected label, or
remove all value labels for the current variable.

Missing Value(s)
The Missing Value(s) tab is used for specifying which values are to
be treated as missing. You have three choices: Not Defined, which
means that no values will be treated as missing values; Discrete
value(s), which allows you to add particular values to the list of
missing values; or Range, which lets you specify an entire range of
numbers as missing values.

4.1.4

Editing Data

Besides changing the actual cell entries of a case data set you can:
Add new Cases and Variables
Insert or delete Cases and Variables
4.1 Case Data – 4.1.4 Editing Data

57

<<< Contents

* Index >>>

4 Data Editor
Sort Cases

4.1.5

Filter Cases

We illustrate the ability of East to filter cases with the help of the following example:
Step 1: Open the Data set
Open the data set leukemia.cyd by clicking on menu Home. Click on the icon
Select Data. The data is stored in the Samples folder of the installation directory of
East.
Step 2: Invoke the Filter Cases menu
Invoke the menu item Data Editor. Click on the icon
Filter cases.
East will present you with a dialog box that allows you to use subsets of data in the
Case Data editor. The dialog box will allow you to select All cases, those satisfying
an If condition, falling in a Range, or using a Filter Variable as shown
below.

Step 3: Filter Variable option
Select the Filter Variable option. Select Status from the variable list and
click on the black triangle, which will remove the variable Status from the variable
list and add it to the empty box on the other side.
Suppose we want to filter the cases for which the Status variable has value 1. Insert

58

4.1 Case Data – 4.1.5 Filter Cases

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
1 in the empty box next to Event code.

Step 4: Output
Click on OK . As shown in the following screenshot, East will grey out all the cases
that have Status variable value 1. Now any analysis carried out on the data set uses
only the filtered cases. In this way you, can carry out subgroup analyses if the

4.1 Case Data – 4.1.5 Filter Cases

59

<<< Contents

* Index >>>

4 Data Editor
subgroups are identified by the values of a variable in the data set.

60

4.1 Case Data – 4.1.5 Filter Cases

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016

4.2

Crossover Data

4.2.1 Data Editor
Capabilities for
Crossover Data
4.2.2 Creating a New
Crossover Data Set

The Data Editor allows you to enter data for a 2 × 2 crossover trial with one
record for each patient. You can use this crossover data editor to input individual
patients’ responses in 2 × 2 crossover trials. The response could be continuous (such
as systolic blood pressure) or binary (such as the development of a tumor after
injecting a carcinogenic agent). Only the continuous response type is currently
supported in East.

4.2.1

Data Editor Capabilities for Crossover Data

The Data Editor is used to create a new 2 × 2 Crossover Data file or to edit one that
was previously saved. You can:
Create and edit data with continuous response of individual patients.
Edit period labels.
Assign treatments to different groups and periods.
Convert to case data.
Convert case data into crossover data.
List data to the log

4.2.2

Creating a New Crossover Data Set

To create a new crossover data set, invoke the menu Home. Click on icon
from the drop down menu choose Crossover data.
You will be presented with a dialog box as shown below:

and

In the above dialog box, you see a 2 × 2 grid called Treatment Assignment
Table. This grid is provided to assign the treatments to different groups and periods.
4.2 Crossover Data – 4.2.2 Creating a New Crossover Data Set

61

<<< Contents

* Index >>>

4 Data Editor
In this version of the software, you can analyze data for 2 × 2 crossover trials. Hence
the number of groups and number of periods are always two. The rows specify the two
groups labeled as G1 and G2. The columns represent two periods of the crossover data
labeled ”P1” and ”P2”. If you’d like to change these labels, click inside the table cells.
Type the treatment names associated with the corresponding group and period. Having
entered the treatments, the crossover data editor settings dialog box will look as
follows:

Rules for editing these fields The row names G1 and G2 can be changed using a
string consisting of a maximum of 8 characters from the set A-Z, 0-9, ’.’, ’ ’
(underscore), starting with either a letter or a digit; blank spaces are not accepted as
part of a name. The column names P1 and P2 can be changed the same way. Also
note that the Group names as well as the Period names must be distinct. The letters are
not case sensitive. Once you have assigned all the treatments, click on the button OK .

62

4.2 Crossover Data – 4.2.2 Creating a New Crossover Data Set

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
This will open up the Patients’ crossover data editor.

This editor resembles the case data editor. Like the case data editor, this is a
spreadsheet into which you can enter data directly. There are four pre-defined fields in
this editor. The PatientId column must contain the Patients’ identification number.
The GroupId column will contain the group identification to which the patient
belongs. The entry in this column should be one of the labels that you have entered as
row names in the 2 × 2 grid earlier. The inputs in the next two columns are numeric
and contain the responses of the patient in two periods respectively. The title of the
next two columns is created by concatenating the word ”Resp” to the period
identifications that you have entered previously. For example, here in the setting dialog
we have entered P1 and P2 as period identifiers and these two response columns are
labeled as P1 Resp and P2 Resp. However, if the period values are starting with digits
such as 1 and 2, then the period ids are prefixed by the letter P, and the heading of the
next two columns would be P1 Resp and P2 Resp.
The variable names PatientId, GroupId, are fixed and cannot be edited in the
data editor. If you use Transform Variable on Group Id and the result is either
”G1” or ”G2,” then the value is displayed; otherwise, the value is shown as missing.
You can also add covariates such as age and sex. All variable settings of the case data
editor are applicable to these covariates. The Settings button allows you to edit the
GroupId, PeriodId or the treatment labels that you have edited earlier. If you
make any changes, these changes will automatically be made in the data editor.

4.3

Data Transformation

You can transform an existing variable with the data transformation facility available
in the Data Editor of East .
4.3 Data Transformation

63

<<< Contents

* Index >>>

4 Data Editor
To transform any variable:
1. Select the menu Data Editor. Click on the icon
You will be
presented with the expression builder dialog box screen. Here you can transform
the values of the current variable using a combination of statistical, arithmetic,
and logical operations.

The current variable name is the target variable on the left hand side of an
equation with the form:
VAR =
Where, VAR is the variable name of the current variable. In order to create a new
variable, type the variable name in the target variable field.
2. Complete the right hand side of the equation with any combination of allowable
functions. To select a function, double-click on it. If the function that you select
needs any extra parameters (typically variable names), this will be indicated by a
? for each required parameter. Replace the ? character with the desired
parameter.
3. Select the OK button to fill in values for the current variable computed
according to the expression that you have constructed.
The statistical, arithmetical, and logical functions that are available in the
Transform Variable dialog box are given below:
64

4.3 Data Transformation

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016

4.4

Mathematical and
Statistical Functions

The following is a list of mathematical and statistical functions available in East
used for variable transformation.
ABS(X) Returns the absolute value of X.
Argument Range: −1 × 1025 ≤ X ≤ 1 × 1025 .
ACOS(X) Returns the arccosine of X.
Argument Range: −1 ≤ X ≤ 1.
ASIN(X) Returns the arcsine of X.
Argument Range: −1 ≤ X ≤ 1.
ATAN(X) Returns the arctangent of X.
Argument Range: −1 × 1025 ≤ X ≤ 1 × 1025 .
AVG(X1 , X2 , . . .) Returns the mean of (X1 , X2 , . . .).
Argument Range: −1 × 1025 ≤ X ≤ 1 × 1025 .
CEIL(X) Returns the ceiling, or smallest integer greater than or equal to X.
Argument Range: −1 × 1025 ≤ X ≤ 1 × 1025 .
CHIDIST(X,df) Returns the probability in the tail area to the left of X from the
chi-squared distribution with df > 0 degrees of freedom.
Argument Range: 0 ≤ X ≤ 1 × 1025 .
CHIINV(X,df) Returns the Xth percentile value of the chi-squared distribution with
d > 0 degrees of freedom, i.e., returns z such that Pr(Z ≤ z) = X.
Argument Range: 0.0001 ≤ X ≤ 0.9999.
COS(X) Returns the cosine of X, where X is expressed in radians.
Argument Range: −2.14 × 109 ≤ X ≤ 2.14 × 109 .
COSH(X) Returns the hyperbolic cosine of X.
Argument Range: −87 ≤ X ≤ 87.
CUMULATIVE(X) Given a column of X values this function returns a new column
in which the entry in row j is the sum of entries in the first j rows of the original
column.
EXP(X) Returns the exponential function evaluated at X.
Argument Range: −87 ≤ X ≤ 87.
FLOOR(X) Returns the floor, or largest integer less than or equal to X.
Argument Range: −1 × 1025 ≤ X ≤ 1 × 1025 .
INT(X) Returns the integer part of X.
Argument Range: −1 × 1025 ≤ X ≤ 1 × 1025 .
ISNA(X) Returns a value of 1 if X is a missing value 0 otherwise.
Argument Range: −1 × 1025 ≤ X ≤ 1 × 1025 . This function is useful. For
example, set missing observations to average values.
X1 = IF(ISNA(X)=1, COLMEAN(X), X)
Another extremely useful task performed by the ISNA() function is to eliminate
records from the data set in which there are missing values.
4.4 Mathematical and Statistical Functions

65

<<< Contents

* Index >>>

4 Data Editor
REJECTIF(ISNA(X)=1) ←- Enter
SELECTIF(ISNA(V1)+ISNA(V2)+ISNA(V3)=0) ←- Enter
LOG(X) Returns the logarithm of X to base 10.
Argument Range: 1 × 10−25 ≤ X ≤ 1 × 1025 . .
LN(X) Returns the logarithm of X to base e.
Argument Range: 1 × 10−25 ≤ X ≤ 1 × 1025 .
MAX(X1 , X2 , . . .) Returns the maximum value of (X1 , X2 , . . .).
MIN(X1 , X2 , . . .) Returns the minimum value of (X1 , X2 , . . .).
MOD(X,Y) Returns the remainder of X divided by Y. The sign of this remainder is
the same as that of X.
Argument Range: −1 × 1025 ≤ X ≤ 1 × 1025 .
NORMDIST(X) Returns the probability in the tail area to the left of X from the
standardized normal distribution.
Argument Range: −10 ≤ X ≤ 10.
NORMINV(X) Returns the Xth percentile value of the standard normal distribution,
i.e., returns z such that Pr(Z ≤ z) = X.
Argument Range: 0.001 ≤ X ≤ 0.999.
ROUND(X,d) Returns a floating point number obtained by rounding X to d decimal
digits. If d=0, X is rounded to the nearest integer.
Argument Range: −1 × 1025 ≤ X ≤ 1 × 1025 .
SIN(X) Returns the sine of X, where X is expressed in radians.
Argument Range: −2.14 × 109 ≤ X ≤ 2.14 × 109 .
SINH(X) Returns the hyperbolic sine of X.
Argument Range: −87 ≤ X ≤ 87.
SQRT(X) Returns the square root of X.
Argument Range: 0 ≤ X ≤ 1 × 1025 .
TAN(X) Returns the tangent of X, where X is expressed in radians.
Argument Range: −1 × 1025 ≤ X ≤ 1 × 1025 ; X 6= (2n + 1) π2 , n an integer.
TANH(X) Returns the hyperbolic tangent of X.
Argument Range: −87 ≤ X ≤ 87.

4.4.1

The IF Function

This function tests arithmetic or logical condition and returns one value if true,
another value if false. The syntax is
IF(CONDITION, X, Y)
The function returns the value X if CONDITION is ”true” and Y if CONDITION is
”false”. For example consider the following equation:
HIVPOS = IF(CD4>1,1,-1)
66

4.4 Mathematical and Statistical Functions – 4.4.1 The IF Function

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
The above equation defines a variable HIVPOS that assumes the value 1, if the variable
CD4 exceeds 1 and assumes the value -1 otherwise. Usually CONDITION is made up
of two arithmetic expressions separated by a ”comparison operator”, e.g., CD4>CD8,
CD4+CD8=15*BLOOD, etc. The following comparison operators are allowed:
= , >, <, >=, <=, <>
More generally, CONDITION can be constructed by combining two or more individual
conditions with AND, OR, or NOT operators. For example consider the following
expression
HIVPOS = IF((CD4>1) !AND! (CD8>1), 1,-1)
The above expression means that HIVPOS will take on the value 1 if both CD4>1 and
CD8>1, and -1 otherwise. On the other hand consider the following expression:
HIVANY = IF((CD4>1) !OR! (CD8>1),1,-1)
The above expression means that HIVANY will take on the value 1 if either CD4>1 or
CD8>1 and -1 otherwise.

4.4.2

The SELECTIF Function

This function provides a powerful way of selecting only those records that satisfy a
specific arithmetic or logical condition. All other records are deleted from the current
data set. The syntax is:
SELECTIF(CONDITION)
This function selects only those records for which CONDITION is ”true” and excludes
all other records from the current dataset. For example consider the following equation:
HIVPOS = SELECTIF(CD4>1)
The above condition retails records for which CD4 exceeds 1. The same rules
governing CONDITION for the IF function are applicable here as well.
Note that the column location of the cursor when Transform Variable was
selected plays no role in the execution of this function.

4.4.3

The RECODE Function

This function recodes different ranges of a variable. It is extremely useful for creating
a new variable consisting of discrete categories at pre-specified cut-points of the
original variable. The syntax for RECODE has two forms — one for recoding a
4.4 Mathematical and Statistical Functions – 4.4.3 The RECODE Function

67

<<< Contents

* Index >>>

4 Data Editor
categorical variable and one for recoding a continuous variable. In both cases, the
variable being recoded must assume numerical values.
Recoding a Categorical Variable syntax is:
RECODE(X, S1 = c1 , S2 = c2 , . . . , Sn = cn , [else]) ,
where X is the categorical variable (or arithmetic expression) being recoded, Sj
represents a set of numbers in X, all being recoded to cj , and the optional
argument [else] is a default number to which all the numbers belonging to X,
but excluded from the sets S1 , S2 , . . . Sn , are recoded. If [else] is not
specified as an argument of RECODE, then all the numbers excluded from the
sets S1 , S2 , . . . , Sn are unchanged.
Notice that the argument Sj = cj in the RECODE function consists of a set of
numbers Sj being recoded to a single number cj . The usual mathematical
convention is adopted of specifying a set of numbers within braces. Thus if set
Sj consisted of m distinct numbers s1j , s2j , . . . , smj , it would be represented in
the RECODE argument list as {s1j , s2j , . . . , smj }. For example
Y = RECODE(X, {1,2,3}=1, {7,9}=2, {10}=3)
will recode the categorical variable X into another categorical variable Y that
assumes the value 1 for X ∈ {1, 2, 3}, 2 for X ∈ {7, 9}, and 3 for X = 10.
Other values of X, if any, remain unchanged. If you want those other values of
X to be recoded to, e.g.,-1, simply augment the argument list by including -1 at
the end of the recode statement:
Y = RECODE(X, {1,2,3}=1, {7,9}=2, {10}=3, -1) .
Recoding a Continuous Variable syntax is:
RECODE(X, I1 = c1 , I2 = c2 , . . . , In = cn , [else])
where X is the continuous variable (or arithmetic expression) being recoded, Ij
represents an interval of numbers all being recoded to cj , and the optional
argument [else] is a default number to which all the numbers belonging to X,
but excluded from the intervals I1 , I2 , . . . In , are recoded. If [else] is not
specified as an argument of RECODE, then all the numbers excluded from the
intervals I1 , I2 , . . . , In are unchanged. Notice that the arguments of RECODE
are intervals being recoded to individual numbers. The usual mathematical
convention for specifying an interval Ij as open, semi-open, and closed is
adopted.
Thus:
An interval Ij of the form (u, v) is open and includes all numbers between
u and v, but not the end points.
68

4.4 Mathematical and Statistical Functions – 4.4.3 The RECODE Function

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
An interval Ij of the form [u, v] is closed and includes all numbers between
u and v inclusive of the end points.
An interval of the form (u, v] is open on the left but closed on the right. It
excludes u, includes v, and includes all the numbers in between.
An interval of the form [u, v) is closed on the left but open on the right. It
includes u, excludes v, and includes all the numbers in between.
For example
Y = RECODE(X, (2.5,5.7]=1, (5.7,10.4]=2)
will recode the continuous variable X so that all numbers 2.5 < X ≤ 5.7 are
replaced by 1, all numbers 5.7 < X ≤ 10.4 are replaced by 2, and all other
values of X are unchanged. If you want all other values of X to also be recoded
to say -1, append the -1 as the last argument of the equation:
Y = RECODE(X, (2.5,5.7]=1, (5.7,10.4]=2, -1) .

4.4.4

Column Functions

Column functions operate on an entire column of numbers and return a scalar
quantity. The returned value is often used in arithmetic expressions. The following
column functions are available. All of them are prefixed by the letters COL. The
argument var of all these column functions must be a variable in the worksheet;
arithmetic expressions are not permitted. This may require you to create an
intermediate column of computed expressions before using a column function. Also
note that missing values are ignored in computing these column functions.
COLMEAN(X) Returns the sample mean of X.
COLVAR(X) Returns the sample variance of X.
COLSTD(X) Returns the sample standard deviation of X.
COLSUM(X) Returns the sum of all the numbers in X.
COLMAX(X) Returns the maximum value of X.
COLMIN(X) Returns the minimum value of X.
COLRANGE(X) Returns the value of COLMAX(X)-COLMIN(X).
COLCOUNT(X) Returns the number of elements in X.
You can use the values returned by these column functions in arithmetic expressions
and as arguments of other functions. To do this, it is not necessary to know the actual
value returned by the column function. However, if you want to know the value
returned by any column function, you must define a new variable in the worksheet and
fill its entire column with the value of the column function.

4.4.5

Random Numbers

4.4 Mathematical and Statistical Functions – 4.4.5 Random Numbers

69

<<< Contents

* Index >>>

4 Data Editor
You can fill an entire column of a worksheet with random numbers and constants.
Suppose the cursor is in a cell of a variable named RANDNUM.
The expression
RANDNUM = #RAND
will result in the variable RANDNUM being filled with a column of uniform random
numbers in the range (0, 1).
Three random number functions or generators are available to you with the editors:
#RAND Generates uniform random numbers in the range (0, 1).
#NORMRAND Generates random numbers from the standard Normal Distribution.
#CHIRAND(X) Generates random numbers from the chi-squared distribution with X
degrees of freedom.
You may of course use these three random number generators to generate random
numbers from other distributions. For example, the equation
Y = 3+2*#NORMRAND
will generate random numbers from the normal distribution with mean 3 and standard
deviation 2, in variable Y. Again, the equation
Z = #CHIRAND(5)
will generate random numbers from the chi-squared distribution with 5 degrees of
freedom.

4.4.6

Special functions

The following special functions are available for use in arithmetic expressions:
#PI This is the value of π.
#NA This is the missing value code. It can be used to detect if a value is missing, or to
force a value to be treated as missing.
#SQNO This is the value of the current sequence number (SQNO) in the current data
set.
#SQEND This is the largest value of the sequence number (SQNO) in the current data
set.

70

4.4 Mathematical and Statistical Functions – 4.4.6 Special functions

<<< Contents

* Index >>>

Volume 2

Continuous Endpoints

5 Introduction to Volume 2

73

6 Tutorial: Normal Endpoint

79

7 Normal Superiority One-Sample

91

8 Normal Noninferiority Paired-Sample

113

9 Normal Equivalence Paired-Sample
10 Normal Superiority Two-Sample

128
141

11 Nonparametric Superiority Two Sample
12 Normal Non-inferiority Two-Sample
13 Normal Equivalence Two-Sample
14 Normal: Many Means

179
185

211

232

15 Multiple Comparison Procedures for Continuous Data
16 Multiple Endpoints-Gatekeeping Procedures

265

240

<<< Contents

* Index >>>

17 Continuous Endpoint: Multi-arm Multi-stage (MaMs)
Designs
285
18 Two-Stage Multi-arm Designs using p-value combination
19 Normal Superiority Regression

72

332

309

<<< Contents

* Index >>>

5

Introduction to Volume 2

This volume describes the procedures for continuous endpoints (normal) applicable to
one-sample, two-samples, many-samples and regression situations. All the three type
of designs - superiority, non-inferiority and equivalence are discussed in detail.
Chapter 6 introduces you to East on the Architect platform, using an example clinical
trial to test difference of means.
Chapter 7, 8 and 9 detail the design and interim monitoring in one-sample situation
where it may be required to compare a new treatment to a well-established control,
using a single sample. These chapters respectively cover superiority, non-inferiority
and equivalence type of trials.
Chapter 10 details the design and interim monitoring in superiority two-sample
situation where the superiority of a new treatment over the control treatment is tested
comparing the group-dependent means of the outcome variables.
Chapter 11 details the design in the Wilcoxon-Mann-Whitney nonparametric test
which is a commonly used test for the comparison of two distributions when the
observations cannot be assumed to come from normal distributions. It is used when the
distributions differ only in a location parameter and is especially useful when the
distributions are not symmetric. For Wilcoxon-Mann-Whitney test, East supports
single look superiority designs only.
Chapter 12 provides an account of the design and interim monitoring in non-inferiority
two-sample situation where the goal is to establish that an experimental treatment is no
worse than the standard treatment, rather than attempting to establish that it is superior.
Non-inferiority trials are designed by specifying a non-inferiority margin. The amount
by which the mean response on the experimental arm is worse than the mean response
on the control arm must fall within this margin in order for the claim of non-inferiority
to be sustained.
Chapter 13 narrates the details of the design and interim monitoring in equivalence
two-sample situation where the goal is neither establishing superiority nor
non-inferiority, but equivalence. When the goal is to show that two treatments are
similar, it is necessary to develop procedures with the goal of establishing equivalence
in mind. In Section 13.1, the problem of establishing the equivalence with respect to
the difference of the means of two normal distributions using a parallel-group design is
presented. The corresponding problem of establishing the equivalence with respect to
73

<<< Contents

* Index >>>

5 Introduction to Volume 2
the log ratio of means is presented in Section 13.2. For the crossover design, the
problem of establishing the equivalence with respect to the difference of the means is
presented in Section 13.3 and with respect to the log ratio of means in Section 13.4.
Chapter 16 details the clinical trials that are often designed to assess benefits of a new
treatment compared to a control treatment with respect to multiple clinical endpoints
which are divided into hierarchically ordered families. It discusses two methods Section 16.2 discusses Serial Gatekeeping whereas section 16.3 discusses Parallel
Gatekeeping.
Chapter 14 details the various tests available for comparing more than two continuous
means in East. Sections 14.1, 14.2 and 14.3 discuss One Way ANOVA, One Way
Repeated Measures ANOVA and Two Way ANOVA respectively.
Chapter 15 details the Multiple Comparison Procedures (MCP) for continuous data. It
is often the case that multiple objectives are to be addressed in one single trial. These
objectives are formulated into a family of hypotheses. Multiple comparison (MC)
procedures provides a guard against inflation of type I error while testing these
multiple hypotheses. East supports several parametric and p-value based MC
procedures. This chapter explains how to design a study using a chosen MC procedure
that strongly maintains FWER.
Chapter 19 elaborates on the design and interim monitoring in superiority regression
situation where linear regression models are used to examine the relationship between
a response variable and one or more explanatory variables. This chapter discusses the
design and interim monitoring of three types of linear regression models. Section 19.1
examines the problem of testing a single slope in a simple linear regression model
involving one continuous covariate. Section 19.2 examines the problem of testing the
equality of two slopes in a linear regression model with only one observation per
subject. Finally Section 19.3 examines the problem of testing the equality of two
slopes in a linear regression repeated measures model, applied to a longitudinal setting.

74

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016

5.1

Settings

Click the

icon in the Home menu to adjust default values in East 6.

The options provided in the Display Precision tab are used to set the decimal places of
numerical quantities. The settings indicated here will be applicable to all tests in East 6
under the Design and Analysis menus.
All these numerical quantities are grouped in different categories depending upon their
usage. For example, all the average and expected sample sizes computed at simulation
or design stage are grouped together under the category ”Expected Sample Size”. So to
view any of these quantities with greater or lesser precision, select the corresponding
category and change the decimal places to any value between 0 to 9.
The General tab has the provision of adjusting the paths for storing workbooks, files,
and temporary files. These paths will remain throughout the current and future sessions
even after East is closed. This is the place where we need to specify the installation
directory of the R software in order to use the feature of R Integration in East 6.

5.1 Settings

75

<<< Contents

* Index >>>

5 Introduction to Volume 2
The Design Defaults is where the user can change the settings for trial design:

Under the Common tab, default values can be set for input design parameters.
You can set up the default choices for the design type, computation type, test type and
the default values for type-I error, power, sample size and allocation ratio. When a new
design is invoked, the input window will show these default choices.
Time Limit for Exact Computation
This time limit is applicable only to exact designs and charts. Exact methods are
computationally intensive and can easily consume several hours of computation
time if the likely sample sizes are very large. You can set the maximum time
available for any exact test in terms of minutes. If the time limit is reached, the
test is terminated and no exact results are provided. Minimum and default value
is 5 minutes.
Type I Error for MCP
If user has selected 2-sided test as default in global settings, then any MCP will
use half of the alpha from settings as default since MCP is always a 1-sided test.
Sample Size Rounding
Notice that by default, East displays the integer sample size (events) by rounding
up the actual number computed by the East algorithm. In this case, the
look-by-look sample size is rounded off to the nearest integer. One can also see
the original floating point sample size by selecting the option ”Do not round
sample size/events”.
76

5.1 Settings

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
Under the Group Sequential tab, defaults are set for boundary information. When a
new design is invoked, input fields will contain these specified defaults. We can also
set the option to view the Boundary Crossing Probabilities in the detailed output. It can
be either Incremental or Cumulative.

Simulation Defaults is where we can change the settings for simulation:

If the checkbox for ”Save summary statistics for every simulation” is checked, then
East simulations will by default save the per simulation summary data for all the
5.1 Settings

77

<<< Contents

* Index >>>

5 Introduction to Volume 2
simulations in the form of a case data.
If the checkbox for ”Suppress All Intermediate Output” is checked, the intermediate
simulation output window will be always suppressed and you will be directed to the
Output Preview area.
The Chart Settings allows defaults to be set for the following quantities on East6
charts:

We suggest that you do not alter the defaults until you are quite familiar with the
software.

78

5.1 Settings

<<< Contents

* Index >>>

6

Tutorial: Normal Endpoint

This tutorial introduces you to East on the Architect platform, using an example
clinical trial to test difference of means.

6.1

Fixed Sample Design

When you open East, by default, the Design tab in the ribbon will be active.
The items on this tab are grouped under the following categories of endpoints:
Continuous, Discrete, Count, Survival, and General. Click Continuous: Two
Samples, and then Parallel Design: Difference of Means.

The following input window will appear.

By default, the radio button for Sample Size (n) is selected, indicating that it is the
variable to be computed. The default values shown for Type I Error and Power are
0.025 and 0.9. Keep the same for this design. Since the default inputs provide all of
the necessary input information, you are ready to compute sample size by clicking the
Compute button. The calculated result will appear in the Output Preview pane, as
6.1 Fixed Sample Design

79

<<< Contents

* Index >>>

6 Tutorial: Normal Endpoint
shown below.

This single row of output contains relevant details of inputs and the computed result of
total sample size (and total completers) of 467. Select this row, and click
display a summary of the design details in the upper pane (known as Output
Summary).

to

The discussion so far gives you a quick feel of the software for computing sample size
for a single look design. We will describe further features in an example for a group
sequential design in the next section.

6.2

Group Sequential
Design for a Normal
Superiority Trial

6.2.1

Study Background

Drug X is a newly developed lipase inhibitor for obesity management that acts by
inhibiting the absorption of dietary fats. The performance of this drug needs to be
compared with an already marketed drug Y for the same condition. In a randomized,
80

6.2 Group Sequential Design – 6.2.1 Study Background

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
double-blind, trial comparing the efficacy and safety of 1 year of treatment with X to Y
(each at 120 mg for three times a day), obese adults are to be randomized to receive
either X or Y combined with dietary intervention for a period of one year. The
endpoint is weight loss (in pounds). You are to design a trial having 90% power to
detect a mean difference of 9 lbs between X and Y, assuming 15 lbs and 6 lbs weight
loss in each treatment arm, respectively, and a common standard deviation of 32 lbs.
The design is required to be a 2-sided test at the 5% significance level.
From the design menu choose Continuous: Two Samples, and then Parallel Design:
Difference of Means. Select 2-Sided for Test Type, and enter 0.05 for Type I
Error. Specify the Mean Control be 6, the Mean Treatment to be 15, and the
common Std. Deviation to be 32. Next, change the Number of Looks to be 5. You
will see a new tab, Boundary , added to the input dialog box.

Click the Boundary tab, and you will see the following screen. On this tab, you can
choose whether to specify stopping boundaries for efficacy, or futility, or both. For this
trial, choose efficacy boundaries only, and leave all other default values. We will
implement the Lan-Demets (O’Brien-Fleming) spending function, with equally spaced

6.2 Group Sequential Design – 6.2.1 Study Background

81

<<< Contents

* Index >>>

6 Tutorial: Normal Endpoint
looks.

On the Boundary tab near the Efficacy drop-down box, click on the icons

82

6.2 Group Sequential Design – 6.2.1 Study Background

or

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
, to generate the following charts.

Click Compute. East will show the results in the Output Preview.
The maximum combined sample size required under this design is 544. The expected
6.2 Group Sequential Design – 6.2.1 Study Background

83

<<< Contents

* Index >>>

6 Tutorial: Normal Endpoint
sample sizes under H0 and H1 are 540 and 403, respectively. Click
in the
Output Preview toolbar to save this design to Wbk1 in the Library. Double-click on
Des1 to generate the following output.

Once you have finished examining the output, close this window, and re-start East
before continuing.

6.2.2

Creating multiple designs easily

In East, it is easy to create multiple designs by inputting multiple parameter values. In
the trial described above, suppose we want to generate designs for all combinations of
the following parameter values: Power = 0.8, 0.9, and Difference in Means =
8.5, 9, 9.5, 10. The number of such combinations is 2 × 4 = 8.
East can create all 8 designs by a single specification in the input dialog box. Enter the
following values as shown below. Remember that the common Std. Deviation is 32.
From the Input Method, select the Difference of Means option. The values of
Power have been entered as a list of comma-separated values, while Difference in

84

6.2 Group Sequential Design – 6.2.2 Creating multiple designs easily

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
Means has been entered as a colon-separated range of values: 8.5 to 10 in steps of 0.5.

Now click compute. East computes all 8 designs, and displays them in the Output
Preview as shown below. Click

to maximize the Output Preview.

Select the first 3 rows using the Ctrl key, and click
to display a summary of the
design details in the upper pane, known as the Output Summary.

Select Des1 in the Output Preview, and click
toolbar to save this design in the
Library. We will use this design for simulation and interim monitoring, as described
below. Now that you have saved Des1, delete all designs from the Output Preview
before continuing, by selecting all designs with the Shift key, and clicking
the toolbar.

6.2.3

in

Simulation

Right-click Des1 in the Library, and select Simulate. Alternatively, you can select
6.2 Group Sequential Design – 6.2.3 Simulation

85

<<< Contents

* Index >>>

6 Tutorial: Normal Endpoint
Des1 and click the

icon.

We will carry out a simulation of Des1 to check whether it preserves the specified
power. Click Simulate. East will execute by default 10000 simulations with the
specified inputs. Close the intermediate window after examining the results. A row
labeled as Sim1 will be added in the Output Preview.
Click the
icon to save this simulation to the Library. A simulation sub-node
will be added under Des1 node. Double clicking on the Sim1 node, will display the

86

6.2 Group Sequential Design – 6.2.3 Simulation

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
detailed simulation output in the work area.

In 80.23% of the simulated trials, the null hypothesis was rejected. This value is very
close to the specified power of 80%. Note that your results may differ from the results
displayed over here as the simulations would be run with different seed. The next
section will explore interim monitoring with this design.

6.2 Group Sequential Design – 6.2.3 Simulation

87

<<< Contents

* Index >>>

6 Tutorial: Normal Endpoint
6.2.4

Interim Monitoring

Right-click Des1 in the Library and select Interim Monitoring. Click the
to open the Test Statistic Calculator. Suppose that after 91
subjects, at the first look, you have observed a mean difference of 8.5, with a standard
error of 6.709.

Click OK to update the IM Dashboard.

The Stopping Boundaries and Error Spending Function charts on the left:

88

6.2 Group Sequential Design – 6.2.4 Interim Monitoring

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016

The Conditional Power and Confidence Intervals charts on the right:

Suppose that after 182 subjects, at the second look, you have observed a mean
difference of 16, with a standard error of 4.744. Click Recalc, and then OK to update
the IM Dashboard. In this case, a boundary has been crossed, and the following
6.2 Group Sequential Design – 6.2.4 Interim Monitoring

89

<<< Contents

* Index >>>

6 Tutorial: Normal Endpoint
window appears.

Click Stop to complete the trial. The IM Dashboard will be updated accordingly, and a
table for Final Inference will be displayed as shown below.

90

6.2 Group Sequential Design

<<< Contents

* Index >>>

7

Normal Superiority One-Sample

To compare a new process or treatment to a well-established control, a single-sample
study may suffice for preliminary information prior to a full-scale investigation. This
single sample may either consist of a random sample of observations from a single
treatment when the mean is to be compared to a specified constant or a random sample
of paired differences or ratio between two treatments. The former is presented in
Section (7.1) and the latter is discussed in Section (7.2) and Section (7.3).

7.1

Single Mean

7.1.1
7.1.2
7.1.3
7.1.4

Trial Design
Simulation
Interim Monitoring
Trial Design Using a
t-Test (Single Look)

The problem of comparing the mean of the distribution of observations from a single
random sample to a specified constant is considered. For example, when developing a
new drug for treatment of a disease, there should be evidence of efficacy. For this
single-sample problem, it is desired to compare the unknown mean µ to a fixed value
µ0 . The null hypothesis H0 : µ = µ0 is tested against the two-sided alternative
hypothesis H1 : µ 6= µ0 or a one-sided alternative hypothesis H1 : µ < µ0 or
H1 : µ > µ0 . The power of the test is computed at a specified value of µ = µ1 and
standard deviation σ.
Let µ̂j denote the estimate of µ based on nj observations, up to and including the j-th
look, j = 1, ..., K, with a maximum of K looks. The test statistic at the j-th look is
based on the value specified by the null hypothesis, namely
1/2

Zj = nj (µ̂j − µ0 )/σ̂j ,

(7.1)

where σ̂j2 is the sample variance based on nj observations.

7.1.1

Trial Design

Consider the situation where treatment for a certain infectious disorder is expected to
result in a decrease in the length of hospital stay. Suppose that hospital records were
reviewed and it was determined that, based on this historical data, the average hospital
stay is approximately 7 days. It is hoped that the new treatment can decrease this to
less than 6 days. It is assumed that the standard deviation is σ = 2.5 days.The null
hypothesis H0 : µ = 7(= µ0 ) is tested against the alternative hypothesis H1 : µ < 7.
First, click Continuous: One Sample on the Design tab and then click Single Arm
Design: Single Mean.
This will launch a new input window.
Single-Look Design
7.1 Single Mean – 7.1.1 Trial Design

91

<<< Contents

* Index >>>

7 Normal Superiority One-Sample
We want to determine the sample size required to have power of 90% when
µ = 6(= µ1 ), using a test with a one-sided type-1 error rate of 0.05. Choose Test Type
as 1-Sided. Specify Mean Response under Null (µ0 ) as 7, Mean Response under
Alt. (µ1 ) as 6 and Std. Deviation (σ) as 2.5. The upper pane should appear as below:

Click Compute. This will calculate the sample size for this design and the output is
shown as a row in the Output Preview. The computed sample size is 54 subjects.

This design has default name Des 1. Select this design by clicking anywhere along the
row and click

92

in the Output Preview toolbar. Some of the design details will

7.1 Single Mean – 7.1.1 Trial Design

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
be displayed in the upper pane, labeled as Output Summary.

In the Output Preview toolbar select Des 1, click
in the Library.

to save this design to Wbk1

Five-Look Design
To allow the opportunity to stop early and proceed with a full-scale plan, five
equally-spaced analyses are planned, using the Lan-DeMets (O’Brien-Fleming)
stopping boundary. Create a new design by right-clicking Des 1 in the Library, and
selecting Edit Design. In the Input, change the Number of Looks from 1 to 5, to
generate a study with four interim looks and a final analysis. A new tab for Boundary
Info should appear. Click this tab to reveal the stopping boundary parameters. By
default, the Spacing of Looks is set to Equal, which means that the interim analyses
will be equally spaced in terms of the number of patients accrued between looks. The
left side contains details for the Efficacy boundary, and the right side contains details
for the Futility boundary. By default, there is an efficacy boundary (to reject H0 )
selected, but no futility boundary (to reject H1 ). The Boundary Family specified is of
the Spending Functions type. The default Spending Function is the
Lan-DeMets (Lan & DeMets, 1983), with Parameter as OF (O’Brien-Fleming),
which generates boundaries that are very similar, though not identical, to the classical
stopping boundaries of O’Brien and Fleming (1979). For a detailed description of the
different spending functions and stopping boundaries available in East refer to
Chapter 62. The cumulative alpha spent and the boundary values are displayed below.

7.1 Single Mean – 7.1.1 Trial Design

93

<<< Contents

* Index >>>

7 Normal Superiority One-Sample

Click Compute. The maximum and expected sample sizes are highlighted in yellow in
the Output Preview. Save this design in the current workbook by selecting the
corresponding row in the Output Preview and clicking
on the Output
Preview toolbar. To compare Des 1 and Des 2, select both rows in Output Preview
using the Ctrl key and click
in the Output Preview toolbar. This will display
both designs in the Output Summary pane.

94

7.1 Single Mean – 7.1.1 Trial Design

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
Des 2 results in a maximum of 56 subjects in order to attain 90% power, with an
expected sample size of 40 under the alternative hypothesis. In order to see the
stopping probabilities, double-click Des 2 in the Library.

The clear advantage of this sequential design resides in the relatively high cumulative
probability of stopping by the third look if the alternative is true, with a sample size of
34 patients, which is well below the requirements for a fixed sample study (54
patients). Close the Output window before continuing.
Examining stopping boundaries and spending functions
You can plot the boundary values of Des 2 by clicking

7.1 Single Mean – 7.1.1 Trial Design

on the Library toolbar,

95

<<< Contents

* Index >>>

7 Normal Superiority One-Sample
and then clicking Stopping Boundaries. The following chart will appear:

You can choose different boundary scales from the drop down box located in the right
hand side. The available boundary scales are Z scale, Score Scale, µ/σ Scale and
p-value scale. To plot the error spending function for Des 2, select Des 2 in the
in the toolbar, and then click Error Spending. The following
Library, click

96

7.1 Single Mean – 7.1.1 Trial Design

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
chart will appear:

The above spending function is according to Lan and DeMets (1983) with
O’Brien-Fleming flavor and for one-sided tests has the following functional form:


Zα/2
α(t) = 2 − 2Φ √
t

Observe that very little of the total type-1 error is spent early on, but more is spent
rapidly as the information fraction increases, and reaches 0.05 at an information
fraction of 1. Feel free to try other plots by clicking
in the Library toolbar.
Close all charts before continuing.

7.1.2

Simulation

Suppose we want to see the advantages of performing the interim analyses, as it relates
to the chance of stopping prior to the final analysis. This examination can be conducted
using simulation. Select Des 2 in the Library, and click
in the toolbar.
Alternatively, right-click on Des 2 and select Simulate. A new Simulation window will
appear. For example, suppose you wish to determine how quickly this trial could be
7.1 Single Mean – 7.1.2 Simulation

97

<<< Contents

* Index >>>

7 Normal Superiority One-Sample
terminated if the treatment difference was much greater than expected. For example,
under the alternative hypothesis, µ = 4.5. Click on the Response Generation Info
tab, and specify: Mean Response(µ) = 4.5 and Std. Deviation (σ) = 2.5.

Click Simulate to start the simulation. Once the simulation run has completed, East
will add an additional row to the Output Preview labeled as Sim 1.
Select Sim 1 in the Output Preview and click
. Now double-click on Sim 1 in
the Library. The simulation output details will be displayed in the upper pane.

Observe that 100% simulated trials rejected the null hypothesis, and about 26% of
these simulations were able to reject the null at the first look after enrolling only 11
subjects. Your numbers might differ slightly due to a different starting seed.

98

7.1 Single Mean – 7.1.2 Simulation

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016

7.1.3

Interim Monitoring

Suppose that the trial has commenced and Des 2 was implemented. Right-click Des 2
in the Library, and select Interim Monitoring.
Although we specified that there will be five equally spaced interim looks, the
Lan-DeMets methodology implemented in East allows you to alter the number and
spacing of these looks. Accordingly, suppose that an interim look was taken after
enrolling 20 subjects and the sample mean, based on these 20 subjects, was 5.1 with a
standard error of 0.592. Since µ0 = 7, based on equation (7.1) the value of the test
statistic at the first look would be Z1 = (5.1 − 7)/0.592 or -3.209.
Click Enter Interim Data on the toolbar. In the Test Statistic Calculator, enter the
following values, and click Recalc and thenOK.

Since the stopping boundary is crossed, the following dialog box appears.

7.1 Single Mean – 7.1.3 Interim Monitoring

99

<<< Contents

* Index >>>

7 Normal Superiority One-Sample

Click Stop to take you back to the interim monitoring dashboard. For final inference,
East will display the following summary information on the dashboard.

7.1.4

Trial Design Using a t-Test (Single Look)

The sample size obtained to correctly power Des 1 in Section (7.1.1) relied on using a
Wald-type statistic for the hypothesis test, given by equation (7.1). Due to the
assumption of normal distribution for the test statistic, we have ignored the fact that the
variance σ is estimated from the sample. For large sample sizes this approximation is
acceptable. However, in small samples with unknown standard deviation the test
statistic
Z = n1/2 (µ̂ − µ0 )/σ̂,
(7.2)
is distributed with student’s t distribution with (n − 1) degrees of freedom. Here, σ̂ 2
denotes the sample variance based on n observations.
Consider the example in Section 7.1.1 where we would like to test the null hypothesis
that the average hospital stay is 7 days, H0 : µ = 7(= µ0 ), against the alternative
hypothesis that is less than 7 days, H1 : µ < 7. We will now design the same trial in a
different manner, using the t distribution for the test statistic.
Right-click Des 1 in the Library, and select Edit Design. In the input window, change
100

7.1 Single Mean – 7.1.4 Trial Design Using a t-Test (Single Look)

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
the Test Stat. from Z to t. The entries for the other fields need not be changed.
Click Compute. East will add an additional row to the Output Preview labeled as
Des 3. The required sample size is 55. Select the rows corresponding to Des 1 and
Des 3 and click

. This will display Des 1 and Des 3 in the Output Summary.

Des 3, which uses the t distribution, requires that we commit a combined total of 55
patients to the study, just one more compared to Des 1, which uses the normal
distribution. The extra patient is needed to compensate for the extra variability due to
estimation of the var[δ̂].

7.2

Mean of Paired
Differences

7.2.1
7.2.2
7.2.3
7.2.4

Trial Design
Simulation
Interim Monitoring
Trial Design Using a
t-Test (Single Look)

The paired t-test is used to compare the means of two normal distributions when each
observation in the random sample from one distribution is matched with a unique
observation from the other distribution. Let µc and µt denote the two means to be
compared and let σ 2 denote the variance of the differences.
The null hypothesis H0 : µc = µt is tested against the two-sided alternative hypothesis
H1 : µc 6= µt or a one-sided alternative hypothesis H1 : µc < µt or H1 : µc > µt . Let
δ = µt − µc . The null hypothesis can be expressed as H0 : δ = 0 and the alternative
can be expressed as H1 : δ 6= 0, H1 : δ > 0, or H1 : δ < 0. The power of the test is
computed at specified values of µc , µt , and σ.
Let µ̂cj and µ̂tj denote the estimates of µc and µt based on nj observations, up to and
including j-th look, j = 1, . . . , K where a maximum of K looks are to be made. The
estimate of the difference at the j-th look is
δ̂j = µ̂tj − µ̂cj
7.2 Mean of Paired Differences

101

<<< Contents

* Index >>>

7 Normal Superiority One-Sample
and the test statistic at the j-th look is
1/2

Zj = nj δ̂j /σˆj ,

(7.3)

where σ̂j2 is the sample variance of nj paired differences.

7.2.1

Trial Design

Consider the situation where subjects are treated once with placebo after pain is
experimentally induced, and later treated with a new analgesic after pain is induced a
second time. Pain is reported by the subjects using a 10 cm visual analog scale (0=“no
pain”, . . . , 10=“extreme pain”). After treatment with placebo, the average is expected
to be 6 cm. After treatment with the analgesic, the average is expected to be 4 cm. It is
assumed that the common standard deviation is σ = 5 cm. The null hypothesis
H0 : δ = 0 is tested against the alternative hypothesis H1 : δ < 0.
Start East afresh. First, Continuous: One Sample on the Design tab, and then click
Paired Design: Mean of Paired Differences
This will launch a new input window.
Single-Look Design
We want to determine the sample size required to have power of 90% when µc = 6 and
µt = 4, using a test with a one-sided type-1 error rate of 0.05. Select Test Type as
1-Sided, Individual Means for Input Method, and specify the Mean Control
(µc ) as 6 and Mean Treatment (µt ) as 4. Enter Std. Dev. of Paired Difference (σ0 )
as 5. The upper pane should appear as below:

102

7.2 Mean of Paired Differences – 7.2.1 Trial Design

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
Click Compute. This will calculate the sample size for this design and the output is
shown as a row in the Output Preview. The computed sample size is 54 subjects.

This design has default name Des 1. Select this design by clicking anywhere along the
row in the Output Preview and click
. Some of the design details will be
displayed in the upper pane, labeled as Output Summary.

In the Output Preview toolbar select Des 1, click
in the Library.

to save this design to Wbk1

Three-Look Design
For the above study, suppose we wish to take up to two equally spaced interim looks
and one final look as we accrue data, using the Lan-DeMets (O’Brien-Fleming)
stopping boundary. Create a new design by right-clicking Des 1 in the Library, and
Edit Design. In the Input, change the Number of Looks from 1 to 3, to generate a
study with two interim looks and a final analysis.
Click Compute. The maximum and expected sample sizes are highlighted in yellow in
the Output Preview. Save this design in the current workbook by selecting the
on the Output Preview
corresponding row in Output Preview and clicking
toolbar. To compare Des 1 and Des 2, select both rows in Output Preview using the
7.2 Mean of Paired Differences – 7.2.1 Trial Design

103

<<< Contents

* Index >>>

7 Normal Superiority One-Sample
Ctrl key and click
pane.

. Both designs will be displayed in the Output Summary

Des 2 results in a maximum of 55 subjects in order to attain 90% power, with an
expected sample size of 43 under the alternative hypothesis. In the Output Preview
toolbar select Des 2, click
to save this design to Wbk1 in the Library. In order
to see the stopping probabilities, double-click Des 2 in the Library.

The clear advantage of this sequential design resides in the high cumulative probability
of stopping by the third look if the alternative is true, with a sample size of 37 patients,
which is well below the requirements for a fixed sample study (54 patients). Close the
Output window before continuing.
Select Des 2 and click

104

on the Library toolbar. You can select one of many

7.2 Mean of Paired Differences – 7.2.1 Trial Design

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
plots, including one for Stopping Boundaries:

Close this chart before continuing.

7.2.2

Simulation

in the toolbar. Click on the Response
Select Des 2 in the Library, and click
Generation Info tab, and make sure Mean Treatment(µt ) = 4, Mean Control(µc ) = 6
and Std. Deviation (σ) = 5. Click Simulate. Once the simulation run has completed,
East will add an additional row to the Output Preview labeled as Sim 1.
Select Sim 1 in the Output Preview and click

7.2 Mean of Paired Differences – 7.2.2 Simulation

. Now double-click on Sim 1 in

105

<<< Contents

* Index >>>

7 Normal Superiority One-Sample
the Library. The simulation output details will be displayed.

Overall, close to 90% of simulations have rejected H0 . The numbers on your screen
might differ slightly due to a different seed.

7.2.3

Interim Monitoring

For an ongoing study we evaluate the test statistic at an interim stage to see whether we
have enough evidence to reject H0 . Right-click Des 2 in the Library, and select
Interim Monitoring.
Although the design specified that there be three equally spaced interim looks, the
Lan-DeMets methodology implemented in East allows you to alter the number and
spacing of these looks. Suppose that an interim look was taken after enrolling 18
subjects and the sample mean, based on these subjects, was -2.2 with a standard error
of 1.4. Then based on equation (7.3), the value of the test statistic at first look would be
Z1 = (−2.2)/1.4 or -1.571.
Click Enter Interim Data on the toolbar. In the Test Statistic Calculator, enter the

106

7.2 Mean of Paired Differences – 7.2.3 Interim Monitoring

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
following values, and click Recalc and thenOK.

The dashboard will be updated accordingly.

As the observed value -1.571 has not crossed the critical boundary value of -3.233, the
trial continues. Now, 18 additional subjects are enrolled, and a second interim analysis
with 36 subjects is conducted. Suppose that the observed difference is -2.3 with
7.2 Mean of Paired Differences – 7.2.3 Interim Monitoring

107

<<< Contents

* Index >>>

7 Normal Superiority One-Sample
standard error as 0.8. Select the Look 2 row and click Enter Interim Data. Enter
these values, and click Recalc, and thenOK.

Since the stopping boundary is crossed, the following dialog box appears. Click on
Stop.

For final inference, East will display the following summary information on the
dashboard.

108

7.2 Mean of Paired Differences – 7.2.4 Trial Design Using a t-Test (Single Look)

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016

7.2.4

Trial Design Using a t-Test (Single Look)

The sample size obtained to correctly power the trial in Section (7.2.1) relied on using
a Wald-type statistic for the hypothesis test, given by equation (7.3). However, we
neglected the fact that the variance σ is estimated by assuming that the test statistic
follows a standard normal distribution. For large sample sizes, asymptotic theory
supports this approximation. In a single-look design, this test statistic is calculated as
Z = n1/2 δ̂/σ̂,

(7.4)

where σ̂ 2 is the sample variance based on n observed paired differences. In the
following calculations we take into consideration that Z follows a Student’s
t-distribution with (n − 1) degrees of freedom.
Consider the example in Section 7.2.1 where we would like to test the null hypothesis
that the analgesic does not reduce pain, H0 : δ = 0, against the alternative hypothesis
that the new analgesic works to reduce pain, H1 : δ < 0. We will design this same trial
using the t distribution for the test statistic.
Right-click Des 1 from the Library, and select Edit Design. Change the Test Stat.
from Z to t. The entries for the other fields need not be changed, and click Compute.
East will add an additional row to the Output Preview labeled as Des 3. Select the
rows corresponding to Des 1 and Des 3. This will display Des 1 and Des 3 in the

7.2 Mean of Paired Differences – 7.2.4 Trial Design Using a t-Test (Single Look)

109

<<< Contents

* Index >>>

7 Normal Superiority One-Sample
Output Summary.

Using the t distribution, we need one extra subject to compensate for the extra
variability due to estimation of the var[δ̂].

7.3

Ratio of Paired
Means

The test for ratio of paired difference is used to compare the means of two log normal
distributions when each observation in the random sample from one distribution is
matched with a unique observation from the other distribution. Let µc and µt denote
the two means to be compared and let σc2 adn σt2 are the respective variances.
The null hypothesis H0 : µc /µt = 1 is tested against the two-sided alternative
hypothesis H1 : µc /µt 6= 1 or a one-sided alternative hypothesis H1 : µc /µt < 1 or
H1 : µc /µt > 1. Let ρ = µt /µc . Then the null hypothesis can be expressed as
H0 : ρ = 1 and the alternative can be expressed as H1 : ρ 6= 1, H1 : ρ > 1, or
H1 : ρ < 1. The power of the test is computed at specified values of µc , µt , and σ. We
assume that σt /µt = σc /µc i.e., the coefficient of variation (CV) is the same under
both control and treatment.

7.3.1

Trial Design

Start East afresh. Click Continuous: One Sample on the Design tab, and then click

110

7.3 Ratio of Paired Means – 7.3.1 Trial Design

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
Paired Design: Mean of Paired Ratios as shown below.

This will launch a new window. The upper pane of this window displays several fields
with default values. Select Test Type as 1-Sided, and Individual Means for
Input Method. Specify the Mean Control (µc ) as 4 and Mean Treatment (µt ) as 3.5.
Enter Std. Dev. of Log ratio as 0.5. The upper pane should appear as below:

Click Compute. This will calculate the sample size for this design and the output is
shown as a row in the Output Preview. The computed sample size is 121 subjects (or
pairs of observations).
This design has default name Des 1. In the Output Preview toolbar select Des 1, click
7.3 Ratio of Paired Means – 7.3.1 Trial Design

111

<<< Contents

* Index >>>

7 Normal Superiority One-Sample
to save this design to Wbk1 in the Library.

7.3.2

Trial Design Using a t-test

Right-click Des 1 in the Library and select Edit Design. In the input window, change
the Test Stat. from Z to t.
Click Compute. East will add an additional row to the Output Preview labeled as
Des 2. Select the rows corresponding to Des 1 and Des 2 using the Ctrl key and click
. This will display Des 1 and Des 2 in the Output Summary.

Des 2 uses the t distribution and requires that we commit a combined total of 122
patients to the study, one more compared to Des 1, which uses a normal distribution.

112

7.3 Ratio of Paired Means

<<< Contents

* Index >>>

8

Normal Noninferiority Paired-Sample

Two common applications of the paired sample design include: (1) comparison of two
treatments where patients are matched on demographic and baseline characteristics,
and (2) two observations made from the same patient under different experimental
conditions. The type of endpoint for paired noninferiority design could be difference
of means or ratio of means. The former is presented in Section 8.1 and the latter is
discussed in Section 8.2. For paired sample noninferiority trials, East can be used only
when no interim look is planned.

8.1

Mean of Paired
Differences

8.1.1 Trial Design
8.1.2 Trial Design Using a
t-Test (Single Look)
8.1.3 Simulation

Consider a randomized clinical trial comparing an experimental treatment, T, to a
control treatment, C, on the basis of outcome variable, X, with means µt and µc ,
2
. Here, the null
respectively, and with a standard deviation of paired difference as σD
hypothesis H0 : µt − µc ≤ δ0 is tested against the one-sided alternative hypothesis
H1 : µt − µc > δ0 . Here δ0 denotes the noninferiority margin and δ0 < 0. Let
δ = µt − µc . Then the null hypothesis can be expressed as H0 : δ ≤ δ0 and the
alternative can be expressed as H1 : δ > δ0 .
Here we assume that the each paired observation on X from T and C are distributed
according to a bivariate normal distribution with means as (µt , µc ) , variances as (σt2 ,
σc2 ) and correlation coefficient as ρ. Let us have N such paired observations from T
and C and µ̂c and µ̂t denote the estimates of µc and µt based on these N pairs.
Therefore, the estimate of the difference is δ̂ = µ̂t − µ̂c . Denoting the standard error of
δ̂ by se(δ̂), the test statistic can be defined as
Z=

δ̂ − δ0
se(δ̂)

(8.1)

The test statistic Z is distributed as a t distribution with (N − 1) degrees of freedom.
For large samples, the t-distribution can be approximated by the standard normal
distribution. The power of the test is computed at specified values of µc , µt , and σD .
East allows you to analyze using both normal and t distribution.
The advantage of the paired sample noninferiority design compared to the two
independent sample noninferiority design lies in the smaller se(δ̂) in former case. The
paired sample design is more powerful than the two independent sample design: to
achieve the same level of power, the paired sample design requires fewer subjects.

8.1.1

Trial Design

Iezzi et. al. (2011) investigated the possibility of reducing radiation dose exposure
8.1 Mean of Paired Differences – 8.1.1 Trial Design

113

<<< Contents

* Index >>>

8 Normal Noninferiority Paired-Sample
while maintaining the image quality in a prospective, single center, intra-individual
study. In this study, patients underwent two consecutive multidetector computed
tomography angiography (MDCTA) scans 6 months apart, one with a standard
acquisition protocol (C) and another using a low dose protocol (T). Image quality was
rated as an ordinal number using a rating scale ranging from 1 to 5. Let µc and µt
denote the average rating of image quality for standard acquisition and low dose
protocol, respectively, and δ = µt − µc be the difference between two means. Based
on the 30 samples included in the study, µc and µt were estimated as 3.67 and 3.12,
respectively. The noninferiority margin for image quality considered was −1.
Accordingly, we will design the study to test
H0 : δ ≤ −1

against

H1 : δ > −1

The standard deviation of paired difference was estimated as 0.683. We want to design
a study with 90% power at µc = 3.67 and µt = 3.12 and that maintains overall
one-sided type I error of 0.025.
First, click Continuous: One Sample on the Design tab and then click Paired
Design: Mean of Paired Differences as shown below.

This will launch a new window. Select Noninferiority for Design Type, and
Individual Means for Input Method. Specify the Mean Control (µc ) as 3.67,
Mean Treatment (µt ) as 3.12, and the Std. Dev. of Paired Difference (σD ) as 0.683.
Finally, enter −1 for the Noninferiority Margin (δ0 ). Leave all other entries with their

114

8.1 Mean of Paired Differences – 8.1.1 Trial Design

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
default values. The upper pane should appear as below:

Click Compute. This will calculate the sample size for this design and the output is
shown as a row in the Output Preview located in the lower pane of this window. The
computed sample size (25 subjects) is highlighted.

This design has default name Des 1. You can select this design by clicking anywhere
along the row in the Output Preview. Select this design and click
in the
Output Preview toolbar. Some of the design details will be displayed in the upper

8.1 Mean of Paired Differences – 8.1.1 Trial Design

115

<<< Contents

* Index >>>

8 Normal Noninferiority Paired-Sample
pane, labeled as Output Summary.

A total of 25 subjects must be enrolled in order to achieve the desired 90% power
under the alternative hypothesis. In the Output Preview select Des 1 and click
in the toolbar to save this design to Wbk1 in the Library.
The noninferiority margin of −1 considered above is the minimal margin. Since the
observed difference is only little less than -0.5 we would like to calculate sample size
for a range of noninferiority margins, say, −0.6, −0.7, −0.8, −0.9 and −1. This can be
done easily in East. First select Des 1 in the Library, and click
on the Library
toolbar. In the Input, change the Noninferiority Margin (δ0 ) −0.6 : −1 : −0.1.

Click Compute to generate sample sizes for different noninferiority margins. This will
add 5 new rows to the Output Preview. There will be a single row for each of the

116

8.1 Mean of Paired Differences – 8.1.1 Trial Design

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
noninferiority margins.

The computed sample sizes are 1961, 218, 79, 41 and 25 with noninferiority margins
−0.60, −0.7, −0.8, −0.9 and −1, respectively. To compare all 5 designs, select last 5
rows in Output Preview, and click
Output Summary pane.

. The 5 designs will be displayed in the

Suppose we have decided to go with Des 3 to test the noninferiority hypothesis with
noninferiority margin of −0.7. This requires a total sample size of 218 to achieve 90%
in the toolbar to save this
power. Select Des 3 in the Output Preview and click
design to Wbk1 in the Library. Before we proceed we would like to delete all designs
from the Output Preview. Select all rows and then either click
in the toolbar,
or click Delete after right click. To delete the designs from the workbook in Library
select the corresponding designs individually (one at a time) and then click Delete
after right click. You can try deleting Des 1 from the Library.
Plotting
With Des 3 selected in the Library, click

on the Library toolbar, and then

8.1 Mean of Paired Differences – 8.1.1 Trial Design

117

<<< Contents

* Index >>>

8 Normal Noninferiority Paired-Sample
click Power vs Sample Size. The resulting power curve for this design will appear.

You can move the vertical bar along the X axis. To find out power at any sample size,
move the vertical bar to that sample size and the numerical value of sample size and
power will be displayed on the right of the plot.You can export this chart in one of
several image formats (e.g., Bitmap or JPEG) by clicking Save As.... Close this chart
before continuing. In a similar fashion one can see power vs delta plot by clicking
and then Power vs Treatment Effect.
You can obtain the tables associated with these plot by clicking
clicking the appropriate table. Close the plots before continuing.

8.1.2

, and then

Trial Design Using a t-Test (Single Look)

The sample size obtained to correctly power Des 3 relied on using a Wald-type statistic
for the hypothesis test. Due to the assumption of a normal distribution for the test
statistic, we have ignored the fact that the variance σ is estimated from the sample. For
large sample sizes, this approximation is acceptable. However, in small samples with
unknown standard deviation, the test statistic
Z = (δ̂ − δ0 )/se(σ̂)
is distributed as Student’s t distribution with (n − 1) degrees of freedom where n is the
118

8.1 Mean of Paired Differences – 8.1.2 Trial Design Using a t-Test (Single Look)

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
number of paired observations.
Select Des 3 from the Library, and click
. This will take you to the input
window. Now change the Test Statistic from Z to t. The entries for the other fields
need not be changed.
Click Compute. East will add an additional row to the Output Preview. The required
sample size is 220. This design uses the t distribution and it requires us to commit a
combined total of 220 patients to the study, two more compared to Des 3 which uses
the normal distribution. The extra couple of patients are needed to compensate for the
extra variability due to estimation of the var[δ̂].

8.1.3

Simulation

Select Des 3 in the Library, and click
in the toolbar. Alternatively, right-click
on Des 3 and select Simulate. A new Simulation window will appear. Click on the
Response Generation Info tab, and specify: Mean control = 3.67, Mean Treatment
= 3.12, and Std. Deviation of Paired Difference (σD )= 0.683.

Leave all default values, and click Simulate. Once the simulation run has completed,
East will add an additional row to the Output Preview labeled as Sim 1.
Select Sim 1 in the Output Preview and click
. Double-click Sim 1 in the
Library, and the simulation output details will be displayed in the right pane under the

8.1 Mean of Paired Differences – 8.1.3 Simulation

119

<<< Contents

* Index >>>

8 Normal Noninferiority Paired-Sample
Simulation tab.

Notice that the percentage of rejections out of 10000 simulated trials is consistent with
the design power of 90%. The exact result of the simulations may differ slightly,
depending on the seed.
Now we wish to simulate from a point that belongs to H0 to check whether the chosen
design maintains type I error of 5%. Right-click Sim 1 in the Library and select Edit
Simulation. Go to the Response Generation Info tab in the upper pane and specify:
Mean control = 3.67, Mean Treatment = 2.97, and Std. Deviation of Paired
Difference (σD ) = 0.683.

Click Simulate. Once the simulation run has completed, East will add an additional
row to the Output Preview labeled as Sim 2. Select Sim 2 in the Output Preview and
click

120

. Now double-click on Sim 2 in the Library. The simulation output

8.1 Mean of Paired Differences – 8.1.3 Simulation

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
details will be displayed.

The upper efficacy stopping boundary was crossed close to the specified type I error of
2.5%. The exact result of the simulations may differ slightly, depending on the seed.

8.2

Ratio of Paired
Means

Consider a randomized clinical trial comparing an experimental treatment, T, to a
control treatment, C, on the basis of outcome variable, X, with means µt and µc ,
respectively, and let σt2 and σc2 denote the respective variances. The null hypothesis
H0 : µt /µc ≤ ρ0 is tested against the one-sided alternative hypothesis
H1 : µt /µc > ρ0 . Here, ρ0 denotes the noninferiority margin and ρ0 < 1. Let
ρ = µt /µc . Then the null hypothesis can be expressed as H0 : ρ ≤ ρ0 and the
alternative can be expressed as H1 : ρ > ρ0 .
Let us have N such paired observations from T and C and (Xit , Xic ) denotes the ith
pair of observations (i = 1, · · · , N ). Then log Xit − log Xic = log (Xit /Xic ) denotes
the logarithm of ratio of means for ith subject. We assume that the paired
log-transformed observations on X from T and C, (log Xit , log Xic ) are bivariate
normally distributed with common parameters. In other words, (Xit , Xic ) is
distributed as bivariate log-normal distribution.
Denote log Xit by yit , log Xic by yic , and the corresponding difference by
δyi = yit − yic . Assume that δ̂y denotes the sample mean for these paired differences
with estimated standard error se(δ̂y ). The test statistic can be defined as
Z=
8.2 Ratio of Paired Means

δ̂y − log ρ0
se(δ̂y )

,

(8.2)
121

<<< Contents

* Index >>>

8 Normal Noninferiority Paired-Sample
The test statistic Z is distributed as a t distribution with (N − 1) degrees of freedom.
For large samples, the t-distribution can be approximated by the standard normal
distribution. East allows you to analyze using both normal and t distribution. The
power of the test is computed at specified values of µc , µt , and σ.

8.2.1

Trial Design

We will use the same example cited in the previous section, but will transform the
difference hypothesis into the ratio hypothesis. Let µc and µt denote the average rating
of image quality for standard acquisition and low dose protocol, estimated as 3.67 and
3.12, respectively. Let ρ = µt /µc be the ratio between two means. Considering a
noninferiority margin of −0.7 for the test of difference, we can rewrite the hypothesis
mentioned in previous section as
H0 : ρ ≤ 0.81

against

H1 : ρ > 0.81

We are considering a noninferirority margin of 0.81(= ρ0 ). For illustration we will
assume the standard deviation of log ratio as 0.20. As before, we want to design a
study with 90% power at µc = 3.67 and µt = 3.12, and maintains overall one-sided
type I error of 0.025.
Start East afresh. Click Continuous: One Sample on the Design tab and then click
Paired Design: Mean of Paired Ratios.
This will launch a new window. The upper pane of this window displays several fields
with default values. Select Noninferiority for Design Type, and Individual
Means for Input Method. Specify the Mean Control (µc ) as 3.67, Mean Treatment
(µt ) as 3.12, and Noninferiority margin (ρ0 ) as 0.81. Enter 0.20 for Std. Dev. of Log
Ratio, and 0.025 for Type I Error (α). The upper pane now should appear as below:

122

8.2 Ratio of Paired Means – 8.2.1 Trial Design

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
Click Compute. This will calculate the sample size for this design and the output is
shown as a row in the Output Preview located in the lower pane of this window. The
computed sample size (180 subjects) is highlighted in yellow.

This design has default name Des 1. You can select this design by clicking anywhere
in the
along the row in the Output Preview. Select this design and click
Output Preview toolbar. Some of the design details will be displayed in the upper
pane, labeled as Output Summary.

A total of 180 subjects must be enrolled in order to achieve the desired 90% power
under the alternative hypothesis. In the Output Preview select Des 1 and click
in the toolbar to save this design to Wbk1 in the Library.
Suppose you think enrolling 180 subjects is too much for your organization and you
can go up to only 130 subjects. You want to evaluate the power of your study at sample
size 130 but with the design parameters remain unaltered. In order to compute power
with 130 subjects, first select the Des 1 in the Library, and click
on the
Library toolbar. In the Input dialog box, first select the radiobutton for Power, and

8.2 Ratio of Paired Means – 8.2.1 Trial Design

123

<<< Contents

* Index >>>

8 Normal Noninferiority Paired-Sample
then enter 130 for Sample Size.

Now click Compute. This will add another row labeled as Des 2 in Output Preview
with computed power highlighted in yellow. The design attains a power of 78.7%.
Now select both the rows in Output Preview by pressing the Ctrl key, and click
in the Output Preview toolbar to see a summary of both designs in the Output
Summary.

In the Output Preview select Des 2 and click
to Wbk1 in the Library.

in the toolbar to save this design

Plotting
With Des 2 selected in the Library, click
on the Library toolbar, and then
click Power vs Sample Size . The resulting power curve for this design will appear.

124

8.2 Ratio of Paired Means – 8.2.1 Trial Design

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
You can move the vertical bar along the X axis.

Suppose you would like to explore the relationship between power and standard
deviation. In order to visualize this relationship, select Des 2 in the Library, click
on the Library toolbar, and then click General (User Defined Plot). Select Std Dev

8.2 Ratio of Paired Means – 8.2.1 Trial Design

125

<<< Contents

* Index >>>

8 Normal Noninferiority Paired-Sample
of Log Ratio for X-Axis. This will display the power vs. standard deviation plot.

Close the plot window before you continue.

8.2.2

Simulation

Select Des 2 in the Library, and click
in the toolbar. Alternatively, right-click
on Des 2 and select Simulate. A new Simulation window will appear. Click on the
Response Generation Info tab, and specify: Mean control = 3.67, Mean Treatment
= 3.12, and Std Dev of Log Ratio= 0.2.

Click Simulate. Once the simulation run has completed, East will add an additional
row to the Output Preview labeled as Sim 1.

126

8.2 Ratio of Paired Means – 8.2.2 Simulation

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016

Select Sim 1 in the Output Preview and click
. Now double-click on Sim 1 in
the Library. The simulation output details will be displayed.

8.2 Ratio of Paired Means

127

<<< Contents

* Index >>>

9

Normal Equivalence Paired-Sample

Two common applications of the paired sample designs include: (1) comparison of two
treatments where patients are matched on demographic and baseline characteristics,
and (2) two observations made from the same patient under different experimental
conditions. The type of endpoint for paired equivalence design may be a difference of
means or ratio of means. The former is presented in Section 9.1 and the latter is
discussed in Section 9.2.

9.1

Mean of Paired
Differences

9.1.1 Trial Design
9.1.2 Simulation

Consider a randomized clinical trial comparing an experimental treatment, T, to a
control treatment, C, on the basis of a outcome variable, X, with means µt and µc ,
2
. Here, the null
respectively, and with a standard deviation of paired difference as σD
hypothesis H0 : µt − µc < δL or µt − µc > δU is tested against the two-sided
alternative hypothesis H1 : δL ≤ µt − µc ≤ δU . Here, δL and δU denote the
equivalence limits. The two one-sided tests (TOST) procedure of Schuirmann (1987)
is commonly used for this analysis.
Let δ = µt − µc denotes the true difference in the means. The null hypothesis
H0 : δ ≤ δL or δ ≥ δU is tested against the two-sided alternative hypothesis
H1 : δL < δ < δU at level α, using TOST procedure. Here, we perform the following
two tests together:
Test1: H0L : δ ≤ δL against H1L : δ > δL at level α
Test2: H0U : δ ≥ δU against H1U : δ < δU at level α
H0 is rejected in favor of H1 at level α if and only if both H0L and H0U are rejected.
Note that this is the same as rejecting H0 in favor of H1 at level α if the (1 − 2α) 100%
confidence interval for δ is completely contained within the interval (δL , δU ).
Here we assume that the each paired observation on X from T and C are bivariate
normally distributed with parameters µt , µc , σt2 , σc2 and ρ. Let us have N such paired
observations from T and C, and let µ̂c and µ̂t denote the estimates of µc and µt based
on these N pairs. The estimate of the difference is δ̂ = µ̂t − µ̂c . Denoting the standard
error of δ̂ by se(δ̂), test statistics for Test1 and Test2 are defined as:
TL =

(δ̂ − δL )
se(δ̂)

and

TU =

(δ̂ − δU )
se(δ̂)

TL and TU are assumed to follow Student’s t-distribution with (N − 1) degrees of
freedom under H0L and H0U , respectively. H0L is rejected if TL ≥ t1−α,(N −1) , and
H0U is rejected if TU ≤ tα,(N −1) .
128

9.1 Mean of Paired Differences

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
The null hypothesis H0 is rejected in favor of H1 if TL ≥ t1−α,(N −1) and
TU ≤ tα,(N −1) , or in terms of confidence intervals: Reject H0 in favor of H1 at level α
if
δL + t1−α,(N −1) se(δ̂) < δ̂ < δU + tα,(N −1) se(δ̂)
(9.1)
We see that decision rule (9.1) is the same as rejecting H0 in favor of H1 if the
(1 − 2 α) 100% confidence interval for δ is entirely contained within interval (δL , δU ).
The power or sample size of such a trial design is determined for a specified value of δ,
say δ1 , for a single-look study only. The choice δ1 = 0, i.e. µt = µc , is common. For a
specified value of δ1 , the power is given by
Pr(Reject H0 ) = 1 − τν (tα,ν |Ω1 ) + τν (−tα,ν |Ω2 )

(9.2)

where ν = N − 1 and Ω1 and Ω2 are non-centrality parameters given by
Ω1 = (δ1 − δL )/se(δ̂) and Ω2 = (δ1 − δU )/se(δ̂), respectively. tα,ν denotes the upper
α × 100% percentile from a Student’s t distribution with ν degrees of freedom.
τν (x|Ω) denotes the distribution function of a non-central t distribution with ν degrees
of freedom and non-centrality parameter Ω, evaluated at x.
Since the sample size N is not known ahead of time, we cannot characterize the
bivariate t-distribution. Thus, solving for sample size must be performed iteratively by
equating the formula (9.2) to the power 1 − β.
The advantage of the paired sample equivalence design compared to the two sample
equivalence design lies in the smaller se(δ̂) in former case. The paired sample
equivalence design is more powerful than the two sample equivalence design: to
achieve the same level of power, the paired sample equivalence design requires fewer
subjects.

9.1.1

Trial Design

To ensure that comparable results can be achieved between two laboratories or
methods, it is important to conduct cross-validation or comparability studies to
establish statistical equivalence between the two laboratories or methods. Often, to
establish equivalence between two laboratories, a paired sample design is employed.
Feng et al. (2006) reported the data on 12 quality control (QC) samples. Each sample
was analyzed first by Lab1 and then by Lab2. In this example we will consider Lab1 as
the standard laboratory (C) and Lab2 is the one to be validated (T). Denote the mean
concentrations from Lab1 and Lab2 by µc and µt , respectively. Considering an
equivalence limit of (−10, 10) we can state our hypotheses as:
H0 : µt − µc < −10 or µt − µc > 10 against H1 : − 10 ≤ µt − µc ≤ 10
9.1 Mean of Paired Differences – 9.1.1 Trial Design

129

<<< Contents

* Index >>>

9 Normal Equivalence Paired-Sample
Based on the reported data µc and µt are estimated as 94.2 pg mL−1 and 89.9 pg
mL−1 , repsectively. The standard deviation of paired difference was estimated as 8.18.
We want to design a study with 90% power at µc = 94.2 and µt = 89.9. We want to
reject H0 with type I error not exceeding 0.025.
First, click Continuous: One Sample on the Design tab, and then click Paired
Design: Mean of Paired Differences as shown below.

This will launch a new window.
Since we are interested in testing an equivalence hypothesis select Equivalence for
Trial Type, with an Type I Error of 0.025, and Power of 0.9. Select Individual
Means for Input Method. Enter −10 for Lower Equivalence Limit (δL ) and 10 for
Upper Equivalence Limit (δU ). Specify the Mean Control (µc ) as 94.2, Mean
Treatment (µt ) as 89.9, and Std. Dev. of Paired Difference (σD ) as 8.18. The upper
pane should appear as below:

130

9.1 Mean of Paired Differences – 9.1.1 Trial Design

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
Click Compute. This will calculate the sample size for this design and the output is
shown as a row in the Output Preview located in the lower pane of this window. The
computed sample size (20 samples) is highlighted in yellow.

This design has default name Des 1 and you can select this design by clicking
in the
anywhere along the row in the Output Preview and then clicking
Output Preview toolbar. Some of the design details will be displayed in the upper
pane, labeled as Output Summary.

A total of 20 samples is required to achieve the desired 90% power under the
alternative hypothesis. In the Output Preview select Des 1 and click
toolbar to save this design to Wbk1 in the Library.

in the

The equivalence limits of (−10, 10) might be too narrow and therefore a wider
equivalence interval (−12.5, 12.5) could be considered. Select Des 1 in the Library,
and click
on the Library toolbar. In the Design Parameters tab, change the
entry for Lower Equivalence Limit (δL ) and Upper Equivalence Limit (δU ) to
−12.5 and 12.5, respectively, and click Compute.
This will add a new row in the Output Preview labeled as Des 2. In the Output
9.1 Mean of Paired Differences – 9.1.1 Trial Design

131

<<< Contents

* Index >>>

9 Normal Equivalence Paired-Sample
Preview select Des 2 and click
in the toolbar to save this design to Wbk1 in
the Library. To compare the two designs, select both rows in Output Preview using
the Ctrl key and click
in the Output Preview toolbar. This will display the
two designs side by side in the Output Summary pane.

As we widen the equivalence limit from (−10, 10) to (−12.5, 12.5), the required
sample size is reduced from 20 to 11.
Plotting
We would like to explore how power is related to the required sample size. Select
Des 2 in the Library, click
on the Library toolbar, and then click Power vs

132

9.1 Mean of Paired Differences – 9.1.1 Trial Design

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
Sample Size. The resulting power curve for this design will appear.

You can move the vertical bar along the X axis. To find out power at any sample size
simply move the vertical bar to that sample size and the numerical value of sample size
and power will be displayed on the right of the plot. You can export this chart in one of
several image formats (e.g., Bitmap or JPEG) by clicking Save As.... Close this chart
before continuing.
In a similar fashion one can see power vs delta plot by clicking

9.1 Mean of Paired Differences – 9.1.1 Trial Design

and then

133

<<< Contents

* Index >>>

9 Normal Equivalence Paired-Sample
Power vs Treatment Effect.

To produce tables associated with these plots, first click
select the appropriate table.

9.1.2

in the toolbar and then

Simulation

Now we wish to simulate from Des 2 to verify whether the study truly maintains the
in the toolbar.
power and type I error. Select Des 2 in the Library, and click
Alternatively, right-click on Des 2 and select Simulate. Click on the Response
Generation Info tab, and specify: Mean control = 94.2, Mean Treatment = 89.9,
and Std. Dev. of Paired Difference (σD ) = 8.18.

Click Simulate. Once the simulation run has completed, East will add an additional
row to the Output Preview labeled as Sim 1.
134

9.1 Mean of Paired Differences – 9.1.2 Simulation

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016

Select Sim 1 in the Output Preview and click
icon. Now double-click on
Sim 1 in the Library. The simulation output details will be displayed.

Notice that the simulated power is close to the attained power of 92.6% for Des 2. The
exact result of the simulations may differ slightly, depending on the seed.
Now we wish to simulate from a point that belongs to H0 to check whether the chosen
design maintains type I error of 5% or not. For this we consider, µc = 94.2 and
µt = 81.7. Since in this case δ = 81.7 − 94.2 = −12.5, this (µt , µc )=(81.7, 94.2)
point belongs to H0 . Right-click on Sim 1 in the Library and select Edit Simulation.
Go to the Response Generation Info tab in the upper pane and specify: Mean control
= 94.2, Mean Treatment = 81.7, and Std. Dev. of Paired Difference (σD ) = 8.18.

Click Simulate to start the simulation. Once the simulation run has completed, East
will add an additional row to the Output Preview labeled as Sim 2. Select Sim 2 in the
Output Preview and click

icon. Now double-click on Sim 2 in the Library.

9.1 Mean of Paired Differences – 9.1.2 Simulation

135

<<< Contents

* Index >>>

9 Normal Equivalence Paired-Sample
The simulation output details will be displayed in the right pane under Simulation tab.

Notice that the simulated power here is close to the pre-set type I error of 5%. The
exact result of the simulations may differ slightly, depending on the seed.

9.2

Ratio of Paired
Means

Consider a randomized clinical trial comparing an experimental treatment, T, to a
control treatment, C, on the basis of a outcome variable, X, with means µt and µc ,
respectively, and let σt2 and σc2 denote the respective variances. Here, the null
hypothesis H0 : µt /µc ≤ ρL or µt /µc ≥ ρU is tested against the alternative hypothesis
H1 : ρL < µt /µc < ρU . Let ρ = µt /µc denotes the ratio of two means. Then the null
hypothesis can be expressed as H0 : ρ ≤ ρL or ρ ≥ ρU and the alternative can be
expressed as H1 : ρL < ρ < ρU . In practice, ρL and ρU are often chosen such that
ρL = 1/ρU . The two one-sided tests (TOST) procedure of Schuirmann (1987) is
commonly used for this analysis, and is employed in this section for a parallel-group
study.
Let us have N such paired observation from T and C and (Xit , Xic ) denotes the ith
pair of observations (i = 1, · · · , N ). Then log Xit − log Xic = log (Xit /Xic ) denotes
the logarithm of ratio of means for the ith subject. Here we assume that the each paired
log-transformed observations on X from T and C, (log Xit , log Xic ) are bivariate
normally distributed with common parameters. In other words, (Xit , Xic ) is
distributed as a bivariate log-normal distribution.
Since we have translated the ratio hypothesis into a difference hypothesis using the log
transformation, we can perform the test for difference as discussed in section 9.1. Note
that we need the standard deviation of log of ratios. Sometimes, we are provided with
information on coefficient of variation (CV) of ratios instead, and the standard

136

9.2 Ratio of Paired Means

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016

deviation of log ratios can be obtained using: sd =

q

ln (1 + CV2 ).

This is a test for the comparison of geometric means of ratio, as we are taking the
mean of the logarithms of ratios.

9.2.1

Trial Design

Here we will use the same example reported by Feng et al (2006). Denote the mean
concentrations from Lab1 and Lab2 by µc and µt , and ρ = µt /µc is the ratio between
two means. Considering an equivalence limit of (0.85, 1.15) we can state our
hypotheses as
H0 : µt /µc < 0.85 or µt /µc > 1.15 against H1 : 0.85 ≤ µt /µc ≤ 1.15
Based on the reported data, µc and µt are estimated as 94.2 pg mL−1 and 89.9 pg
mL−1 , repsectively. Assume that the standard deviation of log ratio can be estimated is
0.086. As before, we want to design a study with 90% power at µc = 94.2 and
µt = 89.9. We want to reject H0 with type I error not exceeding 0.025.
Start East afresh. First, click Continuous: One Sample on the Design tab and then
click Paired Design: Mean of Paired Ratios as shown below.

This will launch a new window.
Select Equivalence for Trial Type, and enter 0.025 for Type I Error, and 0.9 for
Power. Then select Individual Means for Input Method, and enter the Mean
Control (µc ) as 94.2, Mean Treatment (µt ) as 89.9, and Std. Dev. of Log Ratio as
0.086. Enter 0.85 for Lower Equiv. Limit (ρL ) and 1.15 for Upper Equiv. Limit
9.2 Ratio of Paired Means – 9.2.1 Trial Design

137

<<< Contents

* Index >>>

9 Normal Equivalence Paired-Sample
(ρU ). The upper pane should appear as below:

Click Compute. This will calculate the sample size for this design and the output is
shown as a row in the Output Preview located in the lower pane of this window. The
computed sample size (8 samples) is highlighted in yellow.

This design has default name Des 1. Select this design and click
in the Output
Preview toolbar. Some of the design details will be displayed in the upper pane,
labeled as Output Summary.

138

9.2 Ratio of Paired Means – 9.2.1 Trial Design

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016

In the Output Preview select Des 1 and click
to Wbk1 in the Library.

in the toolbar to save this design

Plotting
Suppose you want to see how the standard deviation influences the sample size. In
order to visualize this relationship, select Des 1 in the Library, click
on the
Library toolbar, and then click General (User Defined Plot). Select Std Dev of
Log Ratio for X-Axis in right of the plot. This will display the sample size vs.
standard deviation plot.

Close this plot before continuing.

9.2.2

Simulation

Now we want to check by simulation whether the sample size of 8 provides at least
90% power. Select Des 1 in the Library, and click
in the toolbar. Click on the
Response Generation Info tab, and specify: Mean control = 94.2, Mean Treatment
= 89.9, and Std Dev. of Log Ratio= 0.086.

9.2 Ratio of Paired Means – 9.2.2 Simulation

139

<<< Contents

* Index >>>

9 Normal Equivalence Paired-Sample
Click Simulate to start the simulation. Once the simulation run has completed, East
will add an additional row to the Output Preview labeled as Sim 1. Notice that the
simulated power is very close to the design power.

140

9.2 Ratio of Paired Means

<<< Contents

* Index >>>

10

Normal Superiority Two-Sample

To demonstrate the superiority of a new treatment over the control, it is often necessary
to randomize subjects to the control and treatment arms, and contrast the
group-dependent means of the outcome variables. In this chapter, we show how East
supports the design and interim monitoring of such experiments.

10.1

Difference of Means

10.1.1 Trial Design (Weight
Control Trial of
Orlistat)
10.1.2 IM of the Orlistat
trial
10.1.3 t-Test Design

Consider a randomized clinical trial comparing an experimental treatment, T, to a
control treatment, C, on the basis of a normally distributed outcome variable, X, with
means µt and µc , respectively, and with a common variance σ 2 . We intend to monitor
the data up to K times after accruing n1 , n2 , . . . nK ≡ nmax patients. The information
fraction at the jth look is given by tj = nj /nmax . Let r denote the fraction
randomized to treatment T.
Define the treatment difference to be
δ = µt − µc .
The null hypothesis of interest is
H0 : δ = 0 .
We wish to construct a K-look group sequential level α test of H0 having 1 − β power
at the alternative hypothesis
H1 : δ = δ1 .
Let X̄t (tj ) and X̄c (tj ) be the mean responses of the experimental and control groups,
respectively, at time tj . Then
δ̂(tj ) = X̄t (tj ) − X̄c (tj )

(10.1)

σ2
.
nj r(1 − r)

(10.2)

and
var[δ̂(tj )] =

Therefore, by the Scharfstein, Tsiatis and Robins (1997), Jennison and Turnbull (1997)
theorem the stochastic process
W (tj ) =

p X̄t (tj ) − X̄c (tj )
tj q
, j = 1, 2, . . . K,
2
σ
nj r(1−r)

(10.3)

√
is N (ηtj , tj ) with independent increments, where η = 0 under H0 and η = δ1 Imax
under H1 . We refer to η as the drift parameter.
10.1 Difference of Means – 10.1.1 Trial Design (Weight Control Trial of Orlistat)

141

<<< Contents

10

* Index >>>

Normal Superiority Two-Sample
10.1.1

Trial Design (Weight Control Trial of Orlistat)

Eighteen U.S. research centers participated in this trial, where obese adults were
randomized to receive either Orlistat or placebo, combined with a dietary intervention
for a period of two years (Davidson et al, 1999). Orlistat is an inhibitor of fat
absorption, and the trial was intended to study its effectiveness in promoting weight
loss and reduce cardiovascular risk factors. The study began in October 1992. More
than one outcome measure was of interest, but we shall consider only body weight
changes between baseline and the end of the first year intervention. We shall consider a
group sequential design even though the original study was not intended as such. The
published report does not give details concerning the treatment effect of interest or the
desired significance level and power of the test. It does say, however, that 75% of
subjects had been randomized to the Orlistat arm, probably to maximize the number of
subjects receiving the active treatment.
Single-Look Design
Suppose that the expected mean body weight change after one
year of treatment was 9 kg in the Orlistat arm and 6 kg in the control arm. Assume also
that the common standard deviation of the observations (weight change) was 8 kg. The
standardized difference of interest would therefore be (9 − 6)/8 = 0.375. We shall
consider a one sided test with 5% significance level and 90% power, and an allocation
ratio (treatment:control) of 3:1; that is, 75% of the patients are randomized to the
Treatment (Orlistat) arm.
First, click Continuous: Two Samples on the Design tab, and then click Parallel
Design: Difference of Means.
In the upper pane of this window is the Input dialog box, which displays default input
values. The effect size can be specified in one of three ways, selected from Input
Method: (1) individual means and common standard deviation, (2) difference of
means and common standard deviation, or (3) standardized difference of means. We
will use the Individual Means method. Enter the appropriate design parameters
so that the dialog box appears as shown. Remember to set the Allocation Ratio to 3.

142

10.1 Difference of Means – 10.1.1 Trial Design (Weight Control Trial of Orlistat)

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
Then click Compute.

The design is shown as a row in the Output Preview, located in the lower pane of this
window. The computed sample size is 325 subjects.

You can select this design by clicking anywhere along the row in the Output Preview.
On the Output Preview toolbar, click

to display a summary of the design

to save
details in the upper pane. Then, in the Output Preview toolbar, click
this design to Wbk1 in the Library. If you hover the cursor over Des1 in the Library,
a tooltip will appear that summarizes the input parameters of the design.
on the Library toolbar, and then
With Des1 selected in the Library, click
click Power vs Treatment Effect (δ). The resulting power curve for this design is
shown. You can save this chart to the Library by clicking Save in Workbook. You
can also export the chart in one of several image formats (e.g., Bitmap or JPEG) by

10.1 Difference of Means – 10.1.1 Trial Design (Weight Control Trial of Orlistat)

143

<<< Contents

10

* Index >>>

Normal Superiority Two-Sample
clicking Save As.... For now, you may close the chart before continuing.

Three-Look Design

Create a new design by selecting Des1 in the Library, and

on the Library toolbar, or by right-clicking and selecting Edit
clicking
Design. In the Input, change the Number of Looks from 1 to 3, to generate a study
with two interim looks and a final analysis. A new tab for Boundary Info should
appear. Click this tab to reveal the stopping boundary parameters. By default, the
Spacing of Looks is set to Equal, which means that the interim analyses will be
equally spaced in terms of the number of patients accrued between looks. The left side
contains details for the Efficacy boundary, and the right side contains details for the
Futility boundary. By default, there is an efficacy boundary (to reject H0) selected,
but no futility boundary (to reject H1). The Boundary Family specified is of the
Spending Functions type. The default Spending Function is the
Lan-DeMets (Lan & DeMets, 1983), with Parameter OF (O’Brien-Fleming), which
generates boundaries that are very similar, though not identical, to the classical
stopping boundaries of O’Brien and Fleming (1979).

144

10.1 Difference of Means – 10.1.1 Trial Design (Weight Control Trial of Orlistat)

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
The cumulative alpha spent, and the boundary values, are displayed in the table below.

Expected sample size and stopping probabilities
Click Compute to generate output for Des2. Select both Des1 and Des2 in the Output
Preview and click
in yellow.

. The maximum and expected sample sizes are highlighted

The price to be paid for multiple looks is the commitment of a higher maximum
sample size (331 patients) compared to that of a single-look design (325 patients).
However, if the alternative hypothesis H1 holds, the study has a chance of stopping at
one of the two interim analyses and saving patient accrual: on average, Des2 will stop
10.1 Difference of Means – 10.1.1 Trial Design (Weight Control Trial of Orlistat)

145

<<< Contents

10

* Index >>>

Normal Superiority Two-Sample
with 257 patients if the alternative is true. The expected sample size under the null is
329, less than the maximum since there is a small probability of stopping before the
last look and, wrongly, rejecting the null.
With Des2 selected in the Output Preview, click
to save Des2 to the Library.
In order to see the stopping probabilities, as well as other characteristics, double-click
Des2 in the Library. The clear advantage of this sequential design resides in the high
probability of stopping by the second look, if the alternative is true, with a sample size
of 221 patients, which is well below the requirements for a fixed sample study (325
patients). Even under the null, however, there is a small chance for the test statistic to
cross the boundary for its early rejection (type-1 error probability) at the first or second
look. Close the Details window before continuing.

Examining stopping boundaries and spending functions
Plot the boundary values of Des2 by clicking
on the Library toolbar, and then

146

10.1 Difference of Means – 10.1.1 Trial Design (Weight Control Trial of Orlistat)

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
selecting Stopping Boundaries. The following chart will appear:

The three solid dots correspond to the actual boundary values to be used at the three
planned analyses. Although the three looks are assumed to be equally spaced at design
time, this assumption need not hold at analysis time. Values of the test-statistic (z-test)
greater than the upper boundary values would warrant early stopping in favor of H1,
that Orlistat is better than placebo. The horizontal axis expresses the total number of
patients at each of the three analysis time-points. The study is designed so that the last
analysis time point coincides with the maximum sample size required for the chosen
design, namely 331 patients. By moving the vertical line cursor from left to right, one
can observe the actual values of the stopping boundaries at each interim analysis
time-point. The boundaries are rather conservative: for example, you would need the
standardized test statistic to exceed 2.139 in order to stop the trial at the second look.
It is sometimes convenient to display the stopping boundaries on the p-value scale.
Under Boundary Scale, select the p-value Scale. The chart now displays the
cumulative number of patients on the X-axis and the nominal p-value (1-sided) that we
would need in order to stop the trial at that interim look. To change the scale of this
chart, click Settings... and in the Chart Settings dialog box, change the Maximum to

10.1 Difference of Means – 10.1.1 Trial Design (Weight Control Trial of Orlistat)

147

<<< Contents

10

* Index >>>

Normal Superiority Two-Sample
0.05, and the Divisions: Major to 0.01, and click OK.

The following chart will be displayed.

For example, at the second look, after 221 subjects have been observed, we require a
p-value smaller than 0.016 in order to stop the study. Notice that the p-value at the 3rd
and final look needs to be smaller than 0.045, rather than the usual 0.05 that one would
require for a single-look study. This is the penalty we pay for the privilege of taking
three looks at the data instead of one. You may like to display the boundaries in the
delta scale. In this scale, the boundaries are expressed in units of the effect size, or the
difference in means. We need to observe a difference in average weight loss of 2.658
148

10.1 Difference of Means – 10.1.1 Trial Design (Weight Control Trial of Orlistat)

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
kg or more, in order to cross the boundary at the second look.
Close these charts, and click
chart will appear.

and then Error Spending. The following

This spending function was proposed by Lan and DeMets (1983), and for one-sided
tests has the following functional form:


Zα/2
.
(10.4)
α(t) = 2 − 2Φ √
t
Observe that very little of the total type-1 error is spent early on, but more is spent
rapidly as the information fraction increases, and reaches 0.05 at an information
fraction of 1. A recursive method for generating stopping boundaries from spending
functions is described in the Appendix G. Close this chart before continuing.
Lan and DeMets (1983) also provided a function for spending the type-1 error more
aggressively. This spending function is denoted by PK, signifying that it is the
Lan-DeMets spending function for generating stopping boundaries that closely
resemble the classical Pocock (1977) stopping boundaries. It has the functional form:
α(t) = α ln[1 + (e − 1)t]

10.1 Difference of Means – 10.1.1 Trial Design (Weight Control Trial of Orlistat)

(10.5)

149

<<< Contents

10

* Index >>>

Normal Superiority Two-Sample
Select Des2 in the Library, and click
on the Library toolbar. On the
Boundary Info tab, change the Parameter from OF to PK, and click Compute. With
Des3 selected in the Output Preview, click
and Des3, by holding the Ctrl key, and then click
the details of the two designs side-by-side:

. In the Library, select both Des2
. The upper pane will display

In the Output Summary toolbar, click
to compare the two designs according
to Stopping Boundaries. Notice that the stopping boundaries for Des3 (PK) are
relatively flat; almost the same critical point is used at all looks to declare significance.

150

10.1 Difference of Means – 10.1.1 Trial Design (Weight Control Trial of Orlistat)

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
Close the chart before continuing.

Click

and select Error Spending. Des3 (PK) spends the type-1 error

10.1 Difference of Means – 10.1.1 Trial Design (Weight Control Trial of Orlistat)

151

<<< Contents

10

* Index >>>

Normal Superiority Two-Sample
probability at a much faster rate than Des2 (OF). Close the chart before continuing.

Wang and Tsiatis Power Boundaries
The stopping boundaries generated by the Lan-Demets OF and PK functions closely
resemble closely the classical O’Brien-Fleming and Pocock stopping boundaries,
respectively. These classical boundaries are a special case of a family of power
boundaries proposed by Wang and Tsiatis (1987). For a two-sided level-ψ test, using K
equally spaced looks, the power boundaries for the standardized test statistic Zj at the
j-th look are of the form
C(∆, α, K)
Zj ≥
(10.6)
(j/K)0.5−∆
The normalizing constant C(∆, α, K) is evaluated by recursive integration so as to
ensure that the K-look group sequential test has type-1 error equal to α (see
Appendix G for details), and ∆ is a parameter characterizing the shape of the stopping
boundary. For example, if ∆ = 0.5, the boundaries are constant at each of the K looks.
These are the classical Pocock stopping boundaries (Pocock, 1977). If ∆ = 0, the
width of the boundaries is inversely proportional to the square root of the information
fraction j/K at the j-th look. These are the classical O’Brien-Fleming stopping
boundaries (O’Brien and Fleming, 1979). Other choices produce boundaries of
different shapes. Notice from equation (10.6) that power boundaries have a specific
152

10.1 Difference of Means – 10.1.1 Trial Design (Weight Control Trial of Orlistat)

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
functional form, and can be evaluated directly, or tabulated, once the normalizing
constant C(∆, α, K) has been worked out for various combinations of α and K. In
contrast, spending function boundaries are evaluated indirectly by inverting a
pre-specified spending function as shown in Appendix F.
Right-click Des3 in the Library and select Edit Design. On the Boundary Info tab,
change the Boundary Family from Spending Functions to Wang-Tsiatis.
Leave the default value of ∆ as 0, and click Compute. With Des4 selected in the
Output Preview, click

.

In the Library, select both Des2 and Des4 by holding the Ctrl key. Click
and
select Stopping Boundaries. As expected from our discussion above, the boundary
values for Des2 (Lan-Demets, OF) and for Des4 (Wang-Tsiatis, ∆ = 0) are very
similar. Close the chart before continuing.

More charts
Select Des3 in the Library, click
, and then click Power vs. Treatment effect
(δ). Click the radiobutton for Standardized under X-Axis Scale. By scrolling
from left to right with the vertical line cursor, one can observe the power for various

10.1 Difference of Means – 10.1.1 Trial Design (Weight Control Trial of Orlistat)

153

<<< Contents

10

* Index >>>

Normal Superiority Two-Sample
values of the effect size.

Close this chart, and with Des3 selected, click
again. Then click Expected
Sample Size. Click the radio button for Standardized under X-Axis Scale. The
following chart appears:

154

10.1 Difference of Means – 10.1.1 Trial Design (Weight Control Trial of Orlistat)

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
By scrolling from left to right with the vertical line cursor we can observe how the
expected number of events decreases as the effect size increases. Close this chart
before continuing.
Unequally spaced analysis time points
In the above designs, we have assumed that analyses were equally spaced. This
assumption can be relaxed if you know when interim analyses are likely to be
performed (e.g., for administrative reasons). In either case, departures from this
assumption are allowed during the actual interim monitoring of the study, but sample
size requirements will be more accurate if allowance is made for this knowledge.
With Des3 selected in the Library, right-click Edit Design. Under Spacing of Looks
in the Boundary Info tab, click the Unequal radio button.
The column titled Info. Fraction can be edited to modify the relative spacing of the
analyses. The information fraction refers to the proportion of the maximum (yet
unknown) sample size. By default, this table displays equal spacing, but suppose that
the two interim analyses will be performed with 0.25 and 0.5 of the maximum sample
size. Click Recalc to recompute the cumulative alpha spent and the efficacy boundary
values.

After entering these new information fraction values, click Compute. Select Des5 in
the Output Preview and click

to save it in the Library for now.

Arbitrary amounts of error probability to be spent at each analysis
Another feature of East is the possibility to specify arbitrary amounts of cumulative
error probability to be used at each look. This option can be combined with the option
of unequal spacing of the analyses. With Des5 selected in the Library, click
on the Library toolbar. Under the Boundary Info tab, select Interpolated for the
Spending Function. In the column titled Cum. α Spent, enter 0.005 for the first look

10.1 Difference of Means – 10.1.1 Trial Design (Weight Control Trial of Orlistat)

155

<<< Contents

10

* Index >>>

Normal Superiority Two-Sample
and 0.03 for the second look, click Recalc, and then Compute.

Select Des6 in the Output Preview and click
and Des6 by holding the Ctrl key. Click
The following chart will be displayed.

. From the Library, select Des5
, and select Stopping Boundaries.

The advantage of Des6 over Des5 is the more conservative boundary (less type-1 error
probability spent) at the first look. Close these charts before continuing.
Computing power for a given sample size
East can compute the achieved power, given the other design parameters such as
sample size. Select Des6 in the Library and right-click Edit Design. On the Design
Parameters tab, click the radio button for Power. You will notice that the field for
power will contain the word “Computed”. You may now enter a value for the sample
156

10.1 Difference of Means – 10.1.1 Trial Design (Weight Control Trial of Orlistat)

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
size: Enter 250, and click Compute. As expected, the achieved power is less than 0.9,
namely 0.781.

To delete this design, click Des7 in the Output Preview, and click
in the
toolbar. East will display a warning to make sure that you want to delete the selected
row. Click Yes to continue.
Spending function boundaries for early stopping in favor of H0 or H1
So far we have considered only efficacy boundaries, which allow for early stopping in
favor of the alternative. It may be of interest, in addition, to consider futility
boundaries, which allow for early stopping when there is lack of evidence against the
null hypothesis. Select Des2 in the Library and click
. On the Boundary Info
tab, you can select from one of several types of futility boundaries, such as from a
spending function, or by conditional power. Note that some of these options are
available for one-sided tests only.

Select Spending Functions under Boundary Family. Select PK for the
Parameter, and leave all other default settings. See the updated values of the
stopping boundaries populated in the table below.

On the Boundary Info tab, you may also like to click the

or

icons to

10.1 Difference of Means – 10.1.1 Trial Design (Weight Control Trial of Orlistat)

157

<<< Contents

10

* Index >>>

Normal Superiority Two-Sample
view plots of the error spending functions, or stopping boundaries, respectively.

Click Compute, and with Des7 selected in the Output Preview, click
. To
view the design details, double-click Des7 in the Library. Because not all the type-2
error is spent at the final look, this trial has a chance of ending early if the null
158

10.1 Difference of Means – 10.1.1 Trial Design (Weight Control Trial of Orlistat)

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
hypothesis is true. This is demonstrated by the low expected sample size under the null
(209 patients), compared to those of the other designs considered so far. Close the
Output window before continuing.
Before continuing to the next section, we will save the current workbook, and open a
new workbook. Select Wbk1 in the Library and right-click, then click Save.
Next, click the
button, click New, and then Workbook. A new workbook,
Wbk2, should appear in the Library. Delete all designs from the Output Preview
before continuing.
Creating multiple designs
To create more than one design from the Input, one simply enters multiple values in
any of the highlighted input fields. Multiple values can be entered in two ways. First,
one can enter a comma-separated list (e.g., “0.8, 0.9”). Second, one can use colon
notation (e.g., “7:9:0.5”) to specify a range of values, where “a:b:c” is read as from ‘a’
to ‘b’ in step size ‘c’.
Suppose that we wished to explore multiple variations of Des7. With Des7 selected in
the Library, right-click and select Edit Design. In the Design Parameters tab of the
Input, enter multiple values for the Power(1-β) (0.8, 0.9) and Std.Deviation(σ)
(7 : 9 : 0.5) and click Compute:

We have specified 10 designs here, from the combination of 2 distinct values of the
power and 5 distinct values of the standard deviation. To view all 10 designs on the
to maximize the Output Preview. The designs within the Output
screen, click
Preview can be sorted in ascending or descending order, according to one of the
column variables. For example, if you click once on the column titled Sample Size, the
designs will be sorted (from top to bottom) in ascending order of the total sample size.
In addition, you may wish to filter and select designs that meet certain criteria. Click
10.1 Difference of Means – 10.1.1 Trial Design (Weight Control Trial of Orlistat)

159

<<< Contents

10

* Index >>>

Normal Superiority Two-Sample
on the Output Preview toolbar, and in the filter criterion box, select only those
designs for which the maximum sample size is less than or equal to 400, as follows:

From the remaining designs, select Des8 in the Output Preview, and click
.
You will be asked to nominate the workbook in which this design should be saved.
Select Wbk2 and click OK.

Accrual and dropout information
More realistic assumptions regarding the patient accrual process – namely, accrual
rate, response lag, and probability of dropout – can be incorporated into the design
stage. First, the accrual of patients may be estimated to occur at some known rate.
Second, because the primary outcome measure is change in body weight from baseline
to end of first year, the response lag is known to be 1 year. Finally, due to the long-term
nature of the study, it is estimated that a small proportion of patients is likely to drop
out over the course of the study.
160

10.1 Difference of Means – 10.1.1 Trial Design (Weight Control Trial of Orlistat)

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016

With Des8 selected in the Library, click
. Click Include Options in the top
right hand corner of the Input, and then click Accrual/Droput Info. A new tab should
appear to the right of Design Parameters and Boundary Info. Click on this
Accrual/Dropout tab, and enter the following information as shown below: The
accrual rate is 100 patients per year, the response lag is 1 year, and the probability that
a patients drops out before completing the study is 0.1.

A plot of the predicted accruals and completers over time can be generated by clicking
.

Click Compute to generate the design. Select Des18 in the Output Preview, and click
. Select Wbk2 and click OK. Double-click Des18 in the Library. The output
details reveal that in order to ensure that data can be observed for 153 completers by
the second look, one needs to have accrued 255 subjects. Close this Output window

10.1 Difference of Means – 10.1.1 Trial Design (Weight Control Trial of Orlistat)

161

<<< Contents

10

* Index >>>

Normal Superiority Two-Sample
before continuing.

Select individual looks
With Des8 selected in Wbk2, click
. In the look details table of the Boundary
Info tab, notice that there are ticked checkboxes under the columns Stop for Efficacy
and Stop for Futility. East gives you the flexibility to remove one of the stopping
boundaries at certain looks. For example, untick the checkbox in the first look under
the Stop for Futility column, and click Recalc.

Click

162

to view the new boundaries. Notice that the futility boundary does not

10.1 Difference of Means – 10.1.1 Trial Design (Weight Control Trial of Orlistat)

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
begin until the second look.

Simulation of the Orlistat trial
Suppose you now wish to simulate Des4 in Wbk1. Select Des4 in the Library, and
click the
from the Library toolbar. Alternatively, right-click on Des4 and
select Simulate. A new Simulation worksheet will appear. Click on the Response
Generation Info tab, and input the following values: Mean control = 6; Mean
Treatment = 6; (Common) Std. Deviation = 8. In other words, we are simulating
from a population in which there is no true difference between the control and
treatment means. This simulation will allow us to check the type-1 eror rate when
using Des4.

Click Simulate to start the simulation. Once the simulation run has completed, East
will add an additional row to the Output Preview labeled Sim1.

10.1 Difference of Means – 10.1.1 Trial Design (Weight Control Trial of Orlistat)

163

<<< Contents

10

* Index >>>

Normal Superiority Two-Sample
With Sim1 selected in the Output Preview, click
, then double-click Sim1 in
the Library. The simulation output details will be displayed in the upper pane. In the
Overall Simulation Result table, notice that the percentage of times the upper efficacy
stopping boundary was crossed is largely consistent with a type-1 error of 5%. The
exact values of your simulations may differ, depending on your seed.

Right-click Sim1 in the Library and click Edit Simulation. In the Response
Generation Info tab, enter 9 for Mean Treatment. Leave all other values, and click
Simulate. With Sim2 selected in the Output Preview, click
, then double-click
Sim2 in the Library. Notice that the percentage of times the efficacy stopping
boundary was crossed is largely consistent with 90% power for the original design.
Feel free to experiment further with other simulation options before continuing.

10.1.2

Interim monitoring of the Orlistat trial

Suppose we decided to adopt Des2. Select Des2 in the Library, and click
on
the Library toolbar. Alternatively, right-click on Des2 and select Interim
Monitoring. The interim monitoring dashboard contains various controls for
monitoring the trial, and is divided into two sections. The top section contains several
columns for displaying output values based on the interim inputs.

The bottom section contains four charts, each with a corresponding table to its right.
These charts provide graphical and numerical descriptions of the progress of the

164

10.1 Difference of Means – 10.1.2 IM of the Orlistat trial

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
clinical trial and are useful tools for decision making by a data monitoring committee.

Making Entries in the Interim Monitoring Dashboard
Although the study has been designed assuming three equally spaced analyses,
departures from this strategy are permissible using the spending function methodology
of Lan and DeMets (1983) and its extension to boundaries for early stopping in favor
of H0 proposed by Pampallona, Tsiatis and Kim (2001). At each interim analysis time
point, East will determine the amount of type-1 error probability and type-2 error
probability that it is permitted to spend based on the chosen spending functions
specified in the design. East will then re-compute the corresponding stopping
boundaries. This strategy ensures that the overall type-1 error will not exceed the
nominal significance level α. We shall also see how East proceeds so as to control the
type-2 error probability.
Open the Test Statistic Calculator by clicking on the Enter Interim Data button.
Assume that we take the first look after 110 patients (Sample Size (Overall), with an
Estimate of δ as 3, and Standard Error of Estimate of δ as 1.762. Click OK to

10.1 Difference of Means – 10.1.2 IM of the Orlistat trial

165

<<< Contents

10

* Index >>>

Normal Superiority Two-Sample
continue.

East will update the charts and tables in the dashboard accordingly. For example the
Stopping Boundaries Chart displays recomputed stopping boundaries and the path
traced out by the test statistic. The Error Spending Function Chart displays the
cumulative error spent at each interim look. The Conditional Power (CP) Chart shows
the probability of crossing the upper stopping boundary, given the most recent
information. Finally, the RCI (Repeated Confidence Interval) Chart displays repeated
confidence intervals (Jennison & Turnbull, 2000).

Repeat the input procedure from above with the second look after 221 patients
(Sample Size (Overall), Estimate of δ as 2, and Standard Error of Estimate of δ as
1. Click Recalc and OK to continue.
166

10.1 Difference of Means – 10.1.2 IM of the Orlistat trial

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
For the final look, make sure to tick the box Set Current Look as Last. Input the
following estimates: 331 patients (Sample Size (Overall), with an Estimate of δ as 3,
and Standard Error of Estimate of δ as 1. Click Recalc and OK to continue.
The upper boundary has been crossed. The dashboard will be updated, and the Final
Inference table shows the final outputs. For example, the adjusted p-value is 0.017,
consistent with the rejection of the null.

10.1.3

Trial Design Using a t-Test (Single Look)

In Section 10.1.1 the sample size obtained to correctly power the trial relied on
asymptotic approximation for the distribution of a Wald-type statistic. In the single
look setting this statistic is
δ̂
Z=q
,
(10.7)
var[δ̂]
with
var[δ̂] =

σ̂ 2
.
nr(1 − r)

(10.8)

In a small single-look trial a more accurate representation of the distribution of Z is
obtained by using Student’s t-distribution with (n − 1) degrees of freedom.
Consider the Orlistat trial described in Section 10.1.1 where we would like to test the
null hypothesis that treatment does not lead to weight loss, H0 : δ = 0, against the
alternative hypothesis that the treatment does result in a loss of weight, H1 : δ > 0. We
will now design this same trial in a different manner, using the t-distribution for the test
statistic. Start East afresh. Click Continuous: Two Samples on the Design tab, and
then click Parallel Design: Difference of Means. Enter the following design
parameters so that the dialog box appears as shown. Remember to select a 1-Sided
for Trial Type, and enter an Allocation Ratio of 3. These values are the same as those

10.1 Difference of Means – 10.1.3 t-Test Design

167

<<< Contents

10

* Index >>>

Normal Superiority Two-Sample
from Des1, except that under Dist. of Test Stat., select t. Then click Compute.

We observe that the required sample size for this study is 327 patients. Contrast this to
the 325 patients obtained using the normal distribution in Section 10.1.1.

168

10.1 Difference of Means – 10.1.3 t-Test Design

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016

10.2

Ratio of Means for
Independent Data
(Superiority)

Let σt and σc denote the standard deviations of the treatment and control group
responses respectively. It is assumed that the coefficient of variation (CV), defined as
σt = σc .
the ratio of the standard deviation to the mean, is the same for both groups: µ
µc
t
µt
Finally let ρ = µc . For a Superiority trial, the null hypothesis H0 : ρ = ρ0 is tested
against the two-sided alternative hypothesis H1 : ρ 6= ρ0 or a one-sided alternative
hypothesis H1 : ρ < ρ0 or H1 : ρ > ρ0 .
First, click Continuous: Two Samples on the Design tab, and then click Parallel
Design: Ratio of Means.

Suppose that we wish to determine the sample size required for a one sided test to
achieve a type-1 error of .05, and power of 90%, to detect a ratio of means of 1.25. We
also need to specify the CV = 0.25. Enter the appropriate design parameters so that the
input dialog box appears as below, and click Compute.

10.2 Ratio of Means for Independent Data (Superiority)

169

<<< Contents

10

* Index >>>

Normal Superiority Two-Sample

The computed sample size (42 subjects) is highlighted in yellow.

170

10.2 Ratio of Means for Independent Data (Superiority)

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016

10.3

Difference of Means
for Crossover Data
(Superiority)

In a crossover trial, each experimental subject receives two or more different
treatments. The order in which each subject receives the treatments depends on the
particular design chosen for the trial. The simplest design is a 2×2 crossover trial,
where each subject receives two treatments, say A and B. Half of the subjects receive
A first and then, after a suitably chosen period of time, crossover to B. The other half
receive B first and then crossover to A.
The null and alternative hypotheses are the same as for a two sample test for difference
of means for independent data. However, a key advantage of the crossover design is
that each subject serves as his/her own control. The test statistic also needs to account
for not only treatment effects, but period and carryover effects.
We will demonstrate this design for a Superiority trial. First, click Continuous: Two
Samples on the Design tab, and then click Crossover Design: Difference of Means.

Suppose that we wish to determine the sample size required to achieve a type-1 error
of .05, and power of 90%, to detect a difference of means of 75 with standard deviation
of the difference of 150. Enter the appropriate design parameters so that the input

10.3 Difference of Means for Crossover Data (Superiority)

171

<<< Contents

10

* Index >>>

Normal Superiority Two-Sample
dialog box appears as below, and click Compute.

The computed sample size (45 subjects) is highlighted in yellow.

172

10.3 Difference of Means for Crossover Data (Superiority)

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016

10.4

Ratio of Means
for Crossover Data
(Superiority)

We will demonstrate this design for a Superiority trial. The null hypothesis
H0 : ρ = ρ0 is tested against the two-sided alternative hypothesis H1 : ρ 6= ρ0 or a
one-sided alternative hypothesis H1 : ρ < ρ0 or H1 : ρ > ρ0 . First, click Continuous:
Two Samples on the Design tab, and then click Crossover Design: Ratio of Means.

Suppose that we wish to determine the sample size required for a one sided test to
achieve a type-1 error of .05, and power of 80%, to detect a ratio of means of 1.25 with
square root of MSE of 0.3. Enter the appropriate design parameters so that the input
dialog box appears as below, and click Compute.

The computed sample size (24 subjects) is highlighted in yellow.
10.4 Ratio of Means for Crossover Data (Superiority)

173

<<< Contents

10
10.5

* Index >>>

Normal Superiority Two-Sample
Assurance
(Probability of
Success)

Assurance, or probability of success, is a Bayesian version of power, which
corresponds to the (unconditional) probability that the trial will yield a statistically
significant result. Specifically, it is the prior expectation of the power, averaged over a
prior distribution for the unknown treatment effect (see O’Hagan et al., 2005). For a
given design, East allows you to specify a prior distribution, for which the assurance or
probability of success will be computed.
Select Des2 in the Library, and click
on the Library toolbar. Alternatively,
recompute this design with the following inputs: A 3-look design with
Lan-Demets(OF) efficacy only boundary, Superiority Trial, 1-sided, 0.05 type-1 error,
90% power, allocation ratio = 3, mean control = 6, mean treatment = 9, and standard
deviation = 8.

Select the Assurance checkbox in the Input window.
Suppose that we wish to specify a Normal prior distribution for the treatment effect δ,
with a mean of 3, and standard deviation of 2. Thus, rather than assuming δ = 3 with
certainty, we use this prior distribution to reflect the uncertainty about the true
treatment effect.
In the Distribution list, click Normal, and in the Input Method list, click E(δ) and
SD(δ).
Type 3 in the E(δ) box, and type 2 in the SD(δ) box, and then click Compute.
174

10.5 Assurance (Probability of Success)

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
The computed probability of success (0.72) is shown below. Note that for this prior,
assurance is less than the specified power (0.9); incorporating the uncertainty about δ
has yielded a less optimistic estimate of power.

In the Output Preview, right-click the row corresponding to this design, and rename the
design ID as Bayes1, and save it to the Library.
Return to the input window. Type 0.001 in the SD(δ) box, and click Compute. Such a
prior approximates the non-Bayesian power calculation, where one specifies a fixed
treatment effect.
As shown below, such a prior yields a probability of success that is similar to the
specified power.

East also allows you to specify an arbitrary prior distribution through a CSV file. In the
Distribution list, click User Specified, and then click Browse... to select the CSV file
where you have constructed a prior.

10.5 Assurance (Probability of Success)

175

<<< Contents

10

* Index >>>

Normal Superiority Two-Sample

The CSV file should contain two columns, where the first column lists the grid points
for the parameter of interest (in this case, δ), and the second column lists the prior
probability assigned to each grid point. For example, we consider a 5-point prior with
probability = 0.2 at each point. The prior probabilities can be entered as weights that
do not sum to one, in which case East will re-normalize for you.

Once the CSV filename and path has been specified, click Compute to calculate the
assurance, which will be displayed in the box below:

As stated in O’Hagan et al. (2005, p.190): “Assurance is a key input to
decision-making during drug development and provides a reality check on other
methods of trial design.” Indeed, it is not uncommon for assurance to be much lower
than the specified power. The interested reader is encouraged to refer to O’Hagan et al.
for further applications and discussions on this important concept.

10.6

176

Predictive Power
and Bayesian
Predictive Power

Similar Bayesian ideas can be applied to conditional power for interim monitoring.
Rather than calculating conditional power for a single assumed value of the treatment
effect, δ, such as at δ̂, we may account for the uncertainty about δ by taking a weighted
average of conditional powers, weighted by the posterior distribution for δ. For normal
10.6 Predictive Power and Bayesian Predictive Power

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
endpoints, East assumes a posterior distribution for δ that results from a diffuse prior
distribution, which produces an average power called the predictive power (Lan, Hu,
& Proschan, 2009). In addition, if the user specified a normal prior distribution at the
design stage to calculate assurance, then East will also calculate the average power,
called Bayesian predictive power, for the corresponding posterior. We will
demonstrate these calculations for the design renamed as Bayes1 earlier.
In the Library, right-click Bayes1 and click Interim Monitoring, then click
(Show/Hide Columns) in the toolbar of the IM Dashboard.

In the Show/Hide Columns window, make sure to show the columns for: CP
(Conditional Power), Predictive Power, Bayes Predictive Power, Posterior Distribution
of δ Mean, and Posterior Distribution of δ SD, and click OK. The following columns
will be displayed in the main grid of the IM Dashboard.

Assume that we observed interim data after 110 patients, with an estimate of δ = 1,
and a standard error of the estimate = 0.7. Enter these values in the Test Statistic
Calculator by clicking Enter Interim Data, and click OK.
10.6 Predictive Power and Bayesian Predictive Power

177

<<< Contents

10

* Index >>>

Normal Superiority Two-Sample
The IM Dashboard will be updated. In particular, notice the differing values for CP
and the Bayesian measures of power.

178

10.6 Predictive Power and Bayesian Predictive Power

<<< Contents

* Index >>>

11

Nonparametric Superiority Two
Sample

The Wilcoxon-Mann-Whitney nonparametric test is a commonly used test for the
comparison of two distributions when the observations cannot be assumed to come
from normal distributions. It is used when the distributions differ only in a location
parameter and is especially useful when the distributions are not symmetric. For
Wilcoxon-Mann-Whitney test, East supports single look superiority designs only.

11.1

Wilcoxon-MannWhitney Test

Let X1 , . . . , Xnt be the nt observations from the treatment (T ) with distribution
function Ft and Y1 , . . . , Ync be the nc observations from the control (C) with
distribution function Fc . Ft and Fc are assumed to be continuous with corresponding
densities ft and fc , respectively. The primary objective in Wilcoxon-Mann-Whitney
test is to investigate whether there is a shift of location, which indicates the presence of
the treatment effect. Let θ represents the treatment effect. Then we test the null
hypothesis H0 : θ = 0 against the two-sided alternative H1 : θ 6= 0 or a one-sided
alternative hypothesis H1 : θ < 0 or H
Let U denote the number of pairs
P1 :ncθ >
P0.
nt
(Xi , Yj ) such that Xi < Yj , so U = i=1
j=1 I(Xi , Yj ) where I(a, b) = 1 if a < b
and I(a, b) = 0 if a ≥ b. Then U/nc nt is a consistent estimator of
Z

∞

p = P (X < Y ) =

Z
Ft (y) fc (y) dy =

−∞

1

Ft [Fc−1 (u)] du.

(11.1)

0

The power is approximated using the asymptotic normality of U and depends on the
value of p, and thus depends on Fc and Ft . In order to find the power for a given
sample size or to find the sample size for a given power, we must specify p. However,
this is often a difficult task. If we are willing to specify Fc and Ft , then p can be
computed. East computes p assuming that Fc and Ft are normal distributions with
means µc and µt and a common standard deviation σ, by specifying the values of the
difference in the means and the standard deviation. With this assumption,
equation (11.1) results in


µt − µc
√
(11.2)
p=Φ
2σ
Using the results of Noether (1987), with nt = rN , the total sample size for an α level
two-sided test to have power 1 − β for a specified value of p is approximated by
N=

(zα/2 + zβ )2
.
12r(1 − r)(p − .5)2

11.1 Wilcoxon-Mann-Whitney Test

179

<<< Contents

11
11.2

* Index >>>

Nonparametric Superiority Two Sample
Example: Designing
a single look
superiority study

Based on a pilot study of an anti-seizure medication, we want to design a 12-month
placebo-controlled study of a treatment for epilepsy in children. The primary efficacy
variable is the percent change from baseline in the number of seizures in a 28-day
period. The mean percent decrease was 2 for the control and 8 for the new treatment,
with an estimated standard deviation of 25. We plan to design the study to test the null
hypothesis H0 :θ = 0 against H1 :θ 6= 0. We want to design a study that would have
90% power at µc = 2 and µt = 8 under H1 and maintains type I error at 5%.

11.2.1

Designing the study

Click Continuous: Two Samples on the Design tab and then click Parallel Design:
Wilcoxon-Mann-Whitney.

This will launch a new window. The upper pane of this window displays several fields
with default values. Select 2-Sided for Test Type and enter 0.05 for Type I Error.
Select Individual Means for Input Method and then specify Mean Control
(µc ) as 2 and Mean Treatment (µt ) as 8. Specify Std. Deviation as 25. Click
Compute. The upper pane now should appear as below:

180

11.2 Designing a single look study – 11.2.1 Designing the study

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016

The required sample size for this design is shown as a row in the Output Preview,
located in the lower pane of this window. The computed total sample size (772
subjects) is highlighted in yellow.

This design has default name Des 1 and results in a total sample size of 772 subjects in
order to achieve 90% power. The probability displayed in the row is 0.567, which
indicates the approximate probability P [X < Y ] assuming X ∼ N (8, 252 ) and
Y ∼ N (2, 252 ). This is in accordance with the equation 11.2.
Select this design by clicking anywhere along the row in the Output Preview and
click

in the Output Preview toolbar. Some of the design details will be

11.2 Designing a single look study – 11.2.1 Designing the study

181

<<< Contents

11

* Index >>>

Nonparametric Superiority Two Sample
displayed in the upper pane, labeled as Output Summary.

In the Output Preview toolbar, click
to save this design to Wbk1 in the
Library. Double-click Des 1 in the Library to see the details of the design.

According to this summary, the study needs a total of 772 subjects. Of these 772
subjects, 386 will be allocated to the treatment group and remaining 386 will be
allocated to the control group.
Since the sample size is inversely proportional to (p − .5)2 , it is sensitive to
mis-specification of p (see equation (11.1)). The results of the pilot study included
several subjects who worsened over the baseline and thus the difference in the means
might not be an appropriate approach to determining p. To obtain a more appropriate
value of p, we have several alternative approaches. We can further examine the results
182

11.2 Designing a single look study – 11.2.1 Designing the study

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
of the pilot study after exclusion of some of the extreme values, which will decrease
the standard deviation and provide a difference in the means, which may be a more
reasonable measure of the difference between the distributions. The difference in the
medians may be a more reasonable measure of the difference between the
distributions, especially when used with a decreased standard deviation.
The median percent decrease was 10 for the control and 18 for the new treatment, with
an estimated standard deviation of 25. Create a new design by selecting Des 1 in the
Library, and clicking
on the Library toolbar. In the Input, change the Mean
Control (µc ) and Mean Treatment (µt ) to 10 and 18, respectively.

Click Compute to generate output for Des 2. To compare Des 1 and Des 2, select both
rows in Output Preview using the Ctrl key, and click
icon in the Output
Preview toolbar. Both designs will be displayed in the Output Summary pane.

The sample size required for Des 2 is only 438 subjects as compared to 772 subjects in
Des 1. Now we consider decreasing the standard deviation to 20 to lessen the impact of
the extreme values. Select Des 2 in the Output Preview, and click
11.2 Designing a single look study – 11.2.1 Designing the study

icon in the
183

<<< Contents

11

* Index >>>

Nonparametric Superiority Two Sample
toolbar. In the Input, change the Std. Deviation to 20. Click Compute to generate
output for this design. Select all the rows in Output Preview and click
in the
Output Preview toolbar to see them in the Output Summary pane. This design
results in a total sample size of 283 subjects in order to attain 90% power.

184

11.2 Designing a single look study

<<< Contents

* Index >>>

12

Normal Non-inferiority Two-Sample

In a noninferiority trial, the goal is to establish that an experimental treatment is no
worse than the standard treatment, rather than attempting to establish that it is superior.
A therapy that is demonstrated to be non-inferior to the current standard therapy for a
particular indication might be an acceptable alternative if, for instance, it is easier to
administer, cheaper, or less toxic. Non-inferiority trials are designed by specifying a
non-inferiority margin. The amount by which the mean response on the experimental
arm is worse than the mean response on the control arm must fall within this margin in
order for the claim of non-inferiority to be sustained. In this chapter, we show how
East supports the design and interim monitoring of such experiments, with a normal
endpoint.

12.1

Difference of Means

12.1.1
12.1.2
12.1.3
12.1.4
12.1.5

Trial design
Three-Look Design
Simulation
Interim Monitoring
Trial Design Using
a t-Test (Single
Look)

12.1.1

Trial design

Consider the design of an antihypertension study comparing an ACE inhibitor to a
new AII inhibitor. Let µc be the mean value of a decrease in systolic blood pressure
level (in mmHg) for patients in the ACE inhibitor (control) group and µt be the mean
value of a decrease in blood pressure level for patients in the AII inhibitor (treatment)
group. Let δ = µt − µc be the treatment difference. We want to demonstrate that the
AII inhibitor is non-inferior to the ACE inhibitor. For this example, we will consider a
non-inferiority margin equal to one-third of the mean response in control group. From
historical data, µc = 9 mmHg and therefore the non-inferiority margin is 3 mmHg.
Accordingly we will design the study to test the null hypothesis of inferiority
H0 : δ ≥ −3, against the one sided non-inferiority alternative H1 : δ < −3. The test is
to be conducted at a significance level (α) of 0.025 and is required to have 90% power
at δ = 0. We assume that σ 2 , the variance of the patient response, is the same for both
groups and is equal to 100.
Start East afresh. Click Continuous: Two Samples on the Design tab and then click
Parallel Design: Difference of Means.
Single-look design
In the input window, select Noninferiority for Design Type. The effect size can
be specified in one of three ways by selecting different options for Input Method: (1)
individual means and common standard deviation, (2) difference of means and
common standard deviation, or (3) standardized difference of means. We will use the
Individual Means method. Select Individual Means for Input Method, specify
the Mean Control (µc ) as 9 and Noninferiority margin (δ0 ) as −3 and specify the
12.1 Difference of Means – 12.1.1 Trial design

185

<<< Contents

12

* Index >>>

Normal Non-inferiority Two-Sample
Std. Deviation (σ) as 10. Specify 0 for Difference in Means (δ1 ). The upper pane
should appear as below:

Click Compute. This will calculate the sample size for this design, and the output is
shown as a row in the Output Preview located in the lower pane of this window. The
computed sample size (467 subjects) is highlighted in yellow.

This design has default name Des 1. Select this design by clicking anywhere along the
row in the Output Preview and click

. In the Output Preview toolbar, click

to save this design to Wbk1 in the Library. If you hover the cursor over Des 1
in the Library, a tooltip will appear that summarizes the input parameters of the
design.
With Des 1 selected in the Library, click
on the Library toolbar, and then
click Power vs Treatment Effect (δ). The resulting power curve for this design will

186

12.1 Difference of Means – 12.1.1 Trial design

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
appear.

You can save this chart to the Library by clicking Save in Workbook. In addition, you
can export the chart in one of several image formats (e.g., Bitmap or JPEG) by clicking
Save As.... For now, you may close the chart before continuing.

12.1.2

Three-Look Design

Create a new design by selecting Des 1 in the Library, and clicking
on the
Library toolbar. In the Input, change the Number of Looks from 1 to 3, to generate a
study with two interim looks and a final analysis. A new tab for Boundary Info should
appear. Click this tab to reveal the stopping boundary parameters. By default, the
Spacing of Looks is set to Equal, which means that the interim analyses will be
equally spaced in terms of the number of patients accrued between looks. The left side
contains details for the Efficacy boundary, and the right side contains details for the
Futility boundary. By default, there is an efficacy boundary (to reject H0 ) selected, but
no futility boundary (to reject H1 ). The Boundary Family specified is of the
Spending Functions type. The default Spending function is the Lan-DeMets
(Lan & DeMets, 1983), with Parameter OF (O’Brien-Fleming), which generates
boundaries that are very similar, though not identical, to the classical stopping

12.1 Difference of Means – 12.1.2 Three-Look Design

187

<<< Contents

12

* Index >>>

Normal Non-inferiority Two-Sample
boundaries of O’Brien and Fleming (1979).

Click Compute to generate output for Des 2. Save this design in the current workbook
by selecting the corresponding row in Output Preview and clicking
. To
compare Des 1 and Des 2, select both rows in the Output Preview using the Ctrl key
and click

188

. Both designs will be displayed in the Output Summary.

12.1 Difference of Means – 12.1.2 Three-Look Design

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
The maximum sample size with Des 2 is 473, which is only a slight increase over the
fixed sample size in Des 1. However, the expected sample size with Des 2 is 379
patients under H1 , a saving of almost 100 patients. In order to see the stopping
probabilities, double-click Des 2 in the Library.

The clear advantage of this sequential design resides in the high probability of
stopping by the second look, if the alternative is true, with a sample size of 315
patients, which is well below the requirements for a fixed sample study (467 patients).
Close the Output window before continuing.
Examining stopping boundaries and spending functions
You can plot the boundary values of Des 2 by clicking
on the Library toolbar,
and then clicking Stopping Boundaries. The following chart will appear:

12.1 Difference of Means – 12.1.2 Three-Look Design

189

<<< Contents

12

* Index >>>

Normal Non-inferiority Two-Sample
You can choose a different Boundary Scale from the corresponding drop down box.
The available boundary scales include: Z scale, Score Scale, δ Scale, δ/σ Scale and
p-value scale. To plot the error spending function for Des 2, select Des 2 in the
in the toolbar, and then click Error Spending. The
Library, click the
following chart will appear:

The above spending function is according to Lan and DeMets (1983) with
O’Brien-Fleming flavor, and for one-sided tests has the following functional form:


Zα/2
α(t) = 2 − 2Φ √
t

Observe that very little of the total type-1 error is spent early on, but more is spent
rapidly as the information fraction increases, and reaches 0.025 at an information
fraction of 1.
Feel free to explore other plots by clicking the
icon in the Library toolbar.
Close all charts before continuing. To obtain the tables used to generate these plots,
click the
icon.
Select Des 2 in the Library, and click
on the Library toolbar. In the
Boundary Info tab, change the Boundary Family from Spending Functions to
190

12.1 Difference of Means – 12.1.2 Three-Look Design

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
Wang-Tsiatis. The Wang-Tsiatis (1989) power boundaries are of the form
c(tj ) = C(∆, α, K)t∆
j
for j = 1, 2, · · · , K, where ∆ is a shape parameter that characterizes the boundary
shape and C(∆, α, K) is a positive constant. The choice ∆ = 0 will yield the classic
O’Brien-Fleming stopping boundary, whereas the ∆ = 0.5 will yield the classic
Pocock stopping boundary. Other choices of parameters in the range -0.5 to 0.5 are
also permitted. Accept the default parameter 0 and click Compute to obtain the
sample size.

A new row will be added to the Output Preview with design name as Des 3.
Select all three rows in Output Preview using the Ctrl key and click
designs will be displayed in the Output Summary.

12.1 Difference of Means – 12.1.2 Three-Look Design

. All three

191

<<< Contents

12

* Index >>>

Normal Non-inferiority Two-Sample
Note that the total sample size and the expected sample size under H1 for Des 3 are
close to those for Des 2. This is expected because the Wang-Tsiatis power family with
shape parameter 0 yields the classic O’Brien-Fleming stopping boundaries. Save this
design in the current workbook by selecting the corresponding row in Output Preview
and clicking

on the Output Preview toolbar.

Select Des 2 in the Library, and click the
on the Library toolbar. In the
Boundary Info tab, change the Spending Function from Lan-DeMets to Rho
Family. The Rho spending function was first published by Kim and DeMets (1987)
and was generalized by Jennison and Turnbull (2000). It has following functional
form:
α(t) = αtρ
ρ>0
When ρ = 1, the corresponding stopping boundaries resemble the Pocock stopping
boundaries. When ρ = 3, the boundaries resemble the O’Brien-Fleming boundaries.
Larger value of ρ yield increasingly conservative boundaries.
Specify parameter (ρ) as 2, and click Compute

A new row will be added to the Output Preview with design name as Des 4. Select all
four rows in Output Preview using the Ctrl key and click

192

12.1 Difference of Means – 12.1.2 Three-Look Design

. All the designs will

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
be displayed in the Output Summary.

Observe that Des 4 requires a total sample size of 14 more subjects than Des 2. The
expected sample size under H1 for Des 4 is only 351 patients, compared to 379
patients for Des 2 and 467 patients for Des 1. Save Des 4 to the Library by selecting
the corresponding row in the Output Preview and clicking

12.1.3

.

Simulation

Select Des 4 in the Library, and click
in the toolbar. Alternatively, right-click
on Des 4 and select Simulate. A new window for simulation will appear. Click on the
Response Generation Info tab, and specify: Mean control = 9; Mean Treatment =
9; SD Control = 10.

Click Simulate. Once the simulation run has completed, East will add an additional
12.1 Difference of Means – 12.1.3 Simulation

193

<<< Contents

12

* Index >>>

Normal Non-inferiority Two-Sample
row to the Output Preview labeled as Sim 1.
Select Sim 1 in the Output Preview and click
. Double-click Sim 1 in the
Library. The simulation output details will be displayed.

The upper efficacy stopping boundary was crossed around 90% of times, out of 10,000
simulated trials, which is consistent with the power of 90%. The exact result of the
simulations may differ slightly, depending on the seed.

12.1.4

Interim Monitoring

Select Des 4 in the Library, and click

194

from the Library toolbar. Alternatively,

12.1 Difference of Means – 12.1.4 Interim Monitoring

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
right-click on Des 4 and select Interim Monitoring.

The interim monitoring dashboard contains various controls for monitoring the trial,
and is divided into two sections. The top section contains several columns for
displaying output values based on the interim inputs. The bottom section contains four
charts, each with a corresponding table to its right. These charts provide graphical and
numerical descriptions of the progress of the clinical trial and are useful tools for
decision making by a data monitoring committee.
Although the study has been designed assuming three equally spaced analyses,
departures from this strategy are permissible using the spending function methodology
of Lan and DeMets (1983) and its extension to boundaries for early stopping in favor
of H0 proposed by Pampallona, Tsiatis and Kim (2001). At each interim analysis time
point, East will determine the amount of type-1 error probability and type-2 error
probability that it is permitted to spend based on the chosen spending functions
specified in the design. East will then re-compute the corresponding stopping
boundaries. This strategy ensures that the overall type-1 error does not exceed the
nominal significance level α.
12.1 Difference of Means – 12.1.4 Interim Monitoring

195

<<< Contents

12

* Index >>>

Normal Non-inferiority Two-Sample
Let us take the first look after accruing 200 subjects. The test statistic at look j for
testing non-inferiority is given by
Zj =

δ̂j − δ0
SE(δ̂j )

where δ̂j and δ0 indicate estimated treatment difference and the non-inferiority margin,
respectively. SE denotes the standard error. Suppose we have observed δ̂j = 2.3033
and SE(δ̂j ) = 2.12132. With δ0 = −3, the value of test statistic at first look would be
Z1 = (2.3033 + 3)/2.12132 or 2.5.
To pass these values to East, click Enter Interim Data to open the Test Statistic
Calculator. Enter the following values: 200 for Cumulative Sample Size, 2.3033 as
Estimate of δ and 2.12132 as Standard Error of Estimate of δ. Click Recalc, and
thenOK.

The value of test statistic is 2.498, which is very close to the stopping boundary 2.634.
The lower bound of 97.5% repeated confidence interval (RCI) for δ is -3.29.

196

12.1 Difference of Means – 12.1.4 Interim Monitoring

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016

Click the
dashboard.

icon in the Conditional Power chart located in lower part of the

The conditional power at the current effect size 2.303 is over 99.3%.
Suppose we take the next interim look after accruing 350 subjects. Enter 350 for
Cumulative Sample Size, 2.3033 for Estimate of δ and 1.71047 for Standard Error
of Estimate of δ. Click Recalc and OK to update the charts and tables in the
dashboard.

12.1 Difference of Means – 12.1.4 Interim Monitoring

197

<<< Contents

12

* Index >>>

Normal Non-inferiority Two-Sample
Now the stopping boundary is crossed and the following dialog box appears.

Click Stop. The dashboard will now include the following table.

The adjusted confidence interval and p-value are calculated according to the approach
proposed by Tsiatis, Rosner and Mehta (1984) and later extension by Kim and DeMets
(1987). The basic idea here is to search for the confidence bounds such that the p-value
under the alternative hypothesis just becomes statistically significant.

12.1.5

Trial Design Using a t-Test (Single Look)

In Section 12.1 the sample size is obtained based on asymptotic approximation of the
distribution of the test statistics
δ̂ − δ
q 0
var[δ̂]
If the study under consideration is small, the above asymptotic approximation of the
distribution may be poor. Using the student’s t-distribution with (n − 1) degrees of
freedom, we may better size the trial to have appropriate power to reject the H0 . In
East, this can be done through specifying distribution of test statistic as t. We shall
illustrate this by designing the study described in Section 12.1 that aims to
demonstrate that the AII inhibitor is non-inferior to the ACE inhibitor.
198

12.1 Difference of Means – 12.1.5 Trial Design Using a t-Test (Single Look)

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016

Select Des 1 from the Library. Click
from the toolbar. Change the Test
Statistic from Z to t. The entries for the other fields need not be changed. Click
Compute. East will add an additional row to the Output Preview labeled as Des 5.
The required sample size is 469. Select the rows corresponding to Des 1 and Des 5 and
. This will display both designs in the Output Summary.

Des 5, which used the t distribution, requires us to commit a combined total of 469
patients to the study, up from 467 in Des 1, which used the normal distribution. The
extra patients are needed to compensate for the extra variability due to estimation of
the var[δ̂].

12.2

Ratio of Means

12.2.1 Trial design
12.2.2 Designing the study
12.2.3 Simulation

Let µt and µc denote the means of the observations from the experimental treatment
(T) and the control treatment (C), respectively, and let σt2 and σc2 denote the
corresponding variances of the observations. It is assumed that σt /µt = σc /µc , i.e. the
coefficient of variation CV = σ/µ is the same for t and c. Finally, let ρ = µt /µc .
For a non-inferiority trial with ratio of means we define the null hypothesis as
H0 : ρ ≤ ρ0 if ρ0 < 1
H0 : ρ ≥ ρ0 if ρ0 > 1
where ρ0 denotes the noninferiority margin. Consider the case when ρ0 < 1. Now
define δ = ln(ρ) = ln(µt ) − ln(µc ), so the null hypothesis becomes H0 : δ ≤ δ0 where
δ0 = ln(ρ0 ).
Since we can translate the ratio hypothesis into a difference hypothesis, we can
perform the test for difference as discussed in section 12.1 on log-transformed data.
12.2 Ratio of Means

199

<<< Contents

12

* Index >>>

Normal Non-inferiority Two-Sample
Here, we need the standard deviation of log transformed data. If we are provided with
the coefficient of variation (CV) instead,qthe standard deviation of log transformed data
can be obtained using the relation sd =

12.2.1

ln (1 + CV2 ).

Trial design

For illustration, we consider the example cited by Laster and Johnson (2003): A
randomized clinical study of a new anti-hypertensive therapy known to produce fewer
side-effects than a standard therapy but expected to be almost 95% effective
(ρ1 = 0.95). To accept the new therapy, clinicians want a high degree of assurance that
it is at least 80% as effective in lowering blood pressure as the standard agent.
Accordingly we plan to design the study to test:

H0 : µt /µc ≤ 0.8
against
H1 : µt /µc > 0.8
Reductions in seated diastolic blood pressure are expected to average 10 mmHg (= µc )
with standard therapy with standard deviation as 7.5 mmHg (= σc ). Therefore, CV in
the standard therapy is 7.5/10 = 0.75. We also assume that CV in both therapies are
equal. We need to design a study that would have 90% power at ρ1 = 0.95 under H1
and maintains one-sided type I error at 5%.

12.2.2

Designing the study

Start East afresh. Click Continuous: Two Samples, under the Design tab, and then
click Parallel Design: Ratio of Means.
In the input window, select Noninferiority for Design Type. Select
Individual Means for Input Method and then specify the Mean Control (µc ) as
10, Noninferiority Margin (ρ0 ) as 0.8 and Ratio of Means (ρ1 ) as 0.95. Specify 0.75

200

12.2 Ratio of Means – 12.2.2 Designing the study

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
value for Coeff. Var.. The upper pane should appear as below:

Click Compute. This will calculate the sample size for this design, and the output is
shown as a row in the Output Preview located in the lower pane of this window. The
computed total sample size (636 subjects) is highlighted in yellow.

This design has default name Des 1. Select this design by clicking anywhere along the
row in the Output Preview and click

. Some of the design details will be

12.2 Ratio of Means – 12.2.2 Designing the study

201

<<< Contents

12

* Index >>>

Normal Non-inferiority Two-Sample
displayed in the Output Summary.

In the Output Preview toolbar, click
to save this design to Wbk1 in the
Library. Double-click on Des 1 in the Library to see the details of the design.

202

12.2 Ratio of Means – 12.2.2 Designing the study

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
Unequal allocation ratio
Since the profile of standard therapy is well established and comparatively little is
known about the new therapy, you want to put more subjects on the new therapy. You
can do this by specifying allocation ratio greater than 1. Suppose you want 50% more
subjects on new therapy compared to standard one. Then we need to specify allocation
ratio (nt /nc ) as 1.5.
Create a new design by selecting Des 1 in the Output Preview, and clicking
on the Output toolbar. In the Input, change the Allocation Ratio from 1 to 1.5. Click
Compute to obtain sample size for this design. A new row will be added labeled as
Des 2.
Save this design in the current workbook by selecting the corresponding row in
Output Preview and clicking

on the Output Preview toolbar. Select both

rows in Output Preview using the Ctrl key and click

.

t distribution test statistic
Create a new design by selecting Des 2 in the Output, and clicking
on the
Output toolbar. In the Input, change the Test Statistic from Z to t. Click Compute to
obtain sample size for this design. A new row will be added labeled as Des 3.

12.2 Ratio of Means – 12.2.2 Designing the study

203

<<< Contents

12

* Index >>>

Normal Non-inferiority Two-Sample
A sample size of 664 will be needed, which is very close to the sample size 662
obtained in Des 2 under the normal distribution.
Plotting
With Des 2 selected in the Library, click
on the Library toolbar, and then
click Power vs Sample Size . The resulting power curve for this design will appear.

You can export this chart in one of several image formats (e.g., Bitmap or JPEG) by
clicking Save As.... Feel free to explore other plots as well. Once you have finished,
close all charts before continuing.

204

12.2 Ratio of Means – 12.2.2 Designing the study

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016

12.2.3

Simulation

Select Des 2 in the Library, and click
in the toolbar. Alternatively, right-click
on Des 2 and select Simulate. A new Simulation window will appear. Click on the
Response Generation Info tab, and specify: Mean control = 10; Mean Treatment =
9.5; CV of Data Control = 0.75.

Click Simulate. Once the simulation run has completed, East will add an additional
row to the Output Preview labeled as Sim 1.
Select Sim 1 in the Output Preview and click
. Double-click on Sim 1 in the
Library. The simulation output details will be displayed.

Out of 10,000 simulations, close to 90% are rejected for non-inferiority. Therefore, the
simulation result verifies that the design attains 90% power. The simulation result
might vary depending on the starting seed value chosen.

12.2 Ratio of Means – 12.2.3 Simulation

205

<<< Contents

12
12.3

* Index >>>

Normal Non-inferiority Two-Sample
Difference of
Means in Crossover
Designs

12.3.1 Trial Design

In a 2 × 2 crossover design each subject is randomized to one of two sequence groups.
Subjects in sequence group 1 receive the test drug (T) formulation in a first period,
have their outcome variable, X recorded, wait out a washout period to ensure that the
drug is cleared from their system, then receive the control drug formulation (C) in
period 2 and finally have the measurement on X again. In sequence group 2, the order
in which the T and C are assigned is reversed. The table below summarizes this type of
trial design.
Group
1(TC)
2(CT)

Period 1
Test
Control

Washout
—
—

Period 2
Control
Test

The resulting data are commonly analyzed using a linear model. The response yijk in
period j on subject k in sequence group i, where i = 1, 2, j = 1, 2, and k = 1, . . . , ni
is modeled as a linear function of an overall mean response µ, formulation effect τt
and τc , period effects π1 and π2 , and sequence effects γ1 and γ2 . The fixed effects
model can be displayed as:
Group
1(TC)
2(CT)

Period 1
µ + τt + π1 + γ1
µ + τc + π1 + γ2

Washout
—
—

Period 2
µ + τc + π2 + γ1
µ + τt + π2 + γ2

Let µt = µ + τt and µc = µ + τc denote the means of the observations from the test
and control formulations, respectively, and let M SE denote the mean-squared error.
In a noninferiority trial, we test H0 : δ ≤ δ0 against H0 : δ > δ0 if δ0 < 0 or
H0 : δ ≥ δ0 against H0 : δ < δ0 if δ0 > 0, where δ0 indicates the noninferiority
margin.
East uses following test statistic to test the above null hypothesis
TL =

(ȳ11 − ȳ12 − ȳ21 + ȳ22 )/2 − δ0
q
σ̂ 2 1
1
2 ( n1 + n2 )

where, ȳij is the mean of the observations from group i and period j and σ̂ 2 is the
estimate of error variance. Tτ is distributed with Student’s t distribution with
(n1 + n2 − 2) degrees of freedom.
206

12.3 Difference of Means in Crossover Designs – 12.3.1 Trial Design

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016

12.3.1

Trial Design

Consider a 2 × 2 crossover trial between a Test drug (T) and a Reference Drug (C)
where the noninferiority need to be established in terms of some selected treatment
response. Let µT and µc denote the mean of Test and Reference drugs, respectively.
Let δ = µt − µc be the difference in averages. The noninferiority margin were set at
-3. Accordingly we plan to design the study to test:
H0 : µt − µc ≤ −3
against
H1 : µt − µc > −3
For this study, we consider µc = 21.62 ng.h/mL and µt = 23.19 ng.h/mL under H1 .
Further we assume mean squared error (MSE) would be 2.5. We want to design a
study that would have 90% power at δ1 = 23.19 − 21.62 = 1.57 under H1 . We want
to perform this test at a one sided 0.025 level of significance.
Start East afresh. First, Continuous: Two Samples on the Design tab, and then click
Crossover Designs: Difference of Means.
In the input window, select Noninferiority for Design Type. Select
Individual Means for Input Method and then specify the Mean Control (µc ) as
21.62 and Mean Treatment (µt ) as 23.19. Enter the Type I Error (α) as 0.025.
Select Sqrt(MSE) from the drop-down list and enter as 2.5. Finally, enter
Noninferiority Margin (δ0 ) as −3. The upper pane should appear as below:

Click Compute. The sample size required for this design is highlighted in yellow.
Save this design in the current workbook by selecting the corresponding row in
12.3 Difference of Means in Crossover Designs – 12.3.1 Trial Design

207

<<< Contents

12

* Index >>>

Normal Non-inferiority Two-Sample
Output Preview and clicking
on the Output Preview toolbar. Double-lick
Des 1 in Library. This will display the design details. The sample size required for
Des 1 is only 9 to establish non-inferiority with 90% power.

12.4

Ratio of Means in
Crossover Designs
12.4.1

Trial Design

We consider the same anti-hypertensive therapy example discussed in section 12.2, but
this time we will assume that the data has come from a crossover design. We wish to
test the following hypotheses:
H0 : µt /µc ≤ 0.8
against
H1 : µt /µc > 0.8
We want the study to have at least 90% power at ρ1 = 0.95 and maintains one-sided
type I error at 5%. As before, we will consider CV = 0.75 for both treatment arms.
Start East afresh. First, click Continuous: Two Samples under the Design tab, and
then click Crossover Design: Ratio of Means.
In the input window, select Noninferiority for Design Type. Select
Individual Means for Input Method and then specify the Noninferiority
208

12.4 Ratio of Means in Crossover Designs – 12.4.1 Trial Design

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
Margin (ρ0 ) as 0.8, Mean Control (µc ) as 10, and Mean Treatment (µt ) as 9.5.
Using the relationship between CV (=0.75) and standard deviation of log-transformed
data mentioned in section 12.2, we have standard deviaton for log-transformed data as
0.45. Specify 0.45 for Sqrt. of MSE Log. The upper pane should appear as below:

Click Compute. The sample size required for this design is highlighted in yellow in
the Output Preview pane. Save this design in the current workbook by selecting the
corresponding row in Output Preview and clicking
toolbar. Select Des 1 in Library and click

on the Output Preview

. This will display the design details.

12.4 Ratio of Means in Crossover Designs – 12.4.1 Trial Design

209

<<< Contents

12

* Index >>>

Normal Non-inferiority Two-Sample
In general, a crossover design requires fewer subjects compared to its parallel design
counterpart, and may be preferred whenever it is feasible.

210

12.4 Ratio of Means in Crossover Designs

<<< Contents

* Index >>>

13

Normal Equivalence Two-Sample

In many cases, the goal of a clinical trial is neither superiority nor non-inferiority, but
equivalence. In Section 13.1, the problem of establishing the equivalence with respect
to the difference of the means of two normal distributions using a parallel-group design
is presented. The corresponding problem of establishing equivalence with respect to
the log ratio of means is presented in Section 13.2. For the crossover design, the
problem of establishing equivalence with respect to the difference of the means is
presented in Section 13.3, and with respect to the log ratio of means in Section 13.4.

13.1

Difference in Means

13.1.1 Trial design
13.1.2 Simulation

In some experimental situations, we want to show that the means of two normal
distributions are “close”. For example, a test formulation of a drug (T) and the control
(or reference) formulation of the same drug (C) are considered to be bioequivalent if
the rate and extent of absorption are similar. Let µt and µc denote the means of the
observations from the test and reference formulations, respectively, and let σ 2 denote
the common variance of the observations. The goal is to establish that
δL < µt − µc < δU , where δL and δU are a-priori specified values used to define
equivalence. The two one-sided tests (TOST) procedure of Schuirmann (1987) is
commonly used for this analysis, and is employed in this section for a parallel-group
study.
Let δ = µt − µc denote the true difference in the means. The null hypothesis
H0 : δ ≤ δL or δ ≥ δU is tested against the two-sided alternative hypothesis
H1 : δL < δ < δU at level α, using TOST. Here we perform the following two tests
together:
Test1: H0L : δ ≤ δL against H1L : δ > δL at level α
Test2: H0U : δ ≥ δU against H1U : δ < δU at level α
H0 is rejected in favor of H1 at level α if and only if both H0L and H0U are rejected.
Note that this is the same as rejecting H0 in favor of H1 at level α if the (1 − 2α) 100%
confidence interval for δ is completely contained within the interval (δL , δU ).
Let N be the total sample size and µ̂t and µ̂c denote the estimates of the means T and
C, respectively. Let δ̂ = µ̂t − µ̂c denote the estimated difference with standard error
se(δ̂)
We use the following two test statistics to apply Test1 and Test2, respectively:
TL =
13.1 Difference in Means

(δ̂ − δL )
se(δ̂)

, TU =

(δ̂ − δU )
se(δ̂)
211

<<< Contents

13

* Index >>>

Normal Equivalence Two-Sample

TL and TU are assumed to follow Student’s t-distribution with (N − 2) degrees of
freedom under H0L and H0U , respectively. H0L is rejected if TL ≥ t1−α,(N −2) , and
H0U is rejected if TU ≤ tα,(N −2) .
The null hypothesis H0 is rejected in favor of H1 if TL ≥ t1−α,(N −2) and
TU ≤ tα,(N −2) , or in terms of confidence intervals: Reject H0 in favor of H1 at level α
if
√
√
(13.1)
δL + t1−α,(N −2) 2σ̂/ N < δ̂ < δU + tα,(N −2) 2σ̂/ N .
We see that decision rule (13.1) is the same as rejecting H0 in favor of H1 if the
(1 − 2 α) 100% confidence interval for δ is entirely contained within interval (δL , δU ).
√
The above inequality (13.1) cannot hold if 4t1−α,(N −2) σ̂/ N ≥ (δU − δL ), in which
√
case H0 is not rejected in favor of H1 . Thus, we assume that 4t1−α,(N −2) σ̂/ N <
(δU − δL ). The impact of this assumption was examined by Bristol (1993a).
The power or sample size of such a trial design is determined for a specified value of δ,
denoted δ1 , for a single-look study only. The choice δ1 = 0, i.e. µt = µc , is common.
For a specified value of δ1 , the power is given by
Pr(Reject H0 ) = 1 − τν (tα,ν |∆1 ) + τν (−tα,ν |∆2 )

(13.2)

where ν = N − 2 and ∆1 and ∆2 are non-centrality parameters given by
∆1 = (δ1 − δL )/se(δ̂) and ∆2 = (δ1 − δU )/se(δ̂), respectively. tα,ν denotes the
upper α × 100% percentile from a Student’s t distribution with ν degrees of freedom.
τν (x|∆) denotes the distribution function of a non-central t distribution with ν degrees
of freedom and non-centrality parameter ∆, evaluated at x.
Since we don’t know the sample size N ahead of time, we cannot characterize the
bivariate t-distribution. Thus solving for sample size must be performed iteratively by
equating the formula (13.2) to the power 1 − β.

13.1.1

Trial design

Consider the situation where we need to establish equivalence between a test
formulation of capsules (T) with the marketed capsules (C). The response variable is
the change from baseline in total symptom score. Based on the studies conducted
during the development program, it is assumed that µc = 6.5. Based on this value,
equivalence limits were set as −δL = δU = 1.3(= 20%µc ). We assume that the
common standard deviation is σ = 2.2. We want to have 90% power at µt = µc .
212

13.1 Difference in Means – 13.1.1 Trial design

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
Start East afresh. Click Continuous: Two Samples on the Design tab and then click
Parallel Design: Difference of Means.

This will launch a new window. The upper pane of this window displays several fields
with default values. Select Equivalence for Design Type, and Individual
Means for Input Method. Enter 0.05 for Type I Error. Specify both Mean Control
(µc ) and Mean Treatment (µt ) as 6.5. We have assumed σ = 2.2. Enter this value for
Std. Deviation(σ). Also enter −1.3 for Lower Equivalence Limit (δL ) and 1.3 for
Upper Equivalence Limit (δU ). The upper pane should appear as below:

Click Compute. The output is shown as a row in the Output Preview located in the
lower pane of this window. The computed sample size (126 subjects) is highlighted in
yellow.

This design has default name Des 1. Select this design and click
in the Output
Preview toolbar. Some of the design details will be displayed in the upper pane,
13.1 Difference in Means – 13.1.1 Trial design

213

<<< Contents

13

* Index >>>

Normal Equivalence Two-Sample
labeled as Output Summary.

A total of 126 subjects must be enrolled in order to achieve the desired 90% power
under the alternative hypothesis. Of these 126 subjects 63 will be randomized to the
test formulation, and the remaining 63 to the marketed formulation. In the Output
Preview toolbar, select Des 1 and click
Library.

to save this design to Wbk1 in the

Suppose that this sample size is not economically feasible and we want to examine
power for a total sample size of 100. Select Des 1 in the Library, and click
on
the Library toolbar. In the Input, click the radiobutton for Power, and enter Sample
Size (n) as 100.

214

13.1 Difference in Means – 13.1.1 Trial design

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
Click Compute. This will add a new row to the Output Preview and the calculated
power is highlighted in yellow. We see that a power of 80.3% can be achieved with
100 subjects.

Suppose we want to see how the design parameters such as power, sample size and
treatment effect are interrelated. To visualize any particular relationship for Des 1, first
select Des 1 from Library and then click
in the toolbar. You will see a list of
options available. To plot power against sample size, click Power vs Sample Size.

Feel free to explore other plots and options available with them. Close the charts
before continuing.

13.1.2

Simulation

We wish to make sure that Design 1 has the desired power of 90%, and maintains the
type I error of 5%. This examination can be conducted using simulation. Select Des 1
in the Library, and click

in the toolbar. Alternatively, right-click Des 1 and

13.1 Difference in Means – 13.1.2 Simulation

215

<<< Contents

13

* Index >>>

Normal Equivalence Two-Sample
select Simulate. A new Simulation window will appear. Click on the Response
Generation Info tab. We will first simulate under H1 . Leave the default values as
below, and click Simulate.

Once the simulation run has completed, East will add an additional row to the Output
Preview labeled as Sim 1.
Select Sim 1 in the Output Preview and click
. Now double-click on Sim 1 in
the Library. The simulation output details, including the table below, will be
displayed.

Observe that out of the 10,000 simulated trials, the null hypothesis was around 90% of
the time. (Note: The numbers on your screen might differ slightly because you might
be using a different starting seed for your simulations.)
Next we will simulate from a point that belongs to the null hypothesis. Consider
µc = 6.5 and µt = 7.8. Select Sim 1 in Library and click
icon. Go to the
Response Generation Info tab in the upper pane and specify: Mean Control (µc ) =
6.5 and Mean Treatment (µt ) = 7.8.

216

13.1 Difference in Means – 13.1.2 Simulation

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
Click Simulate to start the simulation. Once the simulation run has completed, East
will add an additional row to the Output Preview labeled as Sim 2. Select Sim 2 in the
Output Preview and click
. Now double-click on Sim 2 in the Library. You
can see that when H0 is true, the simulated power is close to the specified type I error
of 5%.

13.2

Ratio of Means

13.2.1 Trial design
13.2.2 Simulation

For some pharmacokinetic parameters, the ratio of the means is a more appropriate
measure of the distance between the treatments. Let µt and µc denote the means of the
observations from the test formulation (T) and the reference (C), respectively, and let
σt2 and σc2 denote the corresponding variances of the observations. It is assumed that
σt /µt = σc /µc , i.e. the coefficient of variation CV = σ/µ is the same for T and C.
Finally, let ρ = µt /µc .
The goal is to establish that ρL < ρ < ρU , where ρL and ρU are specified values used
to define equivalence. In practice, ρL and ρU are often chosen such that ρL = 1/ρU .
The two one-sided tests procedure of Schuirmann (1987) is commonly used for this
analysis, and is employed here for a parallel-group study.
The null hypothesis H0 : ρ ≤ ρL or ρ ≥ ρU is tested against the two-sided alternative
hypothesis H1 : ρL < ρ < ρU at level α, using two one-sided tests. Schuirmann (1987)
proposed working this problem on the natural logarithm scale. Thus, we are interested
in the parameter δ = ln(ρ) = ln(µt ) − ln(µc ) and the null hypothesis H0 : δ ≤ δL or
δ ≥ δU is tested against the two-sided alternative hypothesis H1 : δL < δ < δU at level
α, using two one-sided t-tests. Here δL = ln(ρL ) and δU = ln(ρU ).
Since we have translated the ratio hypothesis into a difference hypothesis, we can
perform the test for difference as discussed in section 13.1. Note that we need the
standard deviation for log transformed data. However, if we are provided with
information on CV instead, the standard
deviation of log transformed data can be
q
obtained using the relation sd =

13.2.1

ln (1 + CV2 ).

Trial design

Suppose that the logarithm of area under the curve (AUC), a pharmacokinetic
parameter related to the efficacy of a drug, is to be analyzed to compare the two
formulations of a drug. We want to show that the two formulations are bioequivalent
by showing that the ratio of the means satisfies 0.8 < µt /µc < 1.25. Thus ρL = 0.8
and ρU = 1.25. Also, based on previous studies, it is assumed that the coefficient of
variation is CV = 0.25.
13.2 Ratio of Means – 13.2.1 Trial design

217

<<< Contents

13

* Index >>>

Normal Equivalence Two-Sample
Start East afresh. Click Continuous: Two Samples on the Design tab and then click
Parallel Design: Ratio of Means.
This will launch a new window. The upper pane of this window displays several fields
with default values. Select Equivalence for Trial Type, and enter 0.05 for the Type
I Error. For the Input Method, specify Ratio of Means. Enter 1 for Ratio of
Means (ρ1 ), 0.8 for Lower Equivalence Limit (ρL ) and 1.25 for Upper Equivalence
Limit (ρU ). Specify 0.25 for Coeff. Var.. The upper pane should appear as below:

Click Compute. The output is shown as a row in the Output Preview located in the
lower pane of this window. The computed total sample size (55 subjects) is highlighted
in yellow.

In the Output Preview toolbar, click
to save this design to Wbk1 in the
Library . Double-click Des 1 in the Library to see the details of the designs. Close

218

13.2 Ratio of Means – 13.2.1 Trial design

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
this output window before continuing.

Plotting
With Des 1 selected in the Library, click
on the Library toolbar, and then
click Power vs Sample Size. The resulting power curve for this design will appear.

13.2 Ratio of Means – 13.2.1 Trial design

219

<<< Contents

13

* Index >>>

Normal Equivalence Two-Sample
You can export this chart in one of several image formats (e.g., Bitmap or JPEG) by
clicking Save As.... Feel free to explore charts. Close all chart before continuing.

13.2.2

Simulation

Suppose you suspect that CV will be smaller than 0.25; e.g., 0.2. Select Des 1 in the
Library, and click
in the toolbar. Click on the Response Generation Info tab
and change C.V. of Data Control to 0.20.

Click Simulate. Once the simulation run has completed, East will add an additional
row to the Output Preview labeled as Sim 1. Select Sim 1 in the Output Preview and
click
. Now double-click on Sim 1 in the Library. The simulation output
details will be displayed in the upper pane.

Observe that out of 10,000 simulated trials, the null hypothesis was rejected over 98%
of the time. (Note: The numbers on your screen might differ slightly depending on the
starting seed.)

220

13.2 Ratio of Means – 13.2.2 Simulation

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016

13.3

Difference of
Means in Crossover
Designs

13.3.1 Trial design
13.3.2 Simulation

Crossover trials are widely used in clinical and medical research. The crossover
design is often preferred over a parallel design, because in the former, each subject
receives all the treatments and thus each subject acts as their own control. This leads to
the requirement of fewer subjects in a crossover design. In this chapter, we show how
East supports the design and simulation of such experiments with endpoint as
difference of means.
In a 2 × 2 crossover design each subject is randomized to one of two sequence groups
(or, treatment sequences). Subjects in sequence group 1 receive the test drug (T)
formulation in a first period, have their outcome variable, X recorded, wait out a
washout period to ensure that the drug is cleared from their system, then receive the
control drug formulation (C) in period 2 and finally have the measurement on X again.
In sequence group 2, the order in which the T and C are assigned is reversed. The table
below summarizes this type of trial design.

Group
1(TC)
2(CT)

Period 1
Test
Control

Washout
—
—

Period 2
Control
Test

The resulting data are commonly analyzed using a linear model. The response yijk on
the kth subject in period j of sequence group i, where i = 1, 2, j = 1, 2, and
k = 1, . . . , ni is modeled as a linear function of an overall mean response µ,
formulation effect τt and τc , period effects π1 and π2 , and sequence effects γ1 and γ2 .
The fixed effects model can be displayed as:

Group
1(TC)
2(CT)

Period 1
µ + τt + π1 + γ1
µ + τc + π1 + γ2

Washout
—
—

Period 2
µ + τc + π2 + γ1
µ + τt + π2 + γ2

Let µt = µ + τt and µc = µ + τc denote the means of the observations from the test
and control formulations, respectively, and let M SE denote the mean-squared error of
the log data obtained from fitting the model. This is nothing other than the M SE from
a crossover ANOVA model for the 2 × 2 design (2 periods and 2 sequences).

13.3 Difference of Means in Crossover Designs

221

<<< Contents

13

* Index >>>

Normal Equivalence Two-Sample
In an equivalence trial, the goal is to establish δL < µt − µc < δU , where δL and δU
are specified values used to define equivalence. In practice, δL and δU are often chosen
such that δL = −δU The two one-sided tests (TOST) procedure of Schuirmann (1987)
is commonly used for this analysis, and is employed here for a crossover study.
Let δ = µt − µc denotes the true difference in the means. The null hypothesis
H0 : δ ≤ δL or δ ≥ δU is tested against the two-sided alternative hypothesis
H1 : δL < δ < δU at level α, using TOST. Here we perform the following two tests
together:
Test1: H0L : δ ≤ δL against H1L : δ > δL at level α
Test2: H0U : δ ≥ δU against H1U : δ < δU at level α
H0 is rejected in favor of H1 at level α if and only if both H0L and H0U are rejected.
Note that this is the same as rejecting H0 in favor of H1 at level α if the (1 − 2α) 100%
confidence interval for δ is completely contained within the interval (δL , δU ).
East uses following test statistic to test the above two null hypotheses
TL =

(ȳ11 − ȳ12 − ȳ21 + ȳ22 )/2 − δL
q
M SE 1
1
2 ( n1 + n2 )

TU =

(ȳ11 − ȳ12 − ȳ21 + ȳ22 )/2 − δU
q
M SE 1
1
2 ( n1 + n2 )

and

where, ȳij is the mean of the observations from group i and period j. Both TL and TU
are distributed as Student’s t distribution with (n1 + n2 − 2) degrees of freedom.
The power of the test (i.e. probability of declaring equivalence) depends on the true
value of µt − µc . The sample size (or power) is determined at a specified value of this
difference, denoted δ1 . The choice δ1 = 0, i.e. µt = µc , is √
common. Note that the
power and the sample size depend only on δL , δU , δ1 , and M SE.

13.3.1

Trial design

Standard 2 × 2 crossover designs are often recommended in regulatory guidelines to
establish bioequivalence of a generic drug with off patent brand-name drug. Consider a
2 × 2 bioequivalence trial between a Test drug (T) and a Reference Drug (C) where
equivalence needs to be established in terms of the pharmacokinetic parameter Area
Under the Curve (AUC). Let µT and µc denote the average AUC for Test and
222

13.3 Difference of Means in Crossover Designs – 13.3.1 Trial design

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
Reference drugs, respectively. Let δ = µt − µc be the difference. To establish average
bioequivalence, the calculated 90% confidence interval of δ should fall within a
pre-specified bioequivalence limit. The bioequivalence limits are set at -3 and 3.
Accordingly we plan to design the study to test:
H0 : µt − µc ≤ −3 or µt − µc ≥ 3
against
H1 : −3 < µt − µc < 3
From this study, we consider µc = 21.62 ng.h/mL and µt = 23.19 ng.h/mL under H1 .
Further, we assume that the mean squared error (MSE) from ANOVA would be 2.5.
We wish to design a study that would have 90% power at δ1 = 23.19 − 21.62 = 1.57
under H1 .
Start East afresh. Click Continuous: Two Samples on the Design tab and then click
Crossover Design: Difference of Means.
This will launch a new window. The upper pane displays several fields with default
values. Select Equivalence for Design Type, and Individual Means for
Input Method. Enter 0.05 for Type I Error. Specify the Mean Control (µc ) as 21.62
and Mean Treatment (µt ) as 23.19. Select Sqrt(MSE) from the drop-down list and
specify as 2.5. Also specify the Lower Equiv. Limit (δL ) and Upper Equiv. Limit
(δU ) as -3 and 3, respectively. The upper pane should appear as below:

Click Compute. The output is shown as a row in the Output Preview located in the
lower pane of this window. The computed sample size (54 subjects) is highlighted in
13.3 Difference of Means in Crossover Designs – 13.3.1 Trial design

223

<<< Contents

13

* Index >>>

Normal Equivalence Two-Sample
yellow.

This design has default name Des 1. Select this design and click
in the Output
Preview toolbar. Some of the design details will be displayed in the upper pane,
labeled as Output Summary.

In the Output Preview toolbar, click
to save this design to Wbk1 in the
Library. Double-click Des 1 in the Library to see the details of the designs. Close the

224

13.3 Difference of Means in Crossover Designs – 13.3.1 Trial design

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
output window before continuing.

13.3.2

Simulation

Select Des 1 in the Library, and click
in the toolbar. Alternatively, right-click
Des 1 and select Simulate. A new Simulation window will appear. Click on the
Response Generation Info tab, and specify: Mean control = 21.62; Mean
Treatment = 23.19; Sqrt(MSE) = 2.5.

Leave the other default values and click Simulate. Once the simulation run has
completed, East will add an additional row to the Output Preview labeled as Sim 1.
Select Sim 1 in the Output Preview and click

. Now double-click on Sim 1 in

13.3 Difference of Means in Crossover Designs – 13.3.2 Simulation

225

<<< Contents

13

* Index >>>

Normal Equivalence Two-Sample
the Library. The simulation output details will be displayed.

Notice that the number of rejections was close to 90% of the 10000 simulated trials.
The exact result of the simulations may differ slightly, depending on the seed.
The simulation we have just done was under H1 . We wish to simulate from a point that
belongs to H0 . Right-click Sim 1 in Library and select Edit Simulation. Go to the
Response Generation Info tab in the upper pane and specify: Mean control = 21.62;
Mean Treatment = 24.62; Sqrt. MSE = 2.5.

Click Simulate. Once the simulation run has completed, East will add an additional
row to the Output Preview labeled as Sim 2. Select Sim 2 in the Output Preview and
click

226

. Now double-click on Sim 2 in the Library. The simulation output

13.3 Difference of Means in Crossover Designs – 13.3.2 Simulation

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
details will be displayed.

Notice that the upper efficacy stopping boundary was crossed very close to 5% of the
10000 simulated trials. The exact result of the simulations may differ slightly,
depending on the seed.

13.4

Ratio of Means in
Crossover Designs

Often in crossover designs, an equivalence hypothesis is tested in terms of ratio of
means. These types of trials are very popular in establishing bioavailability or
bioequivalence between two formulations in terms of pharmacokinetic parameters
(FDA guideline on BA/BE studies for orally administered drug products, 2003). In
particular, the FDA considers two products bioequivalent if the 90% confidence
interval of the ratio of two means lie within (0.8, 1.25). In this chapter, we show how
East supports the design and simulation of such experiments with endpoint as ratio of
means.
In a 2 × 2 crossover design each subject is randomized to one of two sequence groups.
We have already discussed 2 × 2 crossover design in section 13.3. However, unlike
section 13.3, we are interested in the ratio of means. Let µt and µc denote the means of
the observations from the experimental treatment (T) and the control treatment (C),
respectively. In an equivalence trial with endpoint as ratio of means, the goal is to
establish ρL < ρ < ρU , where ρL and ρU are specified values used to define
equivalence. In practice, ρL and ρU are often chosen such that ρL = 1/ρU
The null hypothesis H0 : ρ ≤ ρL or ρ ≥ ρU is tested against the two-sided alternative
hypothesis H1 : ρL < ρ < ρU at level α, using two one-sided tests. Schuirmann (1987)
proposed working this problem on the natural logarithm scale. Thus, we are interested
in the parameter δ = ln(ρ) = ln(µt ) − ln(µc ) and the null hypothesis H0 : δ ≤ δL or
13.4 Ratio of Means in Crossover Designs

227

<<< Contents

13

* Index >>>

Normal Equivalence Two-Sample
δ ≥ δU is tested against the two-sided alternative hypothesis H1 : δL < δ < δU at level
α, using two one-sided t-tests. Here δL = ln(ρL ) and δU = ln(ρU ).
Since we have translated the ratio hypothesis into a difference hypothesis, we can
perform the test for difference as discussed in section 13.1. Note that we need the
standard deviation for log transformed data. However, if we are provided with
information on CV instead, the standard
deviation of log transformed data can be
q
obtained using the relation sd =

13.4.1

ln (1 + CV2 ).

Trial design

Standard 2 × 2 crossover designs are often recommended in regulatory guidelines to
establish bioequivalence of a generic drug with off patent brand-name drug. Consider a
2 × 2 bioequivalence trial between a Test drug (T) and a Reference Drug (C) where the
equivalence need to be established in terms of pharmacokinetic parameter Area Under
the Curve (AUC). Let µT and µc denote the average AUC for Test and Reference
drugs, respectively. Let ρ = µt /µc be the ratio of averages. To establish average
bioequivalence, the calculated 90% confidence interval of ρ should fall within a
pre-specified bioequivalence limit. The bioequivalence limits are set at 0.8 and 1.25.
Accordingly we plan to design the study to test:

H0 : µt /µc ≤ 0.8 or µt /µc ≥ 1.25
against
H1 : 0.8 < µt /µc < 1.25
From this study, we consider µc = 21.62 ng.h/mL and µt = 23.19 ng.h/mL under H1 .
Further, we assume that the coefficient of variation (CV), or intrasubject variability, is
17%. For a lognormal population, the mean squared error (MSE) from ANOVA of
log-transformed data, and CV, are related by: M SE = log(1 + CV 2 ). Thus in this
case MSE is 0.0285 and its square-root is 0.169. We wish to design a study that would
have 90% power at ρ1 = 23.19/21.62 = 1.073 under H1 .
Start East afresh. Click Continuous: Two Samples on the Design tab and then click
Crossover Design: Ratio of Means.
This will launch a new window. The upper pane displays several fields with default
values. Select Equivalence for Design Type, and Individual Means for
Input Method. Enter 0.05 for Type I Error. Then specify the Mean Control (µc ) as
21.62 and Mean Treatment (µt ) as 23.19. Specify 0.169 for Sqrt. of MSE Log. Also
228

13.4 Ratio of Means in Crossover Designs – 13.4.1 Trial design

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
specify the Lower Equiv. Limit (ρL ) and Upper Equiv. Limit (ρU ) as 0.8 and 1.25,
respectively. The upper pane should appear as below:

Click Compute. The output is shown as a row in the Output Preview located in the
lower pane of this window. The computed sample size (23 subjects) is highlighted in
yellow.

This design has default name Des 1. Select this design and click
in the Output
Preview toolbar. Some of the design details will be displayed in the upper pane,
labeled as Output Summary.

13.4 Ratio of Means in Crossover Designs – 13.4.1 Trial design

229

<<< Contents

13

* Index >>>

Normal Equivalence Two-Sample
In the Output Preview toolbar, click
to save this design to Wbk1 in the
Library. Double-click Des 1 in the Library to see the details of the designs.

13.4.2

Simulation

in the toolbar. Alternatively, right-click
Select Des 1 in the Library, and click
Des 1 and select Simulate. A new Simulation window will appear. Click on the
Response Generation Info tab, and specify: Mean control = 21.62; Mean
Treatment = 23.19; Sqrt. of MSE Log = 0.169.

Click Simulate. Once the simulation run has completed, East will add an additional
row to the Output Preview labeled as Sim 1.
Select Sim 1 in the Output Preview and click

230

. Now double-click on Sim 1 in

13.4 Ratio of Means in Crossover Designs – 13.4.2 Simulation

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
the Library. The simulation output details will be displayed.

Notice that the number of rejections was close to 90% of the 10,000 simulated trials.
The exact result of the simulations may differ slightly, depending on the seed.

13.4 Ratio of Means in Crossover Designs

231

<<< Contents

* Index >>>

14

Normal: Many Means

In this section, we will illustrate various tests available for comparing more than two
continuous means in East.

14.1

One Way ANOVA

14.1.1 One Way Contrast

In a one-way ANOVA test, we wish to test the equality of means across R
independent groups. The two sample difference of means test for independent data is a
one-way ANOVA test for 2 groups. The null hypothesis H0 : µ1 = µ2 = . . . = µR is
tested against the alternative hypothesis H1 : for at least one pair (i, j), µi 6= µj , where
i, j = 1, 2, . . . R.
Suppose n patients have been allocated randomly to R treatments. We assume that the
data of the R treatment groups comes from R normally distributed populations with
the same variance σ 2 , and with population means µ1 , µ2 , . . . , µR .
To design a one-way ANOVA study in East, first click Continuous: Many Samples
on the Design tab, and then click Factorial Design: One Way ANOVA.

In the upper pane of this window is the input dialog box. Consider a clinical trial with
four groups. Enter 4 in Number of Groups(R). The trial is comparing three different
doses of a drug against placebo in patients with Alzheimer’s disease. The primary
objective of the study is to evaluate the efficacy of these three doses, where efficacy is
assessed by difference from placebo in cognitive performance measured on a 13-item
cognitive subscale. On the basis of pilot data, the expected mean responses are 0, 1.5,
2.5, and 2, for Groups 1 to 4, respectively. The common standard deviation within each
group is σ = 3.5. We wish to compute the required sample size to achieve 90% power
with a type-1 error of 0.05. Enter these values into the dialog box as shown below.

232

14.1 One Way ANOVA

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
Then, click Compute.

The design is shown as a row in the Output Preview, located in the lower pane of the
window. The computed sample size (203) is highlighted in yellow.

Select this row, then click

in the Output Preview toolbar to save this design to

Workbook1 in the Library. With Des1 selected in the Library, click

14.1 One Way ANOVA

to

233

<<< Contents

14

* Index >>>

Normal: Many Means
display the following output.

The output indicates that 51 patients per group is necessary to achieve the desired
power. Close this output window before continuing.

14.1.1

One Way Contrast

A contrast of the population means is a linear combination of the µi ’s. Let ci denote
the coefficient for population mean µi in the linear contrast. For a P
single contrast test
of many means in a one-way
ANOVA,
the
null
hypothesis
is
H
:
ciP
µi = 0 versus a
0
P
two-sided
alternative
H
:
c
µ
=
6
0,
or
a
one-sided
alternative
H
:
ci µi < 0 or
1
i
i
1
P
H1 :
ci µi > 0.
. In the input dialog box, click the
With Des1 selected in the Library, click
checkbox titled Use Contrast, and select a two-sided test. Ensure that the means for
each group are the same as those from Des1 (0, 1.5, 2.5, and 2). In addition, we wish
the test the following contrast: −3, 1, 1, 1, which compares the placebo group with the
average of the three treatment groups. Finally, we may enter unequal allocation ratios
such as: 1, 2, 2, 2, which implies that twice as many patients will be assigned to each
234

14.1 One Way ANOVA – 14.1.1 One Way Contrast

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
treatment group as in the placebo group. Click Compute.

The following row will be added to the Output Preview.

Given the above contrast and allocation ratios, this study would require a total of 265
patients to achieve 90% power.

14.2

One Way Repeated
Measures (Const.
Correlation)
ANOVA

As with the one-way ANOVA discussed in subsection 14.1, the repeated measures
ANOVA also tests for equality of means. However, in a repeated measures setting, all
patients are measured under all levels of the treatment. As the sample is exposed to
each condition in turn, the measurement of the dependent variable is repeated. Thus,
there is some correlation between observations from the same patient, which needs to
be accounted for. The constant correlation assumption means we assume that the
correlation between observations from the same patient is constant for all patients. The
correlation parameter (ρ) is an additional parameter that needs to be specified in the
one way repeated measures study design.
Start East afresh. To design a repeated measure ANOVA study, click Continuous:
Many Samples, and click Factorial Design: One Way Repeated Measures
(Constant Correlation) ANOVA.
A specific type of repeated measures design is a longitudinal study in which patients
are followed over a series of time points. As an illustration, we will consider a
14.2 One Way Repeated Measures ANOVA

235

<<< Contents

14

* Index >>>

Normal: Many Means
hypothetical study that investigated the effect of a dietary intervention on weight loss.
The endpoint is decrease in weight (in kilograms) from baseline, measured at four time
points: baseline, 4 weeks, 8 weeks, and 12 weeks. For Number of Levels, enter 4. We
wish to compute the required sample size to achieve 90% power with a type-1 error of
0.05. The means at each of the four levels are: 0, 1.5, 2.5, 2 for Levels 1, 2, 3, and 4,
respectively. Finally, enter σ = 5 and ρ = 0.2, and click Compute.

The design is shown as a row in the Output Preview, located in the lower pane of the
window. The computed sample size (330) is highlighted in yellow.

Select this row, then click

in the Output Preview toolbar to save this design to

Workbook1 in the Library. With Des1 selected in the Library, click

236

14.2 One Way Repeated Measures ANOVA

to

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
display the following output.

The output indicates that 83 patients per group is necessary to achieve the desired
power.

14.3

Two Way ANOVA

In a two-way ANOVA, there are two factors to consider, say A and B. We can design a
study to test equality of means across factor A, factor B, or the interaction between of
A and B. In addition to the common standard deviation σ, you also need to specify the
cell means.
For example, consider a study to determine the combined effects of sodium restriction
and alcohol restriction on lowering of systolic blood pressure in hypertensive men
(Parker et al., 1999). Let Factor A be sodium restriction and Factor B be alcohol
restriction. There are two levels of each factor (restricted vs usual sodium intake, and
restricted vs usual alcohol intake), producing four groups. Each patient is randomly
assigned to one of these four groups.
Start East afresh. Click Continuous: Many Samples, and click Factorial Design:
Two-Way ANOVA.
14.3 Two Way ANOVA

237

<<< Contents

14

* Index >>>

Normal: Many Means
Enter a type-1 error of 0.05. Then enter the following values in the input dialog box as
shown below: Number of Factor A Levels as 2, Number of Factor B Levels as 2,
Common Std. Dev. as 2, A1/B1 as 0.5, A1/B2 as 4.7, A2/B1 as 0.4, and A2/B2 as
6.9. We will first select Power for A, then click Compute.

Leaving the same input values, click Compute after selecting Power for B in the input
window. Similarly, click Compute after selecting Power for AB. The Output
Preview should now have three rows, as shown below.

In order to achieve at least 90% power to detect a different across means in factor A,
factor B, as well as the interaction, a sample size of 156 patients is necessary (i.e.,
Des1). Select Des1 in the Output Preview, then click

in the toolbar to save to

Workbook1 in the Library. With Des1 selected in the Library, click

238

14.3 Two Way ANOVA

to

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
display the following output.

The output indicates that 39 patients per group is necessary to achieve 90% power to
test the main effect of A.

14.3 Two Way ANOVA

239

<<< Contents

* Index >>>

15

Multiple Comparison Procedures for
Continuous Data

It is often the case that multiple objectives are to be addressed in one single trial. These
objectives are formulated into a family of hypotheses. Formal statistical hypothesis
tests can be performed to see if there is strong evidence to support clinical claims.
Type I error is inflated when one considers the inferences together as a family. Failure
to compensate for multiplicities can have adverse consequences. For example, a drug
could be approved when actually it is not better than placebo. Multiple comparison
(MC) procedures provides a guard against inflation of type I error due to multiple
testing. Probability of making at least one type I error is known as family wise error
rate (FWER). East supports several parametric and p-value based MC procedures. In
this chapter we explain how to design a study using a chosen MC procedure that
strongly maintains FWER.
In East, one can calculate the power from the simulated data under different MC
procedures. With the information on power, one can choose the right MC procedure
that provides maximum power yet strongly maintains the FWER. MC procedures
included in East strongly control FWER. Strong control of FWER refers to preserving
the probability of incorrectly claiming at least one null hypothesis. To contrast strong
control with weak control of FWER, the latter controls the FWER under the
assumption that all hypotheses are true. East supports following MC procedures based
on continuous endpoint.
Category
Parameteric

P-value Based

240

Procedure
Dunnett’s Single Step
Dunnett’s Step Down
Dunnett’s Step Up
Bonferroni
Sidak
Weighted Bonferroni
Holm’s Step Down
Hochberg’s Step Up
Hommel’s Step Up
Fixed Sequence
Fallback

Reference
Dunnett CW (1955)
Dunnett CW and Tamhane AC (1991)
Dunnett CW and Tamhane AC (1992)
Bonferroni CE (1935, 1936)
Sidak Z (1967)
Benjamini Y and Hochberg Y ( 1997)
Holm S (1979)
Hochberg Y (1988)
Hommel G (1988)
Westfall PH, Krishen A (2001)
Wiens B, Dimitrienko A (2005)

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016

15.1

Parametric
Procedures

15.1.1 Dunnett’s single
step
15.1.2 Dunnett’s stepdown and step-up
procedures

Assume that there are k arms including the placebo arm. Let ni be the number of
Pk−1
subjects for i-th treatment arm (i = 0, 2, · · · , k − 1). Let N = i=0 ni be the total
sample size and the arm 0 refers to placebo. Let Yij be the response from subject j in
treatment arm i and yij be the observed value of Yij (i = 0, 2, · · · , k − 1,
j = 1, 2, · · · , ni ). Suppose that
Yij = µi + eij

(15.1)

where eij ∼ N (0, σ 2 ). We are interested in the following hypotheses:
For the right tailed test: Hi : µi − µ0 ≤ 0 vs Ki : µi − µ0 > 0
For the left tailed test: Hi : µi − µ0 ≥ 0 vs Ki : µi − µ0 < 0
For the global null hypothesis at least one of the Hi is rejected in favor of Ki after
controlling for FWER. Here Hi and Ki refer to null and alternative hypotheses,
respectively, for comparison of i-th arm with the placebo arm.
East supports three parametric MC procedures - single step Dunnett test (Dunnett,
1955), step-down Dunnett test and step-up Dunnett test. These procedures make two
parametric assumptions - normality and homoscedasticity. Let ȳi be the sample mean
for treatment arm i and s2 be the pooled sample variance for all arms. The test statistic
for comparing treatment effect of arm i with placebo can be defined as
ȳi − ȳ0
Ti = q
s n1i + n10

(15.2)

Let ti be the observed value of Ti and these observed values for K − 1 treatment arms
can be ordered as t(1) ≥ t(2) ≥ · · · ≥ t(k−1) .
Detailed formula to obtain critical boundaries for single step Dunnett and step-down
Dunnett tests are discussed in Appendix H.
In single step Dunnett test, the critical boundary remains same for all the k − 1
individual tests. Let cα be the critical boundary that maintains FWER of α and p̃i be
the adjusted p− value associated with comparison of i-th arm and placebo arm. Then
for a right tailed test, Hi is rejected if ti > cα and for a left tailed test Hi is rejected if
ti < cα .
Unlike in single step Dunnett test, the critical boundary does not remain same for all
the k − 1 individual tests in step-down Dunnett test. Let ci be the critical boundary and
p̃i be the adjusted p-value associated with comparison of i-th arm and placebo arm.
For a right tailed test H(i) is rejected if t(i) > ci and H(1) , · · · , H(c−i) have been
15.1 Parametric Procedures

241

<<< Contents

15

* Index >>>

Multiple Comparison Procedures for Continuous Data
already rejected. For a left tailed test H(i) is rejected if t(i) < ck−i and
H(i−1) , · · · , H(k−1) have been already rejected.
Unlike step-down test, step-up Dunnett procedure starts with the least significant test
statistic i.e., t(k−1) . Let ci be the critical boundary and p̃i be the adjusted p-value
associated with comparison of i-th arm and placebo arm. The i-th test statistic in order
i.e., t(i) will be tested if and only if none of H(i+1) , · · · , H(k−1) are rejected. If H(i) is
rejected then stop and reject all of H(i) , · · · , H(1) . For a right tailed test, H(i) is
rejected if t(i) > c(i) and for a left tailed test H(i) is rejected if t(i) < c(i) .
For both single step Dunnett and step-down Dunnett tests, the global null hypothesis is
rejected in favor of at least one right tailed alternative if H(1) is rejected and in favor of
at least one left tailed alternative if H(k−1) is rejected .
Single step Dunnett test and step-down Dunnett test can be seen as the parametric
version of Bonferroni procedure and Holm procedure, respectively. Parametric tests
are uniformly more powerful than the corresponding p-value based tests when the
parametric assumption holds or at least approximately holds, especially when there are
a large number of hypotheses. Parametric procedures may not control FWER if the
standard deviations are different.

15.1.1

Dunnett’s single step

Dunnett’s Single Step procedure is described below with an example.
Example: Alzheimer’s Disease Clinical Trial
In this section, we will use an example to illustrate how to design a study using the
MCP module in East. This is a randomized, double-blind, placebo-controlled, parallel
study to assess three different doses (0.3 mg, 1 mg and 2 mg) of a drug against placebo
in patients with mild to moderate probable Alzheimer’s disease. The primary objective
of this study is to evaluate the safety and efficacy of the three doses. The drugs are
administered daily for 24 weeks to subjects with Alzheimer’s disease who are either
receiving concomitant treatment or not receiving any co-medication. The efficacy is
assessed by cognitive performance based on the Alzheimer’s disease assessment
scale-13-item cognitive sub-scale. From previous studies, it is estimated that the
common standard deviation of the efficacy measure is 5. It is expected that the
dose-response relationship follows straight line within the dose range we are
interested.
We would like to calculate the power for a total sample size of 200. This will be a
balanced study with a one-sided 0.025 significance level to detect at least one dose
242

15.1 Parametric Procedures – 15.1.1 Dunnett’s single step

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
with significant difference from placebo. We will show how to simulate the power of
such a study using the multiple comparison procedures listed above.
Designing the study
First, click
(Continuous: Many Samples) on the Design tab and then click
Multi-Arm Design: Pairwise Comparisons to Control - Difference of Means. This
will launch a new window.
There is a box at the top with the label Number of Arms. For our example, we have 3
treatment groups plus a placebo. So enter 4 for Number of Arms. Under the Design
Parameters tab, there are several fields which we will fill in. First, there is a box with
the label Side. Here you need to specify whether you want a one-sided or two-sided
test. Currently, only one-sided tests are available. Under it you will see the box with
label Sample Size (n). For now skip this box and move to the next dropdown box with
the label Rejection Region. If left tail is selected, the critical value for the test is
located in the left tail of the distribution of the test statistic. Likewise, if right tail is
selected the critical value for the test is located in the right tail of the distribution of the
test statistic. For our example, we will select Right Tail. Under that, there is a box
with the label Type - 1 Error (α). This is where you need to specify the FWER. For
our example, enter 0.025. Now go to the box with the label Total Sample Size. Here
we input the total number of subjects, including those in the placebo arm. For this
example, enter 200.
To the right, there will be a heading with the title Multiple Comparison Procedures.
In the parametric grouping, check the box next to Dunnett’s single step, as
this is the multiple comparison procedure we are illustrating in this subsection. After
entering these parameters your screen should now look like this:

Now click on Response Generation Info tab. You will see a table titled Table of
Proportions. In this table we can specify the labels for treatment arms. Also you have
to specify the dose level if you want to generate means through dose-response curve.
Since we are comparing placebo and 3 dose groups, enter Placebo, Dose1, Dose2
and Dose3 in the 4 cells in first column labeled as Arm.
15.1 Parametric Procedures – 15.1.1 Dunnett’s single step

243

<<< Contents

15

* Index >>>

Multiple Comparison Procedures for Continuous Data
The table contains the default mean and standard deviation for each arm which we will
change later. There are two check boxes in this tab above the table. The first is labeled
Generate Means through DR Curve. There are two ways to specify the mean
response for each arm: 1) generate means for each arm through a dose-response curve
or 2) Specify the mean directly in the Table of Proportions. To specify the mean
directly just enter the mean value for each arm in the table in Mean column. However,
in this example, we will generate means through dose response curve. In order to do
this, check Generate Means through DR Curve box. Once you check this box you
will notice two things. First, an additional column with label Dose will appear in the
table. Here you need to enter the dose levels for each arm. For this example, enter 0,
0.3, 1 and 2 for Placebo, Dose1, Dose2 and Dose3 arms, respectively. Secondly, you
will notice an additional section will appear to the right which provides the option to
generate the mean response from four families of parametric curves which are Four
Parameter Logistic, Emax, Linear and Quadratic. The technical details about each
curve can be found in the Appendix H.
Here you need to choose the appropriate parametric curve from the drop-down list
under Dose Response Curve and then you have to specify the parameters associated
with these curves. For the Alzheimer’s disease example, suppose the dose response
follows a linear curve with intercept 0 and slope 1.5. To do this, we would need to
select ”Linear” from the dropdown list. To right of this dropdown box, specify the
parameter values of the selected curve family by inputting 0 for Intercept(E0) and 1.5
for Slope(δ). After specifying this, the mean values in the table will be changed
accordingly. Here we are generating the means using the following linear
dose-response curve:
E(Y |Dose) = E0 + δ × Dose
(15.3)
For placebo, the mean can be obtained by specifying Dose as 0 in the above equation.
This gives the mean for placebo arm as 0. For arm Dose1, mean would be
0 + 1.5 × 0.3 or 0.45. Similarly the means for the arm Dose2 and Dose3 will be
obtained as 1.5 and 3. You can verify that the values in Mean column is changed to 0,
0.45, 1.5 and 3 for the four arms, respectively.

244

15.1 Parametric Procedures – 15.1.1 Dunnett’s single step

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
Now click Plot DR Curve to see the plot of means against the dose levels.

You will see the linear dose response curve that intersects the Y-axis at 0. Now close
this window. The dose response curve generates means, but still we have to specify the
standard deviation. Standard deviation for each arm could be either equal or different.
To specify the common standard deviation, check the box with label Common
Standard Deviation and specify the common standard deviation in the field next to it.
When standard deviations for different arms are not all equal, the standard deviations
need to be directly specified in the table in column labeled with Std. Dev.. In this
example, we are considering a common standard deviation of 5. So check the box for
Common Standard Deviation and specify 5 in the field next to it. Now the column
Std.Dev. will be updated with 5 for all the four arms. As we have finished specifying
all the fields in the Response Generation Info tab, this should appear as below.

15.1 Parametric Procedures – 15.1.1 Dunnett’s single step

245

<<< Contents

15

* Index >>>

Multiple Comparison Procedures for Continuous Data
Click on the Include Options button located in the right-upper corner in the
Simulation window and check Randomized Info. This will add an additional tab Randomization Info. Now click on the Randomization Info tab. Second column of
the Table of Allocation table displays the allocation ratio of each treatment arm to that
of control arm. The cell for control arm is always one and is not editable. Only those
cells for treatment arms other than control need to be filled in. The default value for
each treatment arm is one which represents a balanced design. For the Alzheimer’s
disease example, we consider a balanced design and leave the default values for the
allocation ratios unchanged. Your screen should now look like this:

The last tab is Simulation Control Info. Specify 10000 as Number of Simulations
and 1000 as Refresh Frequency in this tab. The box labeled Random Number
Generator is where you can set the seed for the random number generator. You can
either use the clock as the seed or choose a fixed seed (in order to replicate past
simulations). The default is the clock and we will use that. The box on the right hand
side is labeled Output Options. This is where you can choose to save summary
statistics for each simulation run and/or to save subject level data for a specific number
of simulation runs. To save the output for each simulation, check the box with label
Save summary statistics for every simulation run. Now click Simulate to start the
simulation. Once the simulation run has completed, East will add an additional row to
the Output Preview labeled as Sim 1.

Note that a simulation node Sim 1 is created in the library. Also note that another node
is appended to the simulation node with label SummaryStat which contains detailed
simulation summary statistics for each simulation run. Select Sim 1 in the Output
Preview and click
246

icon to save the simulation in the library. Now double-click on

15.1 Parametric Procedures – 15.1.1 Dunnett’s single step

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
Sim 1 in the Library. The simulation output details will be displayed in the right pane.

The first section in the output is the Hypothesis section. In our situation, we are testing
3 hypotheses. We are comparing the mean score on the Alzheimer’s disease
assessment scale (13-item cognitive sub-scale) for each dose with that of placebo. That
is, we are testing the 3 hypotheses:
H1 :µ1 = µ0

vs

K1 :µ1 > µ0

H2 :µ2 = µ0

vs

K2 :µ2 > µ0

H3 :µ3 = µ0

vs

K3 :µ3 > µ0

Here, µP , µ1 , µ2 and µ3 represent the population mean score on the Alzheimer’s
disease assessment scale for the placebo, 0.3 mg, 1 mg and 2 mg dose groups,
respectively. Also, Hi and Ki are the null and alternative hypotheses, respectively, for
the i-th test.
The Input Parameters section provides the design parameters that we specified
earlier. The next section Overall Power gives us estimated power based on the
15.1 Parametric Procedures – 15.1.1 Dunnett’s single step

247

<<< Contents

15

* Index >>>

Multiple Comparison Procedures for Continuous Data
simulation. The second line gives us the global power, which is about 75%. Global
power indicates the power to reject global null H0 : µ1 = µ2 = µ3 = µ0 . Thus, the
global power indicates that 75% of times the global null will be rejected. In other
words, at least one of the H1 , H2 and H3 is rejected in about 75% of the occasion.
Global power is useful to show the existence of dose-response relationship and
dose-response may be claimed if any of the doses in the study is significantly different
from placebo.
The next line displays the conjunctive power. Conjunctive power indicates the
proportion of cases in the simulation where all the Hi ’s, which are truly false, were
rejected. In this example, all the Hi ’s are false. Therefore, for this example,
conjunctive power is the proportion of cases where all of the H1 , H2 and H3 were
rejected. For this simulation conjunctive power is only about 2.0% which means that
only in 2.0% of time, all of the H1 , H2 and H3 were rejected.
Disjunctive power indicates the proportion of rejecting at least one of those Hi ’s where
Hi is truly false. The main distinction between global and distinctive power is that the
former finds any rejection whereas the latter look for rejection only among those Hi ’s
which are false. Since here all of the H1 , H2 and H3 are false, therefore, global and
disjunctive power ought to be the same.
The next section gives us the marginal power for each hypothesis. Marginal power
finds the proportion of times when a particular hypothesis is rejected after applying
multiplicity adjustment. Based on simulation results, H1 is rejected about 3% of times,
H2 is rejected about 20% of times and H3 is rejected a little more than 70% of times.
Recall that we have asked East to save the simulation results for each simulation run—.
Open this file by clicking on SummaryStat in the library and you will see that it
contains 10,000 rows - each rows represents results for a single simulation. Find the 3
columns with labels Rej Flag 1, Rej Flag 2 and Rej Flag 3, respectively.
These columns represents the rejection status for H1 , H2 and H3 , respectively. A
value of 1 is indicator of rejection on that particular simulation, otherwise the null is
not rejected. Now the proportion of 1’s in Rej Flag 1 indicates the marginal power
to reject H1 . Similarly we can find out the marginal power for H2 and H3 from
Rej Flag 2 and Rej Flag 3, respectively. To obtain the global and disjunctive
power, count the total number of cases where at least one of the H1 , H2 and H3 have
been rejected and then divide by the total number of simulations of 10,000. Similarly,
to obtain the conjunctive power count the total number of cases where all of the H1 ,
H2 and H3 have been rejected and then divide by the total number of simulations of
10,000.

248

15.1 Parametric Procedures – 15.1.1 Dunnett’s single step

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
Next we will consider an example to show how global and disjunctive power are
different from each other. Select Sim 1 in Library and click
. Now go to the
Response Generation Info tab and uncheck the Generate Means Through DR
Curve box. The table will now have only three columns. Specify Dose1, Dose2 and
Dose3 in the 4 cells in first column labeled as Arm and enter 0, 0, 1 and 1.2 in the 4
cells in second column labeled as Mean.

Here we are generating response for placebo from distribution N (0, 52 ), for Dose1
from distribution N (0, 52 ), for Dose2 from distribution N (1, 52 ) and for Dose3 from
distribution N (1.2, 52 ). Now click Simulate to start the simulation. Once the
simulation run has completed, East will add an additional row to the Output Preview
labeled as Sim 2.

For Sim 2, the global power and disjunctive power are 17.9% and 17.6%, respectively.
To understand why, we need to open the saved simulation data for Sim 2. The total
number of cases where at least one of H1 , H2 and H3 is rejected is 1790 and dividing
this by total number of simulation 10,000 gives the global power of 17.9%. Again, the
total number of cases where at least one of H2 and H3 are rejected is 1760 and
dividing this by total number of simulation 10,000 gives the disjunctive power of
17.6%. The exact result of the simulations may differ slightly, depending on the seed.
15.1 Parametric Procedures – 15.1.2 Dunnett’s step-down and step-up procedures

249

<<< Contents

15

* Index >>>

Multiple Comparison Procedures for Continuous Data
15.1.2

Dunnett’s step-down and step-up procedures

Dunnett’s Step-Down procedure is described below using the same Alzheimer’s
Disease example from the previous section 15.1.1 on Dunnett’s Single Step.
Since the other design specification remains same except that we are using Dunnett’s
step-down in place of single step Dunnett’s test, we can design simulation in this
section with only little effort. Select Sim 1 in Library and click
. Now go to the
Design Parameters tab. There in the Multiple Comparison Procedures box,
uncheck the Dunnett’s single step box and check the Dunnett’s step-down and
Dunnett’s step-up box.

Now click Simulate to obtain power. Once the simulation run has completed, East will
add two additional rows to the Output Preview labeled as Sim 3 and Sim 4.
Dunnett step-down procedure and step-down have global and disjunctive power of
close to 75% and conjunctive power of close to 4%. To see the marginal power for
icon. Now,
each test, select Sim 3 and Sim 4 in the Output Preview and click
double-click on Sim 3 in the Library. The simulation output for Dunnett step-down

250

15.1 Parametric Procedures – 15.1.2 Dunnett’s step-down and step-up procedures

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
procedure details will be displayed in the right pane.

The marginal power for comparison of Dose1, Dose2 and Dose3 using Dunnett
step-down procedure are close to 5%, 23% and 74%, respectively. Similarly one can
find the marginal power for individual tests in Dunnett step-up procedure.

15.2

p-value based
Procedures

15.2.1 Single step MC
procedures
15.2.2 Data-driven stepdown MC procedure
15.2.3 Data-driven step-up
MC procedures
15.2.4 Fixed-sequence
stepwise MC
procedures

p-value based procedures strongly control the FWER regardless of the joint
distribution of the raw p-values as long as the individual raw p-values are legitimate
p-values. Assume that there are k arms including the placebo arm. Let ni be the
Pk−1
number of subjects for i-th treatment arm (i = 0, 2, · · · , k − 1). Let N = i=0 ni be
the total sample size and the arm 0 refers to placebo. Let Yij be the response from
subject j in treatment arm i and yij be the observed value of
Yij (i = 0, 2, · · · , k − 1, j = 1, 2, · · · , ni ). Suppose that
Yij = µi + eij

(15.4)

where eij ∼ N (0, σi2 ). We are interested in the following hypotheses:
For the right tailed test: Hi : µi − µ0 ≤ 0 vs Ki : µi − µ0 > 0
For the left tailed test: Hi : µi − µ0 ≥ 0 vs Ki : µi − µ0 < 0
15.2 p-value based Procedures

251

<<< Contents

15

* Index >>>

Multiple Comparison Procedures for Continuous Data
For the global null hypothesis at least one of the Hi is rejected in favor of Ki after
controlling for FWER. Here Hi and Ki refer to null and alternative hypotheses,
respectively, for comparison of i-th arm with the placebo arm.
Let ȳi be the sample mean for treatment arm i, s2i be the sample variance from i-th arm
and s2 be the pooled sample variance for all arms. For the unequal variance case, the
test statistic for comparing treatment effect of arm i with placebo can be defined as
Ti = q

ȳi − ȳ0
1 2
ni si

+

(15.5)

1 2
n0 s0

For the equal variance case, one need to replace s2i and s20 by the pooled sample
variance s2 . For both the case, Ti is distributed as Student’s t distribution. However,
the degrees of freedom varies for equal variance and unequal variance case. For equal
variance case the degrees of freedom would be N − k. For the unequal variance case,
the degrees of freedom is subject to Satterthwaite correction.
Let ti be the observed value of Ti and these observed values for K − 1 treatment arms
can be ordered as t(1) ≥ t(2) ≥ · · · ≥ t(k−1) . For the right tailed test the marginal
p-value for comparing the i-th arm with placebo is calculated as pi = P (T > ti ) and
for left tailed test pi = P (T < ti ), where T is distributed as Student’s t distribution.
Let p(1) ≤ p(2) ≤ · · · ≤ p(k−1) be the ordered p-values.

15.2.1

Single step MC procedures

East supports three p-value based single step MC procedures - Bonferroni procedure,
Sidak procedure and weighted Bonferroni procedure. For the Bonferroni procedure,
α
and the adjusted p-value is given as min(1, (k − 1)pi ). For
Hi is rejected if pi < k−1
1

the Sidak procedure, Hi is rejected if pi < 1 − (1 − α) k−1 and the adjusted p-value is
given as 1 − (1 − pi )k−1 . For the weighted Bonferroni procedure, Hi is rejected if
pi < wi α and the adjusted p-value is given as min (1, wpii ). Here wi denotes the
Pk−1
1
,
proportion of α allocated to the Hi such that i=1 wi = 1. Note that, if wi = k−1
then the Bonferroni procedure is reduced to the regular Bonferroni procedure.
Bonferroni and Sidak procedures
Bonferroni and Sidak procedures are described below using the same Alzheimer’s
Disease example from the section 15.1.1 on Dunnett’s Single Step.
Since the other design specification remains same except that we are using Bonferroni
and Sidak in place of single step Dunnett’s test, we can design simulation in this
252

15.2 p-value based Procedures – 15.2.1 Single step MC procedures

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016

section with only little effort.Select Sim 1 in Library and click
. Now go to the
Design Parameters tab. In the Multiple Comparison Procedures box, uncheck the
Dunnett’s single step box and check the Bonferroni and Sidak boxes.

Now click Simulate to obtain power. Once the simulation run has completed, East will
add two additional rows to the Output Preview. Bonferroni and Sidak procedures
have disjunctive and global powers of close to 73% and conjunctive power of about
1.8%. Now select Sim 5 and Sim 6 in the Output Preview using the Ctrl key and click
icon. This will save Sim 5 and Sim 6 in the Wbk1 in Library.
Weighted Bonferroni procedure
As before we will use the same Alzheimer’s Disease example to illustrate weighted
Bonferroni procedure. Select Sim 1 in Library and click
. Now go to the Design
Parameters tab. There in the Multiple Comparison Procedures box, uncheck the
Dunnett’s single step box and check the Weighted Bonferroni box.

Next click on Response Generation Info tab and look at the Table of Proportions.
You will see an additional column with label Proportion of Alpha is added. Here you
have to specify the proportion of total alpha you want to spend in each test. Ideally, the
values in this column should add up to 1; if not, then East will normalize it to add them
up to 1. By default, East distributes the total alpha equally among all tests. Here we
have 3 tests in total, therefore each of the tests have proportion of alpha as 1/3 or
0.333. You can specify other proportions as well. For this example, keep the equal
15.2 p-value based Procedures – 15.2.1 Single step MC procedures

253

<<< Contents

15

* Index >>>

Multiple Comparison Procedures for Continuous Data
proportion of alpha for each test. Now click Simulate to obtain power. Once the
simulation run has completed, East will add an additional row to the Output Preview
labeled as Sim 7. The weighted Bonferroni MC procedure has global and disjunctive
power of 73.7% and conjunctive power of 1.6%. Note that, the powers in the weighted
Bonferroni procedure is quite close to the Bonferroni procedure. This is because the
weighted Bonferroni procedure with equal proportion is equivalent to the simple
Bonferroni procedure. The exact result of the simulations may differ slightly,
depending on the seed. Now select Sim 7 in the Output Preview and click
This will save Sim 7 in Wbk1 in Library.

15.2.2

icon.

Data-driven step-down MC procedure

In the single step MC procedures, the decision to reject any hypothesis does not
depend on the decision to reject other hypotheses. On the other hand, in the stepwise
procedures decision of one hypothesis test can influence the decisions on the other
tests of hypotheses. There are two types of stepwise procedures. One type of
procedures proceed in data-driven order. The other type proceeds in a fixed order set a
priori. Stepwise tests in a data-driven order can proceed in step-down or step-up
manner. East supports Holm step-down MC procedure which start with the most
significant comparison and continue as long as tests are significant until the test for
certain hypothesis fails. The testing procedure stops at the first time a non-significant
comparison occurs and all remaining hypotheses will be retained. In i-th step, H(k−i)
is rejected if p(k−i) ≤ αi and go to the next step.
Holm’s step-down
As before we will use the same Alzheimer’s Disease example to illustrate Holm’s
. Now go to the Design
step-down procedure. Select Sim 1 in Library and click
Parameters tab. In the Multiple Comparison Procedures box, uncheck the
Dunnett’s single step box and check the Holm’s Step-down box.

Now click Simulate to obtain power. Once the simulation run has completed, East will

254

15.2 p-value based Procedures – 15.2.2 Data-driven step-down MC procedure

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
add an additional row to the Output Preview labeled as Sim 8.

Holm’s step-down procedure has global and disjunctive power of 74% and conjunctive
power of 4.5%. The exact result of the simulations may differ slightly, depending on
the seed. Now select Sim 8 in the Output Preview and click
Sim 8 in Wbk1 in Library.

15.2.3

icon. This will save

Data-driven step-up MC procedures

Step-up tests start with the least significant comparison and continue as long as tests
are not significant until the first time when a significant comparison occurs and all
remaining hypotheses will be rejected. East supports two such MC procedures 15.2 p-value based Procedures – 15.2.3 Data-driven step-up MC procedures

255

<<< Contents

15

* Index >>>

Multiple Comparison Procedures for Continuous Data
Hochberg step-up and Hommel step-up procedures. In the Hochberg step-up
procedure, in i-th step H(k−i) is retained if p(k−i) > αi . In the Hommel step-up
procedure, in i-th step H(k−i) is retained if p(k−j) > i−j+1
α for j = 1, · · · , i. Fixed
i
sequence test and fallback test are the types of tests which proceed in a prespecified
order.
Hochberg’s and Hommel’s step-up procedures
Hochberg’s and Hommel’s step-up procedures are described below using the same
Alzheimer’s Disease example from the section 15.1.1 on Dunnett’s Single Step.

Since the other design specification remains same except that we are using Hocheberg
and Hommel step-up procedures in place of single step Dunnett’s test we can design
simulation in this section with only little effort. Select Sim 1 in Library and click
. Now go to the Design Parameters tab. There in the Multiple Comparison
Procedures box, uncheck the Dunnett’s single step box and check the Hochberg’s
step-up and Hommel’s step-up boxes.

Now click Simulate to obtain power. Once the simulation run has completed, East will
add two additional rows to the Output Preview labeled as Sim 9 and Sim 10.

Hocheberg and Hommel procedures have disjunctive and global powers of close to 74
256

15.2 p-value based Procedures – 15.2.4 Fixed-sequence stepwise MC procedures

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016

15.2.4

Fixed-sequence stepwise MC procedures

In data-driven stepwise procedures, we don’t have any control on the order of the
hypotheses to be tested. However, sometimes based on our preference or prior
knowledge we might want to fix the order of tests a priori. Fixed sequence test and
fallback test are the types of tests which proceed in a pre-specified order. East supports
both of these procedures.
Assume that H1 , H2 , · · · , Hk−1 are ordered hypotheses and the order is pre-specified
so that H1 is tested first followed by H2 and so on. Let p1 , p2 , · · · , pk−1 be the
associated raw marginal p-values. In the fixed sequence testing procedure, for
i = 1, · · · , k − 1, in i-th step, if pi < α, reject Hi and go to the next step; otherwise
retain Hi , · · · , Hk−1 and stop.
Fixed sequence testing strategy is optimal when early tests in the sequence have largest
treatment effect and performs poorly when early hypotheses have small treatment
effect or are nearly true (Westfall and Krishen 2001). The drawback of fixed sequence
test is that once a hypothesis is not rejected no further testing is permitted. This will
lead to lower power to reject hypotheses tested later in the sequence.
Fallback test alleviates the above undesirable feature for fixed sequence test. Let wi be
Pk−1
the proportion of α for testing Hi such that i=1 wi = 1. In the fixed sequence
testing procedure, in i-th step (i = 1, · · · , k − 1), test Hi at αi = αi−1 + αwi if Hi−1
is rejected and at αi = αwi if Hi−1 is retained. If pi < αi , reject Hi ; otherwise retain
it. Unlike the fixed sequence testing approach, the fallback procedure can continue
testing even if a non-significant outcome is encountered by utilizing the fallback
strategy. If a hypothesis in the sequence is retained, the next hypothesis in the
sequence is tested at the level that would have been used by the weighted Bonferroni
procedure. With w1 = 1 and w2 = · · · = wk−1 = 0, the fallback procedure simplifies
to fixed sequence procedure.
Fixed sequence testing procedure
As before we will use the same Alzheimer’s Disease example to illustrate fixed
sequence testing procedure. Select Sim 1 in Library and click
. Now go to the
Design Parameters tab. There in the Multiple Comparison Procedures box,

15.2 p-value based Procedures – 15.2.4 Fixed-sequence stepwise MC procedures

257

<<< Contents

15

* Index >>>

Multiple Comparison Procedures for Continuous Data
uncheck the Dunnett’s single step box and check the Fixed Sequence box.

Next click on Response Generation Info tab and look at the Table of Proportions.
You will see an additional column with label Test Sequence is added. Here you have to
specify the order in which the hypotheses will be tested. Specify 1 for the test that will
be tested first, 2 for the test that will be tested next and so on. By default East specifies
1 to the first test, 2 to the second test and so on. For now we will keep the default
which means that H1 will be tested first followed by H2 and finally H3 will be tested.

Now click Simulate to obtain power. Once the simulation run has completed, East will
add an additional row to the Output Preview labeled as Sim 11.

The fixed sequence procedure with the specified sequence has global and disjunctive
power of less than 7% and conjunctive power of 5%. The reason for small global and
258

15.2 p-value based Procedures – 15.2.4 Fixed-sequence stepwise MC procedures

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
disjunctive power is due to the smallest treatment effect is tested first and the
magnitude of treatment effect increases gradually for the remaining tests. For optimal
power in fixed sequence procedure, the early tests in the sequence should have larger
treatment effects. In our case, Dose3 has largest treatment effect followed by Dose2
and Dose1. Therefore, to obtain optimal power, H3 should be tested first followed by
H2 and H1 .
Select Sim 11 in the Output Previewand click

icon. Select Sim 11 in Library,

click
and go to the Response Generation Info tab. In Test Sequence column in
the table, specify 3 for Dose1, 2 for Dose2 and 1 for Dose3.

Now click Simulate to obtain power. Once the simulation run has completed, East will
add an additional rows to the Output Preview labeled as Sim 12.

Now the fixed sequence procedure with the pre-specified sequence (H3 , H2 , H1 ) has
global and disjunctive power close to 85% and conjunctive power close to 5%. This
example illustrates that fixed sequence procedure is powerful provided the hypotheses
are tested in a sequence of descending treatment effects. Fixed sequence procedure
controls the FWER because for each hypothesis, testing is conditional upon rejecting
all hypotheses earlier in sequence. The exact result of the simulations may differ
slightly, depending on the seed. Select Sim 12 in the Output Preview and click
15.2 p-value based Procedures – 15.2.4 Fixed-sequence stepwise MC procedures

259

<<< Contents

15

* Index >>>

Multiple Comparison Procedures for Continuous Data
icon to save it in Library.
Fallback procedure
Again we will use the same Alzheimer’s Disease example to illustrate the fallback
procedure. Select Sim 1 in Library and click
. There in the Multiple Comparison
Procedures box, uncheck the Dunnett’s single step box and check the Fallback box.

Next click on Response Generation Info tab and look at the Table of Proportions.
You will see two additional columns with label Test Sequence and Proportion of
Alpha. In the column Test Sequence, you have to specify the order in which the
hypotheses will be tested. Specify 1 for the test that will be tested first, 2 for the test
that will be tested next and so on. By default East specifies 1 to the first test, 2 to the
second test and so on. For now we will keep the default which means that H1 will be
tested first followed by H2 and finally H3 will be tested.

In the column Proportions of Alpha, you have to specify the proportion of total alpha
you want to spend in each test. Ideally, the values in this column should add up to 1; if
not, then East will normalize it to add them up to 1. By default East distributes the total
alpha equally among the all tests. Here we have 3 tests in total, therefore each of the
tests have proportion of alpha as 1/3 or 0.333. You can specify other proportions as
well. For this example, keep the equal proportion of alpha for each test.

260

15.2 p-value based Procedures – 15.2.4 Fixed-sequence stepwise MC procedures

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
Now click Simulate to obtain power. Once the simulation run has completed, East will
add an additional rows to the Output Preview labeled as Sim 13.

Now we will consider a sequence where H3 will be tested first followed by H2 and H1
because in our case, Dose3 has largest treatment effect followed by Dose2 and Dose1.

Select Sim 13 in the Output Previewand click

icon. Select Sim 12 in Library,

click
and go to the Response Generation Info tab. In Test Sequence column in
the table, specify 3 for Dose1, 2 for Dose2 and 1 for Dose3.

Now click Simulate to obtain power. Once the simulation run has completed, East will

15.2 p-value based Procedures – 15.2.4 Fixed-sequence stepwise MC procedures

261

<<< Contents

15

* Index >>>

Multiple Comparison Procedures for Continuous Data
add an additional rows to the Output Preview labeled as Sim 14.

Note that the fallback test is more robust to the misspecification of the test sequence
but fixed sequence test is very sensitive to the test sequence. If the test order is
misspecified, fixed sequence test has very poor performance.

15.3

Comparison of MC
procedures

We have obtained the power (based on the simulation) for different MC procedures for
the Alzheimer’s Disease example from the section 15.1.1. Now the obvious question is
which MC procedure to choose. To compare all the MC procedure, we will perform
simulation for all the MC procedures under the following scenario.
Treatment arms: placebo, dose1 (dose=0.3 mg), dose2 (dose=1 mg) and dose3
(dose=2 mg) with respective groups means as 0, 0.45, 1.5 and 3, respectively.
common standard deviation = 5
Type I Error: 0.025 (right-tailed)
Number of Simulations:10000
Total Sample Size:200
Allocation ratio: 1 : 1 : 1 : 1
For comparability of simulation results, we have used similar seed for simulation under
all MC procedures. Following output displays the powers under different MC

262

15.3 Comparison of MC procedures

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
procedures.

Here we have used equal proportions for weighted Bonferroni and Fallback
procedures. For the two fixed sequence testing procedures (fixed sequence and
fallback) two sequences have been used - (H1 , H2 , H3 ) and (H3 , H2 , H1 ). As
expected, Bonferroni and weighted Bonferroni procedures provides similar powers. It
appears that fixed sequence procedure with the pre-specified sequence (H3 , H2 , H1 )
provides the power of close to 85% which is the maximum among all the procedures.
However, fixed sequence procedure with the pre-specified sequence (H1 , H2 , H3 )
provides power of less than 7%. Therefore, power in fixed sequence procedure is
largely dependent on the specification of sequence of testing and a mis-specification
might result in huge drop in power. For this reason, fixed sequence procedure may not
be considered as appropriate MC procedure to go with.
Dunnett’s single step, step-down and step-up procedures are the next in order after
fixed sequence procedure with the pre-specified sequence (H3 , H2 , H1 ). All the three
procedures attain close to 75% of disjunctive power, respectively. However, all these
three procedures assume that all the treatment arms have equal variance. Therefore, if
homogeneity of variance between the treatment arms is a reasonable assumption,
Dunnett’s step-down or single step procedure should be the best option based on these
simulation results. However, when the assumption of equal variance is not met,
Dunnett’s procedure may not be the appropriate procedure as the type I error might not
be strongly controlled.
Next in the list are the fallback procedures and both of them provides a little more than
73% power which is very close to the power attained by Dunnett’s procedures.
Therefore, unlike fixed sequence procedure, fallback procedure does not depend much
on the order of the hypotheses they are tested. Moreover, this does not require the
15.3 Comparison of MC procedures

263

<<< Contents

15

* Index >>>

Multiple Comparison Procedures for Continuous Data
assumption of equal variance among the treatment arms to be met. For all these
reasons, fallback procedure seems to be the most appropriate MC procedure for the
design we are interested in.
Now, we will perform the comparison but this time with unequal variance between the
treatment arms. Precisely, we simulate data under the following scenario to see the
type I error rate control of different procedures.
Treatment arms: placebo, dose1 (dose=0.3 mg), dose2 (dose=1 mg) and dose3
(dose=2 mg) with respective groups means as 0, 0, 0 and 0, respectively.
standard deviation for placebo, dose1 and dose2 is 5; standard deviation for
dose3 is 10
Type I Error: 0.025 (right-tailed)
Number of Simulations:1000000
Total Sample Size:200
Allocation ratio: 1 : 1 : 1 : 1
Following output displays the type I error rate under different MC procedures for the
unequal variance case.

Note that the Dunnett tests slightly inflate type I error rate but all other procedures
control the type I error rate below the nominal level 0.025.

264

15.3 Comparison of MC procedures

<<< Contents

* Index >>>

16

Multiple Endpoints-Gatekeeping
Procedures

16.1

Introduction

Clinical trials are often designed to assess benefits of a new treatment compared to a
control treatment with respect to multiple clinical endpoints which are divided into
hierarchically ordered families. Typically, the primary family of endpoints defines the
overall outcome of the trial, provides the basis for regulatory claim and is included in
the product label. The secondary families of endpoints play a supportive role and
provide additional information for physicians, patients, payers and hence are useful for
enhancing product label. Gatekeeping procedures are specifically designed to address
this type of multiplicity problems by explicitly taking into account the hierarchical
structure of the multiple objectives. The terminology-gatekeeping indicates the
hierarchical decision structure where the higher ranked families serve as gatekeepers
for the lower ranked family. The lower ranked families won’t be tested if the higher
ranked families are not passed. Two types of gatekeeping procedures are described in
this chapter. One is serial gatekeeping procedure and the other one is parallel
gatekeeping procedure. In the next few sections, specific examples will be provided to
illustrate how to design trials with each type of gatekeeping procedures. For more
information about applications of gatekeeping procedures in a clinical trial setting and
literature review on this topic, please refer to Dmitrienko and Tamhane (2007).

16.2

Simulate Serial
Gatekeeping Design

Serial gatekeeping procedures were studied by Maurer, Hothorn and Lehmacher
(1995), Bauer et al. (1998) and Westfall and Krishen (2001). Serial gatekeepers are
encountered in trials where endpoints are usually ordered from most important to least
important. Reisberg et al. 2003 reported a study designed to investigate memantine, an
N-methyl-D-aspartate (NMDA) antagonist, for the treatment of alzheimer’s disease in
which patients with moderate-to-severe Alzheimer’s disease were randomly assigned
to receive placebo or 20 mg of memantine daily for 28 weeks. The two primary
efficacy variables were: (1) the Clinician’s Interview-Based Impression of Change Plus
Caregiver Input (CIBIC-Plus) global score at 28 weeks, (2) the change from base line
to week 28 in the Alzheimer’s Disease Cooperative Study Activities of Daily Living
Inventory modified for severe dementia (ADCS-ADLsev). The CIBIC-Plus measures
overall global change relative to base line and is scored on a seven-point scale ranging
from 1 (markedly improved) to 7 (markedly worse). For illustration purpose, we
redefine the primary endpoint of clinician’s global assessment score as 7 minus the
CIBIC-Plus score so that a larger value indicates improvement (0 markedly worse and
6 markedly improved). The secondary efficacy endpoints included the Severe
Impairment Battery and other measures of cognition, function, and behavior. Suppose
that the trial is declared successful only if the treatment effect is demonstrated on both
endpoints. If the trial is successful, it is of interest to assess the two secondary
endpoints: (1) Severe Impairment Battery (SIB), (2) Mini-Mental State Examination
16.2 Simulate Serial Gatekeeping Design

265

<<< Contents

16

* Index >>>

Multiple Endpoints-Gatekeeping Procedures
(MMSE). The SIB was designed to evaluate cognitive performance in advanced
Alzheimer’ disease. A 51-item scale, it assesses social interaction, memory, language,
visuospatial ability, attention, praxis, and construction. The scores range from 0
(greatest impairment) to 100. The MMSE is a 30-point scale that measures cognitive
function. The means of the endpoints for subjects in the control group and
experimental group and the common covariance matrix are as follows

CIBIC-Plus
ADCS-ADLsev
SIB
MMSE

Mean Treatment

Mean Control

2.6
-2.5
-6.5
-0.4

2.3
-4.5
-10
-1.2

CIBIC-Plus

ADCS-ADLsev

SIB

MMSE

1.2
3.6
6.8
1.6

3.6
42
38
9.3

6.8
38
145
17

1.6
9.3
17
8

Typically there are no analytical ways to compute the power for gatekeeping
procedures. Simulations can be used to assess the operating characteristics of different
designs. For example, one could simulate the power for given sample sizes. To start
the simulations, click Two Samples in the Design tab and select Multiple
Comparisons-Multiple Endpoints to see the following input windows

266

16.2 Simulate Serial Gatekeeping Design

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016

On the top of this input window, one needs to specify the total number of endpoints
and other input parameters such as Rejection Region, Type I Error, Sample Size. One
also needs to select the multiple comparison procedure which will be used to test the
last family of endpoints. The type I error specified on this screen is the nominal level
of the familywise error rate which is defined as the probability of falsely declaring the
efficacy of the new treatment compared to control with respect to any endpoint. For the
Alzheimer’s disease example, CIBIC-Plus and ADCS-ADlsev form the primary family,
and the other endpoints SIB and MMSE form the secondary family. Suppose that we
would like to see the power for a sample size of 250 at a nominal type I error rate 0.025
using Bonferroni test for the secondary family, then the input window looks as follows

16.2 Simulate Serial Gatekeeping Design

267

<<< Contents

16

* Index >>>

Multiple Endpoints-Gatekeeping Procedures

Behind the window for Simulation Parameters, there is another window tab labeled as
Response Generation Info. The window for Response Generation Info tab shown
below allows one to specify the underlying joint distribution among the multiple
endpoints for control arm and for experimental arm. The joint distribution among the
endpoints are assumed to be multivariate normal with common covariance matrix. One
also needs to specify which family each endpoint belongs to in the column with label
Family Rank. One can also customize the label for each endpoint. For the Alzheimer’s
disease example, the inputs for this window should be specified as follows

One can specify the number of simulations to be performed on the window with the
268

16.2 Simulate Serial Gatekeeping Design

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
label Simulation Control Info. By default, 10000 simulations will be performed. One
can also save the summary statistics for each simulated trial or save subject-level data
by checking the appropriate box in the output option area. To simulate this design,
click the Simulate button at the bottom right of the screen to see the preliminary output
displayed in the output preview area as seen in the following screen. All the results
displayed in the yellow cells are summary outputs generated from simulations. For
example, the actually FWER, number of families, conjunctive power for the primary
family, conjunctive power and disjunctive power for the last family.

To view the detailed output, first save the simulation into a workbook in the library by
clicking on the tool button
and you will notice that a simulation node appears in
the library as shown in the following screen.

Now double click on the simulation node Sim1 to see the detailed output as shown in
the following screen. The detailed output summarizes all the main input parameters
such as the multiple comparison procedure used for the last family of endpoints, the
nominal type I error level, total sample size, mean values for each endpoint in the
control arm and that in the experimental arm etc. It also displays the attained overall
FWER, conjunctive power, disjunctive power, the FWER and conjunctive power for
each gatekeeper family, the FWER and conjunctive power and disjunctive power for
the last family. The definitions of different types of power are as follows:
Overall Power and FWER:
Global: probability of declaring significance on any of the endpoints
Conjunctive: probability of declaring significance on all of the endpoints for which the
16.2 Simulate Serial Gatekeeping Design

269

<<< Contents

16

* Index >>>

Multiple Endpoints-Gatekeeping Procedures
treatment arm is truly better than the control arm
Disjunctive: probability of declaring significance on any of the endpoints for which the
treatment arm is truly better than the control arm
FWER: probability of making at least one type I error among all the endpoints
Power and FWER for Individual Gatekeeper Family except the Last Family:
Conjunctive Power: probability of declaring significance on all of the endpoints in the
particular gatekeeper family for which the treatment arm is truly better than the control
arm
FWER: probability of making at least one type I error when testing the endpoints in the
particular gatekeeper family
Power and FWER for the Last Family:
Conjunctive Power: probability of declaring significance on all of the endpoints in the
last family for which the treatment arm is truly better than the control arm
Disjunctive Power: probability of declaring significance on any of the endpoints in the
last family for which the treatment arm is truly better than the control arm
FWER: probability of making at least one type I error when testing the endpoints in the
last family
Marginal Power: probability of declaring significance on the particular endpoint
For the Alzheimer’s disease example, the conjunctive power, which characterizes the
power for the study, is 46.9% for a total sample size of 250. Using Bonferroni test for
the last family, the design has 40.5% probability (disjunctive power for the last family)
to detect the benefit of memantine with respect to at least one of the two secondary
endpoints, SIB and MMSE. It has 25.1% chance (conjunctive power for the last family)

270

16.2 Simulate Serial Gatekeeping Design

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
to declare the benefit of memantine with respect to both of the secondary endpoints.

One can find the sample size to achieve a target power by simulating multiple designs
in a batch mode. For example, one could simulate a batch of designs for a range of

16.2 Simulate Serial Gatekeeping Design

271

<<< Contents

16

* Index >>>

Multiple Endpoints-Gatekeeping Procedures
sample size changing from 250 to 500 in step of 50 as shown in the following window.

Note that a total sample size somewhere between 450 to 500 provides 80% power to
detect the mean differences for both primary endpoints CIBIC-Plus and
ADCS-ADLsev as seen in the following window.

To get a more precise sample size to achieve 80% power, one could simulate a bunch
of designs with the sample size ranging from 450 to 500 in step of 10. One will notice
that a sample size of 480 provides over 80% power to claim the significant differences
with respect to both primary endpoints.

272

16.2 Simulate Serial Gatekeeping Design

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016

One could compare the multiple designs side by side by clicking on the tool button
in the output preview area as follows:

There is a special case where all the endpoints belong to one single family. The
software handle this special case in a particular manner. Intersection-Union test will be
applied to a single family of endpoints and the selected MCP for the last family in the
Simulation Parameter tab is not applicable for this special case. For the Alzheimer
disease example, if we are only interested in testing the two endpoints (CIBIC-Plus
and ADCS-ADLsev) as co-primary endpoints as indicated by the family rank in the
window for Response Generation Info, then the Intersection-Union test will be applied
to the two endpoints so that each endpoint is tested at nominal level α. The detailed
output window is slightly different in case of single family of endpoints as seen in the

16.2 Simulate Serial Gatekeeping Design

273

<<< Contents

16

* Index >>>

Multiple Endpoints-Gatekeeping Procedures
following window.

16.3

Simulate Parallel
Gatekeeping Design
Parallel gatekeeping procedures are often used in clinical trials with several primary
objectives where each individual objective can characterize a successful trial outcome.
In other words, the trial can be declared to be successful if at least one primary
objective is met. Consider a randomized, double blinded and parallel group designed
clinical trial to compare two vaccines against the human papilloma virus. Denote
vaccine T the new vaccine and vaccine C the comparator. The primary objective of this
study is to demonstrate that vaccine T is superior to vaccine C for the antigen type 16
or 18 which account for 70% of cervical cancer cases globally. If the new vaccine
shows superiority over the comparator with respect to either antigen type 16 or 18, it is
of interest to test the superiority of vaccine T to vaccince C for the antigen type 31 or
45. The two types of vaccines are compared based on the immunological response, i.e.
the number of T-cell in the blood, seven months after the vaccination. Assume that the
log transformed data is normally distributed with mean µiT or µiC (i = 1, 2, 3, 4)
where the index 1, 2, 3, and 4 represent the four antigen types respectively. The null

274

16.3 Simulate Parallel Gatekeeping Design

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016

Table 16.1: Mean response and Standard Deviation
Endpoints

Mean for Vaccine C

Mean for Vaccine T

Standard Deviation

Type 16
Type 18
Type 31
Type 45

4
3.35
2
1.42

4.57
4.22
2.34
2

0.5
0.5
0.6
0.3

hypotheses and alternative hypotheses can be formulated as
Hi0 : µiT − µiC ≤ 0 vs Hi1 : µiT − µiC > 0

The parallel gatekeeping test strategy is suitable for this example. The two null
hypotheses H10 and H20 for antigen type 16 and 18 constitute the primary family
which serves as the gatekeeper for the second family of hypotheses which contains
H30 and H40 . Assume that the means and the standard deviations for all four antigen
types are as follows:
Assume that the total sample size is 20 and one-sided significance level is 0.025. To
assess the operating characteristics of the parallel gatekeeping procedures, we first
need to open the simulation window for multiple endpoints. To this end, click on the
Design menu, choose Two Sample for continuous endpoint and then select Multiple
Endpoints from the drop-down list and the following screen will show up.

On the top of the above screen, one need to specify the total number of endpoints. The
16.3 Simulate Parallel Gatekeeping Design

275

<<< Contents

16

* Index >>>

Multiple Endpoints-Gatekeeping Procedures
lower part of the above screen is the Simulation Parameters tab which allows one to
specify the important design parameters including the nominal type I error rate, total
sample size, multiple comparison procedures. Now select Parallel Gatekeeping and
choose Bonferroni for the parallel gatekeeping methods. For the last family, select
Bonferroni as the multiple testing procedure. Next to the Simulation Parameters tab
are two additional tabs: Response Generation Info and Simulation Control Info. We
need to specify the mean responses for each endpoint for both treatment and control
arm as well as the covariance structure among the endpoints. In addition, we need to
specify which family each specific endpoint belongs to in the column with the label
Family Rank in the same table for specifying the mean responses. There are two ways
of specifying the covariance structure: Covariance Matrix or Correlation Matrix. If the
Correlation Matrix option is selected, one needs to input the standard deviation for
each endpoint in the same table for specifying the mean responses. There is a simpler
way to input the standard deviation for each endpoint if all the endpoints share a
common standard deviation. This can be done by checking the box for Common
Standard Deviation and specify the value of the common standard deviation in the box
to the right hand side. One also need to specify the correlations among the endpoints in
the table to the right hand side. Similarly, if all the endpoints have a common
correlation, then we can just check the box for Common Correlation and specify the
value of the common correlation in the box to the right. For the vaccine example,
assume the endpoints share a common mild correlation 0.3. Then the window with
completed inputs for generating data looks like the following screen.

In the window for Simulation Control Info, we can specify the total number of
simulations, refresh frequency, type of random number seed. We can also choose to
save the simulation data for more advanced analyses. After finishing specifying all the
276

16.3 Simulate Parallel Gatekeeping Design

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
input parameter values, click on the Simulate button on the bottom right of the window
to run the simualtions. The progress window will report how many simulations have
been completed as seen in the following screen.

When all the requested simulations have been completed, click on the Close button at
the right bottom of the progress report screen and the preliminary simulation summary
will show up in the output preview window where one can see overall power summary
and the power summary for the primary family as well as the attained overal FWER
etc.

To see the detailed output, we need to save the simulation in the workbook by clicking
on the icon

on the top of the output preview window. A simulation node will be

16.3 Simulate Parallel Gatekeeping Design

277

<<< Contents

16

* Index >>>

Multiple Endpoints-Gatekeeping Procedures
appended in the corresponding workbook in the library as seen in the follow window.

Next double click on the simulation node in the library and the detailed outputs will be
displayed accordingly.

In case of testing multiple endpoints, the power definition is not unique. East provides
the overall power summary and the power summary for each specific family. In the
278

16.3 Simulate Parallel Gatekeeping Design

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
overall power summary table, the following types of power are provided with the
overall FWER: global power, conjunctive power and disjunctive power, which capture
the overall performance of this gatekeeping procedure. The definitions of the powers
are given below:
Overall Power and FWER:
Global: probability of declaring significance on any of the endpoints
Conjunctive: probability of declaring significance on all of the endpoints for which the
treatment arm is truly better than the control arm
Disjunctive: probability of declaring significance on any of the endpoints for which the
treatment arm is truly better than the control arm
FWER: probability of making at least one type I error among all the endpoints
Power and FWER for Individual Gatekeeper Families except the Last Family:
Conjunctive Power: probability of declaring significance on all of the endpoints in the
particular gatekeeper family for which the treatment arm is truly better than the control
arm
Disjunctive Power: probability of declaring significance on any of the endpoints in the
particular gatekeeper family for which the treatment arm is truly better than the control
arm
FWER: probability of making at least one type I error when testing the endpoints in the
particular gatekeeper family
Power and FWER for the Last Gatekeeper Family:
Conjunctive Power: probability of declaring significance on all of the endpoints in the
last family for which the treatment arm is truly better than the control arm
Disjunctive Power: probability of declaring significance on any of the endpoints in the
last family for which the treatment arm is truly better than the control arm
FWER: probability of making at least one type I error when testing the endpoints in the
last family
Marginal Power: probability of declaring significance on the particular endpoint
For the vaccine example, we see that the gatekeeping procedure using Bonferroni test
for both the primary family and the secondary family provides 94.49% power to detect
the difference in at least one of the two antigen types 16 and 18. It provides 52.19%
power to detect the differences in both antigen types. Also note that this gatekeeping
procedure only provides 89.55% power to detect the response difference in any of the
other two antigen types 31 or 45 and only 12.53% to detect both antigen types 31 and
45. The marginal power table displays the probabilities of declaring significance on the
16.3 Simulate Parallel Gatekeeping Design

279

<<< Contents

16

* Index >>>

Multiple Endpoints-Gatekeeping Procedures
Table 16.2: Power Comparisons under Different Correlation Assumptions
Correlation
0.3
0.5
0.8

Primary Family
Disjunct. Conjunct.
0.9449
0.9324
0.9174

0.5219
0.5344
0.5497

Secondary Family
Disjunct. Conjunct.
0.8955
0.8867
0.8855

0.1253
0.1327
0.1413

Overall Power
Disjunct. Conjunct.
0.9449
0.9324
0.9174

0.1012
0.1192
0.1402

particular endpoint after multiplicity adjustment. For example, the power of detecting
antigen type 16 is 55.22%.
If it is of interest to assess the robustness of this procedure with respect to the
correlation among the different endpoints, we can go back to the input window to
change the correlations and run simulation again. To this end, right click on the Sim1
node in the library and select Edit Simulation from the dropdown list. Next click on
the Response Generation Info tab, change the common correlation to 0.5 and click
Simulate button. We can repeat this for a common correlation 0.8. The following table
summarizes the power comparisons under different correlation assumptions. Note that
the disjunctive power decreases as the correlation increases and conjunctive power
increases as the correlation increases.
There are three available parallel gatekeeping methods: Bonferroni, Truncated Holm
and Truncated Hochberg. The multiple comparison procedures applied to the
gatekeeper families need to satisfy the so-called separable condition. A multiple
comparison procedure is separable if the type I error rate under partial null
configuration is strictly less than the nominal level α. Bonferroni is a separable
procedure. However, the regular Holm and Hochberg procedure are not separable and
can’t be applied directly to the gatekeeper families. The truncated versions obtained by
taking the convex combinations of the critical constants for the regular
Holm/Hochberg procedure and Bonferroni procedure are separable and more powerful
than Bonferroni test. The truncation constant leverages the degree of conservativeness.
The larger value of the truncation constant results in more powerful procedure. If the
truncation constant is set to be 1, it reduces to the regular Holm or Hochberg test. To
see this, let’s simulate the design using the truncated Holm procedure for the primary
family and Bonferroni test for the second family for the vaccine example with common
correlation 0.3. Table 3 compares the conjunctive power and disjunctive power for
each family and the overall ones for different truncation parameter values. As the value
of the truncation parameter increases, the conjunctive power for the primary family
increases and the disjunctive power remain unchanged. Both the conjunctive power
280

16.3 Simulate Parallel Gatekeeping Design

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016

Table 16.3: Impact of Truncation Constant in Truncated Holm Procedure on Overal
Power and Power for Each Family
Truncation
Constant
0
0.25
0.5
0.8

Primary Family
Conjunct. Disjunct.
0.5219
0.5647
0.5988
0.6327

0.9449
0.9449
0.9449
0.9449

Secondary Family
Conjunct. Disjunct.
0.1253
0.1229
0.1212
0.1188

0.8955
0.8872
0.8747
0.84

Overall Power
Conjunct. Disjunct.
0.1012
0.1065
0.1108
0.115

0.9449
0.9449
0.9449
0.9449

Table 16.4: Impact of Truncation Constant in Truncated Holm Procedure on Marginal
Power
Truncation
Constant
0
0.25
0.5
0.8

Primary Family
Type 16 Type 18

Secondary Family
Type 31 Type 45

0.5522
0.5886
0.6183
0.6483

0.127
0.1246
0.1227
0.1203

0.9146
0.921
0.9254
0.9293

0.8938
0.8855
0.8731
0.8385

and disjunctive power for the secondary family decrease as we increase the truncation
parameter. The overall conjunctive power also increases but the overall disjunctive
power remains the same with the increase of truncation parameter. Table 4 shows the
marginal powers of this design for different truncation parameter values. The marginal
powers for the two endpoints in the primary family increase. On the other hand, the
marginal powers for the two endpoints in the secondary family decrease.
Table 5 and Table 6 displays the operating characteristics for truncation Hochberg test
with different truncation constant values. Note that both the conjunctive and
disjunctive powers for the primary family increase as the truncation parameter
increases. However, the power for the secondary family decreases with the larger
truncation parameter value. The marginal powers for the primary family and for the
secondary family behave similarly. The overall conjunctive and disjunctive powers
also increase as we increase the truncation parameter.
If all the endpoints belong to one single family, the selected multiple testing
procedures for the last family (Bonferroni, Sidak, Weighted Bonferroni, Holm’s step
16.3 Simulate Parallel Gatekeeping Design

281

<<< Contents

16

* Index >>>

Multiple Endpoints-Gatekeeping Procedures

Table 16.5: Impact of Truncation Constant in Truncated Hochberg Procedure on Overal
Power and Power for Each Family
Truncation
Constant
0
0.25
0.5
0.8

Primary Family
Conjunct. Disjunct.
0.5219
0.5652
0.6007
0.6369

0.9449
0.9455
0.9468
0.9491

Secondary Family
Conjunct. Disjunct.
0.1253
0.1229
0.1213
0.119

0.8955
0.8877
0.8764
0.8439

Overall Power
Conjunct. Disjunct.
0.1012
0.1065
0.1109
0.1152

0.9449
0.9455
0.9468
0.9491

Table 16.6: Impact of Truncation Constant in Truncated Hochberg Procedure on
Marginal Power
Truncation
Constant
0
0.25
0.5
0.8

282

Primary Family
Type 16 Type 18

Secondary Family
Type 31 Type 45

0.5522
0.5892
0.6203
0.6525

0.127
0.1246
0.1228
0.1205

0.9146
0.9215
0.9273
0.9335

16.3 Simulate Parallel Gatekeeping Design

0.8938
0.886
0.8749
0.8424

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
down, Hochberg’s step up, Hommel’s step up, Fixed Sequence or Fallback) will be
applied for multiplicity adjustment. For example, if all the four antigen types in the
vaccine example are treated as primary endpoints as indicated by the family rank in the
window for Response Generation Info and Hochberg’s step up test is selected for the
last family in the window for Simulation Parameters, then the regular Hochberg test
will be applied to the four endpoints for multiplicity adjustment. The detailed output
window is slightly different in case of single family of endpoints as seen in the
following window.

16.3 Simulate Parallel Gatekeeping Design

283

<<< Contents

16

284

* Index >>>

Multiple Endpoints-Gatekeeping Procedures

16.3 Simulate Parallel Gatekeeping Design

<<< Contents

* Index >>>

17
17.1

Continuous Endpoint: Multi-arm
Multi-stage (MaMs) Designs

Design
Consider designing a placebo controlled, double blind and randomized trial to evaluate
the efficacy, pharmacokinetics, safety and tolerability of a new therapy given as
multiple weekly infusions in subjects with a recent acute coronary syndrome. There
are four dose regimens to be investigated. The treatment effect is assessed through the
change in PAV (percent atheroma volume) from baseline to Day 36
post-randomization, as determined by IVUS (intravascular ultrasound). The expected
change in PAV for placebo group and the four dose regimens are: 0, 1,1.1,1.2 and 1.3
and the common standard deviation is 3. The objective of the study is to find the
optimal dose regimen based on the totality of the evidence including benefit-risk
assessment and cost considerations.
To design such a study in EAST, we first need to invoke the design dialog window. To
this end, one needs to click on the Design menu on the top of EAST window, select
Many Samples for continuous type of response and then select Multiple Looks-Group
Sequential in the drop-down list as shown in the following screen shot

After selecting the design, we will see a dialog window for the user to specify the main
design parameters. On the top of the window, we need to specify the number of arms
including the control arm and the number of looks. We also need to specify the
17.1 Design

285

<<< Contents

17

* Index >>>

Continuous Endpoint: Multi-arm Multi-stage (MaMs) Designs
nominal significance level, power or sample size, mean response for each arm,
standard deviation for each arm and allocation ratio of each arm to control arm.
Suppose we would like to compute the sample size to achieve 90% power at one-sided
0.025 significance level. After filling in all the inputs, the design dialog window looks
as follows:

Now click on the compute button at the bottom right of the window to see the total
sample size. Note that we need 519 subjects. Here the power is the probability of
successfully detecting significant difference for at least one active treatment group

286

17.1 Design

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
compared to control arm.

Suppose that now we would like to do a group sequential design with interim looks so
that the trial can be terminated earlier if one or more of the treatment groups
demonstrate overwhelming efficacy. To do this, we change the number of looks to 3.
Note that there is another tab showing up beside the Test Parameter tab. This new tab
with label Boundary is to specify efficacy boundary, futility boundary and the spacing
of looks. Suppose we want to take two interim looks with equally spacing using
O’Brien Fleming spending function from Lan-DeMats 1984. The input window looks

17.1 Design

287

<<< Contents

17

* Index >>>

Continuous Endpoint: Multi-arm Multi-stage (MaMs) Designs
like the following

One can view the boundaries in terms of other scales including score, δ and p-value
scale by clicking the drop-down box for boundary scale. For example, the δ scale
boundary for this study is 2.904, 1.486 and 1.026.
Now click on the compute button on the bottom right of the window to create the
design. Note that the total sample size to achieve 90% power is now 525 compared to
519 for the fixed sample design created earlier. The power definition here is the
probability of successfully detecting any active treatment group which is significantly

288

17.1 Design

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
different from control group at any look.

To view the detailed design output, keep the design in the library and double click the
design node.

The first table shows the sample size information including the maximum sample size
17.1 Design

289

<<< Contents

17

* Index >>>

Continuous Endpoint: Multi-arm Multi-stage (MaMs) Designs
if the trial goes all the way to the end and the sample size per arm. It also shows that
the expected sample size under the global null where none of the active treatment
group is different from control group and the expected sample size under the design
alternative specified by the user. The secondary table displays the look-by-look
information including sample size, cumulative type I error, boundaries, boundary
crossing probability under the global null and under user-specified design alternative.
The boundary crossing probability at each look shows the likelihood of at least one
active treatment group crossing the boundary at that particular look. The third table
shows the Z scale boundary.
One can also add a futility boundary to the design by clicking on the drop-down box
for the futility boundary family. There are three families of boundary for futility:
Spending Function, p value, δ which can be seen as in the following screen

Now click on recalc button to see the cumulative α, efficacy boundary, cumulative β
and futility boundary displayed in the boundary table. The futility boundary is
non-binding and the details on the computation of futility boundary is provided in
Section J.2. The futility boundary is computed such that the probability for the best
performed arm (compared to control arm) to cross the futility boundary at any look is
equal to the incremental β. For example, the probability for the best performed
treatment arm crossing 0.178 is 0.005 under the design alternative. The probability for
the trial to stay in the continuous region at the first look but cross the futility boundary
290

17.1 Design

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
1.647 at second look is 0.04 which is the incremental β spent.

Now click on Compute to see the required sample size to achieve 90% power. Note
that we need a larger sample size 560 to acheive the same target power with futility
boundary compared to the design without futility boundary. However, the expected
sample size under H0 with futility boundary is much smaller than the design without

17.1 Design

291

<<< Contents

17

* Index >>>

Continuous Endpoint: Multi-arm Multi-stage (MaMs) Designs
futility.

One can also build a futility boundary based on δ. For example, one might want to
terminate the study if negative δ is observed. It can be seen that such futility boundary
is more conservative than the one constructed based on O’Brien-Fleming spending
function in the sense that it terminates the trial earlier for futility with smaller

292

17.1 Design

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
probability.

17.2

Simulation
Multi-arm multi-stage design is complex study design with pros and cons. One of the
pros is that it saves subjects compared to conducting separate studies to assess each
treatment to control. It may also be advantageous in terms of enrolment. One of the
cons is that the hurdle for demonstrating statistical significance is higher due to
multiplicity. One needs to evaluate the operating characteristics of such designs
through intensive simulations and to assess the pros and cons of using such design. To
simulate a MAMS design, select the design node in the library and click on the

17.2 Simulation

293

<<< Contents

17

* Index >>>

Continuous Endpoint: Multi-arm Multi-stage (MaMs) Designs
simulation icon located at the top of the library window

This will open the simulation dialog window. There are four windows for inputing
values for simulation parameters: Test Parameters, Boundary, Response Generation
and Simulation Controls. The Test Parameters window provides the total sample size,
test statistics and variance type to be used in simulations. The boundary tab has similar
inputs as that for design. The default inputs for boundary are carried from the design.
One can modify the boundary in the simulation mode without having to go back to
design. One can even add a futility boundary. The next screen is Response Generation
tab where one needs to specify the underlying mean, standard deviation and allocaton
ratio for different treatment arm. The last tab, Simulation Control, allows one to
specify the total number of simulations to be run and to save the intermediate
simulation data for further analysis. For example, we can run simulation under the

294

17.2 Simulation

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
design alternative where the mean differences from control are 1,1.1,1.2 and 1.3.

After filling in all the inputs, click on the Simulation button on the right bottom of the
window. After the simulation is completed, it will show up in the ouput preview area.
To view the detailed simulation output, we can save it into the library and double click
the simulation node.

The first table in the detailed output shows the overall power including global power,
conjunctive power, disjunctive power and FWER. The definitions for different powers
17.2 Simulation

295

<<< Contents

17

* Index >>>

Continuous Endpoint: Multi-arm Multi-stage (MaMs) Designs
are as follows.
Global Power: probability of demonstrating statistical significance on one or
more treatment groups
Conjunctive Power: probability of demonstrating statistical significance on all
treatment groups which are truely effective
Disjunctive Power: probability of demonstrating statistical significance on at
least one treatment group which is truely effective
FWER: probability of incorrectly demonstrating statistical significance on at
least one treatment group which is truely ineffective
For this example, the global power is about 90% which confirms the design power. The
conjunctive power is about 8%.
The second table for probability of trial termination at each look displays the average
sample size, information fraction, cumulative α spent, bounary information,
probability of trial termination at each look. For this example, the chance of
terminating the trial at the very first look is less than 3%. The trial has about 55%
chance to stop early by the second look. It can be seen that the average sample size for
the trial is about 424 which is shown in the last entry of the average sample size
column.
In MAMS design, when the trial stops for efficacy, there might be one or more
treatments crossing the efficacy boundary. Such information is valuable in some
situations. For example, when multiple dose options are desired for patients with
different demographic characteristics, it might be benificial to approve multiple doses
on the product label which will give physicians the options to prescribe the appropriate
dose for a specific patient. In this case, we are not only interested in the overal power
of the study but also interested in the power of claiming efficacy on more than one dose
groups. Such information is summarized in the third table. This table shows the
probability of demonstrating significance on specific number of treatments at each look
and across all looks. For example, the trial has about 90% overall power. With 39%
probability out of 90%, it successully shows significance on only one treatment, 26%
probability on two treatments, 17% on three treatments and about 8.5% for all four
treatments. It also shows such breakdown look by look.
The fourth table summarizes the marginal power for each treatment group look by look
and across all looks. For example, the trial has a marginal power of 29% successfully
demonstrating efficacy for Treatment 1, 38% for Treatment 2, 49% for Treatment 3 and
60% for Treatment 4. The detailed efficacy outcome table as seen in the following
screen provides further efficacy details pertinent to treatment identities. For example,

296

17.2 Simulation

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
the trial has about 3.77% probability of demonstrating efficacy only on Treatment 1,
1.34% for both treatment 1 and 2, 1.7% for treatment 1, 2 and 3. It has 8.5%
probability of showing significance on all four treatments.

17.2.1

Futility Stopping and Dropping the Losers

In the simulation mode, the futility boundary can be utilized in two different manners.
Futility boundary can be used to terminate the trial earlier if the best performing
treatment isn’t doing well. It can also be used to drop arms which are futile along the
way and only continue those treatments which are performing well. The two options
can be accessed through the two radio buttons below the boundary table as seen in the

17.2 Simulation – 17.2.1 Futility Stopping and Dropping the Losers

297

<<< Contents

17

* Index >>>

Continuous Endpoint: Multi-arm Multi-stage (MaMs) Designs
following screen.

Suppose that we would like to incorporate a conservative futility boundary so that we
will terminate the trial if all δs are negative at any interim look. We would specify the

298

17.2 Simulation – 17.2.1 Futility Stopping and Dropping the Losers

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
futility boundary as in the following screen.

Suppose we want to see how often the trial will be terminated early for futility if none
of the treatments are effective. Click on the Simulate button on the right bottom of the
window to start simulation. The detailed output is shown below. Note that the trial will
have about 20% probability of stopping early for futility at the very first look and a
little more than 9% chance of stopping for futility at the second look. The average

17.2 Simulation – 17.2.1 Futility Stopping and Dropping the Losers

299

<<< Contents

17

* Index >>>

Continuous Endpoint: Multi-arm Multi-stage (MaMs) Designs
sample size is about 437 compared to 523 for the design without futility boundary.

Under the design alternative, there is a very small probability (less than 0.5%) to
terminate the trial early for futility as seen from the following table.

For the big companies, a more agressive futility boundary might be desirable so that
trials for treatments with small effect can be terminated early and resources can be
deployed to other programs. Suppose that a futility boundary based on δ = 0.5 to be
used. Under the global null hypothesis, there is almost 70% chance for the trial to stop
early for futility. The average sample size for the study is about 316 compared to 437

300

17.2 Simulation – 17.2.1 Futility Stopping and Dropping the Losers

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
for the design with futility based δ of zero.

The other use of the futility boundary is to drop those arms which are ineffective along
the way. Such design would be more efficient if it is anticipated that there is a strong
heterogeneity among different treatment arms. Suppose that two of the four treatment
regimens have relative smaller treatment effect. For example, the mean difference from
control might be 0.1, 0.1, 1.2,1.3. Without applying any futility, the trial has about
85% and average sample size of 437. If we drop those doses which cross the futility
boundary based on δ of 0.5, the trial has about 82% power and average sample size
328. From the table for probability of trial termination at each look, we can see that the
trial has about 8% chance stopping early at the first interim look of which a little more
than 2% for efficacy and about 5% chance for futility. The trial has 46% chance
stopping earlier at the second look with about 45% for efficacy and less than 2% for
futility. From the table for additional details of probability of trial termination at each
look, we can see that the trial has 2.78% chance stopping for efficacy at the first look
of which 2.55% probability the trial demonstrates significance on only one treatment.
At the second look, the trial has about 45% probability stopping early for efficacy of
which 29% probability it demonstrates significance on one treatment, 15% probability
on two treatments and less than 1% probability on three or four treatments. This design
has marginal power about 50% to detect significance on Treatment 3 and more than
60% probability on Treatment 4. Treatment 1 and Treatment 2 each has 70% chance
being terminated at look 1 for futility. The marginal probability for futility stopping for
each treatment counts those simulated trials for which the particular treatment crosses
the futility boundary but it doesn’t counts those trials for which the particular treatment
17.2 Simulation – 17.2.1 Futility Stopping and Dropping the Losers

301

<<< Contents

17

* Index >>>

Continuous Endpoint: Multi-arm Multi-stage (MaMs) Designs
falls into the continuous region.

The second table in the above screen shows the probability of demonstrating
significance on specific number of treatments. However it doesn’t provide information
on the likelihood of showing efficacy on specific treatment combinations. Such
information is provided in the table for detailed efficacy outcomes. For example, the
trial has about 20% probability of success with Treatment 3 only, 32% with Treatment

302

17.2 Simulation – 17.2.2 Interim Treatment Selection

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
4 only, 30% with both Treatment 3 and Treatment 4.

17.2.2

Interim Treatment Selection

It might be desirable to select promising dose/treatment groups and drop those
ineffective or unsafe groups after reviewing the interim data. In general, there are no
analytical approach to evaluate such complex design. EAST provides the option to
evaluate such adaptive design through intensive simulations. The treatment selection
option can be incorporated by clicking on the
icon located on the top bar of
the main simulation dialog window. The treatment selection window screen looks as
follows. It takes several inputs from the user. The first input is the drop-down box for
the user to specify the look position for performing treatment selection. The next input
is drop-down box for the treatment effect scale. There is a list of treatment effect scale
available as seen in the following screen including Wald Statistic, Estimated Mean,

17.2 Simulation – 17.2.2 Interim Treatment Selection

303

<<< Contents

17

* Index >>>

Continuous Endpoint: Multi-arm Multi-stage (MaMs) Designs
Estimated δ etc.

EAST provides three different dose/treatment selection rules: (1) Select best r
treatment, (2) Select treatments wthin  of the best treatment, (3) Select treatments
greater than threshold ζ where r, , ζ accept inputs from the user. For the same
example, suppose we select two best treatments at the second interim look. The inputs

304

17.2 Simulation – 17.2.2 Interim Treatment Selection

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
are as follows:

17.2 Simulation – 17.2.2 Interim Treatment Selection

305

<<< Contents

17

* Index >>>

Continuous Endpoint: Multi-arm Multi-stage (MaMs) Designs

Now click on simulation button to run simulations. When the simulation is done, save
it into the library and view the detailed output as in the following screen. We can see
that the trial has about 85% overall power to detect significance on at least one
treatment group with an average sample size of 400 (Overall Powers). It has about
50% probability of stopping early by the second look (Prabability of Trial Termination
at Each Look). From the third table (Additional Details of Probability of Trial
Termination at Each Look), it can be seen that the trial has about 52% power to show
significance on only one treatment and 33% probability on two treatments, less than
1% probability on three or four treatments. Marginally Treatment 3 has 53% chance of

306

17.2 Simulation – 17.2.2 Interim Treatment Selection

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
success and Treatment 4 has 66% chance of success.

When we select two best treatments, the sample size for the selected two treatments
remains the same as the designed one. However we can realloacate the remaining
sample size from the dropped groups to the selected arm to gain more power. If the
sample size for the dropped arms are reallocated to the selected arms, the efficacy
stopping boundary for the remaining looks will have to be recomputed in order to
preserve the type I error. This can be achieved by checking the box for Reallocating
remaining sample size to selected arm on the Treatment Selection tab as seen in the

17.2 Simulation – 17.2.2 Interim Treatment Selection

307

<<< Contents

17

* Index >>>

Continuous Endpoint: Multi-arm Multi-stage (MaMs) Designs
following window.

The simulation output is shown in the following screen. Note that the power of the
study is almost 92% in exchange of a higher average sample size 436 compared to the
design without sample size reallocation (85% power and 400 average sample size).
Also with sample size reallocation, the study has a higher power 43% of demonstrating
significance on both Treatment 3 and Treatment 4 compared to the design without
sample size reallocation which has 33% power.

308

17.2 Simulation

<<< Contents

* Index >>>

18

Two-Stage Multi-arm Designs using
p-value combination

18.1

Introduction

In the drug development process, identification of promising therapies and inference
on selected treatments are usually performed in two or more stages. The procedure we
will be discussing here is an adaptive two-stage design that can be used for the
situation of multiple treatments to be compared with a control. This will allow
integration of both the stages within a single confirmatory trial controlling the multiple
level type-I error. After the interim analysis in the first stage, the trial may be
terminated early or continued with a second stage, where the set of treatments may be
reduced due to lack of efficacy or presence of safety problems with some of the
treatments. This procedure in East is highly flexible with respect to stopping rules and
selection criteria and also allows re-estimation of the sample size for the second stage.
Simulations show that the method may be substantially more powerful than classical
one-stage multiple treatment designs with the same total sample size because second
stage sample size is focused on evaluating only the promising treatments identified in
the first stage. This procedure is available for continuous as well discrete endpoint
studies. The current chapter deals with the continuous endpoint studies only; discrete
endpoint studies are handled similarly.

18.2

Study Design

This section will explore different design options available in East with the help of an
example.

18.2.1 Introduction to the
Study
18.2.2 Methodology
18.2.3 Study Design Inputs
18.2.4 Simulating
under Different
Alternatives

18.2.1

Introduction to the Study

Consider designing a placebo controlled, double blind, randomized trial to evaluate
the efficacy, pharmacokinetics, safety and tolerability of a New Chemical Entity (NCE)
given as multiple weekly infusions in subjects with a recent acute coronary syndrome.
There are four dose regimens to be investigated. The treatment effect is assessed
through the change in PAV (percent atheroma volume) from baseline to Day 36
post-randomization, as determined by IVUS (intravascular ultrasound). The expected
change in PAV for placebo group and the four dose regimens are: 0, 1, 1.1, 1.2, 1.3 and
the common standard deviation is 3. The objective of the study is to find the optimal
dose regimen based on the totality of the evidence including benefit-risk assessment
and cost considerations.

18.2.2

Methodology

This is a randomized, double-blind, placebo-controlled study conducted in two parts
using a 2-stage adaptive design. In Stage 1, approximately 250 eligible subjects will be
randomized equally to one of four treatment arms (NCE [doses: 1, 2.5, 5 or 10 mg])
and matching placebo (which is 50 subjects/dose group) After all subjects in Stage 1
18.2 Study Design – 18.2.2 Methodology

309

<<< Contents

18

* Index >>>

Two-Stage Multi-arm Designs using p-value combination
have completed treatment period or discontinued earlier, an interim analysis will be
conducted to
1. compare the means each dose group
2. assess safety within each dose group and
3. drop the less efficient doses
Based on the interim analysis, Stage 2 of the study will either continue with additional
subjects enrolling into 1/2/3 arms (placebo and 1/2/3 favorable, active doses) or the
study will be halted completely if unacceptable toxicity has been observed.
In this example, we will have the following workflow to cover different options
available in East:
1. Start with four arms (4 doses + Placebo)
2. Evaluate the four doses at the interim analysis and based on the Treatment
Selection Rules carry forward some of the doses to the next stage
3. While we select the doses, also increase the sample size of the trial by using
Sample Size Re-estimation (SSR) tool to improve conditional power if
necessary
In a real trial, both the above actions (early stopping as well as sample size
re-estimation) will be performed after observing the interim data.
4. See the final design output in terms of different powers, probabilities of selecting
particular dose combinations
5. See the early stopping boundaries for efficacy and futility on adjusted p-value
scale
6. Monitor the actual trial using the Interim Monitoring tool in East.
Start East. Click Design tab, then click Many Samples in the Continuous category,
and then click Multiple Looks- Combining p-values test.

310

18.2 Study Design – 18.2.2 Methodology

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016

This will bring up the input window of the design with some default values. Enter the
inputs as discussed below.

18.2.3

Study Design Inputs

The four doses of the treatment- 1mg, 2.5mg, 5mg, 10mg will be compared with the
Placebo arm based on their treatment means. Preliminary sample size estimates are
provided to achieve an overall study power of at least 90% at an overall, adequately
adjusted 1-sided type-1 or alpha level of 2.5%, after taking into account all interim and
final hypothesis tests. Note that we always use 1-sided alpha since dose-selection rules
are usually 1-sided.
In Stage 1, 250 subjects are initially planned for enrollment (5 arms with 50 subjects
each). Following an interim analysis conducted after all subjects in Stage 1 have
completed treatment period or discontinued earlier, an additional 225 subjects will be
enrolled into three doses for Stage 2 (placebo and two active doses). So we start with
the total of 250+225 = 475 subjects.
The multiplicity adjustment methods available in East to compute the adjusted p-value
(p-value corresponding to global NULL) are Bonferroni, Sidak, Simes. For discrete
endpoint test, Dunnett Single Step is not available since we will be using Z-statistic.
Let us use the Bonferroni method for this example. The p-values obtained from both
the stages can be combined by using the “Inverse Normal” method. In the “Inverse

18.2 Study Design – 18.2.3 Study Design Inputs

311

<<< Contents

18

* Index >>>

Two-Stage Multi-arm Designs using p-value combination
Normal” method, East first computes the weights as follows:
r
n(1)
(1)
w =
n
And

r
w

(2)

=

n(2)
n

(18.1)

(18.2)

where n(1) and n(2) are the total sample sizes corresponding to Stage 1 and stage 2
respectively and n is the total sample size.
EAST displays these weights by default but they are editable and user can specify any
other weights as long as
2
2
w(1) + w(2) = 1
(18.3)
Final p-value is given by


p = 1 − Φ w(1) Φ−1 (1 − p(1) ) + w(2) Φ−1 (1 − p(2) )

(18.4)

The weights specified on this tab will be used for p-value computation. w(1) will be
used for data before interim look and w(2) will be used for data after interim look.
Thus, according to the samples
p sizes planned
pfor the two stages in this example, the
weights are calculated as (250/475) and (225/475). Note : These weights are
updated by East once we specify the first look position as 250/475 in the Boundary
tab. So leave these as default values for now. Set the Number of Arms as 5 and enter
the rest of the inputs as shown below:

We can certainly have early stopping boundaries for efficacy and/or futility. But
generally, in designs like this, the objective is to select the best dose(s) and not stop
early. So for now, select the Boundary tab and set both the boundary families to
“None”. Also, set the timing of the interim analysis as 0.526 which will be after

312

18.2 Study Design – 18.2.3 Study Design Inputs

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
observing the data on 250 subjects out of 475. Enter 250/475 as shown below. Notice
the updated weights on the bf Test Parameters tab.

The next tab is Response Generation which is used to specify the true underlying
means on the individual dose groups and the initial allocation from which to generate
the simulated data.

One can also generate the mean response for all the arms using a dose-response curve
like 4PL or Emax or Linear or Quadratic. It can be done by checking the box for
Generate Means through DR Curve and entering appropriate parameters for DR

18.2 Study Design – 18.2.3 Study Design Inputs

313

<<< Contents

18

* Index >>>

Two-Stage Multi-arm Designs using p-value combination
model selected.

For this example, we will use the given means and standard deviation and not generate
them using a DR curve. Make sure the means are 0, 1, 1.1, 1.2, 1.3 and SD is 3.
Before we update the Treatment Selection tab, go to the Simulation Control
Parameters tab where we can specify the number of simulations to run, the random
number seed and also to save the intermediate simulation data. For now, enter the
inputs as shown below and keep all other inputs as default.

Click on the Treatment Selection tab. This tab is to select the scale to compute the
treatment-wise effects. For selecting treatments for the second stage, the treatment
effect scale will be required, but the control treatment will not be considered for
selection. It will always be there in the second stage. The list under Treatment
Effect Scale allows you to set the selection rules on different scales. Select
Estimated δ from this list. It means that all the selection rules we specify on this tab
will be in terms of the estimated value of treatment effect, δ, i.e., difference from
314

18.2 Study Design – 18.2.3 Study Design Inputs

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
placebo. Here is a list of all available treatment effect scales:
Estimated Mean, Estimated δ, Estimated δ/σ, Test Statistic, Conditional Power,
Isotonic Mean, Isotonic δ, Isotonic δ/σ.
For more details on these scales, refer to the Appendix K chapter on this method.
The next step is to set the treatment selection rules for the second stage.
Select Best r Treatments: The best treatment is defined as the treatment having the
highest or lowest mean effect. The decision is based on the rejection region. If it
is “Right-Tail” then the highest should be taken as best. If it is “Left-Tail” then
the lowest is taken as best. Note that the rejection region does not affect the
choice of treatment based on conditional power.
Select treatments within  of Best Treatment: Suppose the treatment effect scale is
Estimated δ. If the best treatment has a treatment effect of δb and  is specified
as 0.1 then all the treatments which have a δ as δb − 0.1 or more are chosen for
Stage 2.
Select treatments greater than threshold ζ: The treatments which have the
treatment effect scale greater or less than the threshold (ζ) specified by the user
according to the rejection region. But if the treatment effect scale is chosen as
the conditional power then it will be greater than all the time.
Use R for Treatment Selection: If you wish to define any customized treatment
selection rules, it can be done by writing an R function for those rules to be used
within East. This is possible due to the R Integration feature in East. Refer to the
appendix chapter on R Functions for more details on syntax and use of this
feature. A template file for defining treatment selection rules is also available in
the subfolder RSamples under your East installation directory.

For more details on using R to define Treatment selection rules, refer to
section O.10.
18.2 Study Design – 18.2.3 Study Design Inputs

315

<<< Contents

18

* Index >>>

Two-Stage Multi-arm Designs using p-value combination
Selecting multiple doses (arms) for Stage 2 would be more effective than selecting
just the best one. For this example, select the first rule Select Best r
treatments and set r = 2 which indicates that East will select the best two doses
for Stage 2 out the four. We will leave the Allocation Ratio after Selection as 1 to yield
equal allocation between the control and selected doses in Stage 2.
Click the Simulate button to run the simulations. When the simulations are over, a row
gets added in the Output Preview area. Save this row to the Library by clicking the
icon in the toolbar. Rename this scenario as Best2. Double click it to see the
detailed output.

The first table in the detailed output shows the overall power including global power,
conjunctive power, disjunctive power and FWER. The definitions for different powers
are as follows:
Global Power: probability of demonstrating statistical significance on one or
316

18.2 Study Design – 18.2.3 Study Design Inputs

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
more treatment groups
Conjunctive Power: probability of demonstrating statistical significance on all
treatment groups which are truly effective
Disjunctive Power: probability of demonstrating statistical significance on at
least one treatment group which is truly effective
FWER: probability of incorrectly demonstrating statistical significance on at
least one treatment group which is truly ineffective
For our example, there is 88% global power which is the probability of this design to
reject any null hypothesis, where the set of null hypothesis are the TRUE proportion of
responders at each dose equals that of control. Also shown is conjunctive and
disjunctive power, as well as Family Wise Error Rate (FWER).
The Lookwise Summary table summarizes the number of simulated trials that ended
with a conclusion of efficacy, i.e., rejected any null hypothesis, at each look. In this
example, no simulated trial stopped at the interim analysis with an efficacy conclusion
since there were no stopping boundaries, but 8845 simulations yielded an efficacy
conclusion via the selected dose after Stage 2. This is consistent with the global power.
The table Detailed Efficacy Outcomes for all 10000 Simulations summarizes the
number of simulations for which each individual dose group or pairs of doses were
selected for Stage 2 and yielded an efficacy conclusion. For example, the pair
(2.5mg, 10mg only) was observed to be efficacious in approximately 16% of the trials
(1576/10000).
The next table Marginal Probabilities of Selection and Efficacy, summarizes the
number and percent of simulations in which each dose was selected for Stage 2,
regardless of whether it was found significant at end of Stage 2 or not, as well as the
number and percent of simulations in which each dose was selected and found
significant. Average sample size is also shown. It tells us how frequently the dose
(either alone or with some other dose) was selected and efficacious. For example, dose
10mg was selected in approximately 65% trials and was efficacious in approximately
56% trials. (which is the sum of 631, 1144, 1576, 2254 simulations from previous
table.)
The advantage of 2-stage “treatment selection design” or “drop-the-loser” design is
that it allows to drop the less performing/futile arms based on the interim data and still
preserves the type-1 error as well as achieve the desired power.
In the Best2 scenario, we dropped two doses (r = 2). Suppose, we had decided to
proceed to stage 2 without dropping any doses. In this case, Power would have
18.2 Study Design – 18.2.3 Study Design Inputs

317

<<< Contents

18

* Index >>>

Two-Stage Multi-arm Designs using p-value combination
dropped significantly. To verify this in East, click the
button on the
bottom left corner of the screen. This will take us back to the input window of the last
simulation scenario. Go to Treatment Selection tab and set r = 4 and save it to
Library. Rename this scenario as All4. Double click it to see the detailed output. We
can observe that the power drops from 88% to 78%. That is because the sample size of
225 is being shared among five arms as against three arms in the Best2 case.
Now go back to Treatment Selection tab, set r = 2 as before. Select one more rule,
Select Treatments within  of Best Treatment and set the  value as 0.05. The tab
should look as shown below.

Also set the Starting Seed on Simulation Controls tab to 100. Note that since we
have selected two treatment selection rules, East will simulate two different
scenarios, one for each rule. As we want to compare the results from these two
scenarios, we use the same starting seed. That will ensure same random number
generation and the only difference in results will be the effect of the two rules.
Save these two scenarios in the Library as r=2 and epsilon=0.05, select them and
click the

318

icon in the toolbar to see them side-by-side.

18.2 Study Design – 18.2.3 Study Design Inputs

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016

Notice the powers for the two scenarios. The scenario with the rule of δb − 0.05 yields
more power than the Best2 Scenario. Note that δb is the highest value among the
simulated of δ values for the four doses at the interim look.
You can also view the Output Details of these two scenarios. Select the two nodes as

18.2 Study Design – 18.2.3 Study Design Inputs

319

<<< Contents

18

* Index >>>

Two-Stage Multi-arm Designs using p-value combination
before but this time, click the

icon in the toolbar.

Notice from this comparison, due to a more general rule based on , we can select
multiple doses and not just two. At the same time, the marginal probability of selection
as well as efficacy for each dose drops significantly.

18.2.4

Simulating under Different Alternatives

Since this is a simulation based design, we can perform sensitivity analyses by
changing some of the inputs and observing effects on the overall power and other
output. Let us first make sure that this design preserves the total type1 error. It can be
done by running the simulations under “Null” hypothesis.
Select the last design created which would be epsilon = 0.05 in the Library and click
the
icon. This will take you to the input window of that design. Go to Response
Generation tab and enter the inputs as shown below. Notice that all the means are 0

320

18.2 Study Design – 18.2.4 Simulating under Different Alternatives

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
which means the simulations will be run under NULL assumption.

Run the simulations and go to the detailed output by saving the row from Output
Preview to the Library. Notice the global power and the simulated FWER is less than
0.025 which means the overall type1 error is preserved.

18.3

Sample Size Reestimation

As seen in the previous scenario, the desired power of approximately 92% is achieved
with the sample size of 475 if the initial assumptions (µc = 0, µ1mg = 1,
µ2.5mg = 1.1, µ5mg = 1.2 and µ10mg = 1.3) hold true. But if they do not, then the
original sample size of 475 may be insufficient to achieve 92% power. The adaptive
sample size re-estimation is suited to this purpose. In this approach we start out with a
sample size of 475 subjects, but take an interim look after data are available on 250
subjects. The purpose of the interim look is not to stop the trial early but rather to
examine the interim data and continue enrolling past the planned 475 subjects if the
interim results are promising enough to warrant the additional investment of sample
size. This strategy has the advantage that the sample size is finalized only after a
thorough examination of data from the actual study rather than through making a large
up-front sample size commitment before any data are available. Furthermore, if the
sample size may only be increased but never decreased from the originally planned
475 subjects, there is no loss of efficiency due to overruns. Suppose the mean
responses on the five doses are as shown below. Update the Response Generation tab

18.3 Sample Size Re-estimation

321

<<< Contents

18

* Index >>>

Two-Stage Multi-arm Designs using p-value combination
accordingly and also set the seed as 100 in the Simulation Controls tab.

Run 10000 simulations and save the simulation row to the Library by clicking the

322

18.3 Sample Size Re-estimation

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016

icon in the toolbar. See the details.

Notice that the global power has dropped from 92% to 78%. Let us re-estimate the
sample size to achieve the desired power. Add the Sample Size Re-estimation tab by
clicking the button

. A new tab gets added as shown below.

SSR At: For a K-look group sequential design, one can decide the time at which
conditions for adaptations are to be checked and actual adaptation is to be
18.3 Sample Size Re-estimation

323

<<< Contents

18

* Index >>>

Two-Stage Multi-arm Designs using p-value combination
carried out. This can be done either at some intermediate look or after some
specified information fraction. The possible values of this parameter depend
upon the user choice. The default choice for this design is always the Look #.
and is fixed to 1 since it is always a 2-look design.
Target CP for Re-estimating Sample Size: The primary driver for increasing the
sample size at the interim look is the desired (or target) conditional power or
probability of obtaining a positive outcome at the end of the trial, given the data
already observed. For this example we have set the conditional power at the end
of the trial to be 92%. East then computes the sample size that would be required
to achieve this conditional power.
Maximum Sample Size if Adapt (multiplier; total): As just stated, a new sample
size is computed at the interim analysis on the basis of the observed data so as to
achieve some target conditional power. However the sample size so obtained
will be overruled unless it falls between pre-specified minimum and maximum
values. For this example, let us use the multiplier as 2 indicating that we intend
to double the original sample size if the results are promising. The range of
allowable sample sizes is [475, 950]. If the newly computed sample size falls
outside this range, it will be reset to the appropriate boundary of the range. For
example, if the sample size needed to achieve the desired 90% conditional power
is less than 475, the new sample size will be reset to 475. In other words we will
not decrease the sample size from what was specified initially. On the other
hand, the upper bound of 950 subjects demonstrates that the sponsor is prepared
to double the sample size in order to achieve the desired 90% conditional power.
But if 90% conditional power requires more than 950 subjects, the sample size
will be reset to 950, the maximum allowed.
Promising Zone Scale: One can define the promising zone as an interval based on
conditional power, test statistic, or estimated δ/σ. The input fields change
according to this choice. The decision of altering the sample size is taken based
on whether the interim value of conditional power / test statistic / δ/σ lies in this
interval or not. Let us keep the default scale which is Conditional Power.
Promising Zone: Minimum/Maximum Conditional Power (CP): The sample size
will only be altered if the estimate of CP at the interim analysis lies in a
pre-specified range, referred to as the “Promising Zone”. Here the promising
zone is 0.30 − 0.90. The idea is to invest in the trial in stages. Prior to the
interim analysis the sponsor is only committed to a sample size of 475 subjects.
If, however, the results at the interim analysis appear reasonably promising, the

324

18.3 Sample Size Re-estimation

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
sponsor would be willing to make a larger investment in the trial and thereby
improve the chances of success. Here we have somewhat arbitrarily set the lower
bound for a promising interim outcome to be CP = 0.30. An estimate
CP < 0.30 at the interim analysis is not considered promising enough to
warrant a sample size increase. It might sometimes be desirable to also specify
an upper bound beyond which no sample size change will be made. Here we
have set that upper bound of the promising zone at CP = 0.90. In effect we
have partitioned the range of possible values for conditional power at the interim
analysis into three zones; unfavorable (CP < 0.3), promising
(0.3 ≤ CP < 0.9), and favorable (CP ≥ 0.9). Sample size adaptations are
made only if the interim CP falls in the promising zone at the interim analysis.
The promising zone defined on the Test Statistic scale or the Estimated δ/σ scale
works similarly.
SSR Function in Promising Zone: The behavior in the promising zone can either be
defined by a continuous function or a step function. The default is continuous
where East accepts the two quantities - (Multiplier, Target CP) and re-estimates
the sample size depending upon the interim value of CP/test statistic/effect size.
The SSR function can be defined as a step-function as well. This can be done
with a single piece or with multiple pieces. For each piece, define the step
function in terms of:
the interval of CP/test statistic/δ/σ. This depends upon the choice of
promising zone scale.
the value of re-estimated sample size in that interval.
for single piece, just the total re-estimated sample size is required as an
input.
If the interim value of CP/ test statistic/δ/σ lies in the promising zone then the
re-estimation will be done using this step function.
Let us set the inputs on Sample Size Re-estimation tab as shown below. Just for the
comparison purpose, also run the simulations without adaptation. Both the scenarios

18.3 Sample Size Re-estimation

325

<<< Contents

18

* Index >>>

Two-Stage Multi-arm Designs using p-value combination
can also be run together by entering two values 1, 2 in the cell for Multiplier.

Run 10000 simulations and see the Details.
With Sample Size Re-estimation

326

18.3 Sample Size Re-estimation

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016

Without Sample Size Re-estimation

We observe from the table the power of adaptive implementation is approximately
85% which is almost 8% improvement over the non-adaptive design. This increase in
power has come at an average cost of 540-475 = 65 additional subjects. Next we
observe from the Zone-wise Averages table that 1610 of 10000 trials (16%)
underwent sample size re-estimation (Total Simulation Count in the “Promising
Zone”) and of those 1610 trials, 89% were able to reject the Global null hypothesis.
The average sample size, conditional on adaptation is 882.

18.4

Adding Early
Stopping Boundaries

One can also incorporate stopping boundaries to stop at the interim early for efficacy
or futility. The efficacy boundary can be defined based on Adjusted p-value scale
whereas futility boundary can be on Adjusted p-value or δ/σ scale.
Click the
button on the bottom left corner of the screen. This will take
you back to the input window of the last simulation scenario. Go to Boundary tab and
set Efficacy and Futility boundaries to “Adjusted p-value”. These boundaries are for
18.4 Adding Early Stopping Boundaries

327

<<< Contents

18

* Index >>>

Two-Stage Multi-arm Designs using p-value combination
early stopping at look1. As the note on this tab says:
If any one adjusted p-value is ≤ efficacy p-value boundary then stop the trial for
efficacy
If only all the adjusted p-values are > futility p-value then stop the trial for
futility. Else carry forward all the treatments to the next step of treatment
selection.
Stopping early for efficacy or futility is step which is carried out before applying the
treatment selection rules. The simulation output has the same explanation as above
except the Lookwise Summary table may have some trials stopped at the first look
due to efficacy or futility.

18.5

Interim Monitoring
with Treatment
Selection

Select the simulation node with SSR implementation and click the

icon. It will

invoke the Interim Monitoring dashboard. Click the
icon to
open the Test Statistic Calculator. The “Sample Size” column is filled out according
to the originally planned design (50/arm). Enter the data as shown below:

Click Recalc to calculate the test statistic as well as the raw p-values. Notice that the
p-values for 1mg and 2.5mg are 0.069 and0.030 respectively which are greater than
0.025. We will drop these doses in the second stage. On clicking OK, it updates the
dashboard. The overall adjusted p-value is 0.067.
328

18.5 Interim Monitoring with Treatment Selection

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016

Open the test statistic calculator for the second look and enter the following
information and also drop the two doses 1mg and 2.5mg using the dropdown of

18.5 Interim Monitoring with Treatment Selection

329

<<< Contents

18

* Index >>>

Two-Stage Multi-arm Designs using p-value combination
“Action”. Click Recalc to calculate the test statistic as well as the raw p-values.

On clicking OK, it updates the dashboard. Observe that the adjusted p-value for 10mg
crosses the efficacy boundary. It can also be observed in the Stopping Boundaries
chart.

330

18.5 Interim Monitoring with Treatment Selection

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016

The final p-value adjusted for multiple treatments is 0.00353.

18.5 Interim Monitoring with Treatment Selection

331

<<< Contents

* Index >>>

19

Normal Superiority Regression

Linear regression models are used to examine the relationship between a response
variable and one or more explanatory variables assuming that the relationship is linear.
In this chapter, we discuss the design of three types of linear regression models. In
Section 19.1, we examine the problem of testing a single slope in a simple linear
regression model involving one continuous covariate. In Section 19.2, we examine the
problem of testing the equality of two slopes in a linear regression model with only one
observation per subject. Finally, in Section 19.3, we examine the problem of testing
the equality of two slopes in a linear regression repeated measures model, applied to a
longitudinal setting.

19.1

Linear Regression,
Single Slope

19.1.1 Trial Design

We assume that the observed value of a response variable Y is a linear function of an
explanatory variable X plus random noise. For each of the i = 1, . . . , n subjects in a
study
Yi = γ + θ Xi + i
Here the i are independent normal random variables with E(i ) = 0 and
V ar(i ) = σ2 . We follow Dupont et al. (1998) and emphasize a slight distinction
between observational and experimental studies. In an observational study, the values
Xi are attributes of randomly chosen subjects and their possible values are not known
to the investigator at the time of a study design. In an experimental study, a subject is
randomly assigned (with possibly different probabilities) to one of the predefined
experimental conditions. Each of these conditions is characterized by a certain value of
explanatory variable X that is completely defined at the time of the study design. In
both cases the value Xi characterizing either an attribute or experimental exposure of
subject i is a random variable with a variance σx2 .
We are interested in testing that the slope θ is equal to a specified value θ0 . Thus we
test the null hypothesis H0 : θ = θ0 against the two-sided alternative H1 : θ 6= θ0 or a
one-sided alternative hypothesis H1 : θ < θ0 or H1 : θ > θ0 .
Let θ̂ denote the estimate of θ, and let σ̂2 and σ̂x2 denote the estimates of σ2 and σx2
based on n observations. The variance of θ̂ is
σ2 =

σ2
.
nσx2

(19.1)

The test statistic is defined as
Z = (θ̂ − θ0 )/σ̂,
332

19.1 Linear Regression, Single Slope

(19.2)

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
where
σ̂ 2 =

σ̂2
nσ̂x2

is the estimate of the variance of θ̂ based on n observations. Notice that the test
statistic is centered so as to have a mean of zero under the null hypothesis.
We want to design the study so the power is attained when θ = θ1 . The power depends
on θ0 , θ1 , σx , and σ through θ0 − θ1 and σx /σ .

19.1.1

Trial Design

During the development of medications, we often want to model the dose-response
relationship, which may be done by estimating the slope of the regression, where Y is
the appropriate response variable and the explanatory variable X is a set of specified
doses. Consider a clinical trial involving four doses of a medication. The doses and
randomization of subjects across the doses have been chosen so that the standard
deviation σx = 9. Based on prior studies, it is assumed that σ = 15. If there is no
dose response, the slope is equal to 0. Thus we will test the null hypothesis H0 : θ = 0
against a two-sided alternative H1 : θ 6= 0. The study is to be designed to have 90%
power at the alternative θ1 = 0.5 with a type-1 error rate of 5%.
Start East afresh. Click Continuous: Regression on the Design tab and then click
Single-Arm Design: Linear Regression - Single Slope.

This will launch a new input window. Select the 2-Sided for Test Type. Enter 0.05
and 0.9 for Type I Error (α) and Power, respectively. Enter the values of θ0 = 0,

19.1 Linear Regression, Single Slope – 19.1.1 Trial Design

333

<<< Contents

19

* Index >>>

Normal Superiority Regression
θ1 = 0.5, σx = 9, and σ = 15.

Click Compute. This will calculate the sample size for this design and the output is
shown as a row in the Output Preview located in the lower pane of this window. The
computed sample size (119 subjects) is highlighted in yellow.

Des 1 requires 119 subjects in order to attain 90% power. Select this design by clicking
anywhere along the row in the Output Preview and click

334

19.1 Linear Regression, Single Slope – 19.1.1 Trial Design

. Some of the design

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
details will be displayed in the upper pane, labeled as Output Summary.

In the Output Preview toolbar, click
to save this design to Wbk1 in the
Library. Now double-click on Des 1 in Library. You will see a summary of the
design.

19.2

Linear Regression
for Comparing Two
Slopes

19.2.1 Trial Design

In some experimental situations, we are interested in comparing the slopes of two
regression lines. The regression model relates the response variable Y to the
explanatory variable X using the model Yil = γ + θi Xil + il , where the error il has a
normal distribution with mean zero and an unknown variance σ2 for Subject l in
2
2
Treatment i, i = c, t and l = 1, . . . , ni . Let σxc
and σxt
denote the variance of the
explanatory variable X for control (c) and treatment (t), respectively. We are interested
in testing the equality of the slopes θc and θt . Thus we test the null hypothesis
19.2 Linear Regression for Comparing Two Slopes

335

<<< Contents

19

* Index >>>

Normal Superiority Regression
H0 : θc = θt against the two-sided alternative H1 : θc 6= θt or a one-sided alternative
hypothesis H1 : θc < θt or H1 : θc > θt .
2
2
, denote the
, and σ̂xt
Let θ̂c and θ̂t denote the estimates of θc and θt , and let σ̂2 , σ̂xc
2
2
2
estimates of σ , σxc , and σxt , based on nc and nt observations, respectively. The
variance of θ̂i is
σ2
σi2 =
2 .
ni σxi

Let n = nc + nt and let r = nt /n. Then, the test statistic is
n1/2 (θ̂t − θ̂c )

Zj =
σ̂

19.2.1



1
2
(1−r)σ̂xc

+

1
2
rσ̂xt

1/2 .

(19.3)

Trial Design

We want to design the study so the power is attained for specified values of θc and θt .
The power depends on θt , θc , σxc , σxt2 , and σ through θt − θc , σxc /σ , and σxt /σ .
Suppose that a medication was found to have a response that depends on the level of a
certain laboratory parameter. It was decided to develop a new formulation for which
this interaction is decreased. The explanatory variable is the baseline value of the
laboratory parameter. The study is designed with θt = 0.5, θc = 1, σxc = σxt = 6, and
σ = 10. We examine the slopes of the two regressions by testing the null hypothesis
H0 : θt = θc . Although we hope to decrease the slope, we test the null hypothesis
against the two-sided alternative H1 : θt 6= θc .
Start East afresh. Click Continuous: Regression on the Design tab and then click
Parallel Design: Linear Regression - Difference of Slopes.
This will launch a new input window. Select 2-Sided for Test Type. Enter 0.05 and
0.9 for Type I Error (α) and Power, respectively. Select Individual Slopes for
Input Method, and enter the values of θc = 1, θt = 0.5, σxc = 6, σxt = 6, and

336

19.2 Linear Regression for Comparing Two Slopes – 19.2.1 Trial Design

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
σ = 10.

Click Compute. This will calculate the sample size for this design and the output is
shown as a row in the Output Preview located in the lower pane of this window. The
computed sample size (469) is highlighted in yellow.

Des 1 requires 469 subjects in order to attain 90% power. Select this design by clicking
anywhere along the row in the Output Preview and click

. Some of the design

19.2 Linear Regression for Comparing Two Slopes – 19.2.1 Trial Design

337

<<< Contents

19

* Index >>>

Normal Superiority Regression
details will be displayed in the upper pane, labeled as Output Summary.

19.3

Repeated Measures
for Comparing Two
Slopes

In many clinical trials, each subject is randomized to one of two groups, and responses
are collected at various timepoints on the same individual over the course of the trial.
In these “longitudinal” trials, we are interested in testing the equality of slopes, or
mean response changes per unit time, between the treatment group (t) and the control
group (c). A major difficulty associated with designing such studies is the fact that the
data are independent across individuals, but the repeated measurements on the same
individual are correlated. The sample size computations then depend on within – and
between – subject variance components that are often unknown at the design stage.
One way to tackle this problem is to use prior estimates of these variance components
(also known as nuisance parameters) from other studies, or from pilot data.
Suppose each patient is randomized to either group c or group t. The data consist of a
series of repeated measurements on the response variable for each patient over time.
Let M denote the total number of measurements, inclusive of the initial baseline
measurement, intended to be taken on each subject. These M measurements will be
taken at times vm , m = 1, 2, . . . M , relative to the time of randomization, where
v1 = 0. A linear mixed effects model is usually adopted for analyzing such data. Let
Yilm denote the response of subject l, belonging to group i, at time point vm . Then the
model asserts that
Yclm = γc + θc vm + al + bl vm + elm
(19.4)
for the control group, and
Ytlm = γt + θt vm + al + bl vm + elm

338

19.3 Repeated Measures

(19.5)

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
for the treatment group, where the random effect (al , bl )0 is multivariate normal with
mean (0, 0)0 and variance-covariance matrix

G=

σa2
σab

σab
σb2


,

2
2
denotes the “within – subject”
). In this model, σw
and the elm ’s are all iid N (0, σw
variability, attributable to repeated measurements on the same subject, while G denotes
the “between – subjects” variability, attributable to the heterogeneity of the population
being studied.

Define
δ = θt − θc
We are interested in testing
H0 : δ = 0
against the two-sided alternative
H1 : δ 6= 0
or against one-sided alternative hypotheses of the form
H1 : δ > 0 or H1 : δ < 0

Let (θ̂C , θ̂T ) be the maximum likelihood estimates of (θC , θT ), based on a enrollment
of (nC , nT ), respectively. The estimate of the difference of slopes is
δ̂ = θ̂T − θ̂C

(19.6)

and its standard error is denoted by se(δ̂). The test statistic is the familiar Wald statistic
Z=

19.3.1

δ̂
se(δ̂)

(19.7)

Trial Design

Consider a trial to compare an analgesic to placebo in the treatment of chronic pain
using a 10 cm visual analogue scale (VAS). Measurements are taken on each subject at
baseline and once a month for six months. Thus M = 7 and S = 6. It is assumed from
past data that σw = 4 and σb = 6. We wish to test the null hypothesis H0 : θt = θc
19.3 Repeated Measures – 19.3.1 Trial Design

339

<<< Contents

19

* Index >>>

Normal Superiority Regression
with a two-sided level-0.05 test having 90% power to detect a 1 cm/month decline in
slope, with θc = 2 and θt = 1 under H1 .
Start East afresh. Click Continuous: Regression on the Design tab, and then click
Parallel Design: Repeated Measures - Difference of Slopes.

This will launch a new input window. Select 2-Sided for Test Type. Enter 0.05 and
0.9 for Type I Error (α) and Power, respectively. Select Individual Slopes for
Input Method. Enter the values of θc = 2, θt = 1, Duration of Follow up
(S) = 6, Number of Measurements (M) = 7, σw = 4, and σb = 6.

Click Compute. This will calculate the sample size for this design and the output is
shown as a row in the Output Preview located in the lower pane of this window. The

340

19.3 Repeated Measures – 19.3.1 Trial Design

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
computed sample size (1538) is highlighted in yellow.

Des 1 requires 1538 completers in order to attain 90% power. Select this design by
. Some of
clicking anywhere along the row in the Output Preview and click
the design details will be displayed in the upper pane, labeled as Output Summary.

19.3 Repeated Measures – 19.3.1 Trial Design

341

<<< Contents

* Index >>>

Volume 3

Binomial and Categorical Endpoints

20 Introduction to Volume 3

344

21 Tutorial: Binomial Endpoint

350

22 Binomial Superiority One-Sample

363

23 Binomial Superiority Two-Sample

394

24 Binomial Non-Inferiority Two-Sample

474

25 Binomial Equivalence Two-Sample
26 Binomial Superiority n-Sample

535
549

27 Multiple Comparison Procedures for Discrete Data

577

28 Multiple Endpoints-Gatekeeping Procedures for Discrete
Data
601
29 Two-Stage Multi-arm Designs using p-value combination
30 Binomial Superiority Regression
31 Agreement

649

644

621

<<< Contents

* Index >>>

32 Dose Escalation

658

343

<<< Contents

* Index >>>

20

Introduction to Volume 3

This volume describes the procedures for discrete endpoints (binomial) applicable to
one-sample, two-samples, many-samples, regression and agreement situations. All the
three type of designs - superiority, non-inferiority and equivalence are discussed in
detail.
Chapter 21 introduces you to East on the Architect platform, using an example clinical
trial to test difference of proportions.
Chapter 22 deals with the design and interim monitoring of two types of tests involving
binomial response rates that can be described as superiority one sample situation.
Section 22.1 discusses designs in which an observed binomial response rate is
compared to a fixed response rate, possibly derived from historical data. Section 22.2
deals with McNemar’s test for comparing matched pairs of binomial responses.
Chapter 38 discusses in detail the Simon’s Two stage design.
Chapter 23 discusses the superiority two-sample situation where the aim is to compare
independent samples from two populations in terms of the proportion of sampling
units presenting a given trait. East supports the design and interim monitoring of
clinical trials in which this comparison is based on the difference of proportions, the
ratio of proportions, or the odds ratio of the two populations, common odds ratio of the
two populations. The four cases are discussed in Sections 23.1, 23.2, 23.3 and 23.4,
respectively. Section 23.5 discusses the Fisher’s exact test for single look design.
Chapter 24 presents an account of designing and monitoring non-inferiority trials in
which the non-inferiority margin is expressed as either a difference, a ratio, or an odds
ratio of two binomial proportions. The difference is examined in Section 24.1. This is
followed by two formulations for the ratio: the Wald formulation in Section 24.2 and
the Farrington-Manning formulation in Section 24.3. The odds ratio formulation is
presented in Section 24.4.
Chapter 25 narrates the details of the design and interim monitoring in equivalence
two-sample situation where the goal is neither establishing superiority nor
non-inferiority, but equivalence. Examples of this include showing that an aggressive
therapy yields a similar rate of a specified adverse event to the established control,
such as the bleeding rates associated with thrombolytic therapy or cardiac outcomes
with a new stent.
Chapter 26 details the design and interim monitoring superiority k-sample
experimental situations where there are several binomial distributions indexed by an
344

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
ordinal variable and where it is required to examine changes in the probabilities of
success as the levels of the indexing variable changes. Examples of this include the
examination of a dose-related presence of a response or a particular side effect,
dose-related tumorgenicity, or presence of fetal malformations relative to levels of
maternal exposure to a particular toxin, such as alcohol, tobacco, or environmental
factors.
Chapter 27 details the Multiple Comparison Procedures (MCP) for discrete data. It is
often the case that multiple objectives are to be addressed in one single trial. These
objectives are formulated into a family of hypotheses. Multiple comparison (MC)
procedures provide a guard against inflation of type I error while testing these multiple
hypotheses. East supports several parametric and p-value based MC procedures. This
chapter explains how to design a study using a chosen MC procedure that strongly
maintains FWER.
Chapter 30 describes how East may be used to design and monitor two-arm
randomized clinical trials with a binomial endpoints, while adjusting for the effects of
covariates through the logistic regression model. These methods are limited to binary
and categorical covariates only. A more general approach, not limited to categorical
covariates, is to base the design on statistical information rather than sample size. This
approach is further explained in Chapter 59
Chapter 31 discusses the tests available to check the inter-rater reliability. In some
experimental situations, to check inter-rater reliability, independent sets of
measurements are taken by more than one rater and the responses are checked for
agreement. For a binary response, Cohen’s Kappa test for binary ratings can be used to
check inter-rater reliability.
Chapter 32 deals with the design, simulation, and interim monitoring of Phase 1 dose
escalation trials. One of the primary goals of Phase I trials in oncology is to find the
maximum tolerated dose (MTD). Sections 32.1, 32.2, 32.3 and 32.4 discusses the four
commonly used dose escalation methods - 3+3, Continual Reassessment Method
(CRM), modified Toxicity Probability Interval (mTPI) and Bayesian Logistic
Regression Model (BLRM).

345

<<< Contents

20
20.1

* Index >>>

Introduction to Volume 3
Settings

Click the

icon in the Home menu to adjust default values in East 6.

The options provided in the Display Precision tab are used to set the decimal places of
numerical quantities. The settings indicated here will be applicable to all tests in East 6
under the Design and Analysis menus.
All these numerical quantities are grouped in different categories depending upon their
usage. For example, all the average and expected sample sizes computed at simulation
or design stage are grouped together under the category ”Expected Sample Size”. So to
view any of these quantities with greater or lesser precision, select the corresponding
category and change the decimal places to any value between 0 to 9.
The General tab has the provision of adjusting the paths for storing workbooks, files,
and temporary files. These paths will remain throughout the current and future sessions
even after East is closed. This is the place where we need to specify the installation
directory of the R software in order to use the feature of R Integration in East 6.

346

20.1 Settings

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
The Design Defaults is where the user can change the settings for trial design:

Under the Common tab, default values can be set for input design parameters.
You can set up the default choices for the design type, computation type, test type and
the default values for type-I error, power, sample size and allocation ratio. When a new
design is invoked, the input window will show these default choices.
Time Limit for Exact Computation
This time limit is applicable only to exact designs and charts. Exact methods are
computationally intensive and can easily consume several hours of computation
time if the likely sample sizes are very large. You can set the maximum time
available for any exact test in terms of minutes. If the time limit is reached, the
test is terminated and no exact results are provided. Minimum and default value
is 5 minutes.
Type I Error for MCP
If user has selected 2-sided test as default in global settings, then any MCP will
use half of the alpha from settings as default since MCP is always a 1-sided test.
Sample Size Rounding
Notice that by default, East displays the integer sample size (events) by rounding
up the actual number computed by the East algorithm. In this case, the
look-by-look sample size is rounded off to the nearest integer. One can also see
the original floating point sample size by selecting the option ”Do not round
sample size/events”.
20.1 Settings

347

<<< Contents

20

* Index >>>

Introduction to Volume 3
Under the Group Sequential tab, defaults are set for boundary information. When a
new design is invoked, input fields will contain these specified defaults. We can also
set the option to view the Boundary Crossing Probabilities in the detailed output. It can
be either Incremental or Cumulative.

Simulation Defaults is where we can change the settings for simulation:

If the checkbox for ”Save summary statistics for every simulation” is checked, then
East simulations will by default save the per simulation summary data for all the
348

20.1 Settings

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
simulations in the form of a case data.
If the checkbox for ”Suppress All Intermediate Output” is checked, the intermediate
simulation output window will be always suppressed and you will be directed to the
Output Preview area.
The Chart Settings allows defaults to be set for the following quantities on East6
charts:

We suggest that you do not alter the defaults until you are quite familiar with the
software.

20.1 Settings

349

<<< Contents

* Index >>>

21

Tutorial: Binomial Endpoint

This tutorial introduces you to East on the Architect platform, using an example
clinical trial to test difference of proportions.

21.1

Fixed Sample
Design

When you open East, you will see the following screen below.

By default, the Design tab in the ribbon will be active. The items on this tab are
grouped under the following categories of endpoints: Continuous, Discrete, Count,
Survival, and General. Click Discrete: Two Samples, and then Parallel Design:
Difference of Proportions.

350

21.1 Fixed Sample Design

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
The following input window will appear.

By default, the radio button for Sample Size (n) is selected, indicating that it is the
variable to be computed. The default values shown for Type I Error and Power are
0.025 and 0.9. Keep the same for this design. Since the default inputs provide all of
the necessary input information, you are ready to compute sample size by clicking the
Compute button. The calculated result will appear in the Output Preview pane, as
shown below.

This single row of output contains relevant details of inputs and the computed result of
total sample size (and total completers) of 45. Select this row and save it in the
Library under a workbook by clicking
and click

icon. Select this node in the Library,

icon to display a summary of the design details in the upper pane

21.1 Fixed Sample Design

351

<<< Contents

21

* Index >>>

Tutorial: Binomial Endpoint
(known as Output Summary).

The discussion so far gives you a quick feel of the software for computing sample size
for a single look design. We will describe further features in an example for a group
sequential design in the next section.

21.2

Group Sequential
Design for
a Binomial
Superiority Trial

21.2.1

Study Background

Design objectives and interim results from CAPTURE, a prospective randomized trial
of placebo versus Abciximab for patients with refractory unstable angina were
presented at a workshop on clinical trial data monitoring committees (Anderson,
2002). The primary endpoint was reduction in death or MI within 30 days of entering
the study. The study was designed for 80% power to detect a reduction in the event rate
from 15% on the placebo arm to 10% on the Abciximab arm. A two-sided test with a
type-1 error of 5% was used. We will illustrate various design, simulation and interim
monitoring features of East for studies with binomial endpoints with the help of this
example.
Let us modify Des1 to enter above inputs and create a group sequential design for
icon.
CAPTURE trial. Select the node for Des1 in the Library and click the
This will take you back to the input window of Des1. Alternatively, you can also click
the

352

button on the left hand bottom of East screen to go to the latest

21.2 Group Sequential Design – 21.2.1 Study Background

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
input window.
Select 2-Sided for Test Type, enter 0.05 for Type I Error, 0.8 for Power, specify
the Prop. under Control be 0.15, the Prop. under Treatment to be 0.1. Next,
change the Number of Looks to be 3. You will see a new tab, Boundary Info, added
to the input dialog box.

Click the Boundary Info tab, and you will see the following screen. On this tab, you
can choose whether to specify stopping boundaries for efficacy, or futility, or both. For
this trial, choose efficacy boundaries only, and leave all other default values. We will
implement the Lan-Demets (O’Brien-Fleming) spending function, with equally spaced
looks.

On the Boundary Info tab, click on the icons

or

21.2 Group Sequential Design – 21.2.1 Study Background

, to generate the

353

<<< Contents

21

* Index >>>

Tutorial: Binomial Endpoint
following charts.

354

21.2 Group Sequential Design – 21.2.1 Study Background

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
You can also view these boundaries on different scales like δ scale or p-value scale.
Select the desired scale from the dropdown. Let us see the boundaries on δ scale.

Click Compute. This will add another row for Des2 in the Output Preview area.
The maximum sample size required under this design is 1384. The expected sample
sizes under H0 and H1 are 1378 and 1183, respectively. Click
in the Output
Preview toolbar to save this design to Wbk1 in the Library. Double-click on Des2 to
generate the following output.

21.2 Group Sequential Design – 21.2.2 Creating multiple designs easily

355

<<< Contents

21

* Index >>>

Tutorial: Binomial Endpoint
21.2.2

Creating multiple designs easily

In East, it is easy to create multiple designs by inputting multiple parameter values. In
the trial described above, suppose we want to generate designs for all combinations of
the following parameter values: Power = 0.8, 0.9, and Difference in Proportions =
−0.04, −0.03, −0.02, −0.01. The number of such combinations is 2 × 4 = 8.
East can create all 8 designs by a single specification in the input dialog box. Select
Des2 and click
icon. Enter the above values in the Test Parameters tab as
shown below. The values of Power have been entered as a list of comma-separated
values, while Difference in Proportions has been entered as a colon-separated range
of values: -0.04 to -0.01 in steps of 0.01.

Now click compute. East computes all 8 designs Des3-Des10, and displays them in the
Output Preview as shown below. Click

to maximize the Output Preview.

Select the first Des2 to Des4 using the Ctrl key, and click

356

to display a summary

21.2 Group Sequential Design – 21.2.2 Creating multiple designs easily

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
of the design details in the upper pane, known as the Output Summary.

Des2 is already saved in the workbook. We will use this design for simulation and
interim monitoring, as described below. Now that you have saved Des2, delete all
designs from the Output Preview before continuing, by selecting all designs with the
Shift key, and clicking

21.2.3

in the toolbar.

Simulation

Right-click Des2 in the Library, and select Simulate. Alternatively, you can select
Des2 and click the

icon.

We will carry out a simulation of Des2 to check whether it preserves the specified
power. Click Simulate. East will execute by default 10000 simulations with the
specified inputs. Close the intermediate window after examining the results. A row
labeled as Sim1 will be added in the Output Preview.

21.2 Group Sequential Design – 21.2.3 Simulation

357

<<< Contents

21

* Index >>>

Tutorial: Binomial Endpoint
Click the
icon to save this simulation to the Library. A simulation sub-node,
Sim1, will be added under Des2 node. Double clicking on this node will display the
detailed simulation output in the work area.

In 80.46% of the simulated trials, the null hypothesis was rejected. This tells us that
the design power of 80% is achieved. Simulations is a tool which can be used to
assess the study design under various scenarios. The next section will explore interim
monitoring with this design.

358

21.2 Group Sequential Design – 21.2.3 Simulation

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016

21.2.4

Interim Monitoring

Right-click Des2 in the Library and select Interim Monitoring. Click the
to open the Test Statistic Calculator. Suppose that after 461
subjects, at the first look, you have observed 34 out of 230 responding on Control arm
and 23 out of 231 responding on Treatment arm. The calculator computes the
difference in proportions as −0.048 and its standard error of 0.031.

Click OK to update the IM Dashboard.

The Stopping Boundaries and Error Spending Function charts on the left:

21.2 Group Sequential Design – 21.2.4 Interim Monitoring

359

<<< Contents

21

* Index >>>

Tutorial: Binomial Endpoint

The Conditional Power and Confidence Intervals charts on the right:

Suppose that after 923 subjects, at the second look, you have observed 69 out of 461
responding on Control arm and 23 out of 462 responding on Treatment arm. The
360

21.2 Group Sequential Design – 21.2.4 Interim Monitoring

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
calculator computes the difference in proportions as −0.1 and its standard error of
0.019.

Click Recalc, and then OK to update the IM Dashboard. In this case, a boundary has
been crossed, and the following window appears.

Click Stop to complete the trial. The IM Dashboard will be updated accordingly, and a

21.2 Group Sequential Design

361

<<< Contents

21

* Index >>>

Tutorial: Binomial Endpoint
table for Final Inference will be displayed as shown below.

362

21.2 Group Sequential Design

<<< Contents

* Index >>>

22

Binomial Superiority One-Sample

This chapter deals with the design, simulation, and interim monitoring of two types of
tests involving binomial response rates. In Section 22.1, we discuss group sequential
designs in which an observed binomial response rate is compared to a fixed response
rate, possibly derived from historical data. Section 22.2 deals with McNemar’s test for
comparing matched pairs of binomial responses in a group sequential setting.

22.1

Binomial One
Sample

22.1.1 Trial Design
22.1.2 Trial Simulation
22.1.3 Interim Monitoring

In experimental situations, where the variable of interest has a binomial distribution, it
may be of interest to determine whether the response rate π differs from a fixed value
π0 . Specifically we wish to test the null hypothesis H0 : π = π0 against the two sided
alternative hypothesis H1 : π 6= π0 or against one sided alternatives of the form
H1 : π > π0 or H1 : π < π0 . The sample size, or power, is determined for a specified
value of π which is consistent with the alternative hypothesis, denoted π1 .

22.1.1

Trial Design

Consider the design of a single-arm oncology trial in which we wish to determine if
the tumor response rate of a new cytotoxic agent is at least 15%. Thus, it is desired to
test the null hypothesis H0 : π = 0.15 against the one-sided alternative hypothesis
H1 : π > 0.15. We will design this trial with a one sided test that achieves 80% power
at π = π1 = 0.25 with a one-sided level 0.05 test.
Single-Look Design
To begin, click Design tab, then Single Sample under
Discrete group, and then click Single Proportion.

In the ensuing dialog box , choose the test parameters as shown below. We first
consider a single-look design, so leave the default value for Number of Looks to 1. In
the drop down menu, next to Test Type select 1-Sided. Enter 0.8 for Power. Enter
22.1 Binomial One Sample – 22.1.1 Trial Design

363

<<< Contents

22

* Index >>>

Binomial Superiority One-Sample
0.15 in the box next to Prop. Response under Null (π0 ) and 0.25 in the box next to
Prop. Response under Alt (π1 ). This dialog box also asks us to specify whether we
wish to standardize the test statistic (for performing the hypothesis test of the null
hypothesis H0 : π = 0.15) with the null or the empirical variance. We will discuss the
test statistic and the method of standardization in the next subsection. For the present,
select the default radio button Under Null Hypothesis.

Now click Compute. The design is shown as a row in the Output Preview located in
the lower pane of this window. The sample size required in order to achieve the desired
80% power is 91 subjects.

You can select this design by clicking anywhere on the row in the Output Preview.
Click

icon to get the design output summary displayed in the upper pane. In the

Output Preview toolbar, click
icon to save this design Des1 to workbook
Wbk1 in the Library. If you hover the cursor over the node Des1 in the Library, a

364

22.1 Binomial One Sample – 22.1.1 Trial Design

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
tooltip will appear that summarizes the input parameters of the design.

With the design Des1 selected in the Library, click
icon on the Library toolbar,
and then click Power vs. Treatment Effect (δ). The power curve for this design will
be displayed. You can save this chart to the Library by clicking Save in Workbook.
Alternatively, you can export the chart in one of several image formats (e.g., Bitmap or
JPEG) by clicking Save As.... For now, you may close the chart before continuing.

22.1 Binomial One Sample – 22.1.1 Trial Design

365

<<< Contents

22

* Index >>>

Binomial Superiority One-Sample
Three-Look Design
In order to reach an early decision and enter into comparative
trials, let us plan to conduct this single-arm study as a group sequential trial with a
maximum of 3 looks. Create a new design by selecting Des1 in the Library, and
clicking the
icon on the Library toolbar. Change the Number of Looks from 1
to 3, to generate a study with two interim looks and a final analysis. A new tab
Boundary will appear. Clicking on this tab will reveal the stopping boundary
parameters. By default, the Spacing of Looks is set to Equal, which means that the
interim analyses will be equally spaced in terms of the number of patients accrued
between looks. The left side contains details for the Efficacy boundary, and the right
side for the Futility boundary. By default, there is an efficacy boundary (to reject H0)
selected, but no futility boundary (to reject H1). The Boundary Family specified is of
the Spending Functions type. The default Spending function is the
Lan-DeMets (Lan & DeMets, 1983), with Parameter OF (O’Brien-Fleming), which
generates boundaries that are very similar, though not identical, to the classical
stopping boundaries of O’Brien and Fleming (1979). Technical details of these
stopping boundaries are available in Appendix F.

Return to the test parameters by clicking Test Parameters tab. The dialog box requires
us to make a selection in the section labeled Variance of Standardized Test Statistic.
We are being asked to specify to East how we intend to standardize the test statistic
when we actually perform the hypothesis tests at the various monitoring time points.
There are two options: Under Null Hypothesis and Empirical Estimate. To
understand the difference between these two options, let π̂j denote the estimate of π
based on nj observations, up to and including the j th monitoring time point.

366

22.1 Binomial One Sample – 22.1.1 Trial Design

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
Under Null Hypothesis The test statistic to be used for the interim monitoring is
(N )

Zj

=p

π̂j − π0
.
π0 (1 − π0 )/nj

(22.1)

Empirical The test statistic to be used for the interim monitoring is
(E)

Zj

=p

π̂j − π0
.
π̂j (1 − π̂j )/nj

(22.2)

The choice of variance should not make much of a difference to the type 1 error or
power for studies in which the sample size is large. In the present case however, it
might matter. We shall therefore examine both the options. First, we select the Under
Null Hypothesis radio button.
Click Compute button to generate output for Design Des2. With Des2 selected in the
Output Preview, click
icon to save Des2 to the Library. In order to see the
stopping probabilities, as well as other characteristics, select Des2 in the Library, and
click
icon. The cumulative boundary stopping probabilities are shown in the
Stopping Boundaries table. We see that for Des2 the maximum sample size is 91
subjects, with 90 expected under the null hypothesis H0 : π = 0.15 and 73 expected
when the true value is π=0.25.

Close the Output window before continuing. The stopping boundary can be displayed
by clicking on the
icon on the Library toolbar, and then clicking Stopping
22.1 Binomial One Sample – 22.1.1 Trial Design

367

<<< Contents

22

* Index >>>

Binomial Superiority One-Sample
Boundaries. The following chart will appear.

To examine the error spending function, click
icon on the Library toolbar, and
then click Error Spending. The following chart will appear.

368

22.1 Binomial One Sample – 22.1.1 Trial Design

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
To examine the impact of using the empirical variance to standardized test statistic,
select Des2 in the Library, and click
icon on the Library toolbar. In the
Variance of Standardized Test Statistic box, now select Empirical Estimate.

Next, click Compute. With Des3 selected in the Output Preview, click
icon.
In the Library, select the nodes Des2 and Des3, by holding the Ctrl key, and then click
icon. The upper pane will display the summary details of the two designs
side-by-side:

The maximum sample size needed for 80% power is 119, and the expected sample size
is 99 under the alternative hypothesis H1 with π1 = 0.25, if we intend to standardize
the test statistic with the empirical variance. The corresponding maximum and
22.1 Binomial One Sample – 22.1.1 Trial Design

369

<<< Contents

22

* Index >>>

Binomial Superiority One-Sample
expected sample sizes if the null variance is to be used for the standardization are 91
and 73, respectively. Thus, for this configuration of design parameters, it would appear
preferable to specify in advance that the test statistic will be standardized by the null
variance. Evidently, this is the option with the smaller maximum and expected sample
size. These results, however, are based on the large sample theory developed in
Appendix B. Since the sample sizes in both Des2 and Des3 are fairly small, it would
be advisable to verify that the power and type 1 error of both the plans are preserved by
simulating these designs. We show how to simulate these plans in Section 22.1.2.
In some situations, the sample size is subject to external constraints. Then, the power
can be computed for a specified maximum sample size. Suppose that in the above
situation, using the observed estimates for the computation of the variance, the total
sample size is constrained to be at most, 80 subjects. Select Des3 in the Library and
click
on the Library toolbar. Change the selections in the ensuing dialog box
so that the trial is now designed to compute power for a maximum sample size of 80
subjects, as shown below.

Click Compute button to generate the output for Design Des4. With Des4 selected in
the Output Preview, click

icon. In the Library, select the nodes for Des2,

Des3, and Des4 by holding the Ctrl key, and then click

370

22.1 Binomial One Sample – 22.1.1 Trial Design

icon. The upper pane

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
will display the summary details of the three designs side-by-side:

From this, we can see that Des4 has only 65.5 % power.

22.1.2

Trial Simulation

In Section 22.1.1, we created group sequential designs with two different assumptions
for the manner in which the test would be standardized at the interim monitoring stage.
Under Des2, we assumed that the null variance, and hence the test statistic (22.1)
would be used for the interim monitoring. This plan required a maximum sample size
of 91 subjects. Under Des3, we assumed that the empirical variance, and hence the test
statistic (22.2) would be used for the interim monitoring. This plan required a
maximum sample size of 119 subjects. Since the sample sizes for both plans are fairly
small and the calculations involved the use of large sample theory, it would be wise to
verify the operating characteristics of these two plans by simulation.
Select Des2 in the Library, and click the
icon from Library toolbar.
Alternatively, right-click on Des2 node and select Simulate. A new Simulation

22.1 Binomial One Sample – 22.1.2 Trial Simulation

371

<<< Contents

22

* Index >>>

Binomial Superiority One-Sample
worksheet will appear.

Click Simulate to start the simulation. Once the simulation run has completed, East
will add an additional row to the Output Preview labeled Sim1. Select Sim1 row in
the Output Preview and click

icon. Note that some of the simulation output

details will be displayed in the upper pane. Click
icon to save it to the Library.
Double-click on Sim1 node in the Library. The simulation output details will be
displayed.

372

22.1 Binomial One Sample – 22.1.2 Trial Simulation

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
Upon running 10,000 simulations with π = 0.25 we obtain slightly over 80% power as
shown above.
Next we run 10,000 simulations under H0 by setting π = 0.15 in the choice of
simulation parameters. Select Des2 in the Library, and click
icon from
Library toolbar. Under the Response Generation tab, change the Proportion
Response to 0.15. Click Simulate to start the simulation. Once the simulation run has
completed, East will add an additional row to the Output Preview labeled Sim2.
Select Sim2 in the Output Preview. Click
icon to save it to the Library.
Double-click on Sim2 in the Library. The simulation output details will be displayed.

We observe that 7% of these simulations reject the null hypothesis thereby confirming
that these boundaries do indeed preserve the type 1 error (up to Monte Carlo accuracy).
Finally we repeat the same set of simulations for Des3. Select Des3 in the Library,
and click

icon from Library toolbar. Upon running 10,000 simulations with

22.1 Binomial One Sample – 22.1.2 Trial Simulation

373

<<< Contents

22

* Index >>>

Binomial Superiority One-Sample
π = 0.25, we obtain 82% power.

However, when we run the simulations under H0 : π = 0.15, we obtain a type 1 error
of about 3% instead of the specified 5% as shown below. While this ensures that the
type 1 error is preserved, it also suggests that the use of the empirical variance rather
than the null variance to standardize the test statistic might be problematic with small
sample sizes.

374

22.1 Binomial One Sample – 22.1.2 Trial Simulation

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
Let us now investigate if the problem disappears with larger studies. Select Des3 in the
Library and click
on the Library toolbar. Change the value of Prop.
Response under Alt (π1 ) from 0.25 to 0.18.

Click Compute to generate the output for Des5. In the Output Preview, we see that
Des5 requires a sample size of 1035 subjects. To verify whether the use of the
empirical variance will indeed produce the correct type-1 error for this large trial,
select Des5 in the Output Preview and click

icon. In the Library, select Des5

icon from Library toolbar . First, run 10,000 trials with π = 0.15. On
and click
the Response Generation tab, change Proportion Response from 0.18 to 0.15. Next
click Simulate. Observe that the type-1 error obtained by simulating Des5 is about
4.4%, an improvement over the corresponding type 1 error obtained by simulating
Des3.

22.1 Binomial One Sample – 22.1.2 Trial Simulation

375

<<< Contents

22

* Index >>>

Binomial Superiority One-Sample
Next, verify that a sample size of 1035 suffices for producing 80% power by running
10,000 simulations with π = 0.18.

This example has demonstrated the importance of simulating a design to verify that it
does indeed possess the operating characteristics that are claimed for it. Since these
operating characteristics were derived by large-sample theory, they might not hold for
small sample sizes, in which case, the sample size or type-1 error might have to be
adjusted appropriately.

22.1.3

Interim Monitoring

Consider interim monitoring of Des3, the design that has 80% power when the
empirical estimate of variance is used to standardize the test statistic. Select Des3 in
the Library, and click
icon from the Library toolbar. Alternatively, right-click
on Des3 and select Interim Monitoring. The interim monitoring dashboard contains
various controls for monitoring the trial, and is divided into two sections. The top
section contains several columns for displaying output values based on the interim
inputs. The bottom section contains four charts, each with a corresponding table to its
right. These charts provide graphical and numerical descriptions of the progress of the

376

22.1 Binomial One Sample – 22.1.3 Interim Monitoring

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
clinical trial and are useful tools for decision making by a data monitoring committee.

At the first interim look, when 40 subjects have enrolled, suppose that the observed
response rate is 0.35. Click
icon to invoke the Test Statistic
Calculator. In the box next to Cumulative Sample Size enter 40. Enter 0.35 in the
box next to Estimate of π. In the box next to Standard Error of Estimate of π enter

22.1 Binomial One Sample – 22.1.3 Interim Monitoring

377

<<< Contents

22

* Index >>>

Binomial Superiority One-Sample
0.07542. Next click Recalc.

Observe that upon pressing the Recalc button, the test statistic calculator automatically
computes the value of the test statistic as 2.652.

378

22.1 Binomial One Sample – 22.1.3 Interim Monitoring

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
Clicking OK results in the following output.

Since our test statistic, 2.652, is smaller than the stopping boundary, 3.185, the trial
continues.
At the second interim monitoring time point, after 80 subjects have enrolled, suppose
that the estimate of π̂ based on all data up to that point is 0.30. Click on the second row
in the table in the upper section. Then click
icon. In the box next
to Cumulative Sample Size enter 80. Enter 0.30 in the box next to Estimate of π. In
the box next to Standard Error of Estimate of π enter 0.05123. Next click Recalc.
Upon clicking OK we observe that the stopping boundary is crossed and the following

22.1 Binomial One Sample – 22.1.3 Interim Monitoring

379

<<< Contents

22

* Index >>>

Binomial Superiority One-Sample
message is displayed.

We can conclude that π > 0.15 and terminate the trial. Clicking Stop yields the
following output.

22.2

380

McNemar’s Test

McNemar’s Test is used in experimental situations where paired comparisons are
observed. In a typical application, two binary response measurements are made on
each subject – perhaps from two different treatments, or from two different time
points. For example, in a comparative clinical trial, subjects are matched on baseline
demographics and disease characteristics and then randomized with one subject in the
22.2 McNemar’s Test

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
pair receiving the experimental treatment and the other subject receiving the control.
Another example is the cross over clinical trial in which each subject receives both
treatments. By random assignment, some subjects receive the experimental treatment
followed by the control while others receive the control followed by the experimental
treatment. Let πc and πt denote the response probabilities for the control and
experimental treatments, respectively. The probability parameters for McNemar’s test
are displayed in Table 22.1.
Table 22.1: A 2 x 2 Table of Probabilities for McNemar’s Test

Control
No Response
Response
Total Probability

Experimental
No Response Response
π00
π01
π10
π11
1 − πt
πt

Total
Probability
1 − πc
πc
1

The null hypothesis
H0 : πc = πt
is tested against the alternative hypothesis
H1 : πc 6= πt
for the two sided testing problem or the alternative hypothesis
H1 : πc > πt
(or H1 : πc < π) for the one-sided testing problem. Since πt = πc if and only if
π01 = π10 , the null hypothesis is also expressed as
H0 : π01 = π10 ,
and is tested against corresponding one and two sided alternatives. The power of this
test depends on two quantities:
1. The difference between the two discordant probabilities (which is also the
difference between the response rates of the two treatments)
δ = π01 − π10 = πt − πc ;

22.2 McNemar’s Test

381

<<< Contents

22

* Index >>>

Binomial Superiority One-Sample
2. The sum of the two discordant probabilities
ξ = π10 + π01 .

East accepts these two parameters as inputs at the design stage.
We next specify the test statistic to be used during the interim monitoring stage.
Suppose we intend to execute McNemar’s test a maximum of K times in a group
sequential setting. Let the cumulative data up to and including the j th interim look
consist of N (j) matched pairs arranged in the form of the following 2 × 2 contingency
table of counts:
Table 22.2: 2 × 2 Contingency Table of Counts of Matched Pairs at Look j

Control
No Response
Response
Total Probability

Experimental
No Response Response
n00 (j)
n01 (j)
n10 (j)
n11 (j)
c0 (j)
c1 (j)

Total
Probability
r0 (j)
r1 (j)
N (j)

For a = 0, 1 and b = 0, 1 define
π̂ab (j) =

nab (j)
N (j)

(22.3)

Then the sequentially computed McNemar test statistic at look j is
Zj =

δ̂j
se(δ̂j )

(22.4)

where
δ̂j = π̂01 (j) − π̂10 (j)

(22.5)

and

p
se(δ̂j ) =

382

[n01 (j) + n10 (j)]
N (j)

22.2 McNemar’s Test – 22.2.1 Trial Design

(22.6)

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
We now show how to use East to design and monitor a clinical trial based on
McNemar’s test.

22.2.1

Trial Design

Consider a trial in which we wish to determine whether a transdermal delivery system
(TDS) can be improved with a new adhesive. Subjects are to wear the old TDS
(control) and new TDS (experimental) in the same area of the body for one week each.
A response is said to occur if the TDS remains on for the entire one week observation
period. From historical data, it is known that control has a response rate of 85%
(πc = 0.85). It is hoped that the new adhesive will increase this to 95% (πt = 0.95).
Furthermore, of the 15% of the subjects who did not respond on the control, it is hoped
that 87% will respond on the experimental system. That is, π01 = 0.87 × 0.15 = 0.13.
Based on these data, we can fill in all the entries of Table 22.1 as displayed in
Table 22.2.
Table 22.3: McNemar Probabilities for the TDS Trial

Control
No Response
Response
Total Probability

Experimental
No Response Response
0.02
0.13
0.03
0.82
0.05
0.95

Total
Probability
0.15
0.85
1

Although it is expected that the new adhesive will increase the adherence rate, the
comparison is posed as a two-sided testing problem, testing H0 : πc = πt against
H1 : πc 6= πt at the 0.05 level. We wish to determine the sample size to have 90%
power for the values displayed in Table 22.3. To design this trial, click Design tab,
then Single Sample on the Discrete group, and then click McNemar’s Test for

22.2 McNemar’s Test – 22.2.1 Trial Design

383

<<< Contents

22

* Index >>>

Binomial Superiority One-Sample
Matched Pairs.

Single-Look Design
First, consider a study with no interim analyses, and 90%
power for two sided test at α = 0.05. Choose the design parameters as shown below.
We first consider a single-look design, so leave the default value for Number of Looks
to 1. Enter 0.9 for Power. As shown in Table 22.2, we must specify
δ1 = πt − πc = 0.1 and ξ = π01 + π10 = 0.16.

Click Compute. The design Des1 is shown as a row in the Output Preview located in
the lower pane of this window. A total of 158 subjects is required to have 90% power.

You can select this design by clicking anywhere on the row in the Output Preview.
384

22.2 McNemar’s Test – 22.2.1 Trial Design

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016

Click on

icon to get the output summary displayed in the upper pane. In the

Output Preview toolbar, click the
icon to save this design Des1 to workbook
Wbk1 in the Library. If you hover the cursor over Des1 in the Library, a tooltip will
appear that summarizes the input parameters of the design.

Five-Look Design
Now consider the same design with a maximum of 5 looks,
using the default Lan-DeMets (O’Brien-Fleming) spending function. Create a new
design by selecting Des1 in the Library, and clicking
icon on the Library
toolbar. Change the Number of Looks from 1 to 5, to generate a study with four
interim looks and a final analysis. A new tab Boundary will appear. Clicking on this
tab will reveal the stopping boundary parameters. By default, the Spacing of Looks is
set to Equal, which means that the interim analyses will be equally spaced in terms of
the number of patients accrued between looks. The left side contains details for the
Efficacy boundary, and the right side for the Futility boundary. By default, there is an
efficacy boundary (to reject H0) selected, but no futility boundary (to reject H1). The
Boundary Family specified is of the Spending Functions type. The default
Spending function is the Lan-DeMets (Lan & DeMets, 1983), with Parameter OF
(O’Brien-Fleming), which generates boundaries that are very similar, though not
identical, to the classical stopping boundaries of O’Brien and Fleming (1979).

22.2 McNemar’s Test – 22.2.1 Trial Design

385

<<< Contents

22

* Index >>>

Binomial Superiority One-Sample
Technical details of these stopping boundaries are available in Appendix F.

Click Compute to generate output for Des2. With Des2 selected in the Output
Preview, click the

icon to save Des2 to the Library. In the Library, select the

nodes for both Des1 and Des2, by holding the Ctrl key, and then click the
icon.
The upper pane will display the output summary of the two designs side-by-side:

There has been a slight inflation in the maximum sample size, from 158 to 162.
However, the expected sample size is 120 subjects if the alternative hypothesis of
δ1 = 0.10 and ξ = 0.16 holds. The stopping boundary, spending function, and Power
386

22.2 McNemar’s Test – 22.2.1 Trial Design

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
vs. Sample Size charts can all be displayed by clicking on the appropriate icons from
the Library toolbar.

22.2.2

Interim Monitoring

Consider interim monitoring of Des2. Select Des2 in the Library, and click
icon from the Library toolbar. Alternatively, right-click on Des2 and select Interim
Monitoring. A new IM worksheet will appear.
Suppose, that the results are to be analyzed after results are available for every 32
subjects. After the first 32 subjects were enrolled, one subject responded on the control
arm and did not respond on the treatment arm; four subjects responded on the
treatment arm but did not respond on the control arm; 10 subjects did not respond on
either treatment; 17 subjects responded on both the arms. This information is sufficient
to complete all the entries in Table 22.3 and hence to evaluate the test statistic value.
Click
icon to invoke the Test Statistic Calculator. In the box
next to Cumulative Sample Size enter 32. Enter the values in the table as shown
below and click Recalc.

22.2 McNemar’s Test – 22.2.2 Interim Monitoring

387

<<< Contents

22

* Index >>>

Binomial Superiority One-Sample
Clicking OK results in the following entry in the first look row.

As you can see the value of the test statistic, 1.342, is within the stopping boundaries,
(4.909,-4.909). Thus, the trial continues.
The second interim analysis was performed after data were available for 64 subjects. A
total of two subjects responded on the control arm and failed to respond on the
treatment arm; seven subjects responded on the treatment arm and failed to respond on
the control arm; 20 subjects responded on neither arm; 35 subjects responded on both
the arms.
Click on the second row in the table in the upper section. Then click
icon. Enter the appropriate values in the table as shown below and click Recalc.

388

22.2 McNemar’s Test – 22.2.2 Interim Monitoring

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
Then click OK. This results in the following screen.

At the third interim analysis, after 96 subjects were enrolled, a total of two subjects
responded on the control arm and failed to respond on the treatment arm; 13 subjects
responded on the treatment arm and failed to respond on the control arm; 32 subjects
did not respond on either arm; 49 subjects responded on both the arms.
Click on the third row in the table in the upper section. Then click
icon. Enter the appropriate values in the table as shown below and click Recalc.

22.2 McNemar’s Test – 22.2.2 Interim Monitoring

389

<<< Contents

22

* Index >>>

Binomial Superiority One-Sample
Then click OK. This results in the following message box.

Clicking on Stop yields the following Interim Monitoring output.

We reject the null hypothesis that δ = 0, based on these data.

22.2.3

Simulation

Des2 can be simulated to examine the properties for different values of the parameters.
First, we verify the results under the alternative hypothesis at which the power is to be
controlled, namely δ1 =0.10 and ξ=0.16.
Select Des2 in the Library, and click
390

22.2 McNemar’s Test – 22.2.3 Simulation

icon from Library toolbar. Alternatively,

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
right-click on Des2 and select Simulate. A new Simulation worksheet will appear.

Click Simulate to start the simulation. Once the simulation run has completed, East
will add an additional row to the Output Preview labeled Sim1. Select Sim1 in the
Output Preview. If you click

icon, you will see some of the simulation output

details displayed in the upper pane. Click
icon to save it to the Library.
Double-click on Sim1 in the Library. The simulation output details will be displayed

22.2 McNemar’s Test – 22.2.3 Simulation

391

<<< Contents

22

* Index >>>

Binomial Superiority One-Sample
as shown below. The results confirm that the power is at about 90%.

To confirm the results under the null hypothesis, set δ1 = 0 in the Response
Generation tab in the simulation worksheet and then click Simulate. The results,
which confirm that the type-1 error rate is approximately 5%, are given below.

392

22.2 McNemar’s Test – 22.2.3 Simulation

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
While it is often difficult to specify the absolute difference of the discordant
probabilities, δ1 , it is even more difficult to specify the sum of the discordant
probabilities, ξ. Simulation can be used to examine the effects of misspecification of ξ.
Run the simulations again, now with δ1 =0.10 and ξ=0.2. The results are given below.

Notice that this provides a power of approximately 81%. Larger values of ξ would
further decrease the power. However, values of ξ > 0.2 with δ1 = 0.1 would be
inconsistent with the initial assumption of πc = 0.85 and πt =0.95. Additional
simulations for various values of δ and ξ can provide information regarding the
consequences of misspecification of the input parameters.

22.2 McNemar’s Test

393

<<< Contents

* Index >>>

23

Binomial Superiority Two-Sample

In experiments based on binomial data, the aim is to compare independent samples
from two populations in terms of the proportion of sampling units presenting a given
trait. In medical research, outcomes such as the proportion of patients responding to a
therapy, developing a certain side effect, or requiring specialized care, would satisfy
this definition. East supports the design, simulation, and interim monitoring of clinical
trials in which this comparison is based on the difference of proportions, the ratio of
proportions, or the odds ratio of the two populations. The three cases are discussed in
the following sections.

23.1

Difference of
Two Binomial
Proportions

23.1.1 Trial Design
23.1.2 Interim Monitoring
23.1.3 Pooled versus
Unpooled Designs

Let πc and πt denote the binomial probabilities for the control and treatment arms,
respectively, and let δ = πt − πc . Interest lies in testing the null hypothesis that δ = 0
against one and two-sided alternatives. A special characteristic of binomial designs is
the dependence of the variance of a binomial random variable on its mean. Because of
this dependence, even if we keep all other test parameters the same, the maximum
sample size required to achieve a specified power will be affected by how we intend to
standardize the difference of binomial response rates when computing the test statistic
at the interim monitoring stage. There are two options for computing the test statistic –
use either the unpooled or pooled estimate of variance for standardizing the observed
treatment difference. Suppose, for instance, that at the jth interim look the observed
response rate on the treatment arm is π̂tj , and the observed response rate on the control
arm is π̂cj . Let ntj and ncj be the number of patients on the treatment and control
arms, respectively. Then the test statistic based on the unpooled variance is
(u)

Zj

=q

π̂tj − π̂cj
π̂tj (1−π̂tj )
ntj

+

π̂cj (1−π̂cj )
ncj

(23.1)
.

In contrast, the test statistic based on the pooled variance is
(p)

Zj

=q

where
π̂j =

(p)

π̂tj − π̂cj
π̂j (1 − π̂j )[ n1tj +

1
ncj ]

ntj π̂tj + ncj π̂cj
.
ntj + ncj

,

(23.2)

(23.3)

It can be shown that [Zj ]2 is the familiar Pearson chi-square statistic computed from
all the data accumulated by the jth look.
394

23.1 Difference of Two Binomials

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
The maximum sample size required to achieve a given power depends on whether, at
the interim monitoring stage, we intend to use the unpooled statistic (23.1) or the
pooled statistic (23.2) to determine statistical significance. The technical details of the
sample size computations for these two options are given in Appendix B,
Section B.2.5. The CAPTURE clincial trial is designed in Section 23.1.1 and
monitored in Section 23.1.2 under the assumption that the unpooled statistic will be
used for interim monitoring. In Section 23.1.3, however, the same trial is re-designed,
on the basis of the pooled variance. It is seen that the difference in sample size due to
the two design assumptions is almost negligible. This is because the CAPTURE trial
utilized balanced randomization. We show further in Section 23.1.3 that if the
randomization is unbalanced, the difference in sample size based on the two design
assumptions can be substantial.

23.1.1

Trial Design

Design objectives and interim results from CAPTURE, a prospective randomized trial
of placebo versus Abciximab for patients with refractory unstable angina were
presented at a workshop on clinical trial data monitoring committees (Anderson,
2002). The primary endpoint was reduction in death or MI within 30 days of entering
the study. The study was designed for 80% power to detect a reduction in the event rate
from 15% on the placebo arm to 10% on the Abciximab arm. A two-sided test with a
type-1 error of 5% was used. We will illustrate various design and interim monitoring
features of East for studies with binomial endpoints with the help of this example.
Thereby this example can serve as a model for designing and monitoring your own
binomial studies.
Single Look Design
To begin, click Design tab, then Two Samples on the Discrete
group, and then click Difference of Proportions.
The goal of this study is to test the null hypothesis, H0 , that the Abciximab and
placebo arms both have an event rate of 15%, versus the alternative hypothesis, H1 ,
that Abciximab reduces the event rate by 5%, from 15% to 10%. It is desired to have a
two sided test with three looks at the data, a type-1 error of α = 0.05 and a power of
(1 − β) = 0.8.
Choose the test parameters as shown below. We first consider a single-look design, so
leave the default value for Number of Looks to 1. Enter 0.8 for the Power. To specify
the appropriate effect size, enter 0.15 for the Prop. Under Control and 0.10 for the
Prop. Under Treatment. Notice that you have the option to select the manner in
which the test statistic will be standardized at the hypothesis testing stage. If you
choose Unpooled Estimate, the standardization will be according to equation (23.1).
23.1 Difference of Two Binomials – 23.1.1 Trial Design

395

<<< Contents

23

* Index >>>

Binomial Superiority Two-Sample
If you choose Pooled Estimate, the standardization will be according to equation
(23.2). For the present, choose the Unpooled Estimate option. The other choice in this
dialog box is whether or not to use the Casagrande-Pike-Smith (1978) correction for
small sample sizes. This is not usually necessary as can be verified by the simulation
options in East. The dialog box containing the test parameters will now look as shown
below.

Next, click Compute button. The design is shown as a row in the Output Preview
located in the lower pane of this window. The computed sample size (1366 subjects) is
highlighted in yellow.

You can select this design Des1 by clicking anywhere on the row in the Output
Preview. Now you can click

icon to see the output summary displayed in the

icon to save this design Des1
upper pane. In the Output Preview toolbar, click
to Workbook Wbk1 in the Library. If you hover the cursor over Des1 in the Library, a

396

23.1 Difference of Two Binomials – 23.1.1 Trial Design

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
tooltip will appear that summarizes the input parameters of the design.

With Des1 selected in the Library, click the
icon on the Library toolbar, and
the click Power vs Treatment Effect (δ). The resulting power curve for this design is
shown. You can save this chart to the Library by clicking Keep. Alternatively, you
can export the chart in one of several image formats (e.g., Bitmap or JPEG) by clicking
Save As.... For now, you may close the chart before continuing.

Group Sequential Design

Create a new design by selecting Des1 in the Library,

23.1 Difference of Two Binomials – 23.1.1 Trial Design

397

<<< Contents

23

* Index >>>

Binomial Superiority Two-Sample
and clicking
icon on the Library toolbar. Change the Number of Looks from
1 to 3, to generate a study with two interim looks and a final analysis. A new tab
Boundary will appear. Clicking on this tab will reveal the stopping boundary
parameters. By default, the Spacing of Looks is set to Equal, which means that the
interim analyses will be equally spaced in terms of the number of patients accrued
between looks. The left side contains details for the Efficacy boundary, and the right
side for the Futility boundary. By default, there is an efficacy boundary (to reject H0)
selected, but no futility boundary (to reject H1). The Boundary Family specified is of
the Spending Functions type. The default Spending function is the Lan-DeMets
(Lan & DeMets, 1983), with Parameter OF (O’Brien-Fleming), which generates
boundaries that are very similar, though not identical, to the classical stopping
boundaries of O’Brien and Fleming (1979). Technical details of these stopping
boundaries are available in Appendix F.
Click Boundary tab to see the details of cumulative alpha spent, and the boundary
values, in the Look Details table.

Click Compute to generate output for a new design Des2. The 3-look group sequential
design displayed in Des2 requires an upfront commitment of up to a maximum of 1384
patients. That is 18 patients more than the fixed sample design displayed in Des1.
Notice, however, that under the alternative hypothesis of a 5% drop in the event rate,
the expected sample size is only 1183 patients – a saving of 201 patients relative to the
fixed sample design. This is because the test statistic could cross a stopping boundary
at one of the interim looks.
With Des2 selected in the Output Preview, click
icon to save Des2 to the
Library. In order to see the stopping probabilities, as well as other characteristics,
398

23.1 Difference of Two Binomials – 23.1.1 Trial Design

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016

select Des2 in the Library, and click
icon. The cumulative boundary stopping
probabilities are shown in the Stopping Boundaries table.

Close the Output window before continuing. The stopping boundary chart can be
brought up by clicking
icon on the Library toolbar, and then clicking Stopping
Boundaries. The following chart will appear.

Lan-DeMets Spending Function: O’Brien-Fleming Version
Close this chart, and click
icon in the Library toolbar and then Error

23.1 Difference of Two Binomials – 23.1.1 Trial Design

399

<<< Contents

23

* Index >>>

Binomial Superiority Two-Sample
Spending The following chart will appear.

This spending function was proposed by Lan and DeMets (1983), and for two-sided
tests has the following functional form :


zα/4
α(t) = 4 − 4Φ √
.
(23.4)
t

Notice that hardly any type-1 error is spent in the early stages of the trial but the rate of
error spending increases rapidly as the trial progresses. This is reflected in the
corresponding stopping boundaries. The upper and lower boundary values are rather
wide apart initially (±3.712 standard deviations) but come closer together with each
succeeding interim look until at the last look the standardized test statistic crosses the
boundary at ±1.993 standard deviations. This is not too far off from the corresponding
boundary values, ±1.96, required to declare statistical significance at the 0.05 level for
a fixed sample design. For this reason this spending function is often adopted in
preference to other spending functions that spend the type-1 error more aggressively
and thereby reduce the expected sample size under H1 by a greater amount.
Lan-DeMets Spending Function: Pocock Version
A more aggressive spending function, also proposed by Lan and DeMets (1983), is PK
which refers to Pocock. This spending function captures the spirit of the Pocock
(1977) stopping boundary belonging to the Wang and Tsiatis (1987) power family, and
400

23.1 Difference of Two Binomials – 23.1.1 Trial Design

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
has the functional form
α(t) = α log(1 + (e − 1)t) .

(23.5)

Select Des2 in the Library, and click
icon on the Library toolbar. On the
Boundary tab, change the Parameter from OF to PK, and click Compute to create
design Des3. With Des3 selected in the Output Preview, click the
icon. In the
Library, select the nodes for both Des2 and Des3, by holding the Ctrl key, and then
click the
side-by-side:

icon. The upper pane will display the details of the two designs

Under Des3, you must make an up-front commitment of up to 1599 patients,
considerably more than you would need for a fixed sample design. However, because
the type-1 error is spent more aggressively in the early stages, the expected sample size
is only 1119 patients.
For now, close this output window, and click

icon on the Library toolbar to

23.1 Difference of Two Binomials – 23.1.1 Trial Design

401

<<< Contents

23

* Index >>>

Binomial Superiority Two-Sample
compare the two designs according to Power vs. Sample Size.

Using the same icon, select Stopping Boundaries. Notice, by moving the cursor
from right to left in the stopping boundary charts, that the stopping boundary derived
from the PK spending function is approximately flat, requiring ±2.28 standard
deviations at the first look and ±2.29 standard deviations at the second and ±2.30
third looks. In contrast, the stopping boundary derived from the OF spending function
requires ±3.71 standard deviations at the first look, ±2.51 standard deviations at the
second look and ±1.99 standard deviations at the third look. This translates into a
smaller expected sample size under H1 for Des3 than for Des2. This advantage is,
however, offset by at least two drawbacks of the stopping boundary derived from the
PK spending function; the large up-front commitment of 1599 patients, and the large
standardized test statistic of 2.295 (corresponding to a two-sided p value of 0.0217)

402

23.1 Difference of Two Binomials – 23.1.1 Trial Design

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
required at the last look in order to declare statistical significance.

Using the same icon, select Error Spending to compare the two designs
graphically in terms of error spending functions. Des3 (PK) spends the type-1 error
probability at a much faster rate than Des2 (OF). Close the chart before continuing.

23.1 Difference of Two Binomials – 23.1.1 Trial Design

403

<<< Contents

23

* Index >>>

Binomial Superiority Two-Sample
Wang and Tsiatis Power Boundaries
The stopping boundaries generated by the Lan-Demets OF and PK functions closely
resemble closely the classical O’Brien-Fleming and Pocock stopping boundaries,
respectively. These classical boundaries are a special case of a family of power
boundaries proposed by Wang and Tsiatis (1987). For a two-sided α level test, using K
equally spaced looks, the power boundaries for the standardized test statistic Zj at the
j-th look are of the form
C(∆, α, K)
Zj ≥
.
(23.6)
(j/K)0.5−∆
The normalizing constant C(∆, α, K) is evaluated by recursive integration so as to
ensure that the K-look group sequential test has type-1 error equal to α.
Select Des3 in the Library and click
on the Library toolbar. On the Boundary
tab, change the Boundary Family from Spending Functions to
Wang-Tsiatis. Leave the default value of ∆ as 0 and click Compute to create
design Des4.

With Des4 selected in the Output Preview, click the
icon. In the Library,
select both Des2 and Des4 by holding the Ctrl key. Click
icon, and under Select
Chart on the right, select Stopping Boundaries. As expected, the boundary
values for Des2 (Lan-Demets, OF) and Des4 (Wang-Tsiatis, ∆ = 0) are very similar.

404

23.1 Difference of Two Binomials – 23.1.1 Trial Design

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
Close the chart before continuing.

The Power Chart and the ASN Chart
East provides some additional tools for evaluating study designs. Select Des3 in the
Library, click the
icon, and then click Power vs. Treatment effect (δ). By
scrolling from left to right with the vertical line cursor, one can observe the power for
various values of the effect size.

Close this chart, and with Des3 selected, click the
23.1 Difference of Two Binomials – 23.1.1 Trial Design

icon again. Then click
405

<<< Contents

23

* Index >>>

Binomial Superiority Two-Sample
Expected Sample Size. The following chart appears:

This chart displays the Expected Sample Size as a function of the effect size and
confirms that for Des3 the average sample size is 1566 under H0 (effect size, zero) and
1120 under H1 (effect size, -0.05).
Unequally spaced analysis time points
In the above designs, we have assumed that analyses were equally spaced. This
assumption can be relaxed if you know when interim analyses are likely to be
performed (e.g. for administrative reasons). In either case, departures from this
assumption are allowed during the actual interim monitoring of the study, but sample
size requirements will be more accurate if allowance is made for this knowledge.
icon. Under Spacing of Looks in
With Des3 selected in the Library, click the
the Boundary tab, click the Unequal radio button. The column titled Info. Fraction
in the Look Details table can be edited to modify the relative spacing of the analyses.
The information fraction refers to the proportion of the maximum (yet unknown)
sample size. By default, this table displays equal spacing, but suppose that the two
interim analyses will be performed with 0.25 and 0.5 (instead of 0.333 and 0.667) of
the maximum sample size. Enter these new information fraction values and click
Compute to create design Des5. Select Des5 in the Output Preview and click
icon to save it in the Library for now.
Arbitrary amounts of error probability to be spent at each analysis
406

23.1 Difference of Two Binomials – 23.1.1 Trial Design

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
Another feature of East is the possibility to specify arbitrary amounts of cumulative
error probability to be used at each look. This option can be combined with the option
of unequal spacing of the analyses. With Des5 selected in the Library, click the
icon on the Library toolbar. Under the Boundary tab, select Interpolated for the
Spending Function. In the column titled Cum. α Spent, enter 0.005 for the first look
and 0.03 for the second look, and click Compute to create design Des6.

Select Des6 in the Output Preview and click
icon. From the Library, select
Des5 and Des6 by holding the Ctrl key. Click
icon, and under Select Chart on
the right, select Stopping Boundaries. The following chart will be displayed.

Computing power for a given sample size
When sample size is a given design constraint, East can compute the achieved power,
given the other test parameters. Select Des6 in the Library and click
icon. On
the Test Parameters tab, click the radio button for Power(1 − β). You will notice that
23.1 Difference of Two Binomials – 23.1.1 Trial Design

407

<<< Contents

23

* Index >>>

Binomial Superiority Two-Sample
the field for power will contain the word Computed. You may now enter a value for
the sample size: 1250, and click Compute.

The following output will appear in Output Preview in Des7 row, where, as expected,
the achieved power is less than 0.9, namely 0.714.

To delete this design, click Des7 in the Output Preview, and click
icon in the
textOutput Preview toolbar. East will display a warning to make sure that you want to
delete the selected row. Click Yes to continue.

Stopping Boundaries for Early Rejection of H0 or H1
Although both Des2 and
Des3 reduce the expected sample size substantially by rejecting H0 when H1 is true,
they are unable to do so if H0 is true. It is, however, often desirable to terminate a
study early if H0 is true since that would imply that the new treatment is no different
than the standard treatment. East can produce stopping boundaries that result in early
termination either under either H0 or H1 . Stopping boundaries for early termination if
408

23.1 Difference of Two Binomials – 23.1.1 Trial Design

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
H1 is true are known as efficacy boundaries. They are obtained by choosing
an appropriate α-spending function. These boundaries ensure that the type 1 error does
not exceed the pre-specified significance level α. East can also construct stopping
boundaries for rejecting H1 and terminating early if H0 is true. These stopping
boundaries are known as futility boundaries. They are obtained by choosing
an appropriate β spending function. These boundaries ensure that the type 2 error does
not exceed β and thereby ensure that the power of the study is preserved at 1 − β
despite the possibility of early termination for futility. Pampallona and Tsiatis (1994)
have extended the error spending function methodology of Lan and DeMets (1983) so
as to spend both α, the type-1 error, and β, the type-2 error, and thereby obtain efficacy
and futility boundaries simultaneously. East provides you with an entire catalog of
published spending functions from which you can take your pick for generating both
the H0 and H1 boundaries.
For various reasons, investigators usually prefer to be very conservative about early
stopping for efficacy but are likely to be more aggressive about cutting their losses and
stopping early for futility. Suppose then that you wish to use the conservative
Lan-DeMets (OF) spending function for early termination to reject H0 in favor of H1 ,
but use a more aggressive spending function for early termination to reject H1 in favor
of H0 . Possible choices for spending functions to reject H1 that are more aggressive
than Lan-DeMets(OF) but not as aggressive as Lan-DeMets(PK) are members of the
Rho family (Jennison and Turnbull, 2000) and the Gamma family (Hwang, Shih and
DeCani, 1990). For illustrative purposes we will use the Gamam(-1) spending
function from the Gamma family.
Select Des2 in the Library and click
icon. For the futility boundary on the
Boundary tab, select Spending Functions and then select Gamma Family.
Set the Parameter to −1. Also, click on the Binding option to the right. The screen

23.1 Difference of Two Binomials – 23.1.1 Trial Design

409

<<< Contents

23

* Index >>>

Binomial Superiority Two-Sample
will look like this:

On the Boundary tab, you may click
icon, or
icon to view plots of the
error spending functions, or stopping boundaries, respectively.

Observe that the β-spending function (upper in red) spends the type-2 error

410

23.1 Difference of Two Binomials – 23.1.1 Trial Design

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
substantially faster than the α-spending function (lower in blue).

These stopping boundaries are known as inner-wedge stopping boundaries. They
divide the sample space into three zones corresponding to three possible decisions. If
the test statistic enters the lower blue zone, we terminate the trial, reject H0 , and
conclude that the new treatment (Abciximab) is beneficial relative to the placebo. If
the test statistic enters the upper blue zone, we terminate the trial, reject H0 , and
conclude that the new treatment is harmful relative to the placebo. If the test statistic
enters the center (pink) zone, we terminate the trial, reject H1 , and conclude that
Abciximab offers no benefit relative to placebo. Assuming that the event rate is 0.15
for the placebo arm, this strategy has a 2.5% chance of declaring benefit and a 2.5%
chance of declaring harm when the event rate for the Abciximax arm is also 0.15.
Furthermore this strategy has a 20% chance of entering the pink zone and declaring no
benefit when there actually is a substantial benefit with Abciximax, resulting in a drop
in the event rate from 0.15 to 0.1. In other words, Des7 has a two-sided type-1 error of
5% and 80% power.
Click Compute and with Des7 selected in the Output Preview, click the

icon.

To view the design details, click the
icon. Des7 requires an up-front
commitment of 1468 patients, but the expected sample size is 1028 patients under H0 ,
and 1164 patients under H1 . You may wish to save this output (e.g., in HTML format)
by clicking on the

icon, or to print by clicking on the

23.1 Difference of Two Binomials – 23.1.1 Trial Design

icon. Close the
411

<<< Contents

23

* Index >>>

Binomial Superiority Two-Sample
output window before continuing.
Boundaries with Early Stopping for Benefit or Futility
Next suppose you are interested in designing the clinical trial in such a way that you
can reach only two conclusions, not three. You wish to demonstrate either that
Abciximab is beneficial relative to placebo or that it offers no benefit relative to
placebo, but there is no interest in demonstrating that Abciximab is harmful relative to
placebo. To design this two-decision trial select Des7 in the Library and click the
icon. Change the entry in the Test Type cell from 2-Sided to 1-Sided. Check to
ensure other specifications are same as in Des7. Click Compute to generate the
design.

The error spending functions are the same but this time the stopping boundaries divide

412

23.1 Difference of Two Binomials – 23.1.1 Trial Design

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
the sample space into two zones only as shown below.

If the test statistic enters the lower (blue) zone, the null hypothesis is rejected in favor
of concluding that Abciximab is beneficial relative to placebo. The probability of this
event under H0 is 0.05. If the test statistic enters the upper (pink) zone the alternative
hypothesis is rejected in favor of concluding that Abciximab offers no benefit relative
to placebo. The probability of this event under H1 is 0.2. In other words, Design8 has
a one sided type-1 error rate of 5% and 80% power. Since Design8 precludes the
possibility of demonstrating that Abciximab is harmful relative to placebo, it requires
far fewer patients. It only requires an up-front commitment of 1156 patients and the
expected sample size is 681 if H0 is true and 892 if H1 is true.
Before continuing to the next section, we will save the current workbook, and open a
new workbook. Select the workbook node in the Library and Click the
button
in the top left hand corner, and click Save. Alternatively, select Workbook1 in the
Library and right-click, then click Save. This saves all the work done so far on your
directory.
Next, click the
button, click New, and then Workbook. A new workbook,
Wbk2, should appear in the Library. Next, close the window to clear all designs from
the Output Preview.

23.1 Difference of Two Binomials – 23.1.1 Trial Design

413

<<< Contents

23

* Index >>>

Binomial Superiority Two-Sample
Multiple designs for discrete outcomes
East allows the user to easily create
multiple designs by specifying a range of values for certain parameters in the design
window. In studies with discrete outcomes, East supports the input of multiple key
parameters at once to simultaneously create a number of different designs. For
example, suppose in a multi-look study the user wants to generate designs for all
combinations of the following parameter values in a two sample Difference of
Proportions test: Power = 0.8 and 0.9, and Alternative Hypothesis - Prop. under
Treatment = 0.4, 0.5 and 0.6. The number of combinations is 2 x 3 = 6. East creates
all permutations using only a single specification under the Test Parameters tab in the
design window. As shown below, the values for Power are entered as a list of comma
separated values, while the Prop. under Treatment for the alternative hypothesis are
entered as a colon separated range of values, 0.4. to 0.6 in steps of 0.1.

East computes all 6 designs and displays them in the Output Preview window:

East provides the capability to analyze multiple designs in ways that make comparisons
between the designs visually simple and efficient. To illustrate this, a selection of a few
of the above designs can be viewed simultaneously in both the Output Summary
section as well as in the various tables and plots. The following is a subsection of the
designs computed from the above example with differing values for number of looks,
power and proportion under treatment. Designs are displayed side by side, allowing
details to be easily compared. Save these designs in the newly created workbook.

414

23.1 Difference of Two Binomials – 23.1.1 Trial Design

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
In addition East allows multiple designs to be viewed simultaneously either
graphically or in tabular format: Stopping Boundaries (table)

23.1 Difference of Two Binomials – 23.1.1 Trial Design

415

<<< Contents

23

* Index >>>

Binomial Superiority Two-Sample
Error Spending (table)

Stopping Boundaries (plot)

416

23.1 Difference of Two Binomials – 23.1.1 Trial Design

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
Power vs. Treatment Effect (plot)

This capability allows the user to explore a greater space of possibilities when
determining the best choice of study design.
Select individual looks
With Des8 selected in Wbk1, click
icon. In the Spacing of Looks table of the
Boundary tab, notice that there are ticked checkboxes under the columns Stop for
Efficacy and Stop for Futility. East gives you the flexibility to remove one of the
stopping boundaries at certain looks, subject to the following constraints: (1) both
boundaries must be included at the final two looks, (2) at least one boundary, either
efficacy or futility, must be present at each look, (3) once a boundary has been selected
all subsequent looks must include this boundary as well and (4) efficacy boundary for
the penultimate look cannot be absent.
Untick the checkbox in the first look under the Stop for Futility column.

23.1 Difference of Two Binomials – 23.1.1 Trial Design

417

<<< Contents

23

* Index >>>

Binomial Superiority Two-Sample
Click Recalc, and click
icon to view the new boundaries. Notice that the futility
boundary does not begin until the second look.

Simulation Tool
Let us verify the operating characteristics of Des8 from Wkbk1 through Simulations.
Select Des8 in the Library, and click
icon from Library toolbar. Alternatively,
right-click on Des8 and select Simulate. A new Simulation worksheet will appear.
Let us first verify, by running 10,000 simulated clinical trials that the type-1 error is
indeed 5%. That is, we must verify that if the event rate for both the placebo and
treatment (Abciximab) arms is 0.15, only about 500 of these simulations will reject
H0 . Click on the Response Generation tab, and change the entry in the cell labeled
Prop. Under Treatment from 0.1 to 0.15.

418

23.1 Difference of Two Binomials – 23.1.1 Trial Design

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
Next, click Simulate to start the simulation. Once the simulation run has completed,
East will add an additional row to the Output Preview labeled Sim1.
Select Sim1 in the Output Preview. Click
icon to save it to the Library.
Double-click on Sim1 in the Library. The simulation output details will be displayed.
In the Deatils output, notice that 487 of the 10,000 simulations rejected H0 . (This
number might vary, depending on the starting seed used for the simulations.) This
confirms that the type-1 error is preserved (up to Monte Carlo accuracy) by these
stopping boundaries.

Next, run 10,000 simulations under the alternative hypothesis H1 that the event rate for
placebo is 0.15 but the event rate for Abciximab is 0.1. Right-click Sim1 in the Library
and click Edit Simulation. In the Response Generation tab, enter 0.10 for Prop.
Under Treatment. Leave all other values as they are, and click Simulate to create
output Sim2. Select Sim2 in the Output Preview and save it to Workbook Wbk1. In
the Overall Simulation Result table, notice that the lower efficacy stopping boundary
was crossed in 7996 out of 10000 simulated trials, which is consistent with 80% power
(up to Monte Carlo accuracy) for the original design. Moreover, 393 of these
simulations were able to reject the null hypothesis at the very first look. Feel free to

23.1 Difference of Two Binomials – 23.1.1 Trial Design

419

<<< Contents

23

* Index >>>

Binomial Superiority Two-Sample
experiment further with other simulation options before continuing.

23.1.2

Interim Monitoring

The spending functions discussed above were for illustrative purposes only. They were
not used in the actual CAPTURE trial. Instead, the investigators created their own
spending function which is closely approximated by the Gamma spending function of
Hwang, Shih and DeCani (1990) with parameter −4.5. The investigators then used this
spending function to generate two-sided boundaries for early stopping only to reject
H0 . Moreover since it was felt that the trial would enroll patients rapidly, the study
was designed for three unequally spaced looks; one interim analysis after 25%
enrollment, a second interim analysis after 50% enrollment, and a final analysis after
all the patients had enrolled.
icon. In the Boundary
To design this trial, select Des2 in the Library and click
tab, in the Efficacy box, set Spending Function to Gamma Family and change the
Parameter (γ) to −4.5. In the Futility Box, make sure Boundary Family is set to
None. Click the radio button for Unequal in the Spacing of Looks box. In the Looks
Details table change the Info. Fraction to 0.25 and 0.50 for Looks 1 and 2,

420

23.1 Difference of Two Binomials – 23.1.2 Interim Monitoring

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
respectively.

Click Comptue. In the Output Preview toolbar, click
icon to save this design
to Wbk1 in the Library. Select Des9 in the Library, and click the
icon from
the Library toolbar. Alternatively, right-click on Des9 and select Interim Monitoring
dashboard. The interim monitoring dashboard contains various controls for monitoring
the trial, and is divided into two sections. The top section contains several columns for
displaying output values based on the interim inputs. The bottom section contains four
charts, each with a corresponding table to its right. These charts provide graphical and
numerical descriptions of the progress of the clinical trial and are useful tools for
decision making by a data monitoring committee.
Click on the
icon to invoke the Test Statistic Calculator. The
first interim look was taken after accruing a total of 350 patients, 175 per treatment
arm. There were 30 events on the placebo arm and 14 on the Abciximab arm. Based
on these data, the event rate for placebo is 30/175 = 0.17143 and the event rate for
Abciximab is 14/175 = 0.08. Hence the estimate of
δ = 0.08 − 0.17143 = −0.09143. The unpooled estimate of the SE of δ̂ is
r
(14/175)(161/175) (30/175)(145/175)
+
= 0.035103.
(23.7)
175
175
So the value of test statistic is
δ̂
−0.09143
=
= −2.60457
SE
0.035103

(23.8)

We will use the test statistic calculator and specify the values of δ̂ and SE in the same.
23.1 Difference of Two Binomials – 23.1.2 Interim Monitoring

421

<<< Contents

23

* Index >>>

Binomial Superiority Two-Sample
The test statistic calculator will then compute the test statistic value and post it into the
interim monitoring sheet. This process will ensure that the RCI and final adjusted
estimates will be computed using the estimates of δ and SE obtained from the observed
data.
Click on the Estimate of δ and Std. Error of δ radio button. Type in
(14/175) − (30/175) for Estimate of δ. The Estimate of δ is computed as
−0.091429. We can then enter the expression given by (23.7) for the Std. Error of
Estimate of δ. Click on Recalc to get the Test Statistic value, then OK to continue.

The top panel of the interim monitoring worksheet displays upper and lower stopping
boundaries and upper and lower 95% repeated confidence intervals. The lower
stopping boundary for rejecting H0 is -3.239. Since the current value of the test
statistic is -2.605, the trial continues. The repeated confidence interval is
(−0.205, 0.022). We thus conclude, with 95% confidence, that Abciximab arm is
unlikely to increase the event rate by any more than 2.2% relative to placebo and might
actually reduce the event rate by as much as 20.5%.

422

23.1 Difference of Two Binomials – 23.1.2 Interim Monitoring

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
Now click on the second row in the table in the upper section. Then click the
icon. A second interim look was taken after accruing a total of
700 patients, 353 on placebo and 347 on Abciximab. By this time point there were a
total of 55 events on the placebo arm and 37 events on the Abciximab arm.
Based on these data, the event rate for placebo is 55/353 = 0.15581 and the event rate
for Abciximab is 37/347 = 0.10663. Hence the estimate of
δ = 0.10663 − 0.15581 = −0.04918. The unpooled estimate of the SE of δ̂ is
r
(37/347)(310/347) (55/353)(298/353)
+
= 0.02544.
(23.9)
347
353
So the value of test statistic is
−0.04918
δ̂
=
= −1.9332
SE
0.02544

(23.10)

We will now enter the above values of δ̂ and SE in the test statistic calculator for
posting the test statistic value into the interim monitoring sheet. Enter the appropriate
values for Cumulative SS and Cumulative Response. Click the Recalc button. The
calculator updates the fields - total sample size, δ and SE.

23.1 Difference of Two Binomials – 23.1.2 Interim Monitoring

423

<<< Contents

23

* Index >>>

Binomial Superiority Two-Sample
The updated sheet is displayed below.

At this interim look, the stopping boundary for early rejection of H0 is ±2.868 and the
95% repeated confidence interval is still unable to exclude a difference of zero for the
two event rates. Thus the study continues. The Stopping Boundaries chart of the
dashboard displays the path traced out by the test statistic in relation to the upper and
lower stopping boundaries at the first two interim looks. To expand this chart to full
size, click the

icon located at the top right of the chart.

This full-sized chart displays stopping boundaries that have been recomputed on the
basis of the error spent at each look, as shown on the Error Spending chart located at
the bottom left of the dashboard. To display this full-sized chart, close the current chart

424

23.1 Difference of Two Binomials – 23.1.2 Interim Monitoring

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016

and click the

icon on the Error Spending chart.

By moving the vertical cursor from left to right on this chart we observe that 0.0012 of
the total error was spent by the first interim look and 0.005 of it was spent by the
second interim look. Close this chart before continuing.
Although this study was designed for two interim looks and one final look, the data
monitoring committee decided to take a third unplanned look after accruing 1050
patients, 532 on placebo and 518 on Abciximab. The error spending function
methodology permits this flexibility. Both the timing and number of interim looks may
be modified from what was proposed at the design stage. East will recompute the new
stopping boundaries on the basis of the error actually spent at each look rather than the
error that was proposed to be spent. There were 84 events on the placebo arm and 55
events on the Abciximab arm.
Hence the estimate of δ = 0.1062 − 0.1579 = −0.05171. The unpooled estimate of
the SE of δ is 0.02081. So the value of test statistic is −2.4849. Click the third row of
the table in the top portion and then click the
icon. Upon entering
this summary information, through the test statistic calculator, into the interim

23.1 Difference of Two Binomials – 23.1.2 Interim Monitoring

425

<<< Contents

23

* Index >>>

Binomial Superiority Two-Sample
monitoring sheet we observe that the stopping boundary is crossed.

Press the Stop button and observe the results in the interim monitoring worksheet.

The 95% repeated confidence interval is (−0.103, −0.011) and it excludes 0 thus
confirming that the null hypothesis should be rejected. Once the study is terminated,
East computes a final p-value, confidence interval and median unbiased point estimate,
all adjusted for the multiple looks, using a stage wise ordering of the sample space as
proposed by Tsiatis, Rosner and Mehta (1984). The adjusted p-value is 0.016. The
adjusted confidence interval for the difference in event rates is (−0.092, −0.010) and
the median unbiased estimate of the difference in event rates is −0.051. In general, the
adjusted confidence interval produced at the end of the study is narrower than the final
426

23.1 Difference of Two Binomials – 23.1.2 Interim Monitoring

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
repeated confidence interval although both intervals provide valid coverage of the
unknown effect size.

23.1.3

Pooled versus Unpooled Designs

The manner in which the data will be analyzed at the interim monitoring stage should
be reflected in the study design. We stated at the beginning of this chapter that the test
statistic used to track the progress of a binomial endpoint study could be computed by
using either the unpooled variance or the pooled variance to standardize the difference
of binomial proportions. The design of the CAPTURE trial in Section 23.1.1 and its
interim monitoring in Section 23.1.2 were both performed on the basis of the unpooled
statistic. In this section we examine how the design would change if we intended to
use the pooled statistic for the interim monitoring. It is seen that the change in sample
size is negligible if the randomization is balanced. If, however, an unbalanced
randomization rule is adopted, there can be substantial sample size differences between
the unpooled and pooled designs.
Consider once more the design of the CAPTURE trial with a maximum of K = 3
looks, stopping boundaries generated by the Gamma(-4.5) Gamma family spending
function, and 80% power to detect a drop in the event rate from 0.15 on the placebo
arm to 0.1 on the Abciximab arm using a two sided level 0.05 test. We now consider
the design of this trial on the basis of the pooled statistic.
Select Des9 in the Library and click
icon. Then under the Test Parameters
tab, in the Specify Variance box, select the radio button for Pooled Estimate.

Click the Compute button to create Des10. Save Des10 to Wbk1. In the Library

23.1 Pooled Designs – 23.1.3 Pooled versus Unpooled Designs

427

<<< Contents

23

* Index >>>

Binomial Superiority Two-Sample
select Des9 and Des10 by holding the Ctrl key, and then click on the

icon.

It is instructive to compare Des9 with Des10. It is important to remember that Des9
utilized the unpooled design while Des10 utilized the pooled design.
When we compare Des9 and Des10 side by side we discover that there is not much
difference in terms of either the maximum or expected sample sizes. This is usually the
case for balanced designs. If, however, we were to change the value of the Allocation
Ratio parameter from 1 to 0.333 (which corresponds to assigning 25% of the patients
to treatment and 75% to control), then we would find a substantial difference in the
sample sizes of the two plans. In the picture below, Des11 utilizes the unpooled design

428

23.1 Pooled Designs – 23.1.3 Pooled versus Unpooled Designs

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
while Des12 utilizes the pooled design.

Notice that because of the unbalanced randomization the unpooled design is able to
achieve 80% power with 229 fewer patients than the pooled design. Specifically, if we
decide to monitor the study with the test statistic (23.2) we need to commit a maximum
of 1908 patients (Des12), whereas if we decide to monitor the study with the test
statistic (23.1) we need to commit a maximum of only 1679 patients (Des11). We can
verify, by simulation that both Des11 and Des12 produce 80% power under the
alternative hypothesis.
After saving Des11 and Des12 in Workbook1, select Des11 in the Library and click
the
icon. Next, click the Simulate button. The results are displayed below and
demonstrate that the null hypothesis was rejected 7710 times in 10,000 trials (77.10%),

23.1 Pooled Designs – 23.1.3 Pooled versus Unpooled Designs

429

<<< Contents

23

* Index >>>

Binomial Superiority Two-Sample
very close to the desired 80% power.

Next, repeat the procedure for Design12. Observe that once again, the desired power
was almost achieved. This time the null hypothesis was rejected 7916 times in 10,000

430

23.1 Pooled Designs – 23.1.3 Pooled versus Unpooled Designs

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
trials (79.77%), just slightly under the desired 80% power.

The power advantage of the unpooled design over the pooled design gets reversed if
the proportion of patients randomized to the treatment arm is 75% instead of 25%. Edit

23.1 Pooled Designs – 23.1.3 Pooled versus Unpooled Designs

431

<<< Contents

23

* Index >>>

Binomial Superiority Two-Sample
Des11 and Des12, and change the Allocation Ratio parameter to 3.

Now the pooled design (Des14) requires a maximum of 1770 patients whereas the
unpooled des (Des13) requires a maximum of 1995 patients. This shows that when
planning a binomial study with unbalanced randomization, it is important to try both
the pooled and unpooled designs and choose the one that produces the same power
with fewer patients. The correct choice will depend on the response rates of the control
and treatment arms as well as on the value of the fraction assigned to the treatment arm.

23.2

Ratio of Proportions

23.2.1 Trial Design
23.2.2 Trial Simulation
23.2.3 Interim Monitoring

Let πc and πt denote the binomial probabilities for the control and treatment arms,
respectively, and let ρ = πt /πc . We want to test the null hypothesis that ρ = 1 against
one or two-sided alternatives. It is mathematically more convenient to express this
hypothesis testing problem in terms of the difference of the (natural) logarithms. Thus
we define δ = ln(πt ) − ln(πc ). On this metric, we are interested in testing H0 : δ = 0
against one or two-sided alternative hypotheses. Let π̂ij denote the estimate of πi
based on nij observations from Treatment i, up to and including the j th look,
j = 1, . . . K, i = t, c , where a maximum of K looks are to be taken. Then the
estimate of δ at the j-th look is
δ̂j = ln(π̂tj ) − ln(π̂cj )

432

23.2 Ratio of Proportions

(23.11)

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
with estimated standard error
se
ˆj ={

(1 − π̂tj ) (1 − π̂cj ) 1/2
+
}
ntj π̂tj
ncj π̂cj

(23.12)

if we use an unpooled estimate for the variance of δ̂ and estimated standard error
se
ˆj ={

(1 − π̂j ) −1
1/2
(ntj + n−1
,
cj )}
π̂j

where
π̂j =

(23.13)

ntj π̂tj + ncj π̂cj
,
ntj + ncj

if we use a pooled estimate for the variance of δ̂.
In general, for any twice-differentiable function h(.), with derivative h0 (.), h(π̂ij ) is
approximately normal with mean h(πi ) and variance [h0 (πi )]2 πi (1 − πi )/nij for large
values of nij . Using this asymptotic approximation, the test statistic at the j th look is
(u)

Zj

=

ln(π̂tj ) − ln(π̂cj )
,
(1−π̂ ) 1/2
(1−π̂tj )
}
{ ntj π̂tj + ncj π̂cj
cj

(23.14)

i.e. the ratio of (23.11) and (23.12) , if we use an unpooled estimate for the variance of
ln(π̂tj ) − ln(π̂cj ) and
(p)

Zj

=

ln(π̂tj ) − ln(π̂cj )
(1−π̂j )
−1 1/2
{ π̂j (n−1
tj + ncj )}

,

(23.15)

i.e. the ratio of (23.11) and (23.13), if we use a pooled estimate for the variance of
ln(π̂tj ) − ln(π̂cj ).

23.2.1

Trial Design

Design objectives and interim results were presented from PRISM, a prospective
randomized trial of Heparin alone (control arm), Tirofiban alone (monotherapy arm),
and Heparin plus Tirofiban (combination therapy arm), at a DIA Workshop on Flexible
Trial Design (Snappin, 2003). The composite endpoint was refractory ischemia,
myocardial infact or death within seven days of randomization. The investigators were
interested in comparing the two Tirofiban arms to the control arm with each test being
conducted at the 0.025 level of significance (two sided). It was assumed that the
control arm has a 30% event rate. Thus, πt = πc = 0.3 under H0 . The investigators
wished to determine the sample size to have power of 80% if there was a 25% decline
23.2 Ratio of Proportions – 23.2.1 Trial Design

433

<<< Contents

23

* Index >>>

Binomial Superiority Two-Sample
in the event rate, i.e. πt /πc = 0.75. It is important to note that the power of the test
depends on πc and πt , not just the ratio, so different values of the pair (πc , πt ) with the
same ratio will have different solutions.
We will now design a two-arm study that compares the control arm, Heparin, to the
combination therapy arm, Heparin plus Tirofiban. First click Design tab, then Two
Samples on the Discrete group, and then click Ratio of Proportions.

We want to determine the sample size required to have power of 80% when πc =0.3 and
ρ = πt /πc =0.75, using a two-sided test with a type 1 error rate of 0.025.
Single-Look Design- Unpooled Estimate of Variance
First consider a study with
only one look and equal sample sizes in the two groups. Select the input parameters as
displayed below.

We will use the test statistic (23.14) with the unpooled estimate of the variance. Click
the Compute button. The design Des1 is shown as a row in the Output Preview,
located in the lower pane of this window. This single-look design requires a combined
434

23.2 Ratio of Proportions – 23.2.1 Trial Design

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
total of 1328 subjects from both treatments in order to attain 80% power.

You can select this design by clicking anywhere on the row in the Output Preview. If
you click

, some of the design details will be displayed in the upper pane.

In the Output Preview toolbar, click
wbk1 in the Library.

icon to save this design to Workbook

Three-Look Design - Unpooled Estimate of Variance
For the above study, suppose
we wish to take up to two equally spaced interim looks and one final look at the
accruing data, using the Lan-DeMets (O’Brien-Fleming) stopping boundary. Create a
new design by selecting Des1 in the Library, and clicking the
icon. In the
input, change the Number of Looks from 1 to 3, to generate a study with two interim
looks and a final analysis. A new tab for Boundary will appear. Click this tab to reveal
the stopping boundary parameters. By default, the Lan-DeMets (O’Brien-Fleming)

23.2 Ratio of Proportions – 23.2.1 Trial Design

435

<<< Contents

23

* Index >>>

Binomial Superiority Two-Sample
stopping boundary and equal spacing of looks are selected.

Click Compute to create design Des2. The results of Des2 are shown in the Output
Preview window. With Des2 selected in the Output Preview, click
icon. In the
Library, select the nodes for both Des1 and Des2, by holding the Ctrl key, and then
click the
side-by-side:

436

icon. The upper pane will display the details of the two designs

23.2 Ratio of Proportions – 23.2.1 Trial Design

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
Although, the maximum sample size has increased from 1328 to 1339, using three
planned looks may result in a smaller sample size than that required for the single-look
design, with an expected sample size of 1168 subjects under the alternative hypothesis
(πc = 0.3, ρ = 0.75), and still ensures that the power is 80%.
Additional information can also be obtained from Des2. The Lan-DeMets spending
function corresponding to the O’Brien-Fleming boundary can be viewed by selecting
Des2 in the Library, clicking on the
icon, and selecting Stopping Boundaries.
The following chart will appear:

The alpha-spending function can be viewed by selecting Des2 in the Library, clicking

23.2 Ratio of Proportions – 23.2.1 Trial Design

437

<<< Contents

23

* Index >>>

Binomial Superiority Two-Sample
on the

icon, and selecting Error Spending.

In order to see the stopping probabilities, as well as other characteristics, select Des2 in
the Library, and click the
icon. The cumulative boundary stopping
probabilities are shown in the Stopping Boundaries table.

Close this window before continuing.
Three-Look Design - Pooled Estimate of Variance
We now consider this design
using the statistic (23.15) with the pooled estimate of the variance. Create a new
icon. Under the Test
design by selecting Des2 in the Library, and clicking the
Parameters tab, select the radio button for Pooled Estimate in the Variance of
438

23.2 Ratio of Proportions – 23.2.1 Trial Design

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
Standardized Test Statistic box. Leave everything else unchanged. Click the
Compute button to generate the output for Des3. Save Des3 by selecting it in the
Output Preview and clicking the

icon. In the Library, select the nodes for

Des1, Des2, and Des3, by holding the Ctrl Key, and then click the
upper pane will display the details of the three designs side-by-side:

icon. The

For this problem, the test statistic (23.14) with the unpooled estimate of the variance
requires a smaller sample size than the test statistic (23.15) with the pooled estimate of
the variance. Close this window before continuing.

23.2.2

Trial Simulation

Suppose we want to see the impact of πt on the behavior of the test statistic (23.14)
with the unpooled estimate of the variance. First we consider πt = 0.225 as specified
by the alternative hypothesis. With Des2 selected in the Library, click the
icon.
Click on the Simulate button. The results of the simulation will appear under Sim1 in
the Output Preview. Select Sim1 in the Output Preview and click
icon to save
it to Wbk1. Double-click on Sim1 in the Library to display the results of the
simulation. Although the actual values may differ, we see that the power is

23.2 Ratio of Proportions – 23.2.2 Trial Simulation

439

<<< Contents

23

* Index >>>

Binomial Superiority Two-Sample
approximately 80% and the probability of stopping early is about 0.37.

Now we consider πt = 0.25, which will provide us with the impact if we were too
optimistic about the treatment effect. Select Sim1 in the Library and click the
icon. Under the Response Generation tab, enter the value of 0.25 next to Prop.
Under Treatment (πt1 ). Click Simulate button. Although the actual values may

440

23.2 Ratio of Proportions – 23.2.2 Trial Simulation

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
differ, we see that the power is approximately 41%.

23.2.3

Interim Monitoring

Consider interim monitoring of Des2. Select Des2 in the Library, and click the
icon from the Library toolbar. The interim monitoring dashboard contains various
controls for monitoring the trial, and is divided into two sections. The top section
contains several columns for displaying output values based on the interim inputs. The
bottom section contains four charts, each with a corresponding table to its right.
Suppose that the results are to be analyzed after results are available for every 450
icon in the upper left to invoke the Test
subjects. Click on
Statistics Calculator. Select the radio-button to enter δ̂ and its standard error. Enter
450 in the box next to Cumulative Sample Size. Suppose that after the data were
available for first 450 subjects, 230 subjects were randomized to the control arm (c)
and 220 subjects were randomized to the treatment arm (t). Of the 230 subjects in the
control arm, there were 65 events; of the 220 subjects in the treatment arm, there were
45 events. In the box next to Estimate of δ enter: ln((45/220)/(65/230)) and then hit
Enter. EAST will compute the estimate of δ. Enter 0.169451 in the box next to Std.

23.2 Ratio of Proportions – 23.2.3 Interim Monitoring

441

<<< Contents

23

* Index >>>

Binomial Superiority Two-Sample
Error of δ. Next click Recalc. You should now see the following:

Next, click OK. The following table will appear in the top section of the IM
Dashboard.

Note - Click on
icon to hide or unhide the columns of your interest. RCI for
δ. Keeping all the four boxes checked can display RCI on both the scales.
The boundary was not crossed as the value of the test statistic Test Statistic is -1.911,
which is within the boundaries (-4.153, 4.153), so the trial continues. After data were
available for an additional 450 subjects, the second analysis is performed. Suppose that
among the 900 subjects, 448 were randomized to control (c) and 452 were randomized
to (t). Of the 448 subjects in the control arm, there were 132 events; of the 452
subjects in the treatment arm, there were 90 events.
Click on the second row in the table in the upper section. Then click
442

23.2 Ratio of Proportions – 23.2.3 Interim Monitoring

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016

icon. Enter 900 box next to Sample Size (Overall). Then in the
box next to Estimate of δ enter: ln((90/452)/(132/448)). Next hit Enter, then enter
0.119341 in the box next to Std. Error of δ. Click Recalc then OK.
The value of the test statistic is -3.284, which is less than -2.833, the value of the lower
boundary, so the following dialog box appears.

Click on Stop to stop any further analyses. The Final Inference Table shows that the
adjusted point estimate of ln(ρ) is -0.392 (p = 0.001) and the final adjusted 97.5%
confidence interval for ln(ρ) is (-0.659, -0.124).

23.2 Ratio of Proportions – 23.2.3 Interim Monitoring

443

<<< Contents

23
23.3

* Index >>>

Binomial Superiority Two-Sample
Odds Ratio of
Proportions

Let πt and πc denote the two binomial probabilities associated with the treatment and
the control, respectively. Furthermore, let the odds ratio be

23.3.1 Trial Design
23.3.2 Trial Simulation
23.3.3 Interim Monitoring

ψ=

πt (1 − πc )
πt /(1 − πt )
=
.
πc /(1 − πc )
πc (1 − πt )

(23.16)

We are interested in testing H0 : ψ = 1 against the two-sided alternative H1 : ψ 6= 1 or
against a one-sided alternative H1 : ψ < 1 or H1 : ψ > 1. It is convenient to express
this hypothesis testing problem in terms of the (natural) logarithm of ψ. Let π̂tj and
π̂cj denote the estimates of πt and πc based on ntj and ncj observations from the
treatment and the control, respectively, up to and including the j th look, j = 1, . . . , K,
where a maximum of K looks are to be made.
The difference between treatments at the j th look is assessed using
δ̂j = ln(π̂tj /(1 − π̂tj )) ln(π̂cj /(1 − π̂cj )).

(23.17)

Using the asymptotic approximation presented in section 23.2, the estimate of the
standard error of δˆj at the j th look is
se
ˆ j = {1/ntj π̂tj (1 − π̂tj ) + 1/ncj π̂cj (1 − π̂cj )}1/2 ,

(23.18)

and the test statistic at the j-th look is the ratio of δˆj , given by (23.17), and the estimate
of the standard error of δj , given by (23.18), namely,
Zj =

23.3.1

ln(π̂tj /(1 − π̂tj )) − ln(π̂cj /(1 − π̂cj ))
.
{1/ntj π̂tj (1 − π̂tj ) + 1/ncj π̂cj (1 − π̂cj )}1/2

(23.19)

Trial Design

Suppose that the response rate for the control treatment is 10% and we hope that the
experimental treatment can triple the odds ratio; that is, we desire to increase the
response rate to 25%. Although we hope to increase the odds ratio, we solve this
problem using a two-sided testing formulation. The null hypothesis H0 : ψ = 1 is
tested against the two-sided alternative H1 : ψ 6= 1. The power of the test is computed
at specified values of πc and ψ. Note that the power of the test depends on πc and ψ, or
equivalently πc and πt , not just the odds ratio. Thus, different values of πc with the
same odds ratio will have different solutions.
First, click Design tab, then click Two Samplesin the Discrete group, and then click
444

23.3 Binomial Odds Ratio – 23.3.1 Trial Design

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
Odds Ratio of Proportions.

Suppose we want to determine the sample size required to have power of 80% when
πc = 0.1 and ψ1 = 3 using a two-sided test with a type-1 error rate of 0.05.
Single-Look Design
First consider a study with only one look and equal sample
sizes in the two groups. Enter the appropriate design parameters so that the dialog box
appears as shown. Then click Compute.

The design Des1 is shown as a row in the Output Preview, located in the lower pane
of this window. This single-look design requires a combined total of 214 subjects from

23.3 Binomial Odds Ratio – 23.3.1 Trial Design

445

<<< Contents

23

* Index >>>

Binomial Superiority Two-Sample
both treatments in order to attain 80% power.

You can select this design by clicking anywhere on the row in the Output Preview. If
you click

icon, some of the design details will be displayed in the upper pane. In

the Output Preview toolbar, click the
Library.

icon to save this design to Wbk1 in the

Three-Look Design
For the above study, suppose we wish to take up to two equally
spaced interim looks and one final look at the accruing data, using the Lan- DeMets
(O’Brien-Fleming) stopping boundary. Create a new design by selecting Des1 in the
Library, and clicking
icon. In the input, change the Number of Looks from 1
to 3, to generate a study with two interim looks and a final analysis. A new tab for
Boundary will appear. Click this tab to reveal the stopping boundary parameters. By
default, the Lan-DeMets (O’Brien-Fleming) stopping boundary and equal spacing of

446

23.3 Binomial Odds Ratio – 23.3.1 Trial Design

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
looks are selected.

Click Compute button to design Des2. The results of Des2 are shown in the Output
Preview window. With Des2 selected in the Output Preview, click the
icon. In
the Library, select the rows for both Des1 and Des2, by holding the Ctrl key, and then
click the

icon. The upper pane will display the details of the two designs

23.3 Binomial Odds Ratio – 23.3.1 Trial Design

447

<<< Contents

23

* Index >>>

Binomial Superiority Two-Sample
side-by-side:

Using three planned looks may result in a smaller sample size than that required for the
single-look design, with an expected sample size of 186 subjects under the alternative
hypothesis (πc = 0.1, ψ = 3), and still ensures the power is 80%.
Additional information can also be obtained from Des2. The Lan-DeMets spending
function corresponding to the O’Brien-Fleming boundary can be viewed by selecting
Des2 in the Library, clicking on the
icon, and selecting Stopping Boundaries.

448

23.3 Binomial Odds Ratio – 23.3.1 Trial Design

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
The following chart will appear:

The alpha-spending function can be viewed by selecting Des2 in the Library, clicking

23.3 Binomial Odds Ratio – 23.3.1 Trial Design

449

<<< Contents

23

* Index >>>

Binomial Superiority Two-Sample
on the

icon, and selecting Error Spending.

In order to see the stopping probabilities, as well as other characteristics, select Des2 in
icon. The cumulative boundary stopping
the Library, and click the
probabilities are shown in the Stopping Boundaries table. East displays the stopping
boundary, the type-1 error spent and the boundary crossing probabilities under
H0 : πc = 0.1, ψ = 1 and the alternative hypothesis H1 : πc = 0.1, ψ = 3.

Close this window before continuing.

450

23.3 Binomial Odds Ratio – 23.3.2 Trial Simulation

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016

23.3.2

Trial Simulation

Suppose we want to see the impact of πt on the behavior of the test statistic (23.19).
First we consider πt = 0.25 as specified by the alternative hypothesis. With Des2
selected in the Library, click
icon. Next, click Simulate button. The results of
the simulation will appear under Sim1 in the Output Preview. Highlight Sim1 in the
Output Preview and click
icon to save it to workbook Wbk1. Double-click on
Sim1 in the Library to display the results of the simulation. Although your results
may differ slightly, we see that the power is approximately 83% and the probability of
stopping early is about 0.39.

Now we consider πt = 0.225, which will provide us with the impact if we were too
icon.
optimistic about the treatment effect. Select Sim1 in the Library and click
Under the Response Generation tab, enter the value of 0.225 next to Prop. Under
Treatment (πt ). Click Simulate. Although, the actual values may differ, we see that
the power is approximately 68% and the probability of stopping early is about 0.26.

23.3.3

Interim Monitoring

Consider interim monitoring of Des2. Select Des2 in the Library, and click
23.3 Binomial Odds Ratio – 23.3.3 Interim Monitoring

451

<<< Contents

23

* Index >>>

Binomial Superiority Two-Sample
icon from the Library toolbar. The interim monitoring dashboard contains various
controls for monitoring the trial, and is divided into two sections. The top section
contains several columns for displaying output values based on the interim inputs. The
bottom section contains four charts, each with a corresponding table to its right.
Suppose that the results are to be analyzed after results are available for every 70
subjects. Click on
icon in the upper left to invoke the Test
Statistics Calculator. Select the second radio button on the calculator to enter values
of δ̂ and its standard error. Before that enter 70 in the box next to Cumulative Sample
Size. Suppose, after the data were available for first 70 subjects, 35 subjects were
randomized to the control arm (c), of whom 5 experienced a response, and 35 subjects
were randomized to the treatment arm (t), of whom 9 subjects experienced a response.
In the box next to Estimate of δ enter 0.730888 and in the box next to Std. Error of δ
enter 0.618794. Next click Recalc. You should now see the following:

452

23.3 Binomial Odds Ratio – 23.3.3 Interim Monitoring

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
Click OK and the following entry will appear in the top section of the IM Dashboard.

Note - Click on

icon to hide or unhide the columns of your interest.

The boundary was not crossed, as value of the test statistic (1.181) is within the
boundaries (−3.777, 3.777), so the trial continues. After data were available for an
additional 70 subjects, the second analysis was performed. Suppose that among the
140 subjects, 71 were randomized to c and 69 were randomized to t.
Click on the second row in the table in the upper section. Then click
icon. Enter 140 in the box next to Cumulative Sample Size.
Then in the box next to Estimate of δ enter: 1.067841 and in the box next to Std.
Error of δ enter: 0.414083. Next, click on Recalc then OK.
The test statistic 2.579 exceeds the upper boundary (2.56), so the following screen
appears.

Click Stop to halt any further analyses. The Final Inference Table shows that the
adjusted point estimate of ln(ψ) is 1.068 (p = 0.01) and the adjusted 95% confidence

23.3 Binomial Odds Ratio – 23.3.3 Interim Monitoring

453

<<< Contents

23

* Index >>>

Binomial Superiority Two-Sample
interval for ln(ψ) is (0.256, 1.879).

23.4

Common Odds
Ratio of Stratified
Tables

23.4.1 Trial Design
23.4.2 Interim Monitoring

Some experiments are performed with several disjoint groups (strata) within each
treatment group. For example, multicenter clinical trials are conducted using several
investigator sites. Other situations include descriptive subsets, such as baseline and
demographic characteristics. Let πtg and πcg denote the two binomial probabilities in
Group g, g = 1, . . . , G, for the treatment and control, respectively. It is assumed that
the odds ratio
πtg /(1πtg )
πtg (1πcg )
ψ=
=
(23.20)
πcg /(1πcg )
πcg (1πtg )
is the same for each group (stratum). The Cochran-Mantel-Haensel test is used for
testing H0 : ψ = 1 against the two-sided alternative H1 : ψ 6= 1 or against a one-sided
alternative H1 : ψ > 1 or H1 : ψ < 1.
Let π̂tjg and π̂cjg denote the estimates of πt and πc based on ntjg and ncjg
observations in Group g from the treatment (t) and the control (c), respectively, up to
and including the j th look, j = 1, . . . K, where a maximum of K looks are to be
taken.
Then the estimate of δ = ln(ψ) from the g-th group at the j-th look is
δ̂jg = ln(

π̂tjg
π̂cjg
) ln(
).
1π̂tjg
1π̂cjg

Then the estimate of δ = ln(ψ) at the j-th look is the average of δ̂jg , g = 1, . . . , G;
454

23.4 Common Odds Ratio

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
namely,
G
P

δ̂j =

δ̂jg

g=1

G

or, equivalently,
G
P

δˆj =

g=1

π̂

π̂

) ln( 1π̂cjg
))
(ln( 1π̂tjg
tjg
cjg
G

.

(23.21)

The estimate of the standard error of δ̂jg at the j th look is
se
ˆ jg = {

1
1
+
}1/2 .
ntjg π̂tjg (1 − π̂tjg ) ncjg π̂cjg (1π̂cjg )

The estimated variance of δ̂ at the j-th look is the average of the variances of
δ̂jg , g = 1, . . . , G. Thus,
G
P
se
ˆ 2jg
g=1
}1/2 .
se
ˆj ={
G
The test statistic used at the j-th look is
Zj =

23.4.1

δ̂j
.
se
ˆj

(23.22)

(23.23)

Trial Design

First consider a simple example with two strata, such as males and females, with an
equal number of subjects in each stratum and the same response rate of 60% for the
control in each stratum. We hope that the experimental treatment can triple the odds
ratio. Although we hope to increase the odds ratio, we solve this problem using a
two-sided testing formulation. The null hypothesis H0 : ψ = 1 is tested against the
two-sided alternative H1 : ψ 6= 1. The power of the test is computed at specified values
of πcg , g = 1, . . . , G, and ψ.
To begin, click Design tab, then click Two Samples in the Discrete group, and then

23.4 Common Odds Ratio – 23.4.1 Trial Design

455

<<< Contents

23

* Index >>>

Binomial Superiority Two-Sample
click Common Odds Ratio for Stratified 2 x 2 Tables.

Suppose that we want to determine the sample size required to have power of 80%
when πc1 = πc2 = 0.6 and ψ = 3 using a two-sided test with a type-1 error rate of
0.05.
Single-Look Design - Equal Response Rates
First consider a study with only one
look and equal sample sizes in the two groups. Enter the appropriate test parameters so
that the dialog box appears as shown. Then click Compute.

The design is shown as a row in the Output Preview, located in the lower pane of this
window. This single-look design requires a combined total of 142 subjects from both
treatments in order to attain 80% power.

456

23.4 Common Odds Ratio – 23.4.1 Trial Design

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
You can select this design by clicking anywhere on the row in the Output Preview. If
you click

icon, some of the design details will be displayed in the upper pane.

In the Output Preview toolbar, click
Wbk1 in the Library.

icon to save this design to workbook

Single-Look Design - Unequal Response Rates
Now, we consider a more realistic
clinical trial. Suppose that males and females respond differently, so that the response
rate for males is πc1 = 0.6 and the response rate for females is πc2 = 0.3. First, we
consider a study without any interim analyses.
Create a new design by selecting Des1 in the Library, and clicking the
Change πc2 in the Stratum Specific Input table to 0.3 as shown below.

icon.

Click Compute to create design Des2. The results of Des2 are shown in the Output
icon. In the
Preview window. With Des2 selected in the Output Preview, click
Library, select the rows for both Des1 and Des2, by holding the Ctrl key, and then
23.4 Common Odds Ratio – 23.4.1 Trial Design

457

<<< Contents

23

* Index >>>

Binomial Superiority Two-Sample
click the
side-by-side:

icon. The upper pane will display the details of the two designs

This single-look design requires a combined total of 127 subjects from both treatments
in order to attain 80% power.
Three-Look Design - Unequal Response Rates
For the above study, suppose we
wish to take up to two equally spaced interim looks and one final look at the accruing
data, using the Lan- DeMets (O’Brien-Fleming) stopping boundary. Create a new
design by selecting Des2 in the Library, and clicking the
icon. In the input,
change the Number of Looks from 1 to 3, to generate a study with two interim looks
and a final analysis. A new tab for Boundary will appear. Click this tab to reveal the
stopping boundary parameters. By default, the Lan-DeMets (O’Brien-Fleming)
stopping boundary and equal spacing of looks are selected.

458

23.4 Common Odds Ratio – 23.4.1 Trial Design

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
Click the Compute button to generate output for Des3. The results of Des3 are shown
in the Output Preview window. With Des3 selected in the Output Preview, click
icon. In the Library, select the nodes for Des1, Des2, and Des3 by holding the Ctrl
Key, and then click the
designs side-by-side:

icon. The upper pane will display the details of the three

Using three planned looks requires an up-front commitment of 129 subjects, a slight
increase over the single-look design, which required 127 subjects. However, the three
look design may result in a smaller sample size than that required for the single look
design, with an expected sample size of 111 subjects under the alternative hypothesis
(πc1 = 0.6, πc2 = 0.3, ψ = 3), and still ensures that the power is 80%.
icon, East displays the
By selecting only Des3 in the Library and clicking
stopping boundary, the type-1 error spent and the boundary crossing probabilities
under H0 : πc1 = 0.6, πc2 = 0.3, ψ = 1 and the alternative hypothesis

23.4 Common Odds Ratio – 23.4.1 Trial Design

459

<<< Contents

23

* Index >>>

Binomial Superiority Two-Sample
H1 : πc1 = 0.6, πc2 = 0.3, ψ = 3.

Close this window before continuing.
Three-Look Design - Unequal Response Rates - Unequal Strata Sizes
Some
disorders have different prevalence rates across various strata. Consider the above
example, but with the expectation that 30% of the subjects will be males and 70% of
the subjects will be females. Create a new design by selecting Des3 in the Library,
and clicking the
icon. Under the Test Parameters tab in the Stratum Specific
Input box select the radio button Unequal. You can now edit the Stratum Fraction
column for Stratum 1. Change this value from 0.5 to 0.3 as shown below.

Click the Compute button to generate output for Des4. The results of Des4 are shown
in the Output Preview window. With Des4 selected in the Output Preview, click the
icon. In the Library, select the rows for Des1, Des2, Des3, and Des4 by holding
the Ctrl key, and then click

460

icon. The upper pane will display the details of the

23.4 Common Odds Ratio – 23.4.1 Trial Design

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
four designs side-by-side:

Note that, for this example, unequal sample sizes for the two strata result in a smaller
total sample size than that required for equal sample sizes for the two strata.

23.4.2

Interim Monitoring

Consider interim monitoring of Des4. Select Des4 in the Library, and click the
icon from the Library toolbar. The interim monitoring dashboard contains various
controls for monitoring the trial, and is divided into two sections. The top section
contains several columns for displaying output values based on the interim inputs. The
bottom section contains four charts, each with a corresponding table to its right.
Suppose that the results are to be analyzed after results are available for every 40
icon in the upper left to invoke the Test
subjects. Click on the
Statistics Calculator. Enter 40 in the box next to Cumulative Sample Size. Suppose
that δ̂1 = 0.58 and se
ˆ 1 = 0.23. Enter these values and click on Recalc. You should

23.4 Common Odds Ratio – 23.4.2 Interim Monitoring

461

<<< Contents

23

* Index >>>

Binomial Superiority Two-Sample
now see the following:

Click OK and the following table will appear in the top section of the IM Dashboard.

The boundary was not crossed, as value of the test statistic (2.522) is within the
boundaries (-3.777, 3.777), so the trial continues. Click on the second row in the table
in the upper section. Then click the
icon. Enter 80 in the box
next to Cumulative Sample Size. Suppose that δ̂2 = 0.60 and se
ˆ 2 = 0.21. Enter these

462

23.4 Common Odds Ratio – 23.4.2 Interim Monitoring

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
values and click Recalc. You should now see the following:

Click the OK button. The test statistic 2.857 exceeds the upper boundary (2.56), so the
following dialog box appears.

Click Stop to stop any further analyses. The Final Inference Table shows the adjusted
point estimate of ln(ψ) is 0.600 (p = 0.004) and the adjusted 95% confidence interval
23.4 Common Odds Ratio – 23.4.2 Interim Monitoring

463

<<< Contents

23

* Index >>>

Binomial Superiority Two-Sample
for ln(ψ) is (0.188, 1.011).

23.5

Fisher’s Exact Test
(Single Look)

23.5.1 Trial Design

In some experimental situations, the normal approximation to the binomial
distribution may not be appropriate, such as the probabilities of interest are large or
small. This may lead to incorrect p-values, and thus the incorrect conclusion. For this
reason, Fisher’s exact test may be used. Let πt and πc denote the two response
probabilities for the treatment and the control, respectively. Interest lies in testing
H0 : πt = πc against the two-sided alternative H1 : πt 6= πc . Results are presented here
only for the situation where there is a single analysis; that is, no interim analysis, for
the two-sided test with equal sample sizes for the two treatments.
Let π̂t and π̂c denote the estimates of πt and πc , respectively, based on
nt = nc = 0.5N observations from the treatment (t) and the control (c). The
parameter of interest is δ = πt − πc , which is estimated by δ̂ = π̂t − π̂c . The estimate
of the standard error used in the proposed test statistic uses of the pooled estimate of
the common value of πt and πc under H0 , given by
se
ˆ =

2{π̂(1 − π̂)}1/2
,
N

(23.24)

where π̂ = 0.5(π̂t + π̂c ).
Incorporating a continuity correction factor, the test statistic is
Z=

23.5.1

|δ̂|2/N
.
se
ˆ

(23.25)

Trial Design

Consider the example where the probability of a response for the control is 5% and it is
464

23.5 Fisher’s Exact Test – 23.5.1 Trial Design

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
hoped that the experimental treatment can increase this rate to 25%. First, in the
Discrete area, click Two Samples on the Design tab, and then click Fisher Exact Test.

Suppose we want to determine the sample size required to have power of 90% when
πc = 0.05 and πt = 0.25 using a two-sided test with a type-1 error rate of 0.05. Enter
the appropriate test parameters so that the dialog box appears as shown. Then click
Compute.

The design is shown as a row in the Output Preview, located in the lower pane of this
window. This single-look design requires a combined total of 136 subjects from both
treatments in order to attain 90% power.

23.5 Fisher’s Exact Test – 23.5.1 Trial Design

465

<<< Contents

23

* Index >>>

Binomial Superiority Two-Sample
You can select this design by clicking anywhere along the row in the Output Preview.
If you click

icon, some of the design details will be displayed in the upper pane.

In the Output Preview toolbar, click the
in the Library.

icon to save this design to Workbook1

Suppose that this sample size is larger than economically feasible and it is desired to
evaluate the power when a total of 100 subjects are enrolled. Create a new design by
selecting Des1 in the Library, and clicking the
icon. In the input, select the
radio button in the box next to Power. The box next to Power will now say
Computed, since we wish to compute power. In the box next to Sample Size (n)
enter 100.

Click Compute to create design Des2. The results of Des2 are shown in the Output
Preview window. With Des2 selected in the Output Preview, click the
icon. In
the Library, select the rows for both Des1 and Des2, by holding the Ctrl key, and then
click the

466

icon. The upper pane will display the details of the two designs

23.5 Fisher’s Exact Test – 23.5.1 Trial Design

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
side-by-side:

Des2 yields a power of approximately 75% as shown. Noting that 100 subjects is
economically feasible and yields reasonable power, the question arises as to the sample
size required to have 80%, which might still be economically feasible. This can be
accomplished by selecting Des1 in the Library, and clicking the
icon. In the
input, change the Power from 0.9 to 0.8. Click Compute to generate the output for
Des3. The results of Des3 are shown in the Output Preview window. With Des3
selected in the Output Preview, click the

icon. In the Library, select the rows

for both Des1, Des2, and Des3 by holding the Ctrl key, and then click the
The upper pane will display the details of the three designs side-by-side:

23.5 Fisher’s Exact Test – 23.5.1 Trial Design

icon.

467

<<< Contents

23

* Index >>>

Binomial Superiority Two-Sample
Entering 0.8 for the power yields a required sample size of 110 subjects.

23.6

Assurance
(Probability of
Success)

Assurance, or probability of success, is a Bayesian version of power, which
corresponds to the (unconditional) probability that the trial will yield a statistically
significant result. Specifically, it is the prior expectation of the power, averaged over a
prior distribution for the unknown treatment effect (see O’Hagan et al., 2005). For a
given design, East allows you to specify a prior distribution, for which the assurance or
probability of success will be computed. First, enter the following values in the Input
window: A 3-look design for testing the difference in proportions of two distinct
populations with Lan-DeMets(OF) efficacy only boundary, Superiority Trial, 1-sided
test, 0.025 type-1 error, 80% power, πc = 0.15, and πt = 0.1.

Select the Assurance checkbox in the Input window. The following options will
appear as below.

To address our uncertainty about the treatment proportion, we specify a prior
distribution for πt . In the Distribution list, click Beta, and in the Input Method list,
click Beta Parameters (a and b). Enter the values of a = 11 and b = 91. Recall that
a−1
the mode of the Beta distribution is a+b−2
. Thus, these parameter values generate a
Beta distribution that is peaked at 0.1, which matches the assumed treatment
468

23.6 Assurance (Probability of Success)

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
proportion. Click Compute.

The computed probability of success (0.597) is shown above. Note that for this prior,
assurance is very less than the specified power (0.8); incorporating the uncertainty
about πt has yielded a much less optimistic estimate of power. Save this design in the
Library and rename it as Bayes1.

23.6 Assurance (Probability of Success)

469

<<< Contents

23

* Index >>>

Binomial Superiority Two-Sample

East also allows you to specify an arbitrary prior distribution through a CSV file. In the
Distribution list, click User Specified, and then click Browse... to select the CSV file
where you have constructed a prior.

If you are specifying a prior for one parameter only (either πc or πt , but not both), then
the CSV file should contain two columns, where the first column lists the grid points
470

23.6 Assurance (Probability of Success)

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
for the parameter of interest, and the second column lists the prior probability assigned
to each grid point. If you are specifying priors for both πc and πt , the CSV file should
contain four columns (from left to right): values of πc , probabilities for πc , values of
πt , and probabilities for πt . The number of points for πc and number of points for πt
may differ. For example, we consider a 5-point prior for πt only, with probability = 0.2
at each point.

Once the CSV filename and path has been specified, click Compute to calculate the
assurance, which will be displayed in the box below:

As stated in O’Hagan et al. (2005, p.190): “Assurance is a key input to
decision-making during drug development and provides a reality check on other
methods of trial design.” Indeed, it is not uncommon for assurance to be much lower
than the specified power. The interested reader is encouraged to refer to O’Hagan et al.
for further applications and discussions on this important concept.

23.7

Predictive Power
and Bayesian
Predictive Power

Similar Bayesian ideas can be applied to conditional power for interim monitoring.
Rather than calculating conditional power for a single assumed value of the treatment
effect, δ, such as at δ̂, we may account for the uncertainty about δ by taking a weighted
average of conditional powers, weighted by the posterior distribution for δ. East
calculates an average power, called the predictive power (Lan, Hu, & Proschan,
2009), assuming a diffuse prior for the drift parameter, η. In addition, if the user
specified a beta prior distribution at the design stage to calculate assurance, then East
will also calculate the average power, called Bayesian predictive power, for the
corresponding posterior. We will demonstrate these calculations for the design
renamed as Bayes1 earlier.
23.7 Predictive Power and Bayesian Predictive Power

471

<<< Contents

23

* Index >>>

Binomial Superiority Two-Sample
In the Library, right-click Bayes1 and click Interim Monitoring, then click
the toolbar of the IM Dashboard.

in

In the Show/Hide Columns window, make sure to show the columns for: CP
(Conditional Power), Predictive Power, Bayes Predictive Power, Posterior Distribution
of πt a, and Posterior Distribution of πt b, and click OK. The following columns will
be added to the main grid of the IM Dashboard.

In the toolbar of the IM Dashboard, open the Test Statistic Calculator by clicking
. In order to appropriately update the posterior distribution, you
will need to use the Test Statistic Calculator to enter the sample size and number of
responses for each arm. Enter 34 events out of 230 patients in the control arm, and 23

472

23.7 Predictive Power and Bayesian Predictive Power

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
out of 231 patients in the treatment arm, then click OK.

The main grid of the IM Dashboard will be updated as follows. In particular, notice the
differing values for CP and the Bayesian measures of power.

23.7 Predictive Power and Bayesian Predictive Power

473

<<< Contents

* Index >>>

24

Binomial Non-Inferiority Two-Sample

In a binomial non-inferiority trial the goal is to establish that the response rate of an
experimental treatment is no worse than that of an active control, rather than
attempting to establish that it is superior. A therapy that is demonstrated to be
non-inferior to the current standard therapy for a particular indication might be an
acceptable alternative if, for instance, it is easier to administer, cheaper, or less toxic.
Non-inferiority trials are designed by specifying a non-inferiority margin. The amount
by which the response rate on the experimental arm is worse than the response rate on
the control arm must fall within this margin in order for the claim of non-inferiority to
be sustained. In this chapter, we shall design and monitor non-inferiority trials in
which the non-inferiority margin is expressed as either a difference, a ratio, or an odds
ratio of two binomial proportions. The difference is examined in Section 24.1. This is
followed by two formulations for the ratio: the Wald formulation in Section 24.2 and
the Farrington-Manning formulation in Section 24.3. The odds ratio formulation is
presented in Section 24.4.

24.1

Difference of
Proportions

24.1.1 Trial Design
24.1.2 Trial Simulation
24.1.3 Interim Monitoring

Let πc and πt denote the response rates for the control and experimental treatments,
respectively. Let δ = πt − πc . The null hypothesis is specified as
H0 : δ = δ0
and is tested against one-sided alternative hypotheses. If the occurrence of a response
denotes patient benefit rather than harm, then δ0 > 0 and the alternative hypothesis is
H1 : δ < δ0
or equivalently as
H1 : πt > πc − δ0 .
Conversely, if the occurrence of a response denotes patient harm rather than benefit,
then δ0 < 0 and the alternative hypothesis is
H1 : δ > δ 0
or equivalently as
H1 : πt < πc − δ0 .
For any given πc , the sample size is determined by the desired power at a specified
value of δ = δ1 . A common choice is δ1 = 0 (or equivalently πt = πc ) but East
permits you to power the study at any value of δ1 which is consistent with the choice of
H1 .

474

24.1 Difference of Proportions

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
Let π̂tj and π̂cj denote the estimates of πt and πc based on ntj and ncj observations
from the experimental and control treatments, respectively, up to and including j-th
look, j = 1, . . . K, where a maximum of K looks are to be made. The test statistic at
the j-th look is
δ̂j − δ0
(24.1)
Zj =
se(δ̂j )
where
δ̂j = π̂cj − π̂tj
and

s
se(δ̂j ) =

24.1.1

π̂cj (1 − π̂cj ) π̂tj (1 − π̂tj )
+
.
ncj
ntj

(24.2)

(24.3)

Trial Design

The 24-week disease-free rate with a standard therapy for HIV is 80%. Suppose that
the claim of non-inferiority for an experimental therapy can be sustained if its response
rate is greater than 75%; i.e., the non-inferiority margin is δ0 = 0.05. For studies of
this type, we specify inferiority as the null hypothesis, non-inferiority as the alternative
hypothesis, and attempt to reject the null hypothesis using a one-sided test. We will
specify to East that, under the null hypothesis H0 , πc = 0.8 and πt = 0.75. We will
test this hypothesis with a one-sided level 0.05 test. Suppose we require 90% power at
the alternative hypothesis, H1 , that both response rates are equal to the null response
rate of the control arm, i.e. δ1 = 0. Thus, under H1 , we have πc = πt = 0.8.
To begin click Two Samples on the Design tab in the Discrete group, and then click
Difference of Proportions. inxxnon-inferiority,binomial

Single-Look Design Powered at δ = 0 To begin with, suppose we will design a
single-look study for rejection of H0 only, with 90% power at a 0.025 significance
level. Enter the relevant parameters into the dialog box as shown below. In the drop
24.1 Difference of Proportions – 24.1.1 Trial Design

475

<<< Contents

24

* Index >>>

Binomial Non-Inferiority Two-Sample
down box next to Trial be sure to select Noninferiority.

Now click Compute. The design is shown as a row in the Output Preview located in
the lower pane of this window. The single-look design requires a combined total of
2690 patients on both arms in order to attain 90% power. We can, however, reduce the
expected sample size without any loss of power if we use a group sequential design.
This is considered next.

Before continuing we will save Design1 to the Library. You can select this design by
clicking anywhere along the row in the Output Preview. Some of the design details
will be displayed in the upper pane, labeled Compare Designs. In the Output
Preview toolbar, click the
icon to save this design to Workbook1 in the
Library. If you hover the cursor over Design1 in the Library, a tooltip will appear that

476

24.1 Difference of Proportions – 24.1.1 Trial Design

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
summarizes the input parameters of the design.

Three-Look Design Powered at δ = 0 For the above study, suppose we wish to take
up to two interim looks and one final look at the accruing data. Create a new design by
icon on the Library toolbar.
selecting Design1 in the Library, and clicking the
Change the Number of Looks from 1 to 3, to generate a study with two interim looks
and a final analysis. A new tab Boundary Info will appear. Clicking on this tab will
reveal the stopping boundary parameters. By default, the Spacing of Looks is set to
Equal, which means that the interim analyses will be equally spaced in terms of the
number of patients accrued between looks. The left side contains details for the
Efficacy boundary, and the right side contains details for the Futility boundary. By
default, there is an efficacy boundary (to reject H0) selected, but no futility boundary
(to reject H1). The Boundary Family specified is of the Spending Functions
type. The default Spending function is the Lan-DeMets (Lan & DeMets, 1983), with
Parameter OF (O’Brien-Fleming), which generates boundaries that are very similar,
though not identical, to the classical stopping boundaries of O’Brien and Fleming
(1979).
Now suppose, in our example, that the three looks are unequally spaced, with the first
look being taken after 50% of the committed accrual, and the second look being taken
when after 75% of the committed accrual. Under Spacing of Looks in the Boundary
Info tab, click the Unequal radio button. The column titled Info. Fraction in the
Look Details table can be edited to modify the relative spacing of the analyses. The
information fraction refers to the proportion of the maximum (yet unknown) sample
size. By default, this table displays equal spacing. Enter the new information fraction
values as shown below and click Recalc to see the updated values of the stopping

24.1 Difference of Proportions – 24.1.1 Trial Design

477

<<< Contents

24

* Index >>>

Binomial Non-Inferiority Two-Sample
boundaries populated in the Look Details table.

On the Boundary Info tab, you may also click the

478

24.1 Difference of Proportions – 24.1.1 Trial Design

or

icons to view plots

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
of the error spending functions, or stopping boundaries, respectively.

Click the Compute button to generate output for Design2. With Design2 selected in
the Output Preview, click the
icon to save Design2 to the Library. In
theLibrary, select the rows for Design1 and Design2, by holding the Ctrl key, and then
click the

icon. The upper pane will display the details of the two designs

24.1 Difference of Proportions – 24.1.1 Trial Design

479

<<< Contents

24

* Index >>>

Binomial Non-Inferiority Two-Sample
side-by-side:

Let us examine the design output from Design2. The maximum number of subjects
that we must commit to this study in order to achieve 90% power is 2740. That is 50
patients more than are needed for Design1. However, since Design1 is a single-look
design, there is no prospect of saving resources if indeed H1 is true and the two
treatments have the same response rates. In contrast, Design2 permits the trial to stop
early if the test statistic crosses the stopping boundary. For this reason, the expected
sample size under H1 is 2094, a saving of 596 patients relative to Design1. If H0 is
true, the expected sample size is 2732 and there is no saving of patient resources. In
order to see the stopping probabilities, as well as other characteristics, select Design2
in the Library, and click the

480

icon. The cumulative boundary stopping

24.1 Difference of Proportions – 24.1.1 Trial Design

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
probabilities are shown in the Stopping Boundaries table.

To display a chart of average sample number (ASN) versus the effect size, πt − πc ,
select Design2 in the Library and click on the
icon and select Average Sample
Number (ASN). To display a chart of power versus treatment size, select Design2 in
the Library and click on the
icon and select Power vs. Treatment Effect (δ).
In Design2, we utilized Lan-DeMets (Lan & DeMets, 1983) spending function, with
Parameter OF (O’Brien-Fleming to generate the stopping boundary for early stopping
under H1 . One drawback of Design2 is the large expected sample size if H0 is true.
We can guard against this eventuality by introducing a futility boundary which will
allow us to stop early if H0 is true. A popular approach to stopping early for futility is
to compute the conditional power at each interim monitoring time point and stop the
study if this quantity is too low. This approach is somewhat arbitrary since there is no
guidance as to what constitutes low conditional power. In East, we compute futility
boundaries that protect β, the type-2 error, so that the power of the study will not
deteriorate. This is achieved by using a β-spending function to generate the futility
boundary. Thereby the type-2 error will not exceed β and the power of the study will
be preserved. This approach was published by Pampallona and Tsiatis (1994).
Suppose we now wish to include a futility boundary. To design this trial select Design2
icon. In the Boundary Info tab, in the Futility
in the Library and click the
box, set Boundary Family to Spending Function. Change the Spending
Function to Gamma Family and change the Parameter (Γ) to −8. This family is
parameterized by the single parameter γ which can take all possible non-zero values.

24.1 Difference of Proportions – 24.1.1 Trial Design

481

<<< Contents

24

* Index >>>

Binomial Non-Inferiority Two-Sample
Its functional form is
β(t) =

β(1 − e−γt )
.
(1 − e−γ )

(24.4)

Next click Refresh Boundary. Your screen should now look like the following:

On the Boundary Info tab, you may also click the
or
icons to view plots
of the error spending functions, or stopping boundaries, respectively.

Notice how conservative the β-spending function is compared to the α-spending
482

24.1 Difference of Proportions – 24.1.1 Trial Design

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
function. Its rate of error spending is almost negligible until about 60% of the
information has accrued.

One can view the stopping boundaries on various alternative scales by selecting the
appropriate scale from the drop-down list of boundary scales to the right of the chart. It
is instructive to view the stopping boundaries on the p-value scale.

24.1 Difference of Proportions – 24.1.1 Trial Design

483

<<< Contents

24

* Index >>>

Binomial Non-Inferiority Two-Sample
By moving the vertical scroll bar from left to right in the above chart, one can observe
the p-values required for early stopping at each look. The p-values needed to stop the
study and declare non-inferiority at the first, second and third looks are, respectively,
0.0015, 0.0092 and 0.022. The p-values needed to stop the study for futility at the first
and second looks are, respectively, 0.7244 and 0.2708.
Other useful scales for displaying the futility boundary are the conditional power
scales. They are the cp delta1 Scaleand the cp deltahat scale. Here
‘cp’ refers to conditional power. The suffix ‘delta1’ implies that we will represent the
futility boundary in terms of conditional power evaluated at the value of δ = δ1
specified at the design stage under the alternative hypothesis. The suffix ‘deltahat’
implies that we will represent the futility boundary in terms of conditional power
evaluated at the value of δ̂ at which the test statistic Z = δ̂/se(δ̂) would just hit the
futility boundary. The screenshot below represents the first two values of the futility
boundary on the cp delta1 Scale.

For example, the stopping boundary at the first look is cp delta1=0.1137. This is to
be interpreted in the following way: if at the first look the value of the test statistic Z
just falls on the futility boundary, then the conditional power, as defined by Section C.3
of Appendix C with δ = δ1 = 0, will be 0.1137. This gives us a way to express the
futility boundary in terms of conditional power.
The cp delta1 Scale might not give one an accurate picture of futility. This is
because, on this scale, the conditional power is evaluated at the value of δ = δ1
484

24.1 Difference of Proportions – 24.1.1 Trial Design

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
specified at the design stage. However, if the test statistic has actually fallen on the
futility boundary, the data are more suggestive of the null than the alternative
hypothesis and it is not very likely that δ = δ1 . Thus it might be more reasonable to
evaluate conditional power at the observed value δ = δ̂. The screenshot below
represents the futility boundary on the cp deltahat Scale.

For example, the stopping boundary at the second look is cp deltahat=0.0044.
This is to be interpreted in the following way: if at the second look, the value of test
statistic Z just falls on the futility boundary, then the conditional power, as defined by
Section C.3 of Appendix C with δ = δ̂ = Z × se(δ̂), will be 0.0044. It is important to
realize that the futility boundary has not changed. It is merely being expressed on a
different scale. On the whole, it is probably more realistic to express the futility
boundary on the cp deltahat scale than on the cp delta1 scale since it is
highly unlikely that the true value of δ is equal to δ1 if Z has hit the futility boundary.
Close this chart before continuing. Click the Compute button to generate output for
Design3. With Design3 selected in the Output Preview, click the
icon. In the
Library, select the rows for Design1, Design2, and Design3, by holding the Ctrl key,
and then click the

icon. The upper pane will display the details of the three

24.1 Difference of Proportions – 24.1.1 Trial Design

485

<<< Contents

24

* Index >>>

Binomial Non-Inferiority Two-Sample
designs side-by-side:

Observe that Design3 will stop with a smaller expected sample size under either H0 or
H1 compared to Design2.
Three-Look Design Powered at δ 6= 0 The previous designs were all powered to
detect the alternative hypothesis that the new treatment and the active control have the
same response rate (δ1 = 0). As is usually the case with non-inferiority trials, the
distance between the non-inferiority margin δ0 = 0.05 and the alternative hypothesis
δ1 = 0 is rather small, thereby resulting in a very large sample size commitment to this
trial. Sometimes a new treatment is actually believed to have a superior response rate
to the active control. However the anticipated treatment benefit might be too small to
make it feasible to run a superiority trial. Suppose, for example, that it is anticipated
that the treatment arm could improve upon the 80% response rate of the active control
by about 2.5%. A single-look superiority trial designed for 90% power to detect this
small of a difference would require over 12000 subjects. In this situation, the sponsor
might prefer to settle for a non-inferiority claim. A non-inferiority trial in which the
active control has a response probability of πc = 0.8, the non-inferiority margin is
δ0 = −0.05, and the alternative hypothesis is δ1 = πc − πt = −0.025 can be designed
as follows.
Create a new design by selecting Design3 in the Library, and clicking the
icon
on the Library toolbar. In the ensuing dialog box, choose the design parameters as

486

24.1 Difference of Proportions – 24.1.1 Trial Design

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
shown below.

Click the Compute button to generate output for Design4. Notice that this design
requires only 1161 subjects. This is 1585 fewer subjects than under Design3.

24.1.2

Trial Simulation

You can simulate Design 3 by selecting Design3 in the Library, and clicking the
icon from Library toolbar. Alternatively, right-click on Design3 and select Simulate.

24.1 Difference of Proportions – 24.1.2 Trial Simulation

487

<<< Contents

24

* Index >>>

Binomial Non-Inferiority Two-Sample
A new Simulation worksheet will appear.

Try different choices for the simulation parameters to verify the operating
characteristics of the study. For instance under the Response Generation Info tab, set
Prop. Under Control to 0.8 and Prop. Under Treatment to 0.75. You will be
simulating under the null hypothesis and should achieve a rejection rate of 2.5%. Now,
click on the Simulate button.
Once the simulation run has completed, East will add an additional row to the Output
Preview labeled Simulation1. Select Simulation1 in the Output Preview. Note that
some of the design details will be displayed in the upper pane, labeled Compare
Designs. Click the

488

icon to save it to the Library. Double-click on Simulation1

24.1 Difference of Proportions – 24.1.2 Trial Simulation

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
in the Library. The simulation output details will be displayed.

We see above that we achieved a rejection rate of 2.5%.
Now suppose that the new treatment is actually slightly superior to the control
treatment. For example, πc = 0.8 and πt = 0.81. Since this study is designed for 90%
power when πc = πt = 0.8, we would expect the simulations to reveal power in excess
of 90%.
Select Sim1 node in the Library, and click the
icon from Library toolbar.
Under the Response Generation Info tab change the Prop. Under Treatment to
0.81. Click Simulate to start the simulation. Once the simulation run has completed,
East will add an additional row to the Output Preview labeled Simulation2. Select
icon to save it to the Library.
Simulation2 in the Output Preview. Click the
Double-click on Simulation2 in the Library. The simulation output details will be

24.1 Difference of Proportions – 24.1.2 Trial Simulation

489

<<< Contents

24

* Index >>>

Binomial Non-Inferiority Two-Sample
displayed.

These results show that the power exceeds 97%.
The power of the study will deteriorate if the response rate of the control arm is less
than 0.8, even if πc = πt . To see this, let us simulate with πc = πt = 0.7. The results

490

24.1 Difference of Proportions – 24.1.2 Trial Simulation

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
are shown below.

Notice that the power has dropped from 90% to 80% even though the new treatment
and the control treatment have the same response rates. This is because the lower
response rates for πc and πt induce greater variability into the distribution of the test
statistic. In order to preserve power, the sample size must be increased. This can be
achieved without compromising the type-1 error within the group sequential
framework by designing the study for a maximum amount of (Fisher) information
instead of a maximum sample size. We discuss maximum information studies later, in
Chaper 59.

24.1.3

Interim Monitoring

Consider interim monitoring of Design3. Select Design3 in the Library, and click the
icon from the Library toolbar. Alternatively, right-click on Design3 and select
Interim Monitoring. The interim monitoring dashboard contains various controls for
monitoring the trial, and is divided into two sections. The top section contains several
columns for displaying output values based on the interim inputs. The bottom section
contains four charts, each with a corresponding table to its right. These charts provide
graphical and numerical descriptions of the progress of the clinical trial and are useful
tools for decision making by a data monitoring committee.
Suppose that the trial is first monitored after accruing 500 subjects on each treatment
arm, with 395 responses on the treatment arm and 400 responses on the control arm.
24.1 Difference of Proportions – 24.1.3 Interim Monitoring

491

<<< Contents

24

* Index >>>

Binomial Non-Inferiority Two-Sample
Click on the
icon to invoke the Test Statistic Calculator. In the
box next to Cumulative Sample Size enter 1000. Enter −0.01 in the box next to
Estimate of δ. In the box next to Std. Errof of δ enter 0.02553. Next click Recalc.

Note that the test statistic is computed to be 1.567.
Upon clicking the OK button, East will produce the interim monitoring report shown
below.

The stopping boundary for declaring non-inferiority is 3.535 whereas the value of the
test statistic is only 1.567. Thus the trial should continue.

492

24.1 Difference of Proportions – 24.1.3 Interim Monitoring

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
Suppose that the next interim look occurs after accruing 1250 patients on each arm
with 1000 responses on the control arm and 990 responses on the treatment arm. Click
on the second row in the table in the upper section. Then click the
icon. The estimate of δ is -0.008 and the standard error is 0.016118. Enter the
appropriate values as shown below and click Recalc.

Note that the value of the test statistic is now 2.606. Now click the OK button. This
time the stopping boundary for declaring non-inferiority is crossed. The following

24.1 Difference of Proportions – 24.1.3 Interim Monitoring

493

<<< Contents

24

* Index >>>

Binomial Non-Inferiority Two-Sample
message box appears.

Click the Stop button to stop the study. The analysis results are shown below.

The lower bound on the 87.5% repeated confidence interval is -0.042, comfortably
within the non-inferiority margin of -0.05 specified at the design stage.
East also provides a p-value, confidence interval and median unbiased point estimate
for πt − πc using stage-wise ordering of the sample space as described in Jennison and
Turnbull (2000, page 179). This is located in the Adjusted Inference Table, located in
the lower section of the IM Worksheet. In the present example, the lower confidence
494

24.1 Difference of Proportions – 24.1.3 Interim Monitoring

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
bound is -0.040, slightly greater than the corresponding bound from the repeated
confidence interval.

24.2

Ratio of Proportions:
Wald Formulation

24.2.1 Trial Design
24.2.2 Trial Simulation
24.2.3 Interim Monitoring

Let πc and πt denote the response rates for the control and the experimental
treatments, respectively. Let the difference between the two arms be captured by the
ratio
πt
ρ=
.
πc
The null hypothesis is specified as
H0 : ρ = ρ0
and is tested against one-sided alternative hypotheses. If the occurrence of a response
denotes patient benefit rather than harm, then ρ0 < 1 and the alternative hypothesis is
H1 : ρ > ρ0
or equivalently as
H1 : πt > ρ0 πc .
Conversely, if the occurrence of a response denotes patient harm rather than benefit,
then ρ0 > 1 and the alternative hypothesis is
H1 : ρ < ρ0
or equivalently as
H1 : πt < ρ0 πc .
For any given πc , the sample size is determined by the desired power at a specified
value of ρ = ρ1 . A common choice is ρ1 = 1 (or equivalently πt = πc ), but East
permits you to power the study at any value of ρ1 which is consistent with the choice
of H1 .
Let π̂tj and π̂cj denote the estimates of πt and πc based on ntj and ncj observations
from the experimental and control treatments, respectively, up to and including j-th
look, j = 1, . . . , K, where a maximum of K looks are to be made. It is convenient to
express the treatment effect on the logarithm scale as
δ = ln ρ = ln πt − ln πc .

(24.5)

The test statistic at the jth look is then defined as
Zj =

δ̂j − δ0
se(δ̂j )

24.2 Ratio of Proportions: Wald Formulation

(24.6)
495

<<< Contents

24

* Index >>>

Binomial Non-Inferiority Two-Sample
where


δ̂j = ln

π̂tj
π̂cj


,

δ0 = ln(ρ0 )
and

s
se(δ̂j ) =

24.2.1

1 − π̂cj
1 − π̂tj
+
.
ncj π̂cj
ntj π̂tj

(24.7)
(24.8)

(24.9)

Trial Design

The Coronary Artery Revascularization in Diabetes (CARDia) trial (Kapur et. al.,
2005) was designed to compare coronary bypass graft surgery (CABG) and
percutaneous coronary intervention (PCI) as strategies for revascularization, with the
goal of showing that PCI is noninferior to CABG. We use various aspects of that study
to exemplify the methodology to test for inferiority. The endpoint is the one-year event
rate, where an event is defined as the occurrence of death, nonfatal myocardial
infarction, or cerebrovascular accident.
Suppose that the event rate for the CABG is πc = 0.125 and that the claim of
non-inferiority for PCI can be sustained if one can demonstrate statistically that the
ratio ρ = πt /πc is at most 1.3. In other words, PCI is considered to be non-inferior to
CABG as long as πt < 0.1625. Thus the null hypothesis H0 : ρ = 1.3 is tested against
the one-sided alternative hypothesis H1 : ρ < 1.3. We want to determine the sample
size required to have power of 80% when ρ = 1 using a one-sided test with a type-1
error rate of 0.05.
Single Look Design Powered at ρ = 1 First we consider a study with only one look
and equal sample sizes in the two groups. To begin click Two Proportions on the
Design tab under Discrete, and then click Ratio of Proportions.

496

24.2 Ratio of Proportions: Wald Formulation – 24.2.1 Trial Design

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
In the ensuing dialog box, next to Trial, select Noninferiority from the drop
down menu. Choose the remaining design parameters as shown below.

Make sure to select the radio button for Wald in the Test Statistic box. We will
discuss the Score (Farrington Manning) test statistic in the next section.
Now click Compute. The design is shown as a row in the Output Preview located in
the lower pane of this window. This single-look design requires a combined total of
2515 subjects from both treatments in order to attain 80% power.

You can select this design by clicking anywhere along the row in the Output Preview.
Some of the design details will be displayed in the upper pane, labeled Compare
Designs. In the Output Preview toolbar, click the
icon to save this design to
Workbook1 in the Library. If you hover the cursor over Design1 in the Library, a

24.2 Ratio of Proportions: Wald Formulation – 24.2.1 Trial Design

497

<<< Contents

24

* Index >>>

Binomial Non-Inferiority Two-Sample
tooltip will appear that summarizes the input parameters of the design.

Three-Look Design Powered at ρ = 1 For the above study, suppose we wish to take
up to two equally spaced interim looks and one final look at the accruing data, using
the Lan- DeMets (O’Brien-Fleming) stopping boundary. Create a new design by
selecting Design1 in the Library, and clicking the
icon on the Library toolbar.
Change the Number of Looks from 1 to 3, to generate a study with two interim looks
and a final analysis. A new tab Boundary Info will appear. Clicking on this tab will
reveal the stopping boundary parameters. By default, the Spacing of Looks is set to
Equal, which means that the interim analyses will be equally spaced in terms of the
number of patients accrued between looks. The left side contains details for the
Efficacy boundary, and the right side contains details for the Futility boundary. By
default, there is an efficacy boundary (to reject H0) selected, but no futility boundary
(to reject H1). The Boundary Family specified is of the Spending Functions
type. The default Spending function is the Lan-DeMets (Lan & DeMets, 1983), with
Parameter OF (O’Brien-Fleming), which generates boundaries that are very similar,
though not identical, to the classical stopping boundaries of O’Brien and Fleming

498

24.2 Ratio of Proportions: Wald Formulation – 24.2.1 Trial Design

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
(1979). Technical details of these stopping boundaries are available in Appendix F.

Click the Compute button to generate output for Design2. With Design2 selected in
the Output Preview, click the
icon to save Design2 to the Library. In the
Library, select the rows for Design1 and Design2, by holding the Ctrl key, and then
click the
side-by-side:

icon. The upper pane will display the details of the two designs

Using three planned looks requires an up-front commitment of 2566 subjects, a slight
24.2 Ratio of Proportions: Wald Formulation – 24.2.1 Trial Design

499

<<< Contents

24

* Index >>>

Binomial Non-Inferiority Two-Sample
inflation over the single-look design which required 2515 subjects. However, the
three-look design may result in a smaller sample size than that required for the
single-look design, with an expected sample size of 2134 subjects under the alternative
hypothesis (πc = 0.125, ρ = 1), and still ensures that the power is 80%.
icon, East
By selecting Design2 in the Library and clicking on the click the
displays the cumulative accrual, the stopping boundary, the type-1 error spent and the
boundary crossing probabilities under the null hypothesis H0 : ρ = 1.3, and the
alternative hypothesis H1 : ρ = 1 .

Single-Look Design Powered at ρ 6= 1 Sample sizes for non-inferiority trials
powered at ρ = 1 are generally rather large, because regulatory requirements usually
impose small non-inferiority margins (see, for example, Wang et. al., 2001). Observe
that both Design1 and Design2 were powered at ρ = 1 and required sample sizes in
excess of 2500 subjects. However, based on Kapur et al (2005), it is reasonable to
expect πt < πc . We now consider the same design as in Design1, but we will power at
the alternative hypothesis ρ1 = 0.72. That is, we will design this study to have 80%
power to claim non-inferiority if πc = 0.125 and πt = 0.72 × 0.125 = 0.09.
Create a new design by selecting Design1 in the Library, and clicking the
icon
on the Library toolbar. In the ensuing dialog box, change the design parameters as

500

24.2 Ratio of Proportions: Wald Formulation – 24.2.1 Trial Design

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
shown below.

Click the Compute button to generate output for Design3. With Design3 selected in
the Output Preview, click the

icon. In the Library, select the rows for

Design1, Design2, and Design3, by holding the Ctrl key, and then click the
icon. The upper pane will display the details of the three designs side-by-side:

This single-look design requires a combined total of 607 subjects from both treatments
in order to attain 80% power. This is a considerable decrease from the 2515 subjects
required to attain 80% power using Design1with ρ1 = 1.
Three-Look Design Powered at ρ 6= 1 We now consider the impact of multiple
looks on Design3. Suppose we wish to take up to two equally spaced interim looks and
one final look at the accruing data, using the Lan-DeMets (O’Brien-Fleming) stopping
24.2 Ratio of Proportions: Wald Formulation – 24.2.1 Trial Design

501

<<< Contents

24

* Index >>>

Binomial Non-Inferiority Two-Sample
boundary.
Create a new design by selecting Design3 in the Library, and clicking the
icon
on the Library toolbar. In the ensuing dialog box, change the Number of Looks to 3.

Click the Compute button to generate output for Design4.

Using three planned looks inflates the maximum sample size slightly, from 607 to 619
subjects. However it results in a smaller expected sample size under H1 . Observe that
the expected sample size is only 515 subjects under the alternative hypothesis
(πc = 0.125, ρ = 0.72), and still ensures the power is 80%.

502

24.2 Ratio of Proportions: Wald Formulation – 24.2.1 Trial Design

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016

24.2.2

Trial Simulation

You can simulate Design4 by selecting it from the Library and clicking on the
icon. Try different choices for the simulation parameters to verify the operating
characteristics of the study. For instance, under the Response Generation Info tab set
Prop. Under Control to 0.125 and Prop. Under Treatment to 0.72 × 0.125 = 0.09.

Click Simulate button. Once the simulation run has completed, East will add an
additional row to the Output Preview labeled Simulation1. Select Simulation1 in the
Output Preview. Note that some of the design details will be displayed in the upper
icon to save it to the Library.
pane, labeled Compare Designs. Click the
Double-click on Simulation1 in the Library. The simulation output details will be
displayed.

24.2 Ratio of Proportions: Wald Formulation – 24.2.2 Trial Simulation

503

<<< Contents

24

* Index >>>

Binomial Non-Inferiority Two-Sample
We simulated the data under the alternative hypothesis and should achieve a rejection
rate of 80%. This is confirmed above (up to Monte Carlo accuracy).
Next, to simulate under the null hypothesis, under the Response Generation Info tab
set Prop. Under Treatment to 1.3 × 0.125 = 0.1625. Click Simulate button.

This time the rejection rate is only 5% (up to Monte Carlo accuracy), as we would
expect under the null hypothesis. You may experiment in this manner with different
values of πc and πt and observe the rejection rates look by look as well as averaged
over all looks.

24.2.3

Interim Monitoring

icon from the Library toolbar.
Select Design4 in the Library, and click the
Alternatively, right-click on Design4 and select Create IM Dashboard. The interim
monitoring dashboard contains various controls for monitoring the trial, and is divided
into two sections. The top section contains several columns for displaying output
values based on the interim inputs. The bottom section contains four charts, each with
a corresponding table to its right. These charts provide graphical and numerical
descriptions of the progress of the clinical trial and are useful tools for decision making
by a data monitoring committee.
Suppose that the trial is first monitored after accruing 125 subjects on each treatment
504

24.2 Ratio of Proportions: Wald Formulation – 24.2.3 Interim Monitoring

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
arm, with 15 responses on the control arm and 13 responses on the treatment arm.
Click on the
icon to invoke the Test Statistic Calculator. In the
box next to Cumulative Sample Size enter 250. Enter −0.143101 in the box next to
Estimate of δ. In the box next to Std. Error of δ enter 0.357197. Next click Recalc.
Notice that the test statistic is computed to be -1.135. This value for the test statistic
was obtained by substituting the observed sample sizes and responses into
equations (24.6) through (24.9).

Upon clicking the OK button, East will produce the interim monitoring report shown
below.

24.2 Ratio of Proportions: Wald Formulation – 24.2.3 Interim Monitoring

505

<<< Contents

24

* Index >>>

Binomial Non-Inferiority Two-Sample
Note - Click on

icon to hide or unhide the columns of your interest.

The stopping boundary for declaring non-inferiority is -2.872 whereas the value of the
test statistic is only -1.135. Thus the trial should continue.
This conclusion is supported by the value of the 97.5% upper confidence bound of the
repeated confidence interval for δ = ln(ρ). The non-inferiority claim could be
sustained only if this bound were less than ln(1.3) = 0.262. At the current interim
look, however, the upper bound on δ is 0.883, indicating that the non-inferiority claim
is not supported by the data.
Suppose that the next interim look occurs after accruing 250 patients on each arm with
31 responses on the control arm and 22 responses on the treatment arm. Click on the
second row in the table in the upper section. Then click the
icon.
In the box next to Cumulative Sample Size enter 500. Enter −0.342945 in the box
next to Estimate of δ. In the box next to Std. Error of δ enter 0.264031. Next click

506

24.2 Ratio of Proportions: Wald Formulation – 24.2.3 Interim Monitoring

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
Recalc. Notice that the test statistic is computed to be -2.293.

Click the OK button. This time the stopping boundary for declaring non-inferiority is
crossed. The following message box appears.

24.2 Ratio of Proportions: Wald Formulation – 24.2.3 Interim Monitoring

507

<<< Contents

24

* Index >>>

Binomial Non-Inferiority Two-Sample
Click the Stop button to stop the study. The analysis results are shown below.

The upper bound on the 95.0% repeated confidence interval for δ is 0.159. Thus the
upper confidence bound on ρ is exp(0.159) = 1.172, comfortably within the
non-inferiority margin ρ0 = 1.3 specified at the design stage.
In the Final Inference Table in the bottom portion of the IM worksheet, East also
provides a p-value, confidence interval and median unbiased point estimate for δ using
stage-wise ordering of the sample space as described in Jennison and Turnbull (2000).
This approach often yields narrower confidence intervals than the repeated confidence
intervals approach although both approaches have the desired 95.0% coverage. In the
present example, the upper confidence bound is 0.098, slightly less than the
corresponding bound from the repeated confidence interval.

24.3

Ratio of Proportions:
Farrington-Manning
Formulation

24.3.1 Trial Design
24.3.2 Trial Simulation
24.3.3 Interim Monitoring

508

An alternative approach to establishing non-inferiority of an experimental treatment to
the control treatment with respect to the ratio of probabilities was proposed by
Farrington and Manning (1990). Let πc and πt denote the response rates for the control
and the experimental treatments, respectively. Let the difference between the two arms
be expressed by the ratio
πt
ρ=
πc
24.3 Ratio of Proportions: Farrington-Manning

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
The null hypothesis is specified as
H0 : ρ = ρ0 ,
or equivalently
H0 : π t = ρ 0 π c ,
which is tested against one-sided alternative hypotheses. If the occurrence of a
response denotes patient benefit rather than harm, then ρ0 < 1 and the alternative
hypothesis is
H1 : ρ > ρ0
or equivalently as
H1 : πt > ρ0 πc .
Conversely, if the occurrence of a response denotes patient harm rather than benefit,
then ρ0 > 1 and the alternative hypothesis is
H1 : ρ < ρ0
or equivalently as
H1 : πt < ρ0 πc .
For any given πc , the sample size is determined by the desired power at a specified
value of ρ = ρ1 . A common choice is ρ1 = 1 (or equivalently πt = πc ), but East
permits you to power the study at any value of ρ1 which is consistent with the choice
of H1 .
Let π̂tj and π̂cj denote the estimates of πt and πc based on ntj and ncj observations
from the experimental and control treatments, respectively, up to and including the j-th
look, j = 1, . . . , K, where a maximum of K looks are to be made. The test statistic at
the j-th look is defined as
Z j = rh

π̂tj − ρ0 π̂cj
π̂tj (1−π̂tj )
ntj

+

ρ20 π̂cj (1−π̂cj )
ncj

i.

(24.10)

The choice of test statistic is the primary distinguishing feature between the above
Farrington-Manning formulation and the Wald formulation of the non-inferiority test
discussed in Section 24.2. The Wald statistic (24.6) measures the standardized
difference between the observed ratio of proportions and the non-inferiority margin on
the natural logarithm scale. The corresponding repeated one-sided confidence bounds
displayed in the interim monitoring worksheet estimate ln(πt /πc ) and may be
converted to estimates of the ratio of proportions by exponentiation. On the other hand,
24.3 Ratio of Proportions: Farrington-Manning

509

<<< Contents

24

* Index >>>

Binomial Non-Inferiority Two-Sample
the Farrington-Manning formulation focuses on the expression of the null hypothesis
as
H0 : πt − ρ0 πc = 0.
Thus, we consider
δ = πt − ρ0 πc

(24.11)

as the parameter of interest. The test statistic (24.10) is the standardized estimate of
this difference obtained at the j-th look. A large difference in the direction of the
alternative hypothesis is indicative of non-inferiority. The corresponding repeated
one-sided confidence bounds displayed in the interim monitoring worksheet provide
estimates of δ rather than directly estimating ρ or ln(ρ). The Farrington-Manning and
Wald procedures are equally applicable for hypothesis testing since the null hypothesis
δ = 0 is rejected if and only if the corresponding null hypothesis ρ = ρ0 is rejected.

24.3.1

Trial Design

We consider the Coronary Artery Revascularization in Diabetes (CARDia) trial (Kapur
et al, 2005) compared coronary bypass graft surgery (CABG) and percutaneous
coronary intervention (PCI) as strategies for revascularization, with the goal of
showing that PCI is noninferior to CABG, presented in Section 24.2. We use various
aspects of that study to exemplify the use of the methodology to test for inferiority
with respect to the one-year event rate where an ”event” is the occurrence of death,
nonfatal myocardial infarction, or cerebrovascular accident, using the
Farrington-Manning formulation.
Suppose that the event rate for the CABG is πc = 0.125 and that the claim of
non-inferiority for PCI can be sustained if the ratio ρ is at most 1.3; that is, the event
rate for the PCI (πt ) is at most 0.1625. The null hypothesis H0 : ρ = 1.3 is tested
against the alternative hypothesis H1 : ρ < 1.3. We want to determine the sample size
required to have power of 80% when ρ = 1 using a one-sided test with a type-1 error
rate of 0.05.
Single Look Design Powered at ρ = 1 First we consider a study with only one look
and equal sample sizes in the two groups. To begin click Two Proportions on the

510

24.3 Ratio of Proportions: Farrington-Manning – 24.3.1 Trial Design

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
Design tab, and then click Ratio of Proportions.

In the ensuing dialog box, next to Trial, select Noninferiority from the drop
down menu. Choose the remaining design parameters as shown below.

Now click Compute. The design is shown as a row in the Output Preview located in
the lower pane of this window. This single-look design requires a combined total of
2588 subjects from both treatments in order to attain 80% power.

You can select this design by clicking anywhere along the row in the Output Preview.
Some of the design details will be displayed in the upper pane, labeled Compare
Designs. In the Output Preview toolbar, click the
icon to save this design to
Workbook1 in the Library. If you hover the cursor over Design1 in the Library, a
24.3 Ratio of Proportions: Farrington-Manning – 24.3.1 Trial Design

511

<<< Contents

24

* Index >>>

Binomial Non-Inferiority Two-Sample
tooltip will appear that summarizes the input parameters of the design.

Three-Look Design Powered at ρ = 1 For the above study, suppose we wish to take
up to two equally spaced interim looks and one final look at the accruing data, using
the Lan- DeMets (O’Brien-Fleming) stopping boundary. Create a new design by
selecting Design1 in the Library, and clicking the
icon on the Library toolbar.
Change the Number of Looks from 1 to 3, to generate a study with two interim looks
and a final analysis. A new tab Boundary Info will appear. Clicking on this tab will
reveal the stopping boundary parameters. By default, the Spacing of Looks is set to
Equal, which means that the interim analyses will be equally spaced in terms of the
number of patients accrued between looks. The left side contains details for the
Efficacy boundary, and the right side contains details for the Futility boundary. By
default, there is an efficacy boundary (to reject H0) selected, but no futility boundary
(to reject H1). The Boundary Family specified is of the Spending Functions
type. The default Spending function is the Lan-DeMets (Lan & DeMets, 1983), with
Parameter OF (O’Brien-Fleming), which generates boundaries that are very similar,
though not identical, to the classical stopping boundaries of O’Brien and Fleming

512

24.3 Ratio of Proportions: Farrington-Manning – 24.3.1 Trial Design

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
(1979).

Click the Compute button to generate output for Design2. With Design2 selected in
the Output Preview, click the
icon to save Design2 to the Library. In the
Library, select the rows for Design1 and Design2, by holding the Ctrl key, and then
click the

icon. The upper pane will display the details of the two designs

24.3 Ratio of Proportions: Farrington-Manning – 24.3.1 Trial Design

513

<<< Contents

24

* Index >>>

Binomial Non-Inferiority Two-Sample
side-by-side:

Using three planned looks requires an up-front commitment of 2640 subjects, a slight
inflation over the single-look design which required only 2588 subjects. However, the
three-look design may result in a smaller sample size than that required for the
single-look design, with an expected sample size of 2195 subjects under the alternative
hypothesis (πc = 0.125, ρ = 1), and still ensures that the power is 80%.
By selecting Design2 in the Library and clicking on the click the
icon, East
displays the cumulative accrual, the stopping boundary, the type-1 error spent and the
boundary crossing probabilities under the null hypothesis H0 : ρ = 1.3, and the
alternative hypothesis H1 : ρ = 1 .

514

24.3 Ratio of Proportions: Farrington-Manning – 24.3.1 Trial Design

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
Single-Look Design Powered at ρ 6= 1 Sample sizes for non-inferiority trials
powered at ρ = 1 are generally rather large because regulatory requirements usually
impose small non-inferiority margins. Observe that both Design1 and Design2 were
powered at ρ = 1 and required sample sizes in excess of 2500 subjects. However,
based on Kapur et al (2005), it is reasonable to expect πt < πc . We now consider the
same design as in Design1, but we will power at the alternative hypothesis ρ1 = 0.72.
That is, we will design this study to have 80% power to claim non-inferiority if
πc = 0.125 and πt = 0.72 × 0.125 = 0.09.
Create a new design by selecting Design1 in the Library, and clicking the
icon
on the Library toolbar. In the ensuing dialog box, change the design parameters as
shown below.

Click the Compute button to generate output for Design3. With Design3 selected in
the Output Preview, click the

icon. In the Library, select the rows for

Design1, Design2, and Design3, by holding the Ctrl key, and then click the

24.3 Ratio of Proportions: Farrington-Manning – 24.3.1 Trial Design

515

<<< Contents

24

* Index >>>

Binomial Non-Inferiority Two-Sample
icon. The upper pane will display the details of the three designs side-by-side:

This single-look design requires a combined total of 628 subjects from both treatments
in order to attain 80% power. This is a considerable decrease from the 2588 subjects
required to attain 80% power using Design1, i.e. with ρ1 = 1.
Three-Look Design Powered at ρ 6= 1 We now consider the impact of multiple
looks on Design3. Suppose we wish to take up to two equally spaced interim looks and
one final look at the accruing data, using the Lan- DeMets (O’Brien-Fleming) stopping
boundary.
Create a new design by selecting Design3 in the Library, and clicking the
icon
on the Library toolbar. In the ensuing dialog box, change the Number of Looks to 3.

516

24.3 Ratio of Proportions: Farrington-Manning – 24.3.1 Trial Design

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
Click the Compute button to generate output for Design4.

Using three planned looks inflates the maximum sample size slightly, from 628 to 641
subjects. However it results in a smaller expected sample size under H1 . Observe that
the expected sample size is only 533 subjects under the alternative hypothesis
(πc = 0.125, ρ = 0.72), and still ensures the power is 80%.

24.3.2

Trial Simulation

You can simulate Design4 by selecting Design4 in the Library and clicking on the
icon. Try different choices for the simulation parameters to verify the operating
characteristics of the study. For instance, under the Response Generation Info tab set
Prop. Under Control to 0.125 and Prop. Under Treatment to 0.72 × 0.125 = 0.09.

Click Simulate button. Once the simulation run has completed, East will add an
additional row to the Output Preview labeled Simulation1. Select Simulation1 in the
Output Preview. Note that some of the design details will be displayed in the upper
pane, labeled Compare Designs. Click the
icon to save it to the Library.
Double-click on Simulation1 in the Library. The simulation output details will be

24.3 Ratio of Proportions: Farrington-Manning – 24.3.2 Trial Simulation

517

<<< Contents

24

* Index >>>

Binomial Non-Inferiority Two-Sample
displayed.

We simulated the data under the alternative hypothesis and should achieve a rejection
rate of 80%. This is confirmed above (up to Monte Carlo accuracy).
Next, to simulate under the null hypothesis. Edit the Sim1 node by clicking
icon and under the Response Generation Info tab, set Prop. Under Treatment to

518

24.3 Ratio of Proportions: Farrington-Manning – 24.3.2 Trial Simulation

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
1.3 × 0.125 = 0.1625. Click Simulate button.

This time the rejection rate is only 5% (up to Monte Carlo accuracy), as we would
expect under the null hypothesis. You may experiment in this manner with different
values of πc and πt and observe the rejection rates look by look as well as averaged
over all looks.

24.3.3

Interim Monitoring

icon from the Library toolbar.
Select Design4 in the Library, and click the
Alternatively, right-click on Design4 and select Interim Monitoring. The interim
monitoring dashboard contains various controls for monitoring the trial, and is divided
into two sections. The top section contains several columns for displaying output
values based on the interim inputs. The bottom section contains four charts, each with
a corresponding table to its right. These charts provide graphical and numerical
descriptions of the progress of the clinical trial and are useful tools for decision making
by a data monitoring committee.
Suppose that the trial is first monitored after accruing 125 subjects on each treatment
arm, with 15 responses on the control arm and 13 responses on the treatment arm.
Click on the
icon to invoke the Test Statistic Calculator. In the
box next to Cumulative Sample Size enter 250. Enter −0.052 in the box next to
Estimate of δ. In the box next to Std. Error of δ enter 0.046617. Next click Recalc.
24.3 Ratio of Proportions: Farrington-Manning – 24.3.3 Interim Monitoring

519

<<< Contents

24

* Index >>>

Binomial Non-Inferiority Two-Sample
The test statistic is computed to be -1.115. This value for the test statistic was obtained
by substituting the observed sample sizes and responses into equation (24.10).

Upon clicking the OK button, East will produce the interim monitoring report shown
below.

The stopping boundary for declaring non-inferiority is -2.929 whereas the value of the
test statistic is only -1.115. Thus the trial should continue. This conclusion is also
520

24.3 Ratio of Proportions: Farrington-Manning – 24.3.3 Interim Monitoring

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
supported by the upper confidence bound on
δ = πt − ρ0 πc
which at present equals 0.085. A necessary and sufficient condition for the stopping
boundary to be crossed, and non-inferiority demonstrated thereby, is for this upper
confidence bound to be less than zero.
Suppose that the next interim look occurs after accruing 250 patients on each arm with
31 responses on the control arm and 22 responses on the treatment arm. Click on the
second row in the table in the upper section. Then click the
icon.
In the box next to Cumulative Sample Size enter 500. Enter −0.0732 in the box next
to Estimate of δ. In the box next to Std. Error of δ enter 0.032486. Next click
Recalc. Notice that the test statistic is computed to be -2.253.

Click the OK button. This time the stopping boundary for declaring non-inferiority is
24.3 Ratio of Proportions: Farrington-Manning – 24.3.3 Interim Monitoring

521

<<< Contents

24

* Index >>>

Binomial Non-Inferiority Two-Sample
crossed. The following message box appears.

Click the Stop button to stop the study. The analysis results are shown below. Notice
that the upper confidence bound of the repeated confidence interval for δ excludes zero.

In the Final Inference Table in the bottom portion of the IM worksheet, East also
provides a p-value, confidence interval and median unbiased point estimate for δ using
stage-wise ordering of the sample space as described in Jennison and Turnbull (2000,
page 179). The upper confidence bound for δ based on the stage-wise method likewise
excludes zero.

24.4
522

Odds Ratio Test

Let πt and πc denote the two binomial probabilities associated with the treatment (t)
24.4 Odds Ratio Test

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
and the control (c). Let the difference between the two treatment arms be captured by
the odds ratio
πt /(1 − πt )
πt (1 − πc )
ψ=
=
.
πc /(1 − πc )
πc (1 − πt )
The null hypothesis is specified as
H0 : ψ = ψ 0
and is tested against one-sided alternative hypotheses. If the occurrence of a response
denotes patient benefit rather than harm, then ψ0 > 1 and the alternative hypothesis is
H1 : ψ > ψ 0 .
Conversely, if the occurrence of a response denotes patient harm rather than benefit,
then ψ0 < 1 and the alternative hypothesis is
H1 : ψ < ψ 0 .
For any given πc , the sample size is determined by the desired power at a specified
value ψ = ψ1 . A common choice is ψ1 = 1 (or equivalently πt = πc ), but East permits
you to power the study at any value of ψ1 which is consistent with the choice of H1 .
Let π̂tj and π̂cj denote the estimates of πt and πc based on ntj and ncj observations
from the experimental and control treatments, respectively, up to and including j-th
look, j = 1, . . . , K, where a maximum of K looks are to be made. It is convenient to
express the treatment effect on the logarithmic scale as
δ = ln ψ .

(24.12)

The test statistic at the jth look is then defined as
Zj =

24.4.1

δ̂j − δ0
se(δ̂j )

=q

ln(ψ̂j ) − ln(ψ0 )
1
ntj π̂tj (1−π̂tj )

+

1
ncj π̂cj (1−π̂cj )

.

(24.13)

Trial Design

Suppose that the response rate for the control treatment is 90%, where higher response
rates imply patient benefit. Assume that a claim of non-inferiority can be sustained if
we can demonstrate statistically that the experimental treatment has a response rate of
at least 80%. In other words the non-inferiority margin is
ψ0 =

0.8(1 − 0.9)
= 0.444 .
0.9(1 − 0.8)

24.4 Odds Ratio Test – 24.4.1 Trial Design

523

<<< Contents

24

* Index >>>

Binomial Non-Inferiority Two-Sample
The null hypothesis H0 : ψ = 0.444 is to be tested against the one-sided alternative
H1 : ψ > 0.444. Suppose that we want to determine the sample size required to have
power of 90% when πc = 0.9 and ψ1 = 1, i.e. πc = πt , using a test with a type-1 error
rate of 0.05.
Single-Look Design Powered at ψ = 1 First we consider a study with only one
look and equal sample sizes in the two groups. To begin click Two Proportions on the
Design tab, and then click Odds Ratio of Proportions.

In the ensuing dialog box, next to Trial, select Noninferiority from the drop
down menu. Choose the remaining design parameters as shown below.

Now click Compute. The design is shown as a row in the Output Preview located in
the lower pane of this window. This single-look design requires a combined total of
524

24.4 Odds Ratio Test – 24.4.1 Trial Design

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
579 subjects from both treatments in order to attain 90% power.

You can select this design by clicking anywhere along the row in the Output Preview.
Some of the design details will be displayed in the upper pane, labeled Compare
Designs. In the Output Preview toolbar, click the
icon to save this design to
Workbook1 in the Library. If you hover the cursor over Design1 in the Library, a
tooltip will appear that summarizes the input parameters of the design.

Three-Look Design Powered at ψ = 1 For the above study, suppose we wish to
take up to two equally spaced interim looks and one final look at the accruing data,
using the default Lan- DeMets (O’Brien-Fleming) stopping boundary. Create a new
design by selecting Design1 in the Library, and clicking the
icon on the
Library toolbar. Change the Number of Looks from 1 to 3, to generate a study with
two interim looks and a final analysis. A new tab Boundary Info will appear. Clicking
on this tab will reveal the stopping boundary parameters. By default, the Spacing of
Looks is set to Equal, which means that the interim analyses will be equally spaced in
terms of the number of patients accrued between looks. The left side contains details
for the Efficacy boundary, and the right side contains details for the Futility boundary.
By default, there is an efficacy boundary (to reject H0) selected, but no futility
boundary (to reject H1). The Boundary Family specified is of the Spending
Functions type. The default Spending function is the Lan-DeMets (Lan &
DeMets, 1983), with Parameter OF (O’Brien-Fleming), which generates boundaries
that are very similar, though not identical, to the classical stopping boundaries of
24.4 Odds Ratio Test – 24.4.1 Trial Design

525

<<< Contents

24

* Index >>>

Binomial Non-Inferiority Two-Sample
O’Brien and Fleming (1979). Technical details of these stopping boundaries are
available in Appendix F.

Click the Compute button to generate output for Design2. With Design2 selected in
the Output Preview, click the
icon to save Design2 to the Library. In the
Library, select the rows for Design1 and Design2, by holding the Ctrl key, and then
click the
side-by-side:

icon. The upper pane will display the details of the two designs

Using three planned looks requires an up-front commitment of 590 subjects, a slight
inflation over the single-look design which required 579 subjects. However, the
526

24.4 Odds Ratio Test – 24.4.1 Trial Design

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
three-look design may result in a smaller sample size than that required for the
single-look design, with an expected sample size of 457 subjects under the alternative
hypothesis (πc = 0.9, ψ = 1), and still ensures that the power is 90%.
Single-Look Design Powered at ψ 6= 1 Suppose that it is expected that the new
treatment is a bit better than the control, but it is unnecessary and unrealistic to
perform a superiority test. The required sample size for ψ1 = 1.333, i.e.
πt = 0.92308, is determined. Create a new design by selecting Design1 in the
Library, and clicking the
icon on the Library toolbar. In the ensuing dialog
box, change the design parameters as shown below.

Click the Compute button to generate output for Design3. With Design3 selected in
the Output Preview, click the

icon. In the Library, select the rows for

Design1, Design2, and Design3, by holding the Ctrl key, and then click the

24.4 Odds Ratio Test – 24.4.1 Trial Design

527

<<< Contents

24

* Index >>>

Binomial Non-Inferiority Two-Sample
icon. The upper pane will display the details of the three designs side-by-side:

We observe that a single-look design powered at ψ1 = 1.333 reduces the sample size
considerably relative to the single-look design powered at ψ1 = 1. The reduction in
maximum sample size for the three-look design is approximately 38%
(=(579-358)/579). However, Design3 should be implemented after careful
consideration, since its favorable operating characteristics are only applicable to the
optimistic situation where ψ1 = 1.333. If ψ1 < 1.33, the power under Design3
decreases and may be too small to establish noninferiority, even if the true value > 1,
but is < 1.333.
Three-Look Design Powered at ψ 6= 1 For the above study (Design3), suppose we
wish to take up to two equally spaced interim looks and one final look at the accruing
data, using the default Lan- DeMets (O’Brien-Fleming) stopping boundary. Create a
new design by selecting Design3 in the Library, and clicking the
icon on the
Library toolbar. In the ensuing dialog box, change the Number of Looks to 3. Click
the Compute button to generate output for Design4.

528

24.4 Odds Ratio Test – 24.4.1 Trial Design

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
Using three planned looks requires an up-front commitment of 365 subjects, a small
inflation over the single-look design which required 358 subjects. However, the
three-look design may result in a smaller sample size than that required for the
single-look design, with an expected sample size of 283 subjects under the alternative
hypothesis (πc = 0.9, ψ = 1.333), and still ensures that the power is 90%.

24.4.2

Trial Simulation

You can simulate Design4 by selecting Design4 in the Library and clicking on the
icon. Try different choices for the simulation parameters to verify the operating
characteristics of the study. First, we verify the results under the alternative hypothesis
at which the power is to be controlled, namely πc = 0.9 and πt = 0.92308. Under the
Response Generation Info tab set Prop. Under Control to 0.9 and Prop. Under
Treatment to 0.92308.

Click Simulate button. Once the simulation run has completed, East will add an
additional row to the Output Preview labeled Simulation1. Select Simulation1 in the
Output Preview. Note that some of the design details will be displayed in the upper
pane, labeled Compare Designs. Click the
icon to save it to the Library.
Double-click on Simulation1 in the Library. The simulation output details will be

24.4 Odds Ratio Test – 24.4.2 Trial Simulation

529

<<< Contents

24

* Index >>>

Binomial Non-Inferiority Two-Sample
displayed.

We see here that the power is approximately 90%.
Now let’s consider the impact if the sample size was determined assuming
πc = 0.9, ψ1 = 1.333 when the true values are πc = 0.9 and ψ1 = 1. Under the
Response Generation Info tab set Prop. Under Treatment to 0.9. Click Simulate

530

24.4 Odds Ratio Test – 24.4.2 Trial Simulation

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
button.

This results in a power of approximately 74%. From this we see that if that optimistic
choice is incorrect, then the power to establish nonninferiority has decreased to a
possibly unacceptable value of 74%.

24.4.3

Interim Monitoring

Select Design4 in the Library, and click the
icon from the Library toolbar.
Alternatively, right-click on Design4 and select Interim Monitoring. The interim
monitoring dashboard contains various controls for monitoring the trial, and is divided
into two sections. The top section contains several columns for displaying output
values based on the interim inputs. The bottom section contains four charts, each with
a corresponding table to its right. These charts provide graphical and numerical
descriptions of the progress of the clinical trial and are useful tools for decision making
by a data monitoring committee.
Suppose that the trial is first monitored after accruing 60 subjects on each treatment
arm, with 50 responses on the control arm and 52 responses on the treatment arm.
Click on the
icon to invoke the Test Statistic Calculator. In the
box next to Cumulative Sample Size enter 120. Enter 0.264231 in the box next to
Estimate of δ. In the box next to Std. Error of δ enter 0.514034. Next click Recalc.
Notice that the test statistic is computed to be 2.092. This value for the test statistic was
24.4 Odds Ratio Test – 24.4.3 Interim Monitoring

531

<<< Contents

24

* Index >>>

Binomial Non-Inferiority Two-Sample
obtained by substituting the observed sample sizes and responses into equation (24.13).

Upon clicking the OK button, East will produce the interim monitoring report shown
below.

Note - Click on

icon to hide or unhide the columns of your interest.

The critical value is 3.22, and since the observed value of the test statistic (24.13) is
less than this value, the null hypothesis cannot be rejected. Therefore, noninferiority
cannot as yet be concluded.
Suppose that the second look is made after accruing 120 subjects on each treatment
532

24.4 Odds Ratio Test – 24.4.3 Interim Monitoring

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
arm, with 112 responses on the control arm and 115 responses on the treatment arm.
Click on the second row in the table in the upper section. Then click the
icon. In the box next to Cumulative Sample Size enter 240.
Enter 1.43848 in the box next to Estimate of δ. In the box next to Std. Error of δ
enter 0.801501. Next click Recalc. Notice that the test statistic is computed to be
2.808. This value for the test statistic was obtained by substituting the observed sample
sizes and responses into equation (24.13).

Click the OK button. This time the stopping boundary for declaring non-inferiority is

24.4 Odds Ratio Test – 24.4.3 Interim Monitoring

533

<<< Contents

24

* Index >>>

Binomial Non-Inferiority Two-Sample
crossed. The following message box appears.

Click the Stop button to stop the study. The analysis results are shown below.

The null hypothesis is rejected and we conclude that the treatment is noninferior to the
control. In the Final Inference Table in the bottom portion of the IM worksheet, East
also provides a stage-wise adjusted p-value, median unbiased point estimate and
confidence interval for ψ as described in Jennison and Turnbull (2000) and in
Appendix C of the East user manual. In the present example the adjusted p-value is
0.003, the point estimate for ψ is exp(1.427) = 4.166 and the upper 95% confidence
bound for ψ is exp(0.098) = 1.103.

534

24.4 Odds Ratio Test

<<< Contents

* Index >>>

25
25.1

Equivalence Test

Binomial Equivalence Two-Sample

In some experimental situations, it is desired to show that the response rates for the
control and the experimental treatments are ”close”, where ”close” is defined prior to
the collection of any data. Examples of this include showing that an aggressive therapy
yields a similar rate of a specified adverse event to the established control, such as the
bleeding rates associated with thrombolytic therapy or cardiac outcomes with a new
stent. Let πc and πt denote the response rates for the control and the experimental
treatments, respectively, and let π̂t and π̂c denote the estimates of πt and πc based on
nt and nc observations from the experimental and control treatments. Furthermore, let
δ = πt − πc ,

(25.1)

δ̂ = π̂t − π̂c .

(25.2)

which is estimated by
Finally, let the variance of δ̂ be
σ2 =

πc (1 − πc ) πt (1 − πt )
+
,
nc
nt

(25.3)

σ̂ 2 =

π̂c (1 − π̂c ) π̂t (1 − π̂t )
+
.
nc
nt

(25.4)

which is estimated by

The null hypothesis H0 : |πt − πc | = δ0 is tested against the two-sided alternative
hypothesis H1 : |πt − πc | < δ0 , where δ0 (> 0) is specified to define equivalence.
Following Machin and Campbell (1987), we present the solution to this problem as a
one-sided α -level test. The decision rule is to declare equivalence if
−δ0 + zα σ̂ ≤ π̂t − π̂c ≤ δ0 − zα σ̂.

(25.5)

We see that decision rule (25.5) is the same as declaring equivalence if the (1 − 2α)
100% confidence interval for πt − πc is entirely contained with the interval (−δ0 , δ0 ).
The power or sample size are determined for a single-look study only. The extension to
multiple looks is given in the next section. The sample size, or power, is determined at
a specified difference πt − πc , denoted δ1 , where −δ0 < δ1 < δ0 . The probability of
declaring equivalence depends on the true values of πc and πt . Based on the results of
Machin and Campbell (1987), the required total sample size (N) is, for nt = rN and
nc = (1 − r)N ,


(zα + zβ )2 πc (1 − πc ) (πc + δ1 )(1 − (πc + δ1 ))
+
.
(25.6)
N=
(δ0 − δ1 )2
1−r
r
25.1 Equivalence Test – 25.1.1 Trial Design

535

<<< Contents

25

* Index >>>

Binomial Equivalence Two-Sample

25.1.1

Trial Design

Consider the development of a new stent which is to be compared to the standard stent
with respect to target vessel failure (acute failure, target vessel revascularization,
myocardial infarction, or death) after one year. The standard stent has an assumed
target vessel failure rate of 20%. Equivalence is defined as δ0 = 0.075. The sample
size is to be determined with α = 0.025 (one-sided) and power, i.e. probability of
declaring equivalence, of 1 − β = 0.80.
To begin click Two Samples on the Design tab, and then click Difference of
Proportions.

Suppose that we want to determine the sample size required to have power of 80%
when δ1 = 0. Enter the relevant parameters into the dialog box as shown below. In the
drop down box next to Trial Type be sure to select Equivalence.

536

25.1 Equivalence Test – 25.1.1 Trial Design

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
Click on the Compute button. The design is shown as a row in the Output Preview
located in the lower pane of this window. The sample size required in order to achieve
the desired 80% power is 1203 subjects.

You can select this design by clicking anywhere along the row in the Output Preview.
If you double click anywhere along the row in the Output Preview some of the design
details will be displayed in the upper pane, labeled Output Summary.

In the Output Preview toolbar, click the
icon to save this design to Workbook1
in the Library. If you hover the cursor over Des1 in the Library, a tooltip will appear
that summarizes the input parameters of the design.
If the assumed difference δ1 is not zero, it is more difficult to establish equivalence, in
the sense that the power is lower and thus the required sample size is larger. Consider
δ1 = 0.025, so that the new stent increases the rate to 22.5%. Create a new design
25.1 Equivalence Test – 25.1.1 Trial Design

537

<<< Contents

25

* Index >>>

Binomial Equivalence Two-Sample
Des2 by selecting Des1 in the Library, and clicking the
icon on the Library
toolbar. Change the value of Expected Diff. from 0 to 0.025 as shown below.

Click on the Compute button. The design is shown as a row in the Output Preview
located in the lower pane of this window. With Design2 selected in the Output
Preview, click the

icon. In the Library, select the rows for Des1 and Des2, by

holding the Ctrl key, and then click the
details of the two designs side-by-side:

icon. The upper pane will display the

This single-look design requires a combined total of 2120 subjects from both
538

25.1 Equivalence Test – 25.1.1 Trial Design

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
treatments in order to attain 80% power.
Consider δ1 = −0.025, so that the new stent decreases the rate to 17.5%. Create a new
design, as above, and change the value of Expected Diff. to −0.025. Click the
Compute button to generate the output for Des3. With Des3 selected in the Output
Preview, click the

icon. In the Library, select the nodes for Des1, Des2, and

Des3 by holding the Ctrl key, and then click the
display the details of the three designs side-by-side:

icon. The upper pane will

Des3 yields a required total sample size of 1940 subjects. This asymmetry is due to the
fact that the variance is smaller for values of πc + δ1 further from 0.5.

25.1.2

Extension to Multiple Looks

Although the details presented in the previous section are related to a single-look
design only, these results can be used to extend the solution to allow for multiple
equally-spaced looks. We can use the General Design Module to generalize the
solution to this problem to the study design with multiple looks. Details are given in
Chapters 60 and 59.
Let π̂tj and π̂cj denote the estimates of πt and πc based on ntj and ncj observations
from the experimental and control treatments, respectively, up to and including the j-th
look, j = 1, . . . , K, where a maximum of K looks are to be used. Let nj = ncj + ntj
and
δ̂j = π̂tj − π̂cj
(25.7)
25.1 Equivalence Test – 25.1.2 Extension to Multiple Looks

539

<<< Contents

25

* Index >>>

Binomial Equivalence Two-Sample
denote the estimate of δ, given by (25.1), and let
σ̂j2 =

π̂cj (1 − π̂cj ) π̂tj (1 − π̂tj )
+
ncj
ntj

(25.8)

denote the estimate of σ 2 , given by (25.3), using the data available at the j-th look.
At the j-th look, the inference is based on
Zj =

δ̂j
.
σ̂j

(25.9)

Let
η=δ

p

Imax ,

where Imax is described in Chapter 59. Let tj = nj /nmax , j = 1, . . . , K. Then, using
the multivariate normal approximation to the distribution of Z1 , . . . , ZK , with the
1/2
expected value of Zj equal to tj η and the variance of Zj equal to 1, the
(1 − α)100% repeated confidence intervals for η are
!
Zj + CLj Zj + CU j
,
,
(25.10)
1/2
1/2
tj
tj
where CLj and CU j are the values specified by the stopping boundary. The
corresponding (1 − α)100% repeated confidence intervals for δ are
(δj + CLj , δj + CU j ).

(25.11)

Using the General Design Module, East provides these repeated confidence intervals
for η. By considering the decision rule (25.5) as declaring equivalence if the (1 − 2α)
100% confidence interval for πt − πc is entirely contained with the interval (−δ0 , δ0 ),
we generalize the decision rule to a multiple-look design by concluding equivalence
and stopping the study the first time one of the repeated (1 − 2α) 100% confidence
intervals for η is entirely contained within the interval (−η0j , η0j ), where
1/2

η0j = δ0 /tj σ̂j .
Consider Design1 (i.e. πc = 0.20, δ0 = 0.075, and δ1 = 0). As we saw above, a total
of 1203 subjects are required for decision rule (25.5) to have power of 80% of
declaring equivalence, using a 95% confidence interval.

540

25.1 Equivalence Test – 25.1.2 Extension to Multiple Looks

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
To begin click on the Other Designs on the Design tab and then click Sample
Size-Based.

Enter the parameters as shown below. For the Sample Size for Fixed-Sample Study
enter 1203, the value obtained from Des1. Also, be sure to set the Number of Looks
to 5. Recall that the choice here is twice the (one-sided) value specified for the
single-look design. The General Design Module is designed for testing the null
hypothesis H00 : η = 0. Thus, the specified power of the test pertains to testing H00
and is not directly related to the procedure using the confidence interval. The expected
sample sizes under H0 and H1 depend on the specified value of the power and pertain
to the null hypothesis H00 and the corresponding alternative hypothesis H10 : η 6= 0 or
a corresponding one-sided alternative. These expected sample sizes are not directly
applicable to the equivalence problem of testing H0 against H1 .

Next click on the Boundary Info tab. The repeated confidence intervals for η depend
on the choice of spending function boundaries. The sample size for this group
sequential study also depends on the choice of the spending function, as well as the
choice of the power. Although the boundaries themselves are not used in the decision
rule, the width of the repeated confidence intervals for η are determined by the choice
of the spending function. Here we will use the Lan- DeMets (O’Brien-Fleming)

25.1 Equivalence Test – 25.1.2 Extension to Multiple Looks

541

<<< Contents

25

* Index >>>

Binomial Equivalence Two-Sample
stopping boundary, with the looks spaced equally apart, as shown below.

Click Compute. With Des4 selected in the Output Preview, click the
icon. In
the Library, select the rows for Des1 and Des4, by holding the Ctrl key, and then click
icon. The upper pane will display the summary details of the two designs
the
side-by-side:

We see that the extension of Des1 to a five-look design requires a commitment of 1233
subjects, a small inflation over the sample size of 1203 subjects required for Des1.

542

25.1 Equivalence Test – 25.1.2 Extension to Multiple Looks

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
Select Design4 in the Library, and click the
icon from the Library toolbar.
Alternatively, right-click on Design4 and select Create IM Dashboard. This will
invoke the interim monitoring worksheet, from which the repeated 95% confidence
intervals will be provided.

The interim monitoring dashboard contains various controls for monitoring the trial,
and is divided into two sections. The top section contains several columns for
displaying output values based on the interim inputs. The bottom section contains four
charts, each with a corresponding table to its right. These charts provide graphical and
numerical descriptions of the progress of the clinical trial and are useful tools for
decision making by a data monitoring committee.
We want to perform up to five looks, as data becomes available for every 200 subjects.
Suppose that, after 200 subjects, π̂cj = 18/100 = 0.18 and π̂tj = 20/100 = 0.2.
Then, from (25.2) and (25.4), the estimates of δ and the standard error of δ̂ are 0.02
icon to invoke the Test Statistic Calculator. Enter the
and 0.0555. Click on the
appropriate values as shown below and click Recalc. Notice that the test statistic is

25.1 Equivalence Test – 25.1.2 Extension to Multiple Looks

543

<<< Contents

25

* Index >>>

Binomial Equivalence Two-Sample
computed to be 0.357.

Next click OK . The following screen is shown.

The first repeated 95% confidence interval for η is (-12.628, 14.402). Since this
confidence interval is not contained in the interval (-3.357, 3.357), where
η01 =

δ0
1/2
t1 σ̂1

=

0.075
= 3.357,
(0.162)1/2 (0.0555)

we take a second look after 400 subjects. Click on the second row in the table in the
upper section. Then click the
icon to invoke the Test Statistic Calculator.
Suppose that π̂cj = 36/200 = 0.18 and π̂tj = 38/200 = 0.19. Then, from (25.2) and
(25.4), the estimates of δ and the standard error of δ̂ are 0.01 and 0.0388. Enter these

544

25.1 Equivalence Test – 25.1.2 Extension to Multiple Looks

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
values as shown below and click on the Recalc button.

Click on the OK button and the following values are presented in the interim
monitoring worksheet.

The second repeated 95% confidence interval for η is (-6.159, 7.064) is not contained
in the interval (-3.396, 3.396), where
η02 =

δ0
1/2
t2 σ̂2

=

0.075
= 3.396,
(0.324)1/2 (0.0388)

so we cannot conclude equivalence. Continue the study and we take a third look after
600 subjects. Click on the third row in the table in the upper section. Then click the
icon to invoke the Test Statistic Calculator. Suppose that
π̂cj = 51/300 = 0.17 and π̂tj = 60/300 = 0.2. Then, from (25.2) and (25.4), the
estimates of δ and the standard error of δ̂ are 0.03 and 0.0317. Enter these values as
25.1 Equivalence Test – 25.1.2 Extension to Multiple Looks

545

<<< Contents

25

* Index >>>

Binomial Equivalence Two-Sample
shown below and click on the Recalc button. The following screen is shown.

Click on the OK button and the following values are presented in the interim
monitoring worksheet.

The third repeated 95% confidence interval for η is (-2.965, 5.679) is not contained in
the interval (-3.390, 3.390), where
η03 =

δ0
1/2
t3 σ̂3

=

0.075
= 3.390,
(0.487)1/2 (0.0317)

so we cannot conclude equivalence. Continue the study and we take a fourth look after
850 subjects. Click on the fourth row in the table in the upper section. Then click the
icon to invoke the Test Statistic Calculator. Suppose that
π̂cj = 91/450 = 0.2022 and π̂tj = 88/450 = 0.1956. Then, from (25.2) and (25.4),
546

25.1 Equivalence Test – 25.1.2 Extension to Multiple Looks

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
the estimates of δ and the standard estimate of δ are -0.007 and 0.027. Enter these
values as shown below and click on the Recalc button. The following screen is shown.

Click on the OK button and the following values are presented in the interim
monitoring worksheet.

The fourth confidence interval is (-3.302, 2.678) is entirely contained in the interval
(-3.346, 3.346), where
η04 =

δ0
1/2
t4 σ̂4

=

0.075
= 3.346
(0.689)1/2 (0.027)

and thus we conclude that the two treatments are equivalent. To express the results in
terms of the δ, the final confidence interval for η can be transformed to a confidence
interval for δ by multiplying the confidence limits by
1/2

t4 σ̂4 = (0.689)1/2 (0.027) = 0.0224,
25.1 Equivalence Test – 25.1.2 Extension to Multiple Looks

547

<<< Contents

25

* Index >>>

Binomial Equivalence Two-Sample
resulting in a confidence interval for δ of (-0.074, 0.060), which is entirely contained
within the interval (-0.075, 0.075).

548

25.1 Equivalence Test

<<< Contents

* Index >>>

26
26.1

Chi-Square
for Specified
Proportions in C
Categories

26.1.1 Trial Design

Binomial Superiority n-Sample

Let π0i and π1i for i = 1, 2, ..., C denote the response proportions under null and
alternative hypotheses respectively where C denotes the number of categories. The
null hypothesis states that the observed frequencies follow multinomial distribution
with null proportions as probabilities. The test is performed for only two sided
alternative. The sample size, or power, is determined for a specified value of the
proportions which is consistent with the alternative hypothesis, denoted by π1i .

Table 26.1: Table: Contingency Table
Categoris\Response
Age Group A
Age Group B
Age Group C
Marginal

Cured
n11
n12
n13
n1.

Not Cured
n21
n22
n23
n2.

The null hypothesis is
H0 : πi = π0i , i = 1, 2, 3, ..., C and is tested against two-sided alternative.
The test statistic is given as,

χ2 =

X (n1i − µi )2
i

µi

(26.1)

where µi = n1 π0i
Let χ20 be the observed value of χ2 . For large samples, χ2 has approximately
Chi-squared distribution with d.f. C − 1. The p-value is approximated by
P (χ2c−1 ≥ χ20 ), where χ2c−1 denotes a Chi-squared random variable with d.f. = C − 1.

26.1.1

Trial Design

Consider the design of a single-arm trial with binary response - Cured and Not Cured.
The responses for Cured population for three categories are of interest - Age group A,
Age group B and Age group C. We wish to determine whether the proportion of cured
in the three age groups are 0.25, 0.25, and 0.50 respectively. Thus it is desired to test
H0 : πA = 0.25, πB = 0.25, πC = 0.50. We wish to design the trial with a two-sided
26.1 Chi-Square-C categories – 26.1.1 Trial Design

549

<<< Contents

26

* Index >>>

Binomial Superiority n-Sample
test that achieves 90% power at H1 : πA = 0.3, πB = 0.4, πC = 0.3 at level of
significance 0.05.
Start East. Click Design tab, then click Many Samples in the Discrete group, and then
click Chi-Square Test of Specified Proportions in C Categories .
In the upper pane of this window is the Input dialog box, which displays default input
values.
Enter the Number of Categories (C) as 3. Under Table of Proportion of Response,
enter the values of proportions under Null Hypothesis and Alternative
Hypothesis for each category except the last one such that the sum of values in a
row equals to 1. Enter the inputs as shown below and click Compute.

The design output will be displayed in the Output Preview, with the computed sample
size highlighted in yellow. A total of 71 subjects must be enrolled in order to achieve
90% power under the alternative hypothesis. Besides sample size, one can also
compute the power and the level of significance for this Chi-Square Test of Specified
Proportions in C Categories study design.

You can select this design by clicking anywhere on the row in the Output Preview. If
you click

icon, some of the design details will be displayed in the upper pane. In

the Output Preview toolbar, click the
icon, to save this design to workbook Wbk1
in the Library. If you hover the cursor over Des1 in the Library, a tooltip will appear
550

26.1 Chi-Square-C categories – 26.1.1 Trial Design

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
that summarizes the input parameters of the design.

With Des1 selected in the Library, click
icon on the Library toolbar, and then
click Power vs. Sample Size. The resulting power curve for this design is shown. You
can export the chart in one of several image formats (e.g., Bitmap or JPEG) by clicking
Save As.... For now, you may close the chart before continuing.

26.1 Chi-Square-C categories – 26.1.1 Trial Design

551

<<< Contents

26

26.2

* Index >>>

Binomial Superiority n-Sample

Two-Group
Chi-square for
Proportions in C
Categories

Let π1j and π2j denote the response proportions of group 1 and group 2 respectively
for the j-th category, where j = 1, 2, ..., C.
The null hypothesis H0 : π1j = π2j ∀j = 1, 2, ..., C is tested against the alternative
hypothesis that for at least one j, π1j differs from π2j .

26.2.1 Trial Design
Table 26.2: Table: Contingency Table
Categories \ Groups
A
B
C
Marginal

552

26.2 Two-Group Chi-square Test

Group 1
n11
n12
n13
n10

Group 2
n21
n22
n23
n20

Marginal
n01
n02
n03
n

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
The test statistic is given as,

χ2 =
where µij =

noj nio
,j
n

X (nij − µij )2
µi j
ij

(26.2)

= 1, 2, ..., C and i = 1, 2.

Let χ20 be the observed value of χ2 . For large samples, χ2 has approximately
Chi-squared distribution with d.f. C − 1. The p-value is approximated by
P (χ2C−1 ≥ χ20 ), where χ2C−1 denotes a Chi-squared random variable with d.f. =
C − 1.

26.2.1

Trial Design

Suppose researchers want to investigate the relationship between different dose levels
(level 1, level 2 and level 3) of a drug and the type adverse events (serious or not
serious). The proportions who were treated with different dose levels will be compared
using a Chi-square test. Suppose the expected proportions of patients for three
different dose levels are 0.30, 0.35 and 0.35 where patients had no serious adverse
events and the expected proportions are 0.20, 0.30 and 0.50 where patients had serious
adverse events. We wish to design the trial with a two-sided test that achieves 90%
power at level of significance 0.05.
Start East. Click Design tab, then click Many Samples in the Discrete group, and then
clickTwo-Group Chi-square for Proportions in C Categories.
The Input dialog box, with default input values will appear in the upper pane.
Enter the Number of Categories (C) as 3. Under Table of Proportion of Response,
enter the values of proportions under Control and Treatment for each category
except the last one such that the sum of values in a row equals to 1. Enter the inputs as
shown below and click Compute.

26.2 Two-Group Chi-square Test – 26.2.1 Trial Design

553

<<< Contents

26

* Index >>>

Binomial Superiority n-Sample

The design output will be displayed in the Output Preview, with the computed sample
size highlighted in yellow. A total of 503 subjects must be enrolled in order to achieve
90% power under the alternative hypothesis. Besides sample size, one can also
compute the power and the level of significance for this Chi-Square Test of Specified
Proportions in C Categories study design.

You can select this design by clicking anywhere on the row in the Output Preview. If
you click n

icon, some of the design details will be displayed in the upper pane.

icon to save this design to Wbk1 in the
In the Output Preview toolbar, click
Library. If you hover the cursor over Des1 in the Library, a tooltip will appear that
summarizes the input parameters of the design.

554

26.2 Two-Group Chi-square Test – 26.2.1 Trial Design

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016

With Des1 selected in the Library, click
icon on the Library toolbar, and then
click Power vs. Sample Size. The resulting power curve for this design is shown. You
can export the chart in one of several image formats (e.g., Bitmap or JPEG) by clicking
Save As.... For now, you may close the chart before continuing.

26.2 Two-Group Chi-square Test – 26.2.1 Trial Design

555

<<< Contents

26

26.3

* Index >>>

Binomial Superiority n-Sample

Nonparametric:
Wilcoxon Rank
Sum for Ordered
Categorical Data

26.3.1 Trial Design

556

When we compare two treatments with respect to signs and symptoms associated with
a disease, we may base the comparison on a variable that assesses degree of response
or the degree of severity, using an ordinal categorical variable. For example,
investigators may report the severity of an adverse event, or other abnormality, using a
specified grading system or using a simple scale, such as”none”, ”mild”, moderate”, or
”severe”. The latter rating scale might be used in an analgesia study to report the
severity of pain. Although this four-point scale is often used and intuitively appealing,
additional categories, such as ”very mild” and ”very severe”, may be added. In other
situations, the efficacy of the treatment is best assessed by the subject reporting
response to therapy using a similar scale. The Wilcoxon test for ordered categories is a
nonparametric test for use in such situations. East provides the power for a specified
sample size for a single-look design using the constant proportional odds ratio model.
Let πcj and πtj denote the probabilities for category j, j = 1, 2, ..., J for the control c
Pi
Pi
and the treatment t respectively. Let γci = j=1 πcj and γti = j=1 πtj . We assume
that
γci
ψ γti
1−γci = e 1−γti , i = 1, 2, .., J − 1,

26.3 NPAR:Wilcoxon Rank Sum Test

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
or, equivalently,

ψ = ln(γci ) − ln(1 − γci ) − (ln(γti ) − ln(1 − γti ))

(26.3)

We compare the two distributions by focusing on the parameter ψ. Thus we test the
null hypothesis H0 : ψ = 0 against the two-sided alternative H1 : ψ 6= 0 or a
one-sided alternative hypothesis H1 : ψ > 0. East requires the specified value of ψ to
be positive. Technical details can be found in Rabbee et al.,2003.

26.3.1

Trial Design

We consider here a placebo-controlled parallel-group study where subjects report the
response to treatment as ”none”, ”slight” ”considerable”, or ”total”. We expect that
most of the subjects in the placebo group will report no response.
Start East. Click Design tab, then click Many Samples in the Discrete group, and then
clickNon Parametric: Wilcoxon Rank Sum for Ordered Categorical Data.
The Input dialog box, with default input values will appear in the upper pane.
We want to determine the power, using a two-sided test with a type-1 error rate of 0.05,
with a total of 100 subjects, and equal sample sizes for the two groups. Enter Number
of Categories as 4. We will use User Specified for Specify Pop 1 Probabilities
and Proportional Odd Model for Pop2 Probabilities here. Click Proportional
Odds Model radio button. A new field for Shift will appear. Enter 1.5 in this field.
Based on the results of a pilot study, the values of 0.55, 0.3, 0.1, and 0.05 are used as
Pop 1 probabilities. Enter the inputs as shown below and click Compute.

26.3 NPAR:Wilcoxon Rank Sum Test – 26.3.1 Trial Design

557

<<< Contents

26

* Index >>>

Binomial Superiority n-Sample
The design output will be displayed in the Output Preview, with the computed power
highlighted in yellow. This design results in a power of approximately 98% for a total
sample size of 100 subjects.

You can select this design by clicking anywhere on the row in the Output Preview. If
you click

icon, some of the design details will be displayed in the upper pane. In

the Output Preview toolbar, click
icon, to save this design to workbook Wbk1 in
the Library. If you hover the cursor over Des1 in the Library, a tooltip will appear
that summarizes the input parameters of the design.

558

26.3 NPAR:Wilcoxon Rank Sum Test – 26.3.1 Trial Design

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
With Des1 selected in the Library, click
icon, on the Library toolbar, and then
click Power vs. Sample Size. The resulting power curve for this design is shown. You
can export the chart in one of several image formats (e.g., Bitmap or JPEG) by clicking
Save As.... For now, you may close the chart before continuing.

With such high power, a total sample size of 100 subjects may be an inefficient use of
resources. We are willing to use a smaller sample size to achieve a lower power.
Change the maximum sample size to 50 in the previous design. Leave all other values
as defaults, and click Compute.
This design results in approximately 80% power using a total sample size of 50
subjects.

26.3 NPAR:Wilcoxon Rank Sum Test – 26.3.1 Trial Design

559

<<< Contents

26

26.4

* Index >>>

Binomial Superiority n-Sample

Trend in R
Ordered Binomial
Proportions

26.4.1 Trial Design

In some experimental situations, there are several binomial distributions indexed by an
ordinal variable and we want to examine changes in the probabilities of success as the
levels of the indexing variable changes. Examples of this include the examination of a
dose-related presence of a response or a particular side effect, dose-related
tumorgenicity, or presence of fetal malformations relative to levels of maternal
exposure to a particular toxin, such as alcohol, tobacco, or environmental factors.
The test for trend in R ordered proportions is based on the Cochran Armitage trend
test. Let πj denote the probability of interest for the j-th category of the ordinal
variable, j = 1, 2, ..., R and let scores be denoted by ω1 , ω2 , ...ωR . It is assumed that
the odds ratio relating to j-th category to the (j − 1)-th category satisfies
πj
πj−1
= ψ ωj −ωj−1
1 − πj
1 − πj−1

(26.4)

or equivalently,
ln(

πj−1
πj
) = (ωj − ωj−1 ) ln(ψ) + ln(
)
1 − πj
1 − πj−1

(26.5)

This assumption can also be equivalently expressed as a relationship between the odds
ratio for the j -th category to that of the first category; namely,
πj
π1
= ψ ωj −ω1
1 − πj
1 − π1

(26.6)

or equivalently,
ln(
560

πj
π1
) = (ωj − ω1 ) ln(ψ) + ln(
)
1 − πj
1 − π1

26.4 Trend in R Ordered Binomial Proportions

(26.7)

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
It is assumed that π1 < ... < πR with ψ > 1 or π1 > ... > πR with ψ < 1.
We want to test the null hypothesis H0 : ψ = 1 against the two sided alternative
H1 : ψ 6= 1 or against a one-sided alternative H1 : ψ > 1 or H1 : ψ < 1. The sample
size required to achieve a specified power or the power for a specified sample size is
determined for a single-look design with the specified parameters. The sample size
calculation is conducted using the methodology presented below, which is similar to
that described in Nam, 1987.
Let nj = rj N denote the sample size for the j-th category where rj is the j-th sample
fraction and N is the total sample size. The determination of the sample size required
to control the power of the test of H0 is based on
W =

R
X

rj (ωj − ω̄)πˆj

(26.8)

j=1

with ω̄ =

PR

j=1 rj ωj

The expected value of W is
R
X

rj (ωj − ω̄)πj

(26.9)

rj (ωj − ω̄)2 πj (1 − πj )

(26.10)

E(W ) =

j=1

and the variance of W is
V (W ) =

R
X
j=1

The expected value of W under H0 is

E0 (W ) = π

R
X

rj (ωj − ω̄)

(26.11)

j=1

and the variance of W under H0 is
V0 (W ) = π(1 − π)

R
X

rj (ωj − ω̄)2

(26.12)

j=1

26.4 Trend in R Ordered Binomial Proportions

561

<<< Contents

26

* Index >>>

Binomial Superiority n-Sample
Where,
π=

R
X

rj πj

(26.13)

j=1

The test statistic used to determine the sample size is
Z=

W − E0 (W )

(26.14)

1

V0 (W ) 2

The total sample size required for a two-sided test with type-1 error rate of α to have
power 1 − β when ψ = ψ1 is
1

N=

1

[zα/2 V0 (W ) 2 + zβ V (W ) 2 ]2
E(W )2

(26.15)

The total sample size required for a one-sided test with type-1 error rate of α to have
power 1 − β when ψ = ψ1 is determined from (1.11) with α/2 replaced by α.

26.4.1

Trial Design

Consider the problem of comparing three durations of therapy for a specific disorder.
We want to have sufficiently large power when 10% of subjects with shorter duration,
25% of subjects with intermediate duration and 50% of subjects with extensive
duration will respond by the end of therapy. These parameters result in an odds ratio of
ψ = 3 or equivalently ln(ψ) = 1.1 . We would like to determine the sample size to
achieve 90% power when ln(ψ) = 1.1 based on a two-sided test at significance level
0.05.
Start East. Click Design tab, then click Many Samples in the Discrete group, and then
click Trend in R Ordered Binomial Proportions.
The Input dialog box, with default input values will appear in the upper pane.
Response probabilities can be specified in one of the two ways, selected from
Response Probabilities: (1) User Specified Probabilities or (2) Model Based
Probabilities. User can specify probabilities for each population if he or she chooses
User Specified Probabilities whereas Model Based Probabilities are based on logit
562

26.4 Trend in R Ordered Binomial Proportions – 26.4.1 Trial Design

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
transformation. We will use Model Based Probabilities here. Under
Response Probabilities, click Model Based Probabilities radio button. A
new field for log of Common odds Ratio will appear. Enter 1.1 in this field.
Enter 0.1 in Prop. of Response field. One can specify the Scores (W(i)) also
in monotonically increasing order. We will use Equally Spaced here. Enter the
inputs as shown below and click Compute.

The design output will be displayed in the Output Preview, with the computed sample
size highlighted in yellow. A total of 75 subjects must be enrolled in order to achieve
90% power under the alternative hypothesis. Besides sample size, one can also
compute the power and the level of significance for this design.

You can select this design by clicking anywhere on the row in the Output Preview. If
you click on

icon, some of the design details will be displayed in the upper pane.

icon, to save this design to Wbk1 in the
In the Output Preview toolbar, click
Library. If you hover the cursor over Des1 in the Library, a tooltip will appear that
summarizes the input parameters of the design.

26.4 Trend in R Ordered Binomial Proportions – 26.4.1 Trial Design

563

<<< Contents

26

* Index >>>

Binomial Superiority n-Sample

With Des1 selected in the Library, click
icon on the Library toolbar, and then
click Power vs. Sample Size. The resulting power curve for this design is
shown. You can export the chart in one of several image formats (e.g., Bitmap or
JPEG) by clicking Save As.... For now, you may close the chart before continuing.

564

26.4 Trend in R Ordered Binomial Proportions – 26.4.1 Trial Design

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016

The default specification of equally spaced scores is useful when the categories are
ordinal, but not numerical. If the categories are numerical, such as doses of a therapy,
then the numerical value will be more appropriate. Consider three doses of 10, 20, and
30. One must exhibit care in specification of log(ψ) when the differences between
scores for adjacent categories are equal, but this common difference is not equal to
one. Although the differences are equal, user defined scores must be used. If the
common difference is equal to a positive value A, then equating log(ψ) to 1/A of that
for the default of equally spaced scores, with a common difference of one, will provide
identical results. With three doses of (Scores W(i)) of 10, 20, and 30 and and log of
Common odds Ratio = 0.11, the results are the same as those shown above. This
is shown in the following screenshot.

26.4 Trend in R Ordered Binomial Proportions – 26.4.1 Trial Design

565

<<< Contents

26

* Index >>>

Binomial Superiority n-Sample
The design output will be displayed in the Output Preview, with the computed sample
size highlighted in yellow. A total of 75 subjects must be enrolled in order to achieve
90% power under the alternative hypothesis when log(ψ) = .11 and π1 = 0.1.

Similarly, if the differences between scores for adjacent categories are not equal, user
defined scores must be used. Consider three doses of 10, 20, and 50, with log of
Common odds Ratio= 0.11. Change the scores (Scores W(i)) to 10, 20, and 50 in
the previous design. This is shown in the following screenshot.

The design output will be displayed in the Output Preview, with the computed sample
size highlighted in yellow. A total of 16 subjects must be enrolled in order to achieve
90% power under the alternative hypothesis when log(ψ) = .11 and π1 = 0.1.

Although, a small sample size is usually desirable, here it may be due to a value of
π3 (= 0.90) which may be too large to be meaningful. Then the power should be
controlled at a smaller value of log(ψ). Consider log(ψ) = 0.07. Change the log of
Common odds Ratio value to 0.07 . This is shown in the following screenshot.

566

26.4 Trend in R Ordered Binomial Proportions – 26.4.1 Trial Design

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016

The design output will be displayed in the Output Preview, with the computed sample
size highlighted in yellow. A total of 37 subjects must be enrolled in order to achieve
90% power under the alternative hypothesis when log(ψ) = .07 and π1 = 0.1.

The trend test is particularly useful in situations where there are several categories.
Consider now an example of a dose-ranging study to examine the safety of a therapy,
with respect to the occurrence of a specified adverse event (AE), such as a
dose-limiting toxicity (DLT). Six doses (1, 2, 4, 8, 12, 16) have been selected. It is
expected that approximately 5% on the lowest dose will experience the AE. The study
is to be designed to have power of 90% if approximately 20% on the highest dose
experience the AE. This suggests that the study should be designed with log(ψ)
approximately (log(0.20) − log(0.05))/15 = 0.092. Enter log of Common odds
Ratio as 0.1 , Prop. Of Response as 0.05 and Number of Populations
as 6. Enter the Scores W(i) as 1, 2, 4, 8, 12, and 16. Leave all other values as defaults,
and click Compute.

26.4 Trend in R Ordered Binomial Proportions – 26.4.1 Trial Design

567

<<< Contents

26

* Index >>>

Binomial Superiority n-Sample

The design output will be displayed in the Output Preview, with the computed sample
size highlighted in yellow. A total of 405 subjects must be enrolled in order to achieve
90% power under the alternative hypothesis when log(ψ) = .1 and π1 = 0.05.

This sample size may not be economically feasible, so we instead select the sample
size to achieve a power of 80%. Selecting Power(1-β) as 0.8 yields the result shown
in the following screen shot. This design requires a combined total of 298 subjects
from all groups to attain 80% power when log(ψ) = 0.1 and π1 = 0.05.

568

26.4 Trend in R Ordered Binomial Proportions – 26.4.1 Trial Design

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016

26.5

Chi-Square for R
Unordered Binomial
Proportions

26.5.1 Trial Design

Let πij denote proportions of response in i-th group and j-th category with
i = 1.2, ...., R and j = 1, 2 where R denotes the number of groups. The null
hypothesis of equality of proportions in all groups for every category is tested against
the alternative that at least one proportion is different across all groups for any
category.
The null hypothesis is defined as,
H0 : πi1 = π0 ∀i
The alternative is defined as,
H1 : πi1 6= π0 for any i = 1, 2, ..., R

Table 26.3: Table: R × 2 Contingency Table
Rows
Row 1
Row 2
·
·
Row R
Col Total

Col 1
n11
n21
·
·
nR1
n1

Col 2
n12
n22
·
·
nR2
n2

Row Total
m1
m2
·
·
mR
N

The test statistic is given as,

2

χ =

R X
2
X
(nij −
i=1 j=1

mi nj 2
N )
m i nj
N

(26.16)

Let χ20 be the observed value of χ2 . For large samples, χ2 has approximately
Chi-squared distribution with d.f. R − 1. The p-value is approximated by
P (χ2R−1 ≥ χ20 ), where χ2R−1 denotes a Chi-squared random variable with d.f. = R − 1.

26.5.1

Trial Design

Consider a 3-arm trial with treatments A, B and C. The response is the reduction in
26.5 Chi-Square for R Unordered Binomial Proportions – 26.5.1 Trial Design

569

<<< Contents

26

* Index >>>

Binomial Superiority n-Sample
blood pressure (BP). From historical data it is known that the response rates of
treatment A, B and C are 37.5%, 59% and 40% respectively. That is, out of 40
individuals under treatment A, 15 had a reduction in BP, out of 68 individuals under
treatment B, 40 had a reduction in BP and out of 30 individuals under treatment C, 12
had a reduction in BP. Based on these data we can fill the entries in the table of
proportions.
Table 26.4: Table: Proportion of Response
Groups\Categories:
Treatment A
Treatment B
Treatment C

Reduction in BP
0.375
0.59
0.4

No Reduction
0.625
0.41
0.6

Marginal
1
1
1

This can be posed as a two-sided testing problem for testing
H0 : πA = πB = πC (= π0 , say) against H1 : πi 6= π0 (for at least any i = A, B, C)
at 0.05 level. We wish to determine the sample size to have 90% power for the values
displayed in the above table.
Start East. Click Design tab, then click Many Samples in the Discrete group, and then
clickChi-Square Test for Unordered Binomial Proportions.
The Input dialog box, with default input values will appear in the upper pane.
Enter the values of Response Proportion in each group and Alloc.Ratio
ri = ni /n1 where Alloc.Ratio ri = ni /n1 is the corresponding weights relative
to the first group . Enter the inputs as shown below and click Compute.

The design output will be displayed in the Output Preview, with the computed sample
size highlighted in yellow. A total of 301 subjects must be enrolled in order to achieve
570

26.5 Chi-Square for R Unordered Binomial Proportions – 26.5.1 Trial Design

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
90% power under the alternative hypothesis. Besides sample size, one can also
compute the power and the level of significance for this Chi-Square test for R × 2
Table study design.

You can select this design by clicking anywhere on the row in the Output Preview. If
you click

icon, some of the design details will be displayed in the upper pane. In

the Output Preview toolbar, click
icon to save this design to Wbk1 in the Library.
If you hover the cursor over Des1 in the Library, a tooltip will appear that summarizes
the input parameters of the design.

With Des1 selected in the Library, click

icon on the Library toolbar, and then

26.5 Chi-Square for R Unordered Binomial Proportions – 26.5.1 Trial Design

571

<<< Contents

26

* Index >>>

Binomial Superiority n-Sample
click Power vs Sample Size. The resulting power curve for this design is shown. You
can export the chart in one of several image formats (e.g., Bitmap or JPEG) by clicking
Save As.... For now, you may close the chart before continuing

26.6

Chi-Square for
R Unordered
Multinomial
Proportions

Let πij denote the response proportion in i-th group and j-th category. The null
hypothesis H0 : π1j = π2j = .... = πRj ∀j = 1, 2...C is tested against the alternative
hypothesis that for at least one category, the response proportions in all groups are not
same.
The test statistic is given as,

χ2 =

R X
C
X
(nij −
i=1 j=1

mi nj 2
N )
m i nj
N

Let χ20 be the observed value of χ2 . For large samples, χ2 has approximately
572

26.6 Chi-square Test-RxC Table

(26.17)

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016

Table 26.5: Table: Contingency Table
Rows
Row 1
Row 2
·
·
Row R
Col Total

Col 1
n11
n21
·
·
nR1
n1

Col 2
n12
n22
·
·
nR2
n2

·
·
·
·
·
·
·

·
·
·
·
·
·
·

Col C
n1C
n2C
·
·
nRC
nC

Row Total
m1
m2
·
·
mR
mN

Chi-squared distribution with d.f. (R − 1)(C − 1). The p-value is approximated by
P (χ2(R−1)(C−1) ≥ χ20 ), where χ2(R−1)(C−1) denotes a Chi-squared random variable
with d.f. = (R − 1)(C − 1).

26.6.1

Trial Design

Consider a 3-arm oncology trial with treatments A, B and C. The responses in 4
categories - CR (complete response), PR (partial response), SD (stable disease) and PD
(disease progression) are of interest. We wish to determine whether the response
proportion in each of the 4 categories is same for the three treatments. From historical
data we get the following proportions for each category for the three treatments. Out of
100 patients, 30 were treated with treatment A, 35 were treated with treatment B and
35 were treated with treatment C. The response proportion information for each
treatment is given below. Assuming equal allocation in each treatment arm, we wish to
design a two-sided test which achieves 90% power at significance level 0.05.

Table 26.6: Table: Contingency Table
Categories \ Treatment
CR
PR
SD
PD
Marginal

Treatment A
0.019
0.001
0.328
0.652
1

Treatment B
0.158
0.145
0.154
0.543
1

Treatment C
0.128
0.006
0.003
0.863
1

Start East. Click Design tab, then click Many Samples in the Discrete group, and then
clickChi-Square R Unordered Multinomial Proportions.
26.6 Chi-square Test-RxC Table – 26.6.1 Trial Design

573

<<< Contents

26

* Index >>>

Binomial Superiority n-Sample
The Input dialog box with default input values will appear in the upper pane of this
window.
Enter Number of Categories (C) as 4. Enter the values of Proportion of Response
and ri = ni /n1 where ri = ni /n1 is the corresponding weights relative to the first
group. Enter the inputs as shown below and click Compute.

The design output will be displayed in the Output Preview, with the computed sample
size highlighted in yellow. A total of 69 subjects must be enrolled in order to achieve
90% power under the alternative hypothesis. Besides sample size, one can also
compute the power and the level of significance for this Chi-Square Test of
Comparing Proportions in R by C Table study design.

You can select this design by clicking anywhere on the row in the Output Preview. If
you click

icon, some of the design details will be displayed in the upper pane. In

the Output Preview toolbar, click
icon, to save this design to Wbk1 in the
Library. If you hover the cursor over Des1 in the Library, a tooltip will appear that
summarizes the input parameters of the design.

574

26.6 Chi-square Test-RxC Table – 26.6.1 Trial Design

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016

With Des1 selected in the Library, click
icon on the Library toolbar, and then
click Power vs Sample Size. The resulting power curve for this design is shown. You
can export the chart in one of several image formats (e.g., Bitmap or JPEG) by clicking
Save As.... For now, you may close the chart before continuing

26.6 Chi-square Test-RxC Table

575

<<< Contents

26

576

* Index >>>

Binomial Superiority n-Sample

26.6 Chi-square Test-RxC Table

<<< Contents

* Index >>>

27

Multiple Comparison Procedures for
Discrete Data

Sometime it might be the case that multiple treatment arms are compared with a
placebo or control arm in one single trial on the basis of a primary endpoint that is
binary. These objectives are formulated into a family of hypotheses. Formal statistical
hypothesis tests can be performed to see if there is strong evidence to support clinical
claims. Type I error is inflated when one considers the inferences together as a family.
Failure to compensate for multiplicities can have adverse consequences. For example,
a drug could be approved when actually it is not better than placebo. Multiple
comparison (MC) procedures provides a guard against inflation of type I error due to
multiple testing. The probability of making at least one type I error is known as family
wise error rate (FWER). East supports following MC procedures based on binary
endpoint.
Procedure
Bonferroni
Sidak
Weighted Bonferroni
Holm’s Step Down
Hochberg’s Step Up
Hommel’s Step Up
Fixed Sequence
Fallback

Reference
Bonferroni CE (1935, 1936)
Sidak Z (1967)
Benjamini Y and Hochberg Y ( 1997)
Holm S (1979)
Hochberg Y (1988)
Hommel G (1988)
Westfall PH and Krishen A (2001)
Wiens B, Dmitrienko A (2005)

In this chapter we explain how to design a study using a MC procedure.
In East, one can calculate the power from the simulated data under different MC
procedures. With the information on power, one can choose the right MC procedure
that provides maximum power yet strongly maintains the FWER. MC procedures
included in East strongly control FWER. Strong control of FWER refers to preserving
the probability of incorrectly claiming at least one null hypothesis. To contrast strong
control with weak control of FWER, the latter controls the FWER under the
assumption that all hypotheses are true.

27.1

Bonferroni
Procedure

27.1.1 Example: HIV Study

Bonferroni procedure is described below with an example.
Assume that there are k arms including the control where the treatments arms will be
compared with placebo on the basis of a binary response variable X. Let ni be the
27.1 Bonferroni Procedure

577

<<< Contents

27

* Index >>>

Multiple Comparison Procedures for Discrete Data
Pk−1
number of subjects for i-th treatment arm (i = 0, 2, · · · , k − 1). Let N = i=0 ni be
the total sample size and the arm 0 refers to control. Also, assume πi be the response
probabilities in i-th arm. We are interested in the following hypotheses:
For the right tailed test: Hi : πi − π0 ≤ 0 vs Ki : πi − π0 > 0
For the left tailed test: Hi : πi − π0 ≥ 0 vs Ki : πi − π0 < 0
For the global null hypothesis at least one of the Hi is rejected in favor of Ki after
controlling for FWER. Here Hi and Ki refer to the null and alternative hypotheses,
respectively, for comparison of i-th arm with the control arm.
Let π̂i be the sample proportion for treatment arm i and π̂0 be the sample proportion
for the control arm. For unpooled variance case, the test statistic to compare i-th arm
with control (i.e., Hi vs Ki ) is defined as
Ti = q

π̂i − π̂0
1
ni π̂i (1

− π̂i ) +

1
n0 π̂0 (1

(i = 0, 2, · · · , k − 1)

(27.1)

− π̂0 )

For the pooled variance case, one need to replace π̂i and π̂0 by the pooled sample
proportion π̂. Pooled sample proportion π̂ is defined as
π̂ =

ni π̂i + n0 π̂0
ni + n0

(i = 0, 2, · · · , k − 1)

(27.2)

Let ti be the observed value of Ti and these observed values for K − 1 treatment arms
can be ordered as t(1) ≥ t(2) ≥ · · · ≥ t(k−1) . For the right tailed test the marginal
p-value for comparing the i-th arm with placebo is calculated as
pi =P (Z > ti )=Φ(−ti ) and for left tailed test pi =P (Z < ti )=Φ(ti ), where Z is
distributed as standard normal and Φ(·) is the the cumulative distribution function of a
standard normal variable. Let p(1) ≤ p(2) ≤ · · · ≤ p(k−1) be the ordered p-values.
East supports three single step MC procedures for comparing proportions- Bonferroni
procedure, Sidak procedure and weighted Bonferroni procedure. For the Bonferroni
α
and the adjusted p-value is given as
procedure, Hi is rejected if pi < k−1
min(1, (k − 1)pi ).

27.1.1

Example: HIV Study

This is a randomized, double-blind, parallel-group, placebo-controlled, multi-center
study to assess the efficacy and safety of 125mg, 250 mg, and 500 mg orally twice
daily of a new drug for a treatment of HIV associated diarrhea. The primary efficacy
endpoint is clinical response, defined as two or less watery bowel movements per
578

27.1 Bonferroni Procedure – 27.1.1 Example: HIV Study

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
week, during at least two of the four weeks of the 4-week efficacy assessment period.
The efficacy will be evaluated by comparing the proportion of responders in the
placebo group to the proportion of responders in the three treatment groups at a
one-sided alpha of 0.025. The estimated response rate in placebo group is 35%. The
response rates in the treatment groups are expected to be 40% for 125mg, 45% for
250mg and 55% for 500 mg.
Dose (mg)
Placebo
125
250
500

Estimated proportion
0.35
0.40
0.45
0.55

With the above underlying scenario, we would like to calculate the power for a total
sample size of 500. This will be a balanced study with a one-sided 0.025 significance
level to detect at least one dose with significant difference from placebo. We will show
how to simulate the power of such a study using the multiple comparison procedures
listed above.
Designing the Study
Start East. Click Design tab, then click Many Samples in the Discrete group, and then
click Single Look under Multiple Pairwise Comparisons to Control - Differences
of Proportions.
This will launch a new window which asks the user to specify the values of a few
design parameters including the number of arms, overall type I error, total sample size
and multiple comparison procedure. For our example, we have 3 treatment groups plus
a placebo. So enter 4 for Number of Arms. Under the Test Parameters tab, there are
several fields which we will fill in. First, there is a box with the label Test Type. Here
you need to specify whether you want a one-sided or two-sided test. Currently, only
one-sided tests are available. The next dropdown box has the label Rejection Region.
If left tail is selected, the critical value for the test is located in the left tail of the
distribution of the test statistic. Likewise, if right tail is selected the critical value for
the test is located in the right tail of the distribution of the test statistic. For our
example, we will select Right Tail. Under that, there is a box with the label Type 1 Error (α). This is where you need to specify the FWER. For our example, enter
0.025. Now go to the box with the label Sample Size (n). Here we input the total
number of subjects, including those in the placebo arm. For this example, enter 500.
To the right, there will be a heading with the title Multiple Comparison Procedures.
27.1 Bonferroni Procedure – 27.1.1 Example: HIV Study

579

<<< Contents

27

* Index >>>

Multiple Comparison Procedures for Discrete Data
Check the box next to Bonferroni, as this is the multiple comparison procedure we
are illustrating in this subsection. After entering these parameters your screen should
now look like this:

Now click on Response Generation tab. You will see a table titled Table of
Proportions. In this table we can specify the labels for treatment arms. Also you have
to specify the dose level if you want to generate proportions through dose-response
curve.
There are two fields in this tab above the table. The first one is labeled as Variance and
this has drop down list with two options - Pooled and Unpooled. Here you have to
select whether you are considering pooled variance or unpooled variance for the
calculation of test statistics for each test. For this example, select Unpooled for
Variance.

Next to the Variance there is check box labeled Generate Proportions Through DR
Curve. If you want to generate response rate for each arm according to dose-response
580

27.1 Bonferroni Procedure – 27.1.1 Example: HIV Study

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
curve, you need to check this box. Check the box Generate Proportions Through
DR Curve. Once you check this box you will notice two things. First, an additional
column with label Dose will appear in the table. Here you need to enter the dose levels
for each arm. For this example, enter 0, 125, 250 and 500 for Placebo, Dose1, Dose2
and Dose3 arms, respectively. Secondly, you will notice an additional section will
appear to the right which provides the option to generate the response rate from four
families of parametric curves which are Four Parameter Logistic, Emax, Linear and
Quadratic. The technical details about each curve can be found in the Appendix H.
Here you need to choose the appropriate parametric curve from the drop-down list
under Dose Response Curve and then you have to specify the parameters associated
with these curves. Suppose the response rate follows the following four parameter
logistic curve:
δ
E(π|D) = β +
(27.3)
1 + exp( θ−D
τ )
where D indicates dose. The parameter for the logistic dose-response curve should be
chosen with care. We want to parameterize the above logistic model such that the
proportions from logistic model agrees as close as possible to the estimated
proportions stated at the beginning of the example. We will consider a situation where
the response rate at dose 0 is very close to the parameter β. In other words, β indicates
the placebo effect. For this to hold, 1+exp(δ θ−D ) should be very close to 0 at D = 0.
τ
For now, assume that it holds and we will return to this later. We have assumed 35%
response rate in placebo arm. Therefore, we specify β as 0.35. The parameter β + δ
indicates the maximum response rate. Since the response rate cannot exceed 1, δ
should be chosen such a way that β + δ ≤ 1. The situation where the 100% response
rate can never be achieved, δ would be even less. For this example, the response rate
for the highest dose of 550 mg is 55%. Therefore, we assume that maximum response
rate with the new drug could be achieved as only 60%. Therefore, we specify the δ as
0.60 - 0.35 or 0.25. The parameter θ indicates the median dose that can produce 50%
of maximum improvement in response rate or a response that is equal to β + 2δ . With
β = 0.35 and δ = 0.25, β + 2δ is 0.475. Note that we have assumed the dose 250 mg
can provide response rate of 45%. Therefore, we assume θ as 300. τ need to be
selected in such a way that 1+exp(δ θ−D ) should be very close to 0 at D = 0. We can
τ
assure this condition by choosing any small value of τ . However, a very small τ is an
indicator of sharp improvement in response rate around the median dose and negligible
improvement for almost other doses. In the HIV example, the estimated response rates
indicate improvement in all the dose levels. With τ as 75, 1+exp(δ θ−D ) is 0.0045 and
τ
the proportions from the logistic regression are close to the estimated proportions for
the chosen doses. Therefore, β = 0.35, δ = 0.25, θ = 300 and τ = 75 seems to be a
reasonable for our example.
27.1 Bonferroni Procedure – 27.1.1 Example: HIV Study

581

<<< Contents

27

* Index >>>

Multiple Comparison Procedures for Discrete Data
Select Four Parameter Logistic from drop-down list of Dose Response
Curve. To the right of this dropdown box, Now we need to specify the 4 parameter
values in the Parameters box. Enter 0.35 for β, 0.25 for δ, 250 for θ and 75 for τ . You
can verify that the values in Response Rate column is changed to 0.359, 0.39, 0.475
and 0.591 for the four arms, respectively. These proportions are very close to the
estimated proportions stated at the beginning of the example.

Now click Plot DR Curve located below the parameters to see the dose-response
curve.

You will see the logistic dose response curve that intersects the Y-axis at 0.359. Close
this plot. Since the response rates from logistic regression is close but not exactly
582

27.1 Bonferroni Procedure – 27.1.1 Example: HIV Study

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
similar to the estimated proportions stated at the beginning of the example. Therefore,
we will specify directly the estimated response rates in the Table of Proportions. In
order to do this first uncheck Generate Proportions Through DR Curve. You will
notice two things. First, the column with label Dose will disappear in the table.
Second, the section in right will disappear as well. Now enter the estimated
proportions in the Response Rate column. Enter 0.35, 0.40, 0.45 and 0.55 in this
column. Now the Response Generation tab should appear as below.

Click on the Include Options button located in the right-upper corner in the
Simulation window and check Randomized. This will add Randomization tab.
Now click on the Randomization tab. Second column of the Table of Allocation table
displays the allocation ratio of each treatment arm to that of control arm. The cell for
the control arm is always one and is not editable. Only those cells for treatment arms
other than control need to be filled in. The default value for each treatment arm is one
which represents a balanced design. For the HIV study example, we consider a
balanced design and leave the default values for the allocation ratios unchanged. Your

27.1 Bonferroni Procedure – 27.1.1 Example: HIV Study

583

<<< Contents

27

* Index >>>

Multiple Comparison Procedures for Discrete Data
screen should now look like this:

The last tab is Simulation Control. Specify 10000 as Number of Simulations and
1000 as Refresh Frequency in this tab. The box labeled Random Number Seed is
where you can set the seed for the random number generator. You can either use the
clock as the seed or choose a fixed seed (in order to replicate past simulations). The
default is the clock and we will use that. The box besides that is labeled Output
Options. This is where you can choose to save summary statistics for each simulation
run and/or to save the subject level data for a specific number of simulation runs. To
save the output for each simulation, check the box with label Save summary statistics
for every simulation run.

Click Simulate to start the simulation. Once the simulation run has completed, East
will add an additional row to the Output Preview labeled as Sim1.

584

27.1 Bonferroni Procedure – 27.1.1 Example: HIV Study

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016

Select Sim1 in the Output Preview and click
icon. Now double-click on Sim1 in
the Library. The simulation output details will be displayed in the right pane.

The first section in the output is the Hypothesis section. In our situation, we are testing
3 hypotheses. We are comparing the estimated response rate of each dose group with
that of placebo. That is, we are testing the 3 hypotheses:
H1 :π1 = π0

vs

K1 :π1 > π0

H2 :π2 = π0

vs

K2 :π2 > π0

H3 :π3 = π0

vs

K3 :π3 > π0

Here, π0 , π1 , π2 and π3 represent the population response rate for the placebo, 125 mg,
250 mg and 500 mg dose groups, respectively. Also, Hi and Ki are the null and
alternative hypotheses, respectively, for the i-th test.
The Input Parameters section provides the design parameters that we specified
earlier. The next section Overall Power gives us estimated power based on the
simulation. The second line gives us the global power, which is 0.807. Global power
indicates the power to reject global null H0 :µ1 = µ2 = µ3 = µ0 . Thus, the global
power of 0.807 indicates that 80.7% of times the global null will be rejected. In other
words, at least one of the H1 , H2 and H3 is rejected in 81.2% of the occasions. Global
27.1 Bonferroni Procedure – 27.1.1 Example: HIV Study

585

<<< Contents

27

* Index >>>

Multiple Comparison Procedures for Discrete Data
power is useful to show the existence of dose-response relationship and the
dose-response may be claimed if any of the doses in the study is significantly different
from placebo.
The next line displays the conjunctive power. Conjunctive power indicates the
proportion of cases in the simulation where all the Hi ’s, which are truly false, were
rejected. In this example, all the Hi ’s are false. Therefore, for this example,
conjunctive power is the proportion of cases where all of the H1 , H2 and H3 were
rejected. For this simulation conjunctive power is only 0.035 which means that only in
3.5% of time, all of the H1 , H2 and H3 were rejected.
Disjunctive power indicates the proportion of rejecting at least one of those Hi ’s where
Hi is truly false. The main distinction between global and distinctive power is that the
former finds any rejection whereas the latter looks for rejection only among those Hi ’s
which are false. Since here all of the H1 , H2 and H3 are false, therefore, global and
disjunctive power ought to be the same.
The next section gives us the marginal power for each hypothesis. Marginal power
finds the proportion of times when a particular hypothesis is rejected. Based on
simulation results, H1 is rejected about 6% of times, H2 is rejected about 22% of times
and H3 is rejected about 80% of times.
Recall that we have asked East to save the simulation results for each simulation run—.
Open this file by clicking on SummaryStat in the library and you will see that it
contains 10,000 rows - each rows represents results for a single simulation. Find the 3
columns with labels Rej Flag 1, Rej Flag 2 and Rej Flag 3, respectively.
These columns represents the rejection status for H1 , H2 and H3 , respectively. A
value of 1 is indicator of rejection on that particular simulation, otherwise the null is
not rejected. Now the proportion of 1’s in Rej Flag 1 indicates the marginal power
to reject H1 . Similarly we can find out the marginal power for H2 and H3 from
Rej Flag 2 and Rej Flag 3, respectively. To obtain the global and disjunctive
power, count the total number of cases where at least one of the H1 , H2 and H3 have
been rejected and then divide by the total number of simulations of 10,000. Similarly,
to obtain the conjunctive power count the total number of cases where all of the H1 ,
H2 and H3 have been rejected and then divide by the total number of simulations of
10,000.
Next we will consider an example to show how global and disjunctive power are
different from each other. Select Sim 1 in Library and click
. Now go to the the
Response Generation tab and enter 0.35, 0.35, 0.38 and 0.42 in the 4 cells in second

586

27.1 Bonferroni Procedure – 27.1.1 Example: HIV Study

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
column labeled as Response Rate.

Here we are generating response for placebo from distribution Bin(125, 0.35), for
Dose1 from distribution Bin(125, 0.35), for Dose2 from distribution Bin(125, 0.38)
and for Dose3 from distribution Bin(125, 0.42). Click Simulate to start the simulation.
Once the simulation run has completed, East will add an additional row to the Output
Preview labeled as Sim 2.

For Sim 2, the global power and disjunctive power are close to 12%. To understand
why, click on SummaryStat in the library for Sim 2. The total number of cases where
at least one of H1 , H2 and H3 are rejected is about 1270 and dividing this by total
number of simulation 10,000 gives the global power of 12.7%. Again, the total number
of cases where at least one of H2 and H3 are rejected is close to1230 and dividing this
by total number of simulation 10,000 gives the disjunctive power of 12.3%. The exact
result of the simulations may differ slightly, depending on the seed.
Now, delete the Sim 2 from the Output Preview because we have modified the design
in HIV example to explain the difference between global power and disjunctive power.
In order to do this, select row corresponding to Sim 2 in Output Preview and click
in the toolbar.
27.1 Bonferroni Procedure – 27.1.1 Example: HIV Study

587

<<< Contents

27
27.2

* Index >>>

Multiple Comparison Procedures for Discrete Data
Weighted Bonferroni
procedure

In this section we will cover the weighted Bonferroni procedure with the same HIV
example.
For the weighted Bonferroni procedure, Hi is rejected if pi < wi α and the adjusted
p-value is given as min (1, wpii ). Here wi denotes the proportion of α allocated to the
Pk−1
1
Hi such that i=1 wi = 1. Note that, if wi = k−1
, then the Bonferroni procedure is
reduced to the regular Bonferroni procedure.
Since the other design specifications remain same except that we are using weighted
Bonferroni procedure in place of Bonferroni procedure, we can design simulation in
this section with only little effort. Select Sim 1 in Library and click
. Now go to
the the Test Parameters tab. In the Multiple Comparison Procedures box, uncheck
the Bonferroni box and check the Weighted Bonferroni box.
Next click on Response Generation tab and look at the Table of Proportions. You
will see an additional column with label Proportion of Alpha is added. Here you have
to specify the proportion of total alpha you want to spend in each test. Ideally, the
values in this column should add up to 1; if not, then East will normalize it to add them
up to 1. By default, East distributes the total alpha equally among all tests. Here we
have 3 tests in total, therefore each of the tests have proportion of alpha as 1/3 or
0.333. You can specify other proportions as well. For this example, keep the equal

588

27.2 Weighted Bonferroni procedure

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
proportion of alpha for each test.

Now click Simulate to obtain power. Once the simulation run has completed, East will
add an additional row to the Output Preview labeled as Sim 2.

The weighted Bonferroni MC procedure has global and disjunctive power of 81% and
conjunctive power of 3.4%. Note that, the powers in the weighted Bonferroni
procedure is quite close to the Bonferroni procedure. This is because the weighted
Bonferroni procedure with equal proportion is equivalent to the simple Bonferroni
procedure. The difference in power between Bonferroni test in previous section and
the weighted Bonferroni power in this section attributed to simulation error. The exact
result of the simulations may differ slightly, depending on the seed. Now select Sim2
in the Output Preview and click the
Library.
27.2 Weighted Bonferroni procedure

icon. This will save Sim2 in Wbk1 in

589

<<< Contents

27
27.3

* Index >>>

Multiple Comparison Procedures for Discrete Data
Sidak procedures

Sidak procedures are described below using the same HIV example from the
1
section 27.1. For the Sidak procedure, Hi is rejected if pi < 1− (1 − α) k−1 and the
adjusted p-value is given as 1 − (1 − pi )k−1 .
. Now go to the Test Parameters tab. In the
Select Sim1 in Library and click
Multiple Comparison Procedures box, uncheck the Bonferroni box and check the
Sidak box.

Now click Simulate to obtain power. Once the simulation run has completed, East will
add an additional rows to the Output Preview labeled as Sim3.

Sidak procedure has disjunctive and global powers of 81% and conjunctive powers of
3.8%. The exact result of the simulations may differ slightly, depending on the seed.
Now select Sim 3 in the Output Preview using the Ctrl key and click the
This will save Sim 3 in the Wbk1 in Library.

27.4

590

Holm’s step-down
procedure

icon.

In the single step MC procedures, the decision to reject any hypothesis does not
depend on the decision to reject other hypotheses. On the other hand, in the stepwise
procedures decision of one hypothesis test can influence the decisions on the other
tests of hypotheses. There are two types of stepwise procedures. One type of
procedures proceeds in data-driven order. The other type proceeds in a fixed order set a
priori. Stepwise tests in a data-driven order can proceed in step-down or step-up
manner. East supports Holm step down MC procedure which start with the most
significant comparison and continue as long as tests are significant until the test for
27.4 Holm’s step-down procedure

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
certain hypothesis fails. The testing procedure stops at the first time a non-significant
comparison occurs and all remaining hypotheses will be retained. In i-th step, H(i) is
α
rejected if p(i) ≤ k−i
and goes to the next step.
Holm’s step down
As before we will use the same HIV example to illustrate Holm’s step down procedure.
Select Sim1 in Library and click
. Now go to the the Test Parameters tab. In the
Multiple Comparison Procedures box, uncheck the Bonferroni box and check the
Holm’s Step down box.

Now click Simulate to obtain power. Once the simulation run has completed, East will
add an additional row to the Output Preview labeled as Sim4.

Holm’s step down procedure has global and disjunctive power close to 81% and
conjunctive power close to 9%. The exact result of the simulations may differ slightly,
depending on the seed. Now select Sim4 in the Output Preview and click the
icon. This will save Sim4 in Wbk1 in Library.

27.5

Hocheberg and
Hommel procedures

Step-up tests start with the least significant comparison and continue as long as tests
are not significant until the first time when a significant comparison occurs and all
remaining hypotheses will be rejected. East supports two such MC procedures Hochberg step-up and Hommel step-up procedures. In the Hochberg step-up
procedure, in i-th step H(k−i) is retained if p(k−i) > αi . In the Hommel step-up
procedure, in i-th step H(k−i) is retained if p(k−j) > i−j+1
α for j = 1, · · · , i. Fixed
i
27.5 Hocheberg and Hommel procedures

591

<<< Contents

27

* Index >>>

Multiple Comparison Procedures for Discrete Data
sequence test and fallback test are the types of tests which proceed in a prespecified
order.
Hochberg’s and Hommel’s step up procedures are described below using the same HIV
example from the section 27.1 on Bonferroni procedure.
Since the other design specifications remain same except that we are using Dunnett’s
step down in place of single step Dunnett’s test we can design simulation in this
section with only little effort. Select Sim1 in Library and click
. Now go to the
the Test Parameters tab. In the Multiple Comparison Procedures box, uncheck the
Bonferroni box and check the Hochberg’s step up and Hommel’s step up boxes.

Now click Simulate to obtain power. Once the simulation run has completed, East will
add two additional rows to the Output Preview labeled as Sim 5 and Sim 6.

The Hocheberg and Hommel procedures have disjunctive and global powers of 81.2%
and 81.4%, respectively and conjunctive powers close to 10%. The exact result of the
simulations may differ slightly, depending on the seed. Now select Sim5 and Sim6 in
the Output Preview using Ctrl key and click the
Sim6 in Wbk1 in Library.

592

27.5 Hocheberg and Hommel procedures

icon. This will save Sim5 and

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016

27.6

Fixed-sequence
testing procedure

In data-driven stepwise procedures, we don’t have any control on the order of the
hypotheses to be tested. However, sometimes based on our preference or prior
knowledge we might want to fix the order of tests a priori. Fixed sequence test and
fallback test are the types of tests which proceed in a pre-specified order. East supports
both of these procedures.
Assume that H1 , H2 , · · · , Hk−1 are ordered hypotheses and the order is prespecified
so that H1 is tested first followed by H2 and so on. Let p1 , p2 , · · · , pk−1 be the
associated raw marginal p-values. In the fixed sequence testing procedure, for
i = 1, · · · , k − 1, in i-th step, if pi < α, reject Hi and go to the next step; otherwise
retain Hi , · · · , Hk−1 and stop.
Fixed sequence testing strategy is optimal when early tests in the sequence have largest
treatment effect and performs poorly when early hypotheses have small treatment
effect or are nearly true (Westfall and Krishen 2001). The drawback of fixed sequence
test is that once a hypothesis is not rejected no further testing is permitted. This will
lead to lower power to reject hypotheses tested later in the sequence.
As before we will use the same HIV example to illustrate fixed sequence testing
procedure. Select Sim 1 in Library and click
. Now go to the the Test
Parameters tab. In the Multiple Comparison Procedures box, uncheck the
Bonferroni box and check the Fixed Sequence box.

Next click on Response Generation tab and look at the Table of Proportions. You
will see an additional column with label Test Sequence is added. Here you have to
specify the order in which the hypotheses will be tested. Specify 1 for the test that will
be tested first, 2 for the test that will be tested next and so on. By default East specifies
1 to the first test, 2 to the second test and so on. For now we will keep the default

27.6 Fixed-sequence testing procedure

593

<<< Contents

27

* Index >>>

Multiple Comparison Procedures for Discrete Data
which means that H1 will be tested first followed by H2 and finally H3 will be tested.

Now click Simulate to obtain power. Once the simulation run has completed, East will
add an additional row to the Output Preview labeled as Sim7.

The fixed sequence procedure with the specified sequence has global and disjunctive
power close to13% and conjunctive power close to 10%. The reason for small global
and disjunctive power is due to the smallest treatment effect is tested first and the
magnitude of treatment effect increases gradually for the remaining tests. For optimal
power in fixed sequence procedure, the early tests in the sequence should have larger
treatment effects. In our case, Dose3 has largest treatment effect followed by Dose2
and Dose1. Therefore, to obtain optimal power, H3 should be tested first followed by
H2 and H1 .
Select Sim7 in the Output Preview and click the
Library, click
594

icon. Now, select Sim7 in

and go to the the Response Generation tab. In Test Sequence

27.6 Fixed-sequence testing procedure

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
column in the table, specify 3 for Dose1, 2 for Dose2 and 1 for Dose3.

Now click Simulate to obtain power. Once the simulation run has completed, East will
add an additional rows to the Output Preview labeled as Sim8.

Now the fixed sequence procedure with the pre-specified sequence (H3 , H2 , H1 ) has
global and disjunctive power close to 89% and conjunctive power of 9.7%. This
example illustrates that fixed sequence procedure is powerful provided the hypotheses
are tested in a sequence of descending treatment effects. Fixed sequence procedure
controls the FWER because for each hypothesis, testing is conditional upon rejecting
all hypotheses earlier in sequence. The exact result of the simulations may differ
slightly, depending on the seed. Select Sim8 in the Output Preview and click the
icon to save it in Library.

27.7

Fallback procedure

Fallback test alleviates the above undesirable feature for fixed sequence test. Let wi be
Pk−1
the proportion of α for testing Hi such that i=1 wi = 1. In the fixed sequence
27.7 Fallback procedure

595

<<< Contents

27

* Index >>>

Multiple Comparison Procedures for Discrete Data
testing procedure, in i-th step (i = 1, · · · , k − 1), test Hi at αi = αi−1 + αwi if Hi−1
is rejected and at αi = αwi if Hi−1 is retained. If pi < αi , reject Hi ; otherwise retain
it. Unlike the fixed sequence testing approach, the fallback procedure can continue
testing even if a non-significant outcome is encountered by utilizing the fallback
strategy. If a hypothesis in the sequence is retained, the next hypothesis in the
sequence is tested at the level that would have been used by the weighted Bonferroni
procedure. With w1 = 1 and w2 = · · · = wk−1 = 0, the fallback procedure simplifies
to fixed sequence procedure.
Again we will use the same HIV example to illustrate the fallback procedure. Select
Sim 1 in Library and click
. Now go to the the Test Parameters tab. In the
Multiple Comparison Procedures box, uncheck the Dunnett’s single step box and
check the Fallback box.

Next click on Response Generation tab and look at the Table of Proportions. You
will see two additional columns with label Test Sequence and Proportion of Alpha.
In the column Test Sequence, you have to specify the order in which the hypotheses
will be tested. Specify 1 for the test that will be tested first, 2 for the test that will be
tested next and so on. By default East specifies 1 to the first test, 2 to the second test
and so on. For now we will keep the default which means that H1 will be tested first
followed by H2 and finally H3 will be tested.
In the column Proportions of Alpha, you have to specify the proportion of total alpha
you want to spend in each test. Ideally, the values in this column should add up to 1; if
not, then East will normalize it to add them up to 1. By default East distributes the total
alpha equally among the all tests. Here we have 3 tests in total, therefore each of the
tests have proportion of alpha as 1/3 or 0.333. You can specify other proportions as

596

27.7 Fallback procedure

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
well. For this example, keep the equal proportion of alpha for each test.

Now click Simulate to obtain power. Once the simulation run has completed, East will
add an additional rows to the Output Preview labeled as Sim9.

The fixed sequence procedure with the specified sequence had global and disjunctive
power close to 13% and conjunctive power of 9%. With the same pre-specified order
for testing hypotheses, fallback procedure has superior power compared to fixed
sequence procedure. This is because the fallback procedure can continue testing even
if a non-significant outcome is encountered whereas the fixed sequence procedure has
to stop when a hypothesis in the sequence is not rejected. Now we will consider a
sequence where H3 will be tested first followed by H2 and H1 because in our case,
Dose3 has largest treatment effect followed by Dose2 and Dose1.
Select Sim 9 in the Output Previewand click the
in Library, click

icon. Now, select Simulation 9

and go to the the Response Generation tab. In Test Sequence

27.7 Fallback procedure

597

<<< Contents

27

* Index >>>

Multiple Comparison Procedures for Discrete Data
column in the table, specify 3 for Dose1, 2 for Dose2 and 1 for Dose3.

Now click Simulate to obtain power. Once the simulation run has completed, East will
add an additional rows to the Output Preview labeled as Sim 10.

Now the fixed sequence procedure with the pre-specified sequence (H3 , H2 , H1 ) had
global and disjunctive power of 89% and conjunctive power of 9.7%. The obtained
power is very close to Sim 9. Therefore, specification of sequence in descending
treatment effect does not make much difference in terms of power. The exact result of
598

27.7 Fallback procedure

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
the simulations may differ slightly, depending on the seed. Select Sim10 in the Output
Previewand click the

27.8

Comparison of MC
procedures

icon to save it in Library.

We have obtained the power (based on the simulations) for different MC procedures
for the HIV example in the previous sections. Now the obvious question is which MC
procedure to choose. To compare all the MC procedure, we will perform simulations
for all the MC procedures under the following scenario.
Treatment arms: placebo, dose1 (dose=0.3 mg), dose2 (dose=1 mg) and dose3
(dose=2 mg) with respective proportions as 0.35, 0.4, 0.45 and 0.55, respectively.
Variance: Unpooled
Proportion of Alpha: Equal (0.333)
Type I Error: 0.025 (right-tailed)
Number of Simulations:10000
Total Sample Size:500
Allocation ratio: 1 : 1 : 1 : 1
For comparability of simulation results, we have used similar seed for simulation under
all MC procedures (we have used seed as 5643). Following output displays the powers
under different MC procedures. Clean up the Output Preview area, select all the
checkboxes corresponding to the procedures and hit Simulate.

Here we have used equal proportions for weighted Bonferroni and Fallback
procedures. For the two fixed sequence testing procedures (fixed sequence and
fallback) two sequences have been used - (H1 , H2 , H3 ) and (H3 , H2 , H1 ). As
expected, Bonferroni and weighted Bonferroni procedures provides similar powers. It
appears that fixed sequence procedure with the pre-specified sequence (H3 , H2 , H1 )
provides the power of 89.5% which is the maximum among all the procedures.
However, fixed sequence procedure with the pre-specified sequence (H1 , H2 , H3 )
provides power of 13.6%. Therefore, power in fixed sequence procedure is largely
27.8 Comparison of MC procedures

599

<<< Contents

27

* Index >>>

Multiple Comparison Procedures for Discrete Data
dependent on the specification of sequence of testing and a mis-specification might
result in huge drop in power.
All the remaining remaining procedures have almost equal global and disjunctive
powers - about 82%. Now, in terms of conjunctive power, Hochberg’s step-up and
Hommel’s step-up procedures have the highest conjunctive power of 9.9%. Therefore,
we can choose either Hochberg’s step-up or Hommel’s step-up procedure for a
prospective HIV study discussed in section 27.1.

600

27.8 Comparison of MC procedures

<<< Contents

* Index >>>

28

Multiple Endpoints-Gatekeeping
Procedures for Discrete Data

Clinical trials are often designed to assess benefits of a new treatment compared to a
control treatment with respect to multiple clinical endpoints which are divided into
hierarchically ordered families. Typically, the primary family of endpoints defines the
overall outcome of the trial, provides the basis for regulatory claim and is included in
the product label. The secondary families of endpoints play a supportive role and
provide additional information for physicians, patients, payers and are useful for
enhancing the product label. Gatekeeping procedures address multiplicity problems by
explicitly taking into account the hierarchical structure of the multiple objectives. The
term ”gatekeeping” indicates the hierarchical decision structure where the higher
ranked families serve as ”gatekeepers” for the lower ranked family. Lower ranked
families won’t be tested if the higher ranked families have not passed requirements.
Two types of gatekeeping procedures for discrete outcomes, parallel and serial, are
described in this chapter. For more information about applications of gatekeeping
procedures in a clinical trial setting and literature review on this topic, please refer to
Dmitrienko and Tamhane (2007).
East uses simulations to assess the operating characteristics of different designs using
gatekeeping procedures. For example, one could simulate the power for a variety of
sample sizes in a simple batch procedure. It is important to note that when determining
the sample size for a clinical trial with multiple co-primary endpoints, if the correlation
among the endpoints is not taken into consideration, the sample size may be
overestimated (Souza, et al 2010). East uses information about the correlation among
the multiple endpoints in order to determine a more feasible sample size.

28.1

MK-0974 (telcagepant) Consider the randomized, placebo-controlled, double blind, parallel treatment clinical
for Acute Migraine
trial designed to compare two treatments for migraine, a common disease and leading
cause of disability. Standard treatment includes the use of Triptans, which although
generally well tolerated, have a vasoconstrictor effect, which can be problematic. This
leaves a certain population of patients with underlying cardiovascular disease,
uncontrolled hypertension or certain subtypes of migraine unable to access this
treatment. In addition, for some patients this treatment has no or low beneficial effect
and is associated with some undesirable side effects resulting in the discontinuation of
the drug (Ho et al, 2008). In this study, multiple doses of the drug Telcagepant (300
mg, 150 mg), an antagonist of the CGRP receptor associated with migraine, and
zolmitriptan (5mg) the standard treatment against migraine, are compared against a
placebo. The five co-primary endpoints include pain freedom, pain relief, absence of
photophobia (sensitivity to light), absence of phonophobia (sensitivity to sound), and
absence of nausea two hours post treatment. Three co-secondary endpoints included
more sustained measurements of pain freedom, pain relief, and total migraine freedom
28.1 MK-0974 (telcagepant) for Acute Migraine

601

<<< Contents

28

* Index >>>

Multiple Endpoints-Gatekeeping Procedures for Discrete Data
for up to a 24 hour period. The study employed a full analysis set where the
multiplicity of endpoints was addressed using a step-down closed testing procedure.
Due to the negative aspects of zolmitriptan, investigators were primarily interested in
determining the efficacy of Telcagepant for the acute treatment of migraine with the
hope of an alternative treatment with fewer associated side effects. This study will be
used to illustrate the two gatekeeping procedures East provides for multiple discrete
endpoints.

28.2

Serial Gatekeeping
Design - Simulation
for Discrete
Outcomes

Serial gatekeeping procedures were studied by Maurer, Hothorn and Lehmacher
(1995), Bauer et al. (1998) and Westfall and Krishen (2001). Serial gatekeepers are
encountered in trials where endpoints are usually ordered from most important to least
important. Suppose that a trial is declared successful only if the treatment effect is
demonstrated on both primary and secondary endpoints. If endpoints in the primary
trial are successful, it is only then of interest to assess the secondary endpoints.
Correlation coefficients between the endpoints are bounded and East computes the
valid range of acceptable values. As the number of endpoints increases, the restriction
imposed on the valid range of correlation values is also greater. Therefore for
illustration purpose, the above trial is simplified to consider three primary endpoints,
pain freedom (PF), absence of phonophobia (phono) and absence of photophobia
(photo) at two hours post treatment. Only one endpoint from the secondary family,
sustained pain freedom (SPF), will be included in the example. Additionally, where the
original trial studied multiple doses and treatments, this example will use only two
groups to focus the comparison on the higher dose of Telcagepant of 300mg, and
placebo. The example includes correlation values intended to represent zero, mild and
moderate correlation accordingly, to examine its effect on power.
The efficacy, or response rate, of the endpoints for subjects in the treatment group and
placebo group and a sample correlation matrix follows:

602

Response Telcagepant 300mg

Response Placebo

PF
phono
photo

0.269
0.578
0.51

0.096
0.368
0.289

SPF

0.202

0.05

28.2 Serial Gatekeeping Design - Simulation for Discrete Outcomes

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016

ρ12

ρ13

ρ23

ρ14

ρ24

ρ34

Sim 1
Sim 2
Sim 3
Sim 4
Sim 5
Sim 6
Sim 7

0
0
0
0
0.3
0.3
0.3

0
0
0
0
0.3
0.3
0.3

0
0.3
0.5
0.8
0.3
0.5
0.8

0
0
0
0
0
0
0

0
0
0
0
0
0
0

0
0
0
0
0
0
0

Sim 8
Sim 9
Sim 10
Sim 11
Sim 12
Sim 13
Sim 14

0
0
0
0
0
0
0

0
0
0
0
0
0
0

0.3
0.3
0.3
0.8
0.8
0.8
0.8

0.3
0.5
0.7
0.3
0.5
0.7
0.7

0
0
0
0
0
0
0

0
0
0
0
0
0
0

To construct the above simulations, in the Design tab on the Discrete group, click Two
Samples and select Multiple Comparisons-Multiple Endpoints

28.2 Serial Gatekeeping Design - Simulation for Discrete Outcomes

603

<<< Contents

28

* Index >>>

Multiple Endpoints-Gatekeeping Procedures for Discrete Data

At the top of this input window, the user must specify the total number of endpoints in
the trial. Other input parameters such as Test Type, Type I Error (α), Sample Size (n),
and whether or not a Common Rejection Region is to be used for the endpoints. If a
different rejection region is desired for different endpoints, this information should be
specified in the Endpoint Information box. Here the user can change the label, select
the family rank for each endpoint and choose the rejection region (either right or left
tailed).
As discussed above there are typically two types of gatekeeping procedures - serial and
parallel. Parallel gatekeeping requires the rejection of at least one hypothesis test - that
is only one of the families of endpoints must be significant, no matter the rank. Serial
gatekeeping uses the fact that the families are hierarchically ordered, and subsequent
families are only tested if the previously ranked families are significant. Once the
Gatekeeping Procedure is selected, the user must then select the multiple comparison
procedure which will be used to test the last family of endpoints. These tests are
discussed in Chapter 27. If Parallel Gatekeeping is selected, the user must also specify
a test for Gatekeeper Families, specifically Bonferonni, Truncated Holm or Truncated
Hochberg, and is discussed more in the Parallel example which follows. The type I
error specified on this screen is the nominal level of the family-wise error rate, which is
defined as the probability of falsely declaring the efficacy of the new treatment
compared to control with respect to any endpoint.

604

28.2 Serial Gatekeeping Design - Simulation for Discrete Outcomes

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
For the migraine example, PF, phono, and photo form the primary family, and SPF is
the only outcome in the secondary family. Suppose that we would like to see the power
for a sample size of 200 at a nominal type I error rate 0.025 using Bonferroni test for
the secondary family. The input window will look as follows:

In addition to the Test Parameters tab, there is a tab labeled Response Generation.
This is where the user specifies the underlying joint distribution among the multiple
endpoints for the control arm and for the treatment arm. This is assumed to be
multivariate binary with a specified correlation matrix. For the first simulation, the

28.2 Serial Gatekeeping Design - Simulation for Discrete Outcomes

605

<<< Contents

28

* Index >>>

Multiple Endpoints-Gatekeeping Procedures for Discrete Data
Common Correlation box can be checked with default value of 0.

The number of simulations to be performed and other simulation parameters can be
specified in bf Simulation Controls window. By default, 10000 simulations will be
performed. The summary statistics for each simulated trial and subject-level data can
be saved by checking the appropriate boxes in the Output Options area. Once all
design parameters are specified, click the Simulate button at the bottom right of the
screen. Preliminary output is displayed in the output preview area and all results

606

28.2 Serial Gatekeeping Design - Simulation for Discrete Outcomes

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
displayed in the yellow cells are summary outputs generated from simulations.

To view the detailed output, first save the simulation into a workbook in the library by
selecting the simulation in the Output Preview window and clicking
node will appear in the library under the current workbook.

A simulation

Double click the simulation node Sim1 in the Library to see the detailed output which
summarizes all the main input parameters, including the multiple comparison
procedure used for the last family of endpoints, the nominal type I error level, total
sample size, mean values for each endpoint in the control arm and that in the
experimental arm etc. It also displays a comprehensive list of different types of power:
These different types of power are defined as follows:
28.2 Serial Gatekeeping Design - Simulation for Discrete Outcomes

607

<<< Contents

28

* Index >>>

Multiple Endpoints-Gatekeeping Procedures for Discrete Data
Overall Power and FWER:
Global: probability of declaring significance on any of the endpoints
Conjunctive: probability of declaring significance on all of the endpoints for which the
treatment arm is truly better than the control arm
Disjunctive: probability of declaring significance on any of the endpoints for which the
treatment arm is truly better than the control arm
FWER: probability of making at least one type I error among all the endpoints
Power and FWER for Individual Gatekeeper Family except the Last Family:
Conjunctive Power: probability of declaring significance on all of the endpoints in the
particular gatekeeper family for which the treatment arm is truly better than the control
arm
FWER: probability of making at least one type I error when testing the endpoints in the
particular gatekeeper family
Power and FWER for the Last Family:
Conjunctive Power: probability of declaring significance on all of the endpoints in the
last family for which the treatment arm is truly better than the control arm
Disjunctive Power: probability of declaring significance on any of the endpoints in the
last family for which the treatment arm is truly better than the control arm
FWER: probability of making at least one type I error when testing the endpoints in the
last family

608

28.2 Serial Gatekeeping Design - Simulation for Discrete Outcomes

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
Marginal Power: probability of declaring significance on the particular endpoint

For the migraine example, the conjunctive power, which characterizes the power for
the study, is 0.701% for a total sample size of 200. Using Bonferroni test for the last
family, the design has 0.651% probability (disjunctive power for the last family) to
28.2 Serial Gatekeeping Design - Simulation for Discrete Outcomes

609

<<< Contents

28

* Index >>>

Multiple Endpoints-Gatekeeping Procedures for Discrete Data
detect the benefit of Telcagepant 300mg with respect to at least one secondary
endpoints. It has 0.651% chance (conjunctive power for the last family) to declare the
benefit of Telcagepant 300 mg with respect to both of the secondary endpoints. For a
sample size of 200 this relatively low power is typically undesirable. One can find the
sample size to achieve a target power by simulating multiple designs in a batch mode.
For example, the simulation of a batch of designs for a range of sample size 200 to 300
in steps of 20 is shown by the following.

Multiple designs can be viewed side by side for easy comparison by selecting the
simulations and clicking the

in the output preview area:

For this example, to obtain a conjunctive power between 80% and 90% the study
610

28.2 Serial Gatekeeping Design - Simulation for Discrete Outcomes

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
would need to be constructed with somewhere between 250 and 300 subjects. For the
remainder of this example, we will use sample size of 250 subjects under the
correlation assumptions in the above table.

28.3

Parallel Gatekeeping
Design - Simulation
for Discrete
Outcomes

A common concern in clinical trials with multiple primary endpoints, is whether or not
statistical significance should be achieved on all endpoints. As the number of
endpoints increases, this generally becomes more difficult. Parallel gatekeeping
procedures are often used in clinical trials with multiple primary objectives where each
individual objective can characterize a successful overall trial outcome. In other words,
the trial can be declared to be successful if at least one primary objective is met.
Again, consider the same randomized, placebo-controlled, double blind, parallel
treatment clinical trial designed to compare two treatments for migraine presented in
the serial gatekeeping example. For the purpose of this example the trial is again
simplified to study only three primary family endpoints, pain freedom (PF), absence of
phonophobia (phono) and absence of photophobia (photo) at two hours post treatment.
The singular endpoint in the secondary family is sustained pain freedom (SPF), and
will be included in the example where, using East, power estimates will be computed
via simulation. The example correlation values are intended to represent a common
and moderate association among the endpoints. In general, serial gatekeeping designs
require a larger sample size than parallel designs, therefore this example will use a
total sample size of 125, at one-sided significance level of α = 0.025.
The efficacy, or response rate, of the endpoints for subjects in the treatment group and
placebo group and a sample correlation matrix are as follows:
Response Telcagepant 300mg

Response Placebo

PF
phono
photo

0.269
0.578
0.51

0.096
0.368
0.289

SPF

0.202

0.05

28.3 Parallel Gatekeeping Design - Simulation for Discrete Outcomes

611

<<< Contents

28

* Index >>>

Multiple Endpoints-Gatekeeping Procedures for Discrete Data

Sim 1
Sim 2
Sim 3

ρ12

ρ13

ρ23

ρ14

ρ24

ρ34

0.3
0
0.3

0.3
0
0.3

0.3
0.8
0.8

0.3
0.3
0.3

0.3
0.0
0

0.3
0.0
0

We now construct a new set of simulations to assess the operating characteristics of the
study using a Parallel Gatekeeping design for the above response generation
information. In the Design tab on the Discrete group, click Two Samples and select
Multiple Comparisons-Multiple Endpoints

In the Gatekeeping Procedure box, keep the default of Parallel and Bonferroni for
the Test for Gatekeeper Families. For the Test for Last Family, also ensure that
Bonferroni is selected as the multiple testing procedure. In the Endpoint
Information box, specify which family each specific endpoint belongs to using the

612

28.3 Parallel Gatekeeping Design - Simulation for Discrete Outcomes

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
column with the label Family Rank.

In the Response Generation window the Variance can be specified to be either Pooled
or Un-pooled. In the Endpoint Information box, the Response Rates for treatment
and control for each endpoint are specified. If the endpoints share a common
correlation, select the Common Correlation checkbox and enter the correlation value
to the right. East will only allow a value within the Valid Range. Otherwise input the
specific correlation for each pair of endpoints in the Correlation Matrix.

In the Simulation Controls window, the user can specify the total number of
simulations, refresh frequency, and random number seed. Simulation data can be saved
for more advanced analyses. After all the input parameter values have been specified,
click the Simulate button on the bottom right of the window to begin the simulation.

28.3 Parallel Gatekeeping Design - Simulation for Discrete Outcomes

613

<<< Contents

28

* Index >>>

Multiple Endpoints-Gatekeeping Procedures for Discrete Data
The progress window will report how many simulations have been completed.

When complete, close the progress report screen and the preliminary simulation
summary will be displayed in the output preview window. Here, one can see the
overall power summary.

614

28.3 Parallel Gatekeeping Design - Simulation for Discrete Outcomes

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
To see the detailed output, save the simulation in the current workbook by clicking the
icon. A simulation node will be appended to the corresponding workbook in the
library. Double click the simulation node in the library to display the detailed outputs.

28.3 Parallel Gatekeeping Design - Simulation for Discrete Outcomes

615

<<< Contents

28

* Index >>>

Multiple Endpoints-Gatekeeping Procedures for Discrete Data
As with serial gatekeeping, East provides following types of power:
Overall Power and FWER:
Global: probability of declaring significance on any of the endpoints.
Conjunctive: probability of declaring significance on all of the endpoints for which the
treatment arm is truly better than the control arm.
Disjunctive: probability of declaring significance on any of the endpoints for which the
treatment arm is truly better than the control arm.
FWER: probability of making at least one type I error among all the endpoints.
Power and FWER for Individual Gatekeeper Families except the Last Family:
Conjunctive Power: probability of declaring significance on all of the endpoints in the
particular gatekeeper family for which the treatment arm is truly better than the control
arm.
Disjunctive Power: probability of declaring significance on any of the endpoints in the
particular gatekeeper family for which the treatment arm is truly better than the control
arm.
FWER: probability of making at least one type I error when testing the endpoints in the
particular gatekeeper family.
Power and FWER for the Last Gatekeeper Family:
Conjunctive Power: probability of declaring significance on all of the endpoints in the
last family for which the treatment arm is truly better than the control arm.
Disjunctive Power: probability of declaring significance on any of the endpoints in the
last family for which the treatment arm is truly better than the control arm.
FWER: probability of making at least one type I error when testing the endpoints in the
last family.
Marginal Power: probability of declaring significance on the particular endpoint.
For the migraine example under the lower common correlation assumption, we see that
the gatekeeping procedure using the Bonferroni test for both the primary family and
the secondary family provides 84.4% power to detect the difference in at least one of
the three primary measures of migraine relief. It only provides 24.1% power to detect
the differences in all types of relief. The marginal power table displays the
probabilities of declaring significance on the particular endpoint after multiplicity
adjustment. For example, the power to detect sustained pain relief beyond 2 hours for a
dose of 300 mg of telecapant is 60.3
To assess the robustness of this procedure with respect to the correlation among the

616

28.3 Parallel Gatekeeping Design - Simulation for Discrete Outcomes

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016

Table 28.1: Power Comparisons under Different Correlation Assumptions
Correlation
Sim 1
Sim 2
Sim 3

Primary Family
Disjunct. Conjunct.
0.839
0.838
0.787

0.242
0.244
0.286

Secondary Family
Disjunct. Conjunct.
0.599
0.579
0.554

0.99
0.579
0.554

Overall Power
Disjunct. Conjunct.
0.839
0.838
0.787

0.218
0.202
0.234

different endpoints, the simulation can be run again with different combinations of
correlations. Right click on the simulation node in the Library and select Edit
Simulation from the dropdown list. Next click on the Response Generation tab,
update the correlation matrix, and click Simulate. This can be repeated for all desired
correlation combinations and be compared in an output summary.

The following table summarizes the power comparisons under different correlation
assumptions. Note that the disjunctive power decreases as the correlation increases and
conjunctive power increases as the correlation increases.
There are three available parallel gatekeeping methods: Bonferroni, Truncated Holm
and Truncated Hochberg. The multiple comparison procedures applied to the
gatekeeper families need to satisfy the so-called separable condition. A multiple
comparison procedure is separable if the type I error rate under partial null
configuration is strictly less than the nominal level α. Bonferroni is a separable
28.3 Parallel Gatekeeping Design - Simulation for Discrete Outcomes

617

<<< Contents

28

* Index >>>

Multiple Endpoints-Gatekeeping Procedures for Discrete Data
Table 28.2: Impact of Truncation Constant on Power in the Truncated Holm Procedure
Truncation
Constant
0
0.25
0.5
0.8

Primary Family
Conjunct. Disjunct.
0.234
0.28
0.315
0.383

0.84
0.833
0.836
0.838

Secondary Family
Conjunct. Disjunct.
0.59
0.569
0.542
0.488

0.59
0.569
0.542
0.488

Overall Power
Conjunct. Disjunct.
0.21
0.248
0.275
0.334

0.84
0.833
0.836
0.838

procedure, however, the regular Holm and Hochberg procedure are not separable and
can’t be applied directly to the gatekeeper families. The truncated versions obtained by
taking the convex combinations of the critical constants for the regular
Holm/Hochberg procedure and Bonferroni procedure are separable and more powerful
than Bonferroni test. The truncation constant leverages the degree of conservativeness.
The larger value of the truncation constant results in more powerful procedure. If the
truncation constant is set to be 1, it reduces to the regular Holm or Hochberg test.
To see this, simulate the design using the truncated Holm procedure for the primary
family and Bonferroni test for the second family for the migraine example with
common correlation 0.3. The table below compares the conjunctive power and
disjunctive power for each family and the overall ones for different truncation
parameter values. As the value of the truncation parameter increases, the conjunctive
power for the primary family increases and the disjunctive power remain unchanged.
Both the conjunctive power and disjunctive power for the secondary family decrease as
we increase the truncation parameter. The overall conjunctive power also increases but
the overall disjunctive power remains the same with the increase of truncation
parameter.
The next table shows the marginal powers of this design for different truncation
parameter values. The marginal powers for the two endpoints in the primary family
increase. On the other hand, the marginal powers for the endpoint in the secondary
family decrease.

The last two tables display the operating characteristics for the Hochberg test with
different truncation constant values. Note that both the conjunctive and disjunctive
powers for the primary family increase as the truncation parameter increases.
However, the power for the secondary family decreases with the larger truncation
618

28.3 Parallel Gatekeeping Design - Simulation for Discrete Outcomes

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016

Table 28.3: Impact of Truncation Constant on Marginal Power in the Truncated Holm
Procedure
Truncation
Constant
0
0.25
0.5
0.8

Primary Family
PF
Phono Photo
0.54
0.582
0.591
0.625

0.512
0.512
0.541
0.568

Secondary Family
SPF

0.568
0.58
0.596
0.631

0.59
0.569
0.542
0.488

Table 28.4: Impact of Truncation Constant on Power in the Truncated Hochberg
Procedure
Truncation
Constant
0
0.25
0.5
0.8

Primary Family
Conjunct. Disjunct.
0.234
0.303
0.322
0.407

0.844
0.838
0.841
0.847

Secondary Family
Conjunct. Disjunct.
0.595
0.578
0.544
0.494

0.595
0.578
0.544
0.494

Overall Power
Conjunct. Disjunct.
0.208
0.268
0.281
0.351

0.844
0.838
0.841
0.847

parameter value. The marginal powers for the primary family and for the secondary
family behave similarly. The overall conjunctive and disjunctive powers also increase
as we increase the truncation parameter.

28.3 Parallel Gatekeeping Design - Simulation for Discrete Outcomes

619

<<< Contents

28

* Index >>>

Multiple Endpoints-Gatekeeping Procedures for Discrete Data

Table 28.5: Impact of Truncation Constant in Truncated Hochberg Procedure on
Marginal Power
Truncation
Constant
0
0.25
0.5
0.8

620

Primary Family
PF
Photo Phono
0.552
0.595
0.603
0.642

0.52
0.529
0.54
0.592

0.564
0.603
0.598
0.647

Secondary Family
SPF
0.595
0.578
0.544
0.494

28.3 Parallel Gatekeeping Design - Simulation for Discrete Outcomes

<<< Contents

* Index >>>

29

Two-Stage Multi-arm Designs using
p-value combination

29.1

Introduction

In the drug development process, identification of promising therapies and inference
on selected treatments are usually performed in two or more stages. The procedure we
will be discussing here is an adaptive two-stage design that can be used for the
situation of multiple treatments to be compared with a control. This will allow
integration of both the stages within a single confirmatory trial controlling the multiple
level type-I error. After the interim analysis in the first stage, the trial may be
terminated early or continued with a second stage, where the set of treatments may be
reduced due to lack of efficacy or presence of safety problems with some of the
treatments. This procedure in East is highly flexible with respect to stopping rules and
selection criteria and also allows re-estimation of the sample size for the second stage.
Simulations show that the method may be substantially more powerful than classical
one-stage multiple treatment designs with the same total sample size because second
stage sample size is focused on evaluating only the promising treatments identified in
the first stage. This procedure is available for continuous as well discrete endpoint
studies. The current chapter deals with the discrete endpoint studies only; continuous
endpoint studies are handled similarly.

29.2

Study Design

This section will explore different design options available in East with the help of an
example.

29.2.1 Introduction to the
Study
29.2.2 Methodology
29.2.3 Study Design Inputs
29.2.4 Simulating
under Different
Alternatives

29.2.1

Introduction to the Study

A new chemical entity (NCE) is being developed for the treatment of reward
deficiency syndrome, specifically alcohol dependence and binge eating disorder.
Compared with other orally available treatments, NCE was designed to exhibit
enhanced oral bioavailability, thereby providing improved efficacy for the treatment of
alcohol dependence.
Primary Objective: To evaluate the safety and efficacy of NCE compared with
placebo when administered daily for 12 weeks to adults with alcohol
dependence.
Secondary Objective: To determine the optimal dose or doses of NCE.
The primary endpoint is defined as the percent of subjects abstinent from heavy
drinking during Weeks 5 through 12 of treatment based on self-report of drinking
activity. A heavy drinking day is defined as 4 or more standard alcoholic drinks in 1
day for females and 5 or more standard alcoholic drinks in 1 day for males. The
endpoint is based on the patient-reported number of standard alcoholic drinks per day,
transformed into a binary outcome measure, abstinence from heavy drinking.
29.2 Study Design – 29.2.2 Methodology

621

<<< Contents

29

* Index >>>

Two-Stage Multi-arm Designs using p-value combination
29.2.2

Methodology

This is a multicenter, randomized, double-blind, placebo-controlled study conducted in
two parts using a 2-stage adaptive design. In Stage 1, approximately 400 eligible
subjects will be randomized equally among four treatment arms (NCE [doses: 1, 2.5,
or 10 mg]) and matching placebo. After all subjects in Stage 1 have completed the
12-week treatment period or discontinued earlier, an interim analysis will be conducted
to
1. compare the proportion of subjects in each dose group who have achieved
abstinence from heavy drinking during Weeks 5 through 12,
2. to assess safety within each dose group and
3. drop the less efficient doses.
Based on the interim analysis, Stage 2 of the study will either continue with additional
subjects enrolling into 2 or 3 arms (placebo and 1 or 2 favorable, active doses) or the
study will be halted completely if unacceptable toxicity has been observed.
In this example, we will have the following workflow to cover different options
available in East:
1. Start with four arms (3 doses + Placebo)
2. Evaluate the three doses at the interim analysis and based on the Treatment
Selection Rules carry forward one or two of the doses to the next stage
3. While we select the doses, also increase the sample size of the trial by using
Sample Size Re-estimation (SSR) tool to improve conditional power if
necessary
In a real trial, both the above actions (early stopping as well as sample size
re-estimation) will be performed after observing the interim data.
4. See the final design output in terms of different powers, probabilities of selecting
particular dose combinations
5. See the early stopping boundaries for efficacy and futility on adjusted p-value
scale
6. Monitor the actual trial using the Interim Monitoring tool in East.
Start East. Click Design tab, then click Many Samples in the Discrete category, and
then click Multiple Looks- Combining p-values test.

622

29.2 Study Design – 29.2.2 Methodology

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016

This will bring up the input window of the design with some default values. Enter the
inputs as discussed below.

29.2.3

Study Design Inputs

Let us assume that three doses of the treatment 1mg, 2.5mg, 10mg are compared with
the Placebo arm. Preliminary sample size estimates are provided to achieve an overall
study power of at least 80% at an overall, adequately adjusted 1-sided type-1 or alpha
level of 2.5%, after taking into account all interim and final hypothesis tests. Note that
we always use 1-sided alpha since dose-selection rules are usually 1-sided.
In Stage 1, 400 subjects are initially planned for enrollment (4 arms with 100 subjects
each). Following an interim analysis conducted after all subjects in Stage 1 have
completed 12 weeks of treatment or discontinued earlier, an additional 200 subjects
will be enrolled into 2 doses for Stage 2 (placebo and one active dose). So we start
with the total of 400+200 = 600 subjects.
The multiplicity adjustment methods available in East to compute the adjusted p-value
29.2 Study Design – 29.2.3 Study Design Inputs

623

<<< Contents

29

* Index >>>

Two-Stage Multi-arm Designs using p-value combination
(p-value corresponding to global NULL) are Bonferroni, Sidak, Simes. For discrete
endpoint test, Dunnett Single Step is not available since we will be using Z-statistic.
Let us use the Bonferroni method for this example. The p-values obtained from both
the stages can be combined by using the “Inverse Normal” method. In the “Inverse
Normal” method, East first computes the weights as follows:
r
n(1)
(1)
w =
(29.1)
n
And
r
n(2)
w(2) =
(29.2)
n
where n(1) and n(2) are the total sample sizes corresponding to Stage 1 and stage 2
respectively and n is the total sample size.
EAST displays these weights by default but these values are editable and user can
specify any other weights as long as
2

2

w(1) + w(2) = 1

(29.3)



p = 1 − Φ w(1) Φ−1 (1 − p(1) ) + w(2) Φ−1 (1 − p(2) )

(29.4)

Final p-value is given by

The weights specified on this tab will be used for p-value computation. w(1) will be
used for data before interim look and w(2) will be used for data after interim look.
Thus, according to the samples
pfor the two stages in this example, the
p sizes planned
weights are calculated as (400/600) and (200/600). Note : These weights are
updated by East once we specify the first look position as 400/600 in the Boundary
tab. So leave these as default values for now. Set the Number of Arms as 4 and enter
the rest of the inputs as shown below:

We can certainly have early stopping boundaries for efficacy and/or futility. But
generally, in designs like this, the objective is to select the best dose(s) and not stop
624

29.2 Study Design – 29.2.3 Study Design Inputs

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
early. So for now, select the Boundary tab and set both the boundary families to
“None”. Also, set the timing of the interim analysis as 0.667 which will be after
observing the data on 400 subjects out of 600. Enter 400/600 as shown below. Notice
the updated weights on the bf Test Parameters tab.

The next tab is Response Generation which is used to specify the true underlying
proportion of response on the individual dose groups and the initial allocation from
which to generate the simulated data.

Before we update the Treatment Selection tab, go to the Simulation Control
Parameters tab where we can specify the number of simulations to run, the random
number seed and also to save the intermediate simulation data. For now, enter the

29.2 Study Design – 29.2.3 Study Design Inputs

625

<<< Contents

29

* Index >>>

Two-Stage Multi-arm Designs using p-value combination
inputs as shown below and keep all other inputs as default.

Click on the Treatment Selection tab. This tab is to select the scale to compute the
treatment-wise effects. For selecting treatments for the second stage, the treatment
effect scale will be required, but the control treatment will not be considered for
selection. It will always be there in the second stage. The list under Treatment
Effect Scale allows you to set the selection rules on different scales. Select
Estimated δ from this list. It means that all the selection rules we specify on this tab
will be in terms of the estimated value of treatment effect, δ, i.e., difference from
placebo.
Here is a list of all available treatment effect scales:
Estimated Proportion, Estimated δ, Test Statistic, Conditional Power, Isotonic
Proportion, Isotonic δ.
For more details on these scales, refer to the Appendix K chapter on this method.
The next step is to set the treatment selection rules for the second stage.
Select Best r Treatments: The best treatment is defined as the treatment having the
highest or lowest mean effect. The decision is based on the rejection region. If it
is “Right-Tail” then the highest should be taken as best. If it is “Left-Tail” then
the lowest is taken as best. Note that the rejection region does not affect the
choice of treatment based on conditional power.
Select treatments within  of Best Treatment: Suppose the treatment effect scale is
Estimated δ. If the best treatment has a treatment effect of δb and  is specified
626

29.2 Study Design – 29.2.3 Study Design Inputs

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
as 0.1 then all the treatments which have a δ as δb − 0.1 or more are chosen for
Stage 2.
Select treatments greater than threshold ζ: The treatments which have the
treatment effect scale greater or less than the threshold (ζ) specified by the user
according to the rejection region. But if the treatment effect scale is chosen as
the conditional power then it will be greater than all the time.
Use R for Treatment Selection: If you wish to define any customized treatment
selection rules, it can be done by writing an R function for those rules to be used
within East. This is possible due to the R Integration feature in East. Refer to the
appendix chapter on R Functions for more details on syntax and use of this
feature. A template file for defining treatment selection rules is also available in
the subfolder RSamples under your East installation directory.

For more details on using R to define Treatment selection rules, refer to
section O.10.
For this example, select the first rule Select Best r treatments and set r = 1
which indicates that East will select the best dose for Stage 2 out the three doses. We
will leave the default allocation ratio selections to yield equal allocation between the

29.2 Study Design – 29.2.3 Study Design Inputs

627

<<< Contents

29

* Index >>>

Two-Stage Multi-arm Designs using p-value combination
control and selected best dose in Stage 2.

Click the Simulate button to run the simulations. When the simulations are over, a row
gets added in the Output Preview area. Save this row to the Library by clicking the
icon in the toolbar. Rename this scenario as Best1. Double click it to see the

628

29.2 Study Design – 29.2.3 Study Design Inputs

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
detailed output.

The first table in the detailed output shows the overall power including global power,
conjunctive power, disjunctive power and FWER. The definitions for different powers
are as follows:
Global Power: probability of demonstrating statistical significance on one or
more treatment groups
Conjunctive Power: probability of demonstrating statistical significance on all
treatment groups which are truly effective
Disjunctive Power: probability of demonstrating statistical significance on at
least one treatment group which is truly effective
FWER: probability of incorrectly demonstrating statistical significance on at
29.2 Study Design – 29.2.3 Study Design Inputs

629

<<< Contents

29

* Index >>>

Two-Stage Multi-arm Designs using p-value combination
least one treatment group which is truly ineffective
For our example, there is 0.8 global power, i.e., the probability of this design to reject
any null hypothesis, where the set of null hypothesis are the TRUE proportion of
responders at each dose equals that of control. Also shown are conjunctive and
disjunctive power, as well as Family Wise Error Rate (FWER).
The Lookwise Summary table summarizes the number of simulated trials that ended
with a conclusion of efficacy, i.e., rejected any null hypothesis, at each look. In this
example, no simulated trial stopped at the interim analysis with an efficacy conclusion
since there were no stopping boundaries, but 8083 simulations yielded an efficacy
conclusion via the selected dose after Stage 2. This is consistent with the global power.
The next table Detailed Efficacy Outcomes for all 10000 Simulations, summarizes
the number of simulations for which each dose was selected for Stage 2 and yielded an
efficacy conclusion. For example, the dose 10mg was observed to be efficacious in
63% of simulated trials whereas none of the three doses were efficacious in 19% of
trials.
The last output table Marginal Probabilities of Selection and Efficacy, summarizes
the number and percent of simulations in which each dose was selected for Stage 2,
regardless of whether it was found significant at end of Stage 2 or not, as well as the
number and percent of simulations in which each dose was selected and found
significant. Average sample size is also shown. Note that since this design only
selected the single best dose, this table gives almost the same information as the above
one.
Selecting multiple doses (arms) for Stage 2 would be of more effective than selecting
just the best one.

Click the
button on the bottom left corner of the screen. This will take
us back to the input window of the last simulation scenario. Go to Treatment
Selection tab and set r = 2. It means that we are interested in carrying forward the two
best doses out of the three. Run the simulations by keeping the sample size fixed as
600. The simulated power drops to approximately 73%. Note that the loss of power for
this 2-best-doses-choice scenario in comparison to the previous example which chose
only the best dose. This is because of the smaller sample sizes per dose in stage 2 for
this 2-best-doses scenario since the sample size is split in Stage 2 among 2 doses and
control instead of between only 1 dose and control in the best dose scenario.
630

29.2 Study Design – 29.2.3 Study Design Inputs

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
Now go to Test Parameters tab and change the sample size to 700 assuming that each
of the two doses and Placebo will get 100 subjects in Stage 2. Accordingly, update the
look position on Boundaries tab to 400/700 as well. Click the Simulate button to run
the simulations. When the simulations are over, a row gets added in the Output
icon in the toolbar.
Preview area. Save this row to the Library by clicking the
Rename this scenario as Best2. Double click it to see the detailed output.

The interpretation of first two tables is same as described above. It restores the power
to 80% and also gives us the design details when two of the three doses were selected.

29.2 Study Design – 29.2.3 Study Design Inputs

631

<<< Contents

29

* Index >>>

Two-Stage Multi-arm Designs using p-value combination
The table Detailed Efficacy Outcomes for all 10000 Simulations summarizes the
number of simulations for which each individual dose group or pairs of doses were
selected for Stage 2 and yielded an efficacy conclusion. For example, the pair
(2.5mg, 10mg only) was observed to be efficacious in 41% of the trials (4076/10000).
The next table Marginal Probabilities of Selection and Efficacy, summarizes the
number and percent of simulations in which each dose was selected for Stage 2,
regardless of whether it was found significant at end of Stage 2 or not, as well as the
number and percent of simulations in which each dose was selected and found
significant. Average sample size is also shown. It tells us how frequently the dose
(either alone or with some other dose) was selected and efficacious. For example, dose
1mg was selected in approximately 25% trials and was efficacious in approximately
7% trials (which is the sum of 10, 130 and 555 simulations from previous table.)
The advantage of 2-stage “treatment selection design” or “drop-the-loser” design is
that it allows to drop the less performing/futile arms based on the interim data and still
preserves the type-1 error as well as achieve the desired power.
In the Best1 scenario, we dropped two doses (r = 1) and in the Best2 scenario, we
dropped one dose (r = 2). Suppose, we had decided to proceed to stage 2 without
dropping any doses. In this case, Power would have dropped significantly. To verify
this in East, run the above scenario with r = 3 and save it to Library. Rename this
scenario as All3. Double click it to see the detailed output. We can observe that the
power drops from 80% to 72%.
The three scenarios created so far can be compared in the tabular manner as well.
Select the three nodes in the Library, click the

632

icon in the toolbar and select

29.2 Study Design – 29.2.4 Simulating under Different Alternatives

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
“Power” from the dropdown. A table as shown below will be created by East.

29.2.4

Simulating under Different Alternatives

Since this is a simulation based design, we can perform sensitivity analyses by
changing some of the inputs and observing effects on the overall power and other
output. Let us first make sure that this design preserves the total type1 error. It can be
done by running the simulations under “Null” hypothesis.

Click the
button on the bottom left corner of the screen. Go to Response
Generation tab and enter the inputs as shown below:

Also set r = 2 in the Treatment Selection tab. Run the simulations and go to the
29.2 Study Design – 29.2.4 Simulating under Different Alternatives

633

<<< Contents

29

* Index >>>

Two-Stage Multi-arm Designs using p-value combination
detailed output by saving the row from Output Preview to the Library. Notice the
global power and simulated FWER is less than design type I error which means the
overall type1 error is preserved.

29.3

Sample Size Reestimation

As we have seen above, the desired power of 80% is achieved with the sample size of
700 if the initial assumptions (πc = 0.1, π1mg = 0.14, π2.5mg = 0.18, π10mg = 0.22)
hold true. But if they do not, then the original sample size of 700 may be insufficient to
achieve 80% power. The adaptive sample size re-estimation is suited to this purpose.
In this approach we start out with a sample size of 700 subjects, but take an interim
look after data are available on 400 subjects. The purpose of the interim look is not to
stop the trial early but rather to examine the interim data and continue enrolling past
the planned 700 subjects if the interim results are promising enough to warrant the
additional investment of sample size. This strategy has the advantage that the sample
size is finalized only after a thorough examination of data from the actual study rather
than through making a large up-front sample size commitment before any data are
available. Furthermore, if the sample size may only be increased but never decreased
from the originally planned 700 subjects, there is no loss of efficiency due to overruns.
Suppose the proportions of response on the four arms are as shown below. Update the
Response Generation tab accordingly and also set the seed as 100 in the Simulation
Controls tab.

Run 10000 simulations and save the simulation row to the Library by clicking the

634

29.3 Sample Size Re-estimation

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016

icon in the toolbar.

Notice that the global power has dropped from 80% to 67%. Let us re-estimate the
sample size to achieve the desired power. Add the Sample Size Re-estimation tab by
clicking the button

. A new tab is added as shown below.

SSR At: For a K-look group sequential design, one can decide the time at which
conditions for adaptations are to be checked and actual adaptation is to be
carried out. This can be done either at some intermediate look or after some
specified information fraction. The possible values of this parameter depend
29.3 Sample Size Re-estimation

635

<<< Contents

29

* Index >>>

Two-Stage Multi-arm Designs using p-value combination
upon the user choice. The default choice for this design is always the Look #.
and is fixed to 1 since it is always a 2-look design.
Target CP for Re-estimating Sample Size: The primary driver for increasing the
sample size at the interim look is the desired (or target) conditional power or
probability of obtaining a positive outcome at the end of the trial, given the data
already observed. For this example we have set the conditional power at the end
of the trial to be 80%. East then computes the sample size that would be required
to achieve this desired conditional power.
Maximum Sample Size if Adapt (multiplier; total): As just stated, a new sample
size is computed at the interim analysis on the basis of the observed data so as to
achieve some target conditional power. However the sample size so obtained
will be overruled unless it falls between pre-specified minimum and maximum
values. For this example, the range of allowable sample sizes is [700, 1400]. If
the newly computed sample size falls outside this range, it will be reset to the
appropriate boundary of the range. For example, if the sample size needed to
achieve the desired 80% conditional power is less than 700, the new sample size
will be reset to 700. In other words we will not decrease the sample size from
what was specified initially. On the other hand, the upper bound of 1400 subjects
demonstrates that the sponsor is prepared to increase the sample size up to
double the initial investment in order to achieve the desired 80% conditional
power. But if 80% conditional power requires more than 1400 subjects, the
sample size will be reset to 1400, the maximum allowed.
Promising Zone Scale: One can define the promising zone as an interval based on
conditional power, test statistic, or estimated δ. The input fields change
according to this choice. The decision of altering the sample size is taken based
on whether the interim value of conditional power / test statistic / δ lies in this
interval or not. Let us keep the default scale which is Conditional Power.
Promising Zone: Minimum/Maximum Conditional Power (CP): The sample size
will only be altered if the estimate of CP at the interim analysis lies in a
pre-specified range, referred to as the “Promising Zone”. Here the promising
zone is 0.30 − 0.80. The idea is to invest in the trial in stages. Prior to the
interim analysis the sponsor is only committed to a sample size of 700 subjects.
If, however, the results at the interim analysis appear reasonably promising, the
sponsor would be willing to make a larger investment in the trial and thereby
improve the chances of success. Here we have somewhat arbitrarily set the lower
bound for a promising interim outcome to be CP = 0.30. An estimate

636

29.3 Sample Size Re-estimation

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
CP < 0.30 at the interim analysis is not considered promising enough to
warrant a sample size increase. It might sometimes be desirable to also specify
an upper bound beyond which no sample size change will be made. Here we
have set that upper bound of the promising zone at CP = 0.80. In effect we
have partitioned the range of possible values for conditional power at the interim
analysis into three zones; unfavorable (CP < 0.3), promising
(0.3 ≤ CP < 0.8), and favorable (CP ≥ 0.8). Sample size adaptations are
made only if the interim CP falls in the promising zone at the interim analysis.
The promising zone defined on the Test Statistic scale or the Estimated δ scale
works similarly.
SSR Function in Promising Zone: The behavior in the promising zone can either be
defined by a continuous function or a step function. The default is continuous
where East accepts the two quantities - (Multiplier, Target CP) and re-estimates
the sample size depending upon the interim value of CP/test statistic/effect size.
The SSR function can be defined as a step-function as well. This can be done
with a single piece or with multiple pieces. For each piece, define the step
function in terms of:
the interval of CP/test statistic/effect size. This depends upon the choice of
promising zone scale.
the value of re-estimated sample size in that interval.
for single piece, just the total re-estimated sample size is required as an
input.
If the interim value of CP/ test statistic/effect size lies in the promising zone then
the re-estimation will be done using this step function.
Let us set the inputs on Sample Size Re-estimation tab as shown below:

29.3 Sample Size Re-estimation

637

<<< Contents

29

* Index >>>

Two-Stage Multi-arm Designs using p-value combination

Run 10000 simulations and see the Details. Just for the comparison purpose, re-run the
simulations but this time, set the multiplier in the Sample Size Re-estimation tab to 1
which means we are not interested in sample size re-estimation. Both the scenarios can
also be run by entering two values 1, 2 in the cell for Multiplier.

638

29.3 Sample Size Re-estimation

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
With Sample Size Re-estimation

Without Sample Size Re-estimation

We observe from the table the power of adaptive implementation is approximately 75%
which is almost 8% improvement over the non-adaptive design. This increase in power
has come at an average cost of 805-700 = 105 additional subjects. Next we observe
from the Zone-wise Averages table that 1563 of 10000 trials (16%) underwent sample
size re-estimation and of those 1563 trials, 84% were able to reject the Global null
hypothesis. The average sample size, conditional on adaptation is 1376.

29.4

Adding Early
Stopping Boundaries

One can also incorporate stopping boundaries to stop at the interim early for efficacy
or futility. The efficacy boundary can be defined based on Adjusted p-value scale
whereas futility boundary can be on Adjusted p-value or δ scale.
Click the
button on the bottom left corner of the screen. This will take
you back to the input window of the last simulation scenario. Go to Boundary tab and
set Efficacy and Futility boundaries to “Adjusted p-value”. These boundaries are for
29.4 Adding Early Stopping Boundaries

639

<<< Contents

29

* Index >>>

Two-Stage Multi-arm Designs using p-value combination
early stopping at look1. As the note on this tab says:
If any one adjusted p-value is ≤ efficacy p-value boundary then stop the trial for
efficacy
If only all the adjusted p-values are > futility p-value then stop the trial for
futility. Else carry forward all the treatments to the next step of treatment
selection.
Stopping early for efficacy or futility is step which is carried out before treatment
selection rules are applied. The simulation output has the same explanation as above
except the Lookwise Summary table may have some trials stopped at the first look
due to efficacy or futility.

29.5

Monitoring this trial

Select the simulation node with SSR implementation and click the
invoke the Interim Monitoring dashboard. Click the
open the Test Statistic Calculator. Enter the data as shown below:

icon. It will
icon to

Click Recalc to calculate the test statistic as well as the raw p-values. Notice that the
p-value for 1mg is 0.095 which is greater than 0.025. We will drop this dose in the
second stage. On clicking OK, it updates the dashboard.
640

29.5 Monitoring this trial

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016

Open the test statistic calculator for the second look and enter the following
information and also drop the dose 1mg. Click Recalc to calculate the test statistic as

29.5 Monitoring this trial

641

<<< Contents

29

* Index >>>

Two-Stage Multi-arm Designs using p-value combination
well as the raw p-values.

On clicking OK, it updates the dashboard. Observe that the adjusted p-value for 10mg
crosses the efficacy boundary. It can also be observed in the Stopping Boundaries
chart.

642

29.5 Monitoring this trial

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016

29.5 Monitoring this trial

643

<<< Contents

* Index >>>

30
30.1

Logistic Regression
with Single Normal
Covariate

Binomial Superiority Regression

Logistic regression is widely used for modeling the probability of a binary response in
the presence of covariates. In this section we will show how East may be used to
design clinical trials with binomial endpoints, while adjusting for the effects of
covariates through the logistic regression model. The sample size calculations for the
logistic regression models discussed here and implemented in East are based on the
methods of Hsieh et al., 1997. We note, however, that these methods are limited to
continuous covariates only. When the covariate is normal, the log odds value β1 is zero
if and only if the group means between the two response categories are the same
assuming equal variances.
Suppose in a logistic regression model, Y is a binary response variable and X1 is a
covariate related to Y . The model is given by

log(

P
) = β0 + β1 X1
1−P

(30.1)

where P = P (Y = 1). The null hypothesis that the coefficient of the covariate β1 is
zero is tested against the two sided alternative hypothesis that β1 is not equal to zero.
The slope coefficient β1 is the change in log odds for every one unit increase in X1 .
The sample size required for a two sided test with type-I error rate of α to have a
power 1 − β is

n=

(Z1− α2 + Z1−β )2
P1 (1 − P1 )β ∗2

(30.2)

Where β ∗ is the effect size to be tested, P1 is the event rate at the mean of X and Zu is
the upper u-th percentile of the standard normal distribution.

30.1.1

Trial Design

We use a Department of Veterans Affairs Cooperative Study entitled ’A
Psychophysiological Study of Chronic Post-Traumatic Stress Disorder’ to illustrate the
preceding sample size calculation for logistic regression with continuous covariates.
The study developed and validated a logistic regression model to explore the use of
certain psychophysiological measurements for the prognosis of combat-related
644

30.1 Logistic Regression with Single Normal Covariate – 30.1.1 Trial Design

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
post-traumatic stress disorder (PTSD). In the study, patients’ four psychophysiological
measurements-heart rate, blood pressures, EMG and skin conductance- were recorded
while patients were exposed to video tapes containing combat and neutral scenes.
Among the psychophysiological variables, the difference of the heart rates obtained
while viewing the combat and the neutral tapes (DCNHR) is considered a good
predictor of the diagnosis of PTSD. The prevalence rate of PTSD among the Vietnam
veterans was assumed to be 20 per cent. Therefore, we assumed a four to one sample
size ratio for the non-PTSD versus PTSD groups. The effect size of DCNHR is
approximately 0.3 which is the difference of the group means divided by the standard
deviation. We would like to determine the sample size to achieve 90% power based on
a two-sided test at significance level 0.05 (Hsieh et. al.,1998).
Start East. Click Design tab, then click Regression in the Discrete group, and then
clickLogistic Regression - Odds Ratio.
The input dialog box, with default input values will appear in the upper pane of this
window. Enter 0.2 in Proportion Success at X = µ, (P0 ) and 1.349 in
Odds Ratio P1 (1 − P0 )/P0 (1 − P1 ) field.
Enter the rest of the inputs as shown below and click Compute.

The design output will be displayed in the Output Preview, with the computed sample
size highlighted in yellow. A total of 733 subjects must be enrolled in order to achieve
90% power under the alternative hypothesis. Besides sample size, one can also
compute the power and the level of significance for this design.

You can select this design by clicking anywhere on the row in the Output Preview. If
30.1 Logistic Regression with Single Normal Covariate – 30.1.1 Trial Design

645

<<< Contents

30

* Index >>>

Binomial Superiority Regression
you click on

icon, some of the design details will be displayed in the upper pane.

In the Output Preview toolbar, click the
icon, to save this design to workbook
Wbk1 in the Library. If you hover the cursor over Des1 in the Library, a tooltip will
appear that summarizes the input parameters of the design.

With Des1 selected in the Library, click
below:

646

icon to see the detailed output as shown

30.1 Logistic Regression with Single Normal Covariate – 30.1.1 Trial Design

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016

Observe that this kind output gives us the summary of the output as well.
With Des1 selected in the Library, click
icon, on the Library toolbar, and then
click Power vs. Sample Size. The resulting power curve for this design is shown. You
can export the chart in one of several image formats (e.g., Bitmap or JPEG) by clicking
Save As.... For now, you may close the chart before continuing

30.1 Logistic Regression with Single Normal Covariate

647

<<< Contents

30

648

* Index >>>

Binomial Superiority Regression

30.1 Logistic Regression with Single Normal Covariate

<<< Contents

* Index >>>

31
31.1

Cohen’s Kappa

31.1.1 Trial Design

Agreement

In some experimental situations, to check inter-rater reliability, independent sets of
measurements are taken by more than one rater and the responses are checked for
agreement. For a binary response, Cohen’s Kappa test for binary ratings can be used to
check inter-rater reliability. Conventionally, the kappa coefficient is used to express the
degree of agreement between two raters when the same two raters rate each of a
sample of n subjects independently, with the ratings being on a categorical scale
consisting of k categories (Fleiss, 1981). A simple example is given in the below table
where two tests Test 1 and Test 2 (k = 2) were performed. In the below table, πij
denotes the true population proportion in the i-th row and the j-th column category.

Table 31.1: Table of proportions of two raters
Test 1\ Test 2
Test 1(+)
Test 1(-)
Marginal Probability

Test 2(+)
π11
π21
π.1

Test 2(-)
π12
π22
π.2

Marginal Probability
π1.
π2.
1

The Kappa coefficient (κ) is defined by

κ=
where π0 =

P2

i=1

πii and πe =

P2

i=1

π0 − πe
1 − πe

(31.1)

πi. π.i .

We want to test the null hypothesis H0 : κ ≤ κ0 against H1 : κ > κ0 where κ0 > 0.
The total sample size required for a test with type-I error rate of α to have a power
1 − β is

n=

31.1 Cohen’s Kappa

(zα + zβ )2 (E + F − G)
[(1 − πe )2 (κ − κ0 )]2

(31.2)

649

<<< Contents

31

* Index >>>

Agreement
where

E=

2
X

πii [(1 − πe ) − (π.i + πi. )(1 − π0 )]2

(31.3)

i=1

F = (1 − π0 )

2

2 X
X

πij (π.i + πj. )2

(31.4)

i=1 j6=i

and

G = [π0 (1 + πe ) − 2πe ]2

(31.5)

We can calculate power, sample size or level of significance for your Cohen’s Kappa
test for two ratings.

31.1.1

Trial Design

Consider responses from two raters. The example is based on a study to develop and
validate a set of clinical criteria to identify patients with minor head injury who do not
undergo a CT scan (Haydel, et al., 2000). In the study, CT scan was first reviewed by a
staff neuroradiologist. An independent staff radiologist then reviewed 50 randomly
selected CT scans and the two sets of responses checked for agreement. Let κ denote
the level of agreement. The null hypothesis is H0 : κ = 0.9 versus the one-sided
alternative hypothesis H1 : κ < 0.9. We wish to compute the power of the test at the
alternative value κ1 = 0.6. We expect each rater to identify 8% of CT scans to be
positive. Also we expect 5% of the positive CT scans were rated by both the raters.
Start East. Click Design tab, then click Agreement in the Discrete group, and then
clickCohen’s Kappa (Two Binary Ratings .

650

31.1 Cohen’s Kappa – 31.1.1 Trial Design

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016

The input dialog box, with default input values will appear in the upper pane of this
window.
Enter 0.9 in Null Agreement (κ0 ) field. Specify the α = 0.05, sample size and the
kappa parameter values as shown below. Enter the rest of the inputs as shown below
and click Compute.

The design output will be displayed in the Output Preview, with the computed power
highlighted in yellow. The power of the test is 64.9% given a sample size of 50 scans
to establish agreement of ratings by the two radiologists. Besides power, one can also
compute the sample size for this study design.

You can select this design by clicking anywhere on the row in the Output Preview. If
you click

icon, some of the design details will be displayed in the upper pane. In

31.1 Cohen’s Kappa – 31.1.1 Trial Design

651

<<< Contents

31

* Index >>>

Agreement
the Output Preview toolbar, click
icon, to save this design to workbook Wbk1 in
the Library. If you hover the cursor over Des1 in the Library, a tooltip will appear
that summarizes the input parameters of the design.

With Des1 selected in the Library, click
icon on the Library toolbar, and then
click Power vs. Sample Size. The resulting power curve for this design is shown. You
can export the chart in one of several image formats (e.g., Bitmap or JPEG) by clicking
Save As.... For now, you may close the chart before continuing

652

31.1 Cohen’s Kappa – 31.1.1 Trial Design

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016

31.2

Cohen’s Kappa (C
Ratings)

Let κ denotes the measure of agreement between two raters who each classify n
objects into C mutually exclusive ratings (categories). Here the null hypothesis is
H0 : κ = κ0 is tested against two-sided hypothesis H1 : κ 6= κ0 or one sided
hypothesis H1 : κ > κ0 or H1 : κ < κ0 . The total sample size required for a test with
type-I error rate of α to have a power 1 − β when κ = κ1 is

n≥[

Z1−α max τ (κ̂|κ = κ0 ) + Z1−β max τ (κ̂|κ = κ1 )
]
κ1 − κ0

(31.6)

Where
1

τ (κ̂) =

(Q1 + Q2 − 2Q3 − Q4 ) 2
(1 − πe )2

(31.7)

and
31.2 Cohen’s Kappa (C Ratings)

653

<<< Contents

31

* Index >>>

Agreement
Q1 = π0 (1 − πe )2 ,
PC PC
Q2 = (1 − π0 )2 i=1 j=1 πij (πi. + π.j )2 ,
PC
Q3 = 2(1 − π0 )(1 − πe ) i=1 πij (πi. + π.j ),
Q4 = (π0 πe − 2πe + π0 )2 .
πij is the proportion of subjects that Rater 1 places in category i but Rater 2 places in
category j, π0 is the proportion of agreement and πe is the expected proportion of
agreement.
The power of the test is given by
√
Power = Φ[

31.2.1

n(κ1 − κ0 ) − Z1−α max τ (κ̂|κ = κ0 )
]
max τ (κ̂|κ = κ1 )

(31.8)

Trial Design

Consider a hypothetical problem of physical health ratings from two different
raters-health instructor and subject’s general practitioner. 360 subjects were randomly
selected and the two sets of responses were checked for agreement. Let κ denote the
level of agreement. The null hypothesis is H0 : κ = 0.6 versus the one-sided
alternative hypothesis H1 : κ < 0.6. We wish to compute the power of the test at the
alternative value κ1 = 0.5.

Table 31.2: Table: Contingency Table
General Petitioner \ Health Instructor
Poor
Fair
Good
Excellent
Total

Poor
2
9
4
1
16

Fair
12
35
36
8
91

Good
8
43
103
30
184

Excellent
0
7
40
22
69

Total
22
94
183
61
360

Start East. Click Design tab, then click Agreement in the Discrete group, and then
clickCohen’s Kappa (Two Categorical Ratings .

654

31.2 Cohen’s Kappa (C Ratings) – 31.2.1 Trial Design

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016

The input dialog box, with default input values will appear in the upper pane of this
window.
Enter Number of Ratings (C) as 4. Enter 0.6 in Null Agreement (κ0 ) field and
0.5 in Alternative Agreement (κ1 ) . Click Marginal Probabilities and
specify the marginal probabilities calculated from the above table. Specify the sample
size. Leave all other values as defaults, and click Compute.

The design output will be displayed in the Output Preview, with the computed power
highlighted in yellow. The power of the test is 73.3% given a sample size of 360
subjects to establish agreement of ratings by the two raters. Besides power, one can
also compute the sample size for this study design.

31.2 Cohen’s Kappa (C Ratings) – 31.2.1 Trial Design

655

<<< Contents

31

* Index >>>

Agreement
You can select this design by clicking anywhere on the row in the Output Preview. If
you click

icon, some of the design details will be displayed in the upper pane. In

the Output Preview toolbar, click
icon to save this design to workbook Wbk1 in
the Library. If you hover the cursor over Des1 in the Library, a tooltip will appear
that summarizes the input parameters of the design.

With Des1 selected in the Library, click
icon on the Library toolbar, and then
click Power vs. Sample Size. The resulting power curve for this design is shown. You
can export the chart in one of several image formats (e.g., Bitmap or JPEG) by clicking
Save As.... For now, you may close the chart before continuing.

656

31.2 Cohen’s Kappa (C Ratings)

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016

31.2 Cohen’s Kappa (C Ratings)

657

<<< Contents

* Index >>>

32

Dose Escalation

This chapter deals with the design, simulation, and interim monitoring of Phase 1 dose
escalation trials. A brief overview of the designs is given below; more technical details
are available in the Appendix N.
One of the primary goals of Phase I trials in oncology is to find the maximum tolerated
dose (MTD). Currently, the vast majority of such trials have employed traditional dose
escalation methods such as the 3+3 design. The 3+3 design starts by allocating three
patients typically to the lowest dose level, and then adaptively moves up and down in
subsequent cohorts until either the MTD is obtained, or the trial is stopped for
excessive toxicity. In addition to the 3+3, East also provides the Continual
Reassessment Method (CRM), the modified Toxicity Probability Interval (mTPI)
method, and the Bayesian logistic regression model (BLRM) for single agent designs.
Compared to the 3+3, these modern methods may offer a number of advantages, which
can be explored systematically via simulation and interim monitoring.
The CRM (Goodman et al., 1995; O’Quigley et al., 1990) is a Bayesian model-based
method that uses all available information from all doses to guide dose assignment.
One first specifies a target toxicity, a one-parameter dose response curve and
corresponding prior distribution. The posterior mean and predictions for the
probability of toxicity at each dose are updated as the trial progresses. The next
recommended dose is the one whose toxicity probability is closest to the target toxicity.
The mTPI method (Ji et al., 2010) is Bayesian like the CRM, but rule-based like the
3+3. In this way, the mTPI represents a useful compromise between the other methods.
An independent beta distribution is assumed for the probability of toxicity at each
dose. A set of decision intervals are specified, and subsequent dosing decisions (up,
down, or stay) are determined by computing the normalized posterior probability in
each interval at the current dose. The normalized probability for each interval is known
as the unit probability mass (UPM).
A more advanced version of the CRM is the BLRM (Neuenschwander et al., 2008;
Sweeting et al., 2013), which assumes a two-parameter logistic dose response curve.
In addition to a target toxicity, one specifies a set of decision intervals, and optional
associated losses, for guiding dosing decisions.
For dual-agent combination designs, East provides a combination version of the
BLRM (Neuenschwander et al., 2014), as well as the PIPE (product of independent
beta probabilities escalation) method (Mander & Sweeting, 2015).
658

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016

32.1

3+3

32.1.1 Simulation
32.1.2 Interim Monitoring

32.1.1

Simulation

Click Discrete: Dose Escalation on the Design tab, and then click Single Agent
Design: 3+3.

This window is the Input dialog box, which is separated into three tabs: Design
Parameters, Response Generation, and Simulation Control. First, you may specify
the Max. Number of Doses as 7.
In the Design Parameters tab, enter 30 as the Max. Sample Size. For the 3+3 design,
the Cohort Size is fixed at 3.
There are three variants of 3+3 offered: L and H and L(modified). The key differences
between these variants can be seen in the respective Decision Rules table. Select 3+3

32.1 3+3 – 32.1.1 Simulation

659

<<< Contents

32

* Index >>>

Dose Escalation
L.

You also have the option of starting with an Accelerated Titration design (Simon et al.,
1997), which escalates with single-patient cohorts until the first DLT is observed, after
which the cohort is expanded at the current dose level with two more patients.
In the Response Generation tab, you can specify a set of true dose response curves
from which to simulate. For the Starting Dose, select the lowest dose (5). In the row
titled Dose, you can specify the dose levels (e.g., in mg). In the row titled GC1, you
can edit the true probabilities of toxicity at each dose. You can also rename the profile
by directly editing that cell. For now, leave all entries at their default values.
You can add a new profile generated from a parametric curve family. For example,
click on the menu Curve Family and select Emax. You may construct a

660

32.1 3+3 – 32.1.1 Simulation

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
four-parameter Emax curve by adjusting its parameters, then click Add Profile.

Click Plot Profiles to plot the two dose toxicity curves in this grid.

In the Simulation Control tab, check the boxes corresponding to Save summary
statistics and Save subject-level data. These options will provides access to several
charts derived from these more detailed levels of simulated data. If you wish to display
subject-level plots for more than one simulation, you can increase the number. For
now, leave this at 1 to save computation time.

32.1 3+3 – 32.1.1 Simulation

661

<<< Contents

32

* Index >>>

Dose Escalation
You may also like to examine the Local Options button of the input window toolbar.
This gives you different options for computing average allocations for each dose.

Click Simulate. East will simulate data generated from the two profiles you specified,
and apply the 3+3 design to each simulation data set. Once completed, the two
simulations will appear as two rows in the Output Preview. Select both rows in the
Output Preview and click the
icon in the toolbar. The two simulations will be
displayed side by side in the Output Summary.

In the Output Preview toolbar, click the

662

32.1 3+3 – 32.1.1 Simulation

icon to save both simulations to the

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
Library. Double-click Sim1 in the Library to display the simulation output details.

With Sim1 selected in the Library, click the Plots icon to access a wide range of
available plots. Below is an example of the MTD plot, showing the percentage of
simulations that each dose level was selected as the MTD. The ”true” MTD is
displayed as the second dose level. This is the dose whose true probability of DLT

32.1 3+3 – 32.1.1 Simulation

663

<<< Contents

32

* Index >>>

Dose Escalation
(0.1) was closest to and below the target probability (1/6).

Another useful plot is that showing the possible sample sizes, shown as percentages
over all simulations.

Close each plot after viewing, or save them by clicking Save in Workbook.
Finally, to save the workbook to disk, right-click Wbk1 in the Library and then Save
664

32.1 3+3 – 32.1.1 Simulation

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
As....

32.1.2

Interim Monitoring

Right-click one of the Simulation nodes in the Library, and select Interim
Monitoring. This will open an empty interim monitoring dashboard.

Click Enter Interim Data to open a window in which to enter data for the first cohort:
in particular, the Dose Assigned and the DLTs Observed. Click OK to continue.

32.1 3+3 – 32.1.2 Interim Monitoring

665

<<< Contents

32

* Index >>>

Dose Escalation
The dashboard will be updated accordingly, and the next Recommended Dose is 10.

Click Enter Interim Data again, with 10 selected as Dose Assigned, enter 2 for DLTs
Observed, and click OK.

.

666

32.1 3+3 – 32.1.2 Interim Monitoring

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
East now recommends de-escalation to 5.

Click Enter Interim Data, with 5 selected as Dose Assigned, enter 1 for DLTs
Observed, and click OK.
East recommends that you stop the trial.

.
Click Stop Trial to generate a table for final inference.

.
32.1 3+3 – 32.1.2 Interim Monitoring

667

<<< Contents

32
32.2

* Index >>>

Dose Escalation
Continual
Reassessment
Method (CRM)

32.2.1 Simulation
32.2.2 Interim Monitoring

32.2.1

Simulation

Click Discrete: Dose Escalation on the Design tab, and then click Single Agent
Design: Continual Reassessment Method.

This window is the Input dialog box, which is separated into four tabs: Design
Parameters, Stopping Rules, Response Generation, and Simulation Control.
In the Design Parameters tab, enter 30 as the Maximum Sample Size, and 3 for
Cohort Size. If you were to check the box Start With, then you would be simulating
from the 3+3 or Accelerated Titration design first, before switching to the CRM. For
this tutorial, however, leave the box unchecked.
Enter 0.25 for the Target Probability of Toxicity, and 0.3 for the Target Probability
Upper Limit. This will ensure that the next dose assignment is that whose posterior
mean toxicity probability is closest to 0.25, and below 0.3.

Click the Posterior Sampling... button. By default, CRM requires the posterior mean
only. If instead you wish to sample from the posterior distribution (using a
668

32.2 Continual Reassessment Method (CRM) – 32.2.1 Simulation

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
Metropolis-Hastings algorithm), you will be able to compute and plot the posterior
probabilities of being the MTD for each dose. Note that this option will increase the
simulation time.

Click the Dose Skipping... button. As was recommended in later variations of CRM,
in the interests of promoting safety, leave the default options: No untried doses will be
skipped while escalating, and no dose escalation will occur when the most recent
subject experienced a DLT.

For Model Type, select Power, with a Gamma(α = 1,β = 1) prior for θ. Other model
types available include the Logistic and the Hyperbolic Tangent. Finally, for
the prior probabilities of toxicity of all doses (known as the skeleton), enter: 0.05, 0.1,

32.2 Continual Reassessment Method (CRM) – 32.2.1 Simulation

669

<<< Contents

32

* Index >>>

Dose Escalation
0.2, 0.3, 0.35, 0.4, and 0.45.

Click the
icon to generate a chart of the 95% prior intervals at each dose for
probability of DLT.

In the Stopping Rules tab, you may specify various rules for stopping the trial. Enter

670

32.2 Continual Reassessment Method (CRM) – 32.2.1 Simulation

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
the following inputs as below.

The early stopping rules are divided into two types: Those where the MTD is not
determined, and those where the MTD is determined. The former case may arise when
the MTD is estimated to be below the lowest dose or above the highest dose. Thus, if
the posterior probability of overdosing (toxicity at the lowest dose is greater than target
toxicity) exceeds 0.8, then the trial will be stopped. Similarly, if the posterior
probability of underdosing (toxicity at the highest dose is lower than target toxicity)
exceeds 0.9, then the trial will be stopped. A minimum of 6 subjects will need to be
observed on a dose before either of these two rules is activated. A further stopping rule
is based on the Allocation Rule: If the number of subjects already allocated to the
current MTD is at least 9, the trial will be stopped.
In the Response Generation tab, you can specify a set of true dose response curves
from which to simulate. For the Starting Dose, select the lowest dose (5). Leave the
default profile as shown below. If you wish to edit or add additional profiles (dose

32.2 Continual Reassessment Method (CRM) – 32.2.1 Simulation

671

<<< Contents

32

* Index >>>

Dose Escalation
response curves), see the corresponding section for the 3+3 design.

In the Simulation Control tab, check the boxes corresponding to Save summary
statistics and Save subject-level data. These options will provides
access to several charts derived from these more detailed levels of simulated data. If
you wish to display subject-level plots for more than one simulation, you can increase
the number. For now, leave this at 1 to save computation time.
Click Simulate to simulate the CRM design. In the Output Preview toolbar, click the
icon to save the simulation to the Library. Double-click the simulation node in
the Library to display the simulation output details. Click the Plots icon in the
Library to access a wide range of available plots.
Below is an example of the MTD plot, showing the percentage of simulations that each
dose level was selected as the MTD. The true MTD is displayed as the third dose level
(15). This is the dose whose true probability of DLT (0.2) was closest to and below the

672

32.2 Continual Reassessment Method (CRM) – 32.2.1 Simulation

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
target probability (0.25).

32.2.2

Interim Monitoring

Right-click the Simulation node in the Library, and select Interim Monitoring. This
will open an empty interim monitoring dashboard.
Click Enter Interim Data to open a window in which to enter data for the first cohort:
in particular, the Dose Assigned and the DLTs Observed. Click OK to continue.
Continue in this manner by clicking Enter Interim Data, entering the following
doses, and the corresponding number of DLTs.

32.2 Continual Reassessment Method (CRM) – 32.2.2 Interim Monitoring

673

<<< Contents

32

* Index >>>

Dose Escalation
If you click Display by Dose, you will see the data grouped by dose level. You may
click Display by Cohort to return to the original view.

After each cohort, East will update the Interim Monitoring Dashboard. You may
replace the IM dashboard plots with any other plots or corresponding tables, by
clicking on the associated icons at the top left of each panel.

At this point, East recommends that you stop the trial.

.
674

32.2 Continual Reassessment Method (CRM) – 32.2.2 Interim Monitoring

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
Click Stop Trial to generate a table for final inference.

.

32.3

modified Toxicity
Probability Interval
(mTPI)

32.3.1 Simulation
32.3.2 Interim Monitoring

32.3.1

Simulation

Click Discrete: Dose Escalation on the Design tab, and then click Single Agent
Design: Modified Toxicity Probability Interval.

This window is the Input dialog box, which is separated into five tabs: Design
Parameters, Stopping Rules, Trial Monitoring Table, Response Generation, and
Simulation Control.
In the Design Parameters tab, enter 30 as the Maximum Sample Size, and 3 for
Cohort Size. If you were to check the box Start With, then you would be simulating
from the 3+3 or Accelerated Titration design first, before switching to the mTPI. For
this tutorial, however, leave the box unchecked.
Enter 0.25 for the Target Probability of Toxicity, 0.2 for the upper limit of the Under
32.3 modified Toxicity Probability Interval (mTPI) – 32.3.1 Simulation

675

<<< Contents

32

* Index >>>

Dose Escalation
dosing interval, and 0.3 for the upper limit of Proper dosing interval.

These entries imply that toxicity probabilities within this interval [0.2 to 0.3] can be
regarded as equivalent to the target toxicity (0.25) as far as dosing decisions are
concerned. Finally, we will assume a uniform Beta(a = 1, b = 1) prior distribution for
all doses.

In the Stopping Rules tab, enter the following inputs as below.

For the mTPI design, the stopping rule is based on dose exclusion rules. This states
that if there is greater than a 0.95 posterior probability that toxicity for a given dose is
greater than the target toxicity, that dose and all higher doses will be excluded in
subsequent cohorts. When this dose exclusion rule applies to the lowest dose, then all
doses are excluded, and hence the trial will be stopped for excessive toxicity.
676

32.3 modified Toxicity Probability Interval (mTPI) – 32.3.1 Simulation

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
Furthermore, the dose exclusion rule is not activated until at least 3 subjects are
observed on a dose. A similar idea can be applied to the highest dose: If there is a
greater than 95% posterior probability that the toxicity at the highest dose is less than
the target toxicity, then stop the trial early.
The remaining stopping rules allow one to stop the trial early with MTD determined.
The Allocation Rule requires a certain number of subjects already allocated to the
next recommended dose. The CI Rule requires that the credible interval for
probability of DLT at the MTD is within some range. The Target Rule requires that
the posterior probability of being in the target toxicity, or proper dosing interval,
exceeds some threshold. Finally, any of these rules can be combined with Minimum
Ss Observed in the Trial.
In the Trial Monitoring Table tab, you can view the decision table corresponding to
the inputs entered in the previous tabs.

East also provides the option of creating and simulating from a customized trial
monitoring table. If you click Edit Trial Monitoring Table, you can click on any cell

32.3 modified Toxicity Probability Interval (mTPI) – 32.3.1 Simulation

677

<<< Contents

32

* Index >>>

Dose Escalation
in the grid to edit and change the dose assignment rule for that cell.

In the Response Generation tab, you can specify a set of true dose response curves
from which to simulate. For the Starting Dose, select the lowest dose (5). Leave the
default profile as shown below. If you wish to edit or add additional profiles (dose
response curves), see the corresponding section for the 3+3 design.
In the Simulation Control tab, check the boxes corresponding to Save summary
statistics and Save subject-level data. These options will provides access to several
charts derived from these more detailed levels of simulated data. If you wish to display
subject-level plots for more than one simulation, you can increase the number. For
now, leave this at 1 to save computation time.
Click the Local Options button at the top left corner of the input window toolbar. This
gives you different options for computing average allocations for each dose, and for
computing isotonic estimates. Select the following options and click OK.

Click Simulate to simulate the mTPI design. In the Output Preview toolbar, click the
678

32.3 modified Toxicity Probability Interval (mTPI) – 32.3.1 Simulation

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016

icon to save the simulation to the Library. Double-click the simulation node in
the Library to display the simulation output details. For example, the true MTD was
D3 (15), and this dose was selected as MTD the most often (43% of the time).

Click the Plots icon in the Library to access a wide range of available plots.

32.3.2

Interim Monitoring

Right-click one of the Simulation nodes in the Library, and select Interim
Monitoring. This will open an empty interim monitoring dashboard.
In the interim monitoring toolbar, click the chart icon, and Trial Monitoring Table to
generate a table to guide dosing decisions for this trial.

Click Enter Interim Data to open a window in which to enter data for the first cohort:

32.3 modified Toxicity Probability Interval (mTPI) – 32.3.2 Interim Monitoring

679

<<< Contents

32

* Index >>>

Dose Escalation
in particular, the Dose Assigned and the DLTs Observed. Click OK to continue.

The dashboard will be updated accordingly. The decision for the next cohort is based
on the highest Unit Probability Mass (UPM): the posterior probability for each toxicity
interval divided by the length of the interval. The underdosing interval corresponds to
an E (Escalate) decision, the proper dosing interval corresponds to an S (Stay)
decision, and the overdosing interval corresponds to a D (De-escalate) decision. In this
case, the UMP for underdosing is highest.

Thus, the recommendation is to escalate to dose 10.
Continue in this manner by entering data for each subsequent cohort, and observe how
the interim monitoring dashboard updates.
680

32.3 modified Toxicity Probability Interval (mTPI) – 32.3.2 Interim Monitoring

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
One example is given below.

After each cohort, East will update the Interim Monitoring Dashboard. You may
replace the IM dashboard plots with any other plots or corresponding tables, by
clicking on the associated icons at the top left of each panel.

Suppose we wished to end the study after 8 cohorts (24 patients). Click Stop Trial to
end the study and generate a table of final inference.

32.4

Bayesian logistic
regression model
(BLRM)

32.4.1 Simulation
32.4.2 Interim Monitoring

32.4.1

Simulation

Click Discrete: Dose Escalation on the Design tab, and then click Single Agent
32.4 Bayesian logistic regression model (BLRM) – 32.4.1 Simulation

681

<<< Contents

32

* Index >>>

Dose Escalation
Design: Bayesian Logistic Regression Model.

This window is the Input dialog box, which is separated into four tabs: Design
Parameters, Stopping Rules, Response Generation, and Simulation Control.
In the Design Parameters tab, enter 30 as the Maximum Sample Size, and 3 for
Cohort Size. If you were to check the box Start With, then you would be simulating
from the 3+3 or Accelerated Titration design first, before switching to the BLRM. For
this tutorial, however, leave the box unchecked.
The next step is to choose a Dose Selection Method: either by Bayes Risk or by
Max Target Toxicity. For the next cohort, the Bayes risk method selects the
dose that minimizes the posterior expected loss, aka Bayes risk. In contrast, Max
Target Toxicity method selects the dose that maximizes the posterior probability of
targeted toxicity. For both methods, the dose selected must not exceed the EWOC
(Escalation With Overdose Control) threshold: the posterior probability of overdosing,

682

32.4 Bayesian logistic regression model (BLRM) – 32.4.1 Simulation

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
either excessive or unacceptable toxicity, is less than the threshold (e.g., 0.25).

Recall that the BLRM method applies the following model:
logit(πd ) = log(α) + β log(d/d∗ )

(32.1)

The Reference Dose (D*) is the dose at which the odds of observing a DLT is α.
Click the Dose Skipping button, and select Allow skipping any doses /
No Restrictions.

You can specify the prior directly in terms of a bivariate normal distribution for log(α)

32.4 Bayesian logistic regression model (BLRM) – 32.4.1 Simulation

683

<<< Contents

32

* Index >>>

Dose Escalation
and log(β).

Alternatively, if you click Prior Calculator, a calculator will appear allowing you to
specify a prior indirectly by one of three methods: (1) lowest dose and reference dose,
(2) lowest dose and highest dose, or (3) lowest dose and MTD. Click Recalc to convert
the prior inputs into matching bivariate normal parameter values, and click OK to
paste these values into the input window. Appendix N of the manual, and Appendix A
of Neuenschwander et al. (2008) describes some of these methods.

Click the
684

icon to generate a chart of the 95% prior intervals at each dose for

32.4 Bayesian logistic regression model (BLRM) – 32.4.1 Simulation

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
probability of DLT.

Click Posterior Sampling Methods to select from one of two methods: Metropolis
Hastings, or direct Monte Carlo. For this tutorial, click OK to select Direct.

In the Stopping Rules tab, you can specify multiple rules for stopping the trial. The
trial is stopped early and MTD not determined if there is evidence of underdosing.
This rule is identical to that from mTPI: If there is a greater than some threshold
posterior probability that the toxicity at the highest dose is less than the target toxicity,
then stop the trial early.
The remaining stopping rules allow one to stop the trial early with MTD determined.
The Allocation Rule requires a certain number of subjects already allocated to the
32.4 Bayesian logistic regression model (BLRM) – 32.4.1 Simulation

685

<<< Contents

32

* Index >>>

Dose Escalation
next recommended dose. The CI Rule requires that the credible interval for
probability of DLT at the MTD is within some range. The Target Rule requires that
the posterior probability of being in the target toxicity exceeds some threshold. Finally,
any of these rules can be combined with Minimum Ss Observed in the Trial. Check
the appropriate boxes and enter values as below.

In the Response Generation tab, you can specify a set of true dose response curves
from which to simulate. For the Starting Dose, select the lowest dose (5). Leave the
default profile as shown below. If you wish to edit or add additional profiles (dose
response curves), see the corresponding section for the 3+3 design.
In the Simulation Control tab, check the boxes corresponding to Save summary
statistics, Save subject-level data, and Save final posterior samples. These options
will provides access to several charts derived from these more detailed levels of
simulated data. If you wish to display subject-level plots, or posterior distribution
plots, for more than one simulation, you can increase the number. For now, leave both
of these at 1 to save computation time.

Click Simulate to simulate the BLRM design. In the Output Preview toolbar, click
686

32.4 Bayesian logistic regression model (BLRM) – 32.4.1 Simulation

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016

the
icon to save the simulation to the Library. Double-click the simulation
node in the Library to display the simulation output details. Click the Plots icon in the
Library to access a wide range of available plots.

32.4.2

Interim Monitoring

Right-click the Simulation node in the Library, and select Interim Monitoring. This
will open an empty interim monitoring dashboard. Click Enter Interim Data to open
a window in which to enter data for the first cohort: in particular, the Dose Assigned
and the DLTs Observed. Click OK to continue.

The dashboard will be updated accordingly.
The acceptable dose range is on a continuous scale between the minimum and
maximum doses. The upper limit of the acceptable dose range is the largest dose
whose probability of overdosing is less than the EWOC threshold. The lower limit of
the acceptable range is the dose whose DLT rate is equal to the lower limit of the
targeted toxicity interval. When the computed lower limit exceeds the recommended

32.4 Bayesian logistic regression model (BLRM) – 32.4.2 Interim Monitoring

687

<<< Contents

32

* Index >>>

Dose Escalation
dose, it is set to the recommended dose.

In the IM toolbar, click the Plots icon, then Interval Probabilities by Dose and Panel.

Notice that for all doses greater than or equal to 25, the posterior probability of
overdosing exceeds the EWOC threshold (0.25). Of the remaining doses, dose 15
maximizes the probability of targeted toxicity, and is therefore the next recommended

688

32.4 Bayesian logistic regression model (BLRM) – 32.4.2 Interim Monitoring

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
dose.

In the IM toolbar, click the Plots icon, then Predictive Distribution of Number of
DLTs. You can enter a planned cohort size and select a next dose, to plot the posterior
predictive probability of the number of DLTs to be observed in next cohort.

32.4 Bayesian logistic regression model (BLRM) – 32.4.2 Interim Monitoring

689

<<< Contents

32

* Index >>>

Dose Escalation
After each cohort, East will update the Interim Monitoring Dashboard. You may
replace the IM dashboard plots with any other plots or corresponding tables, by
clicking on the associated icons at the top left of each panel.

Continue entering data for each subsequent cohort, and observe how the interim
monitoring dashboard updates. One example is given below.

Click Stop Trial to generate final inference table.

32.5

Bayesian logistic
regression model for
dual-combination
(comb2BLRM)

32.5.1 Simulation
32.5.2 Interim Monitoring

690

32.5.1

Simulation

Click Discrete: Dose Escalation on the Design tab, and then click Two Agents

32.5 BLRM Dual Combination – 32.5.1 Simulation

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
Design: Bayesian Logistic Regression Model for Dual-Combination.

Set the Max. Number of Doses as 4 for both Agent 1 and Agent 2, the Max. Sample
Size as 60, the Cohort Size as 3.
Set the target toxicity interval to 16-35%, with an EWOC criterion of 0.25. Set the
reference doses to 290 and 20 for Agents 1 and 2, respectively.

Click the button for Dose Skipping. These options imply that the dose of only one
compound can be increased for the next cohort (no diagonal escalation), with a

32.5 BLRM Dual Combination – 32.5.1 Simulation

691

<<< Contents

32

* Index >>>

Dose Escalation
maximum increment of 100

The prior distribution is an extension of that for the single-agent BLRM, but includes a
normal prior for the interaction term. As with the single-agent BLRM, you can use the
calculator to transform prior information on particular dose levels to a bivariate normal
for either Agent 1 or Agent 2.In this tutorial, we will simply enter the following values
adapted from Neuenschwander et al. (2015).

In the Stopping Rules tab, you may specify various rules for stopping the trial. The
logical operators (And/Or) follow left-to-right precedence, beginning with the top-most
rule in the table. The order of the stopping rules is determined by the order of selection.
Enter the following inputs as below. Select the Minimum Ss rule first, followed by the
Target Rule, followed by the Allocation Rule. Be sure to select the appropriate
logical operators. This combination of rules implies the MTD dose combination
declared will meet the following conditions: (1) At least 6 patients have already been
allocated to this combination, and (2) This dose satisfies one of the following: (i) The
692

32.5 BLRM Dual Combination – 32.5.1 Simulation

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
probability of targeted toxicity at this combination exceeds 0.5, or (ii) A minimum of
15 subjects have already been observed in the trial.

In the Response Generation tab, enter the following inputs. Make sure that the
starting dose combination is the lowest dose level for each agent.

In the Simulation Control tab, select the following options. In this tutorial, we will
run only 1000 simulations. Click Simulate.

32.5 BLRM Dual Combination – 32.5.1 Simulation

693

<<< Contents

32

* Index >>>

Dose Escalation
In the Output Preview toolbar, click the
icon to Sim1 to the Library.
Double-click Sim1 in the Library to display the simulation output details.
With Sim1 selected in the Library, click the Plots icon to access a wide range of
available plots. Below is an example of the MTD plot, showing the percentage of
simulations that each dose combination was selected as the MTD. The combinations
whose true DLT rates were below, within, and above the target toxicity interval
(0.16 − 0.35) are colored blue, green, and red, respectively.

32.5.2

Interim Monitoring

Right-click the Simulation node in the Library, and select Interim Monitoring. This
will open an empty interim monitoring dashboard. Click Enter Interim Data to open a
window in which to enter data for the first cohort: in particular, the Dose Assigned for

694

32.5 BLRM Dual Combination – 32.5.2 Interim Monitoring

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
each agent, the Subjects Allocated and the DLTs Observed. Click OK to continue.

The next recommended dose is 100 mg for Agent 1 and 20 mg for Agent 2.

Recall that the dose skipping constraints are that the dose increment cannot exceed
100% of the current dose, and that only one compound can be increased. Of the
eligible dose combinations, the recommended one has the highest probability of

32.5 BLRM Dual Combination – 32.5.2 Interim Monitoring

695

<<< Contents

32

* Index >>>

Dose Escalation
targeted toxicity.

You may replace the IM dashboard plots with any other plots or corresponding tables,
by clicking on the associated icons at the top left of each panel. For example, change
the left-hand plot to Dose Limiting Toxicity to view the number of subjects and DLTs

696

32.5 BLRM Dual Combination – 32.5.2 Interim Monitoring

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
observed at each dose combination.

Continue in this manner by clicking Enter Interim Data, entering the following
doses, and the corresponding number of DLTs.

The recommended MTD combination is 200 mg for Agent 1 and 30 mg for Agent 2.

32.5 BLRM Dual Combination – 32.5.2 Interim Monitoring

697

<<< Contents

32
32.6

* Index >>>

Dose Escalation
Product of
Independent beta
Probabilities dose
Escalation (PIPE)

32.6.1

Simulation

One of the core concepts underlying the PIPE method is the maximum tolerated
contour (MTC), a line partitioning the dose combination space into toxicity
probabilities either less than or greater than the target. The recommended dose
combination at the end of the trial is the dose combination closest from below to the
MTC. The following figures from Mander and Sweeting (2015) illustrate the MTC,
and the related concepts of admissible dose combinations (adjacent or closest) and
dose skipping options (neighborhood vs non-neighborhood constraint).
This figure below shows six monotonic MTCs for two agents, each with two dose
levels.

After each cohort, the most likely contour is selected before applying a dose selection
strategy. The next dose combination is chosen from a set of admissible doses, which
are either closest to the most likely contour, or adjacent. In the figure below, all the (X)
and (+) symbols are considered adjacent. Of these, the (X) symbols represent the
closest doses.

698

32.6 PIPE – 32.6.1 Simulation

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016

Of the admissible doses, the next dose combination chosen is that with the minimum
sample size, where sample size is defined as the prior and trial sample size
combined. The weighted randomization method selects one of the
admissible doses at random, with selection probabilities weighted by the inverse of
their sample size.
For dose skipping options, one can choose between a neighborhood constraint, or a
non-neighborhood constraint. The neighborhood constraint restricts the set
of admissible doses to those a single dose level higher or lower than the current dose
for both agents, while the non-neighborhood constraint restricts the set of
admissible doses to a single dose level higher or lower than any previously
administered dose combination.
This figure below illustrates the neighborhood constraint, at two different cohorts.
Only those combinations within the dashed box are admissible. The asterisk symbol
on the left represents the admissible dose combination closest to the MTC.

32.6 PIPE – 32.6.1 Simulation

699

<<< Contents

32

* Index >>>

Dose Escalation

This figure below illustrates the non-neighborhood constraint. The set of admissible
doses is now larger because all previously administered doses are included.

Finally, there is a safety constraint threshold to avoid overdosing. Averaging over the
posterior distribution of all monotonic contours, the expected probability of being
above the MTC is calculated for all dose combinations. Those dose combinations
whose expected probabilities exceed the safety threshold are excluded from the
admissible set.
700

32.6 PIPE – 32.6.1 Simulation

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
Click Discrete: Dose Escalation on the Design tab, and then click Two Agents
Design: Product of Independent Beta Probabilities Dose Escalation.

In the Design Parameters tab, select the following options.

In addition to the Closest and Adjacent options for Admissible Dose
Combinations, there is also an Interval option. This allows you to specify a
margin  around the target toxicity level to define the admissible dose set, rather than
relying on the MTC. Dose combinations whose posterior mean toxicity risk lies in the
specified target interval (PT ± ) are considered admissible.
For the prior specification, enter the following values. When entering the same prior
32.6 PIPE – 32.6.1 Simulation

701

<<< Contents

32

* Index >>>

Dose Escalation
sample size for each dose combination, a value of 1 considered a strong prior, whereas
a value of 1 divided by the number of combinations can be considered a weak prior
(Mander & Sweeting, 2015).

In the Stopping Rules tab, there are a number of options similar to those from other
designs. However, for this tutorial, leave these options unchecked.

Similarly, leave the default options in the Response Generation tab. In this tutorial,
the true probabilities of toxicity will be in agreement with the prior medians specified
702

32.6 PIPE – 32.6.1 Simulation

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
above.

In the Simulation Controls tab, you can run 1000 simulations, although the PIPE
method runs relatively quickly.

In the Output Preview toolbar, click the
icon to save the simulation to the
Library. Double-click the simulation node in the Library to display the simulation
output details.
In the MTD Analysis table, you can see that the (Agent 1, Agent 2) dose combinations
selected most often as MTD were: (300, 10) at 22.1% and (300, 20) at 20.8%. The true

32.6 PIPE – 32.6.1 Simulation

703

<<< Contents

32

* Index >>>

Dose Escalation
probabilities of toxicity at these combinations were 0.24 and 0.28, respectively.

32.6.2

Interim Monitoring

Right-click the Simulation node in the Library, and select Interim Monitoring. This
will open an empty interim monitoring dashboard. Click Enter Interim Data to open a
window in which to enter data for the first cohort: in particular, the Dose Assigned for

704

32.6 PIPE – 32.6.2 Interim Monitoring

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
each agent, the Subjects Allocated and the DLTs Observed. Click OK to continue.

The next recommended dose is 200 mg for Agent 1 and 20 mg for Agent 2.

Recall that the dose skipping constraints allow for diagonal escalation (that is,
escalation on both agents at the same time), but the neighborhood constraint
restricts the set of admissible doses to a single dose level higher or lower than the
current dose. Given these constraints, the dose combination (200, 10) is the only
combination closest to the most probable MTC.
The MTC plot allows you to view the most probable MTC, the current dose, the

32.6 PIPE – 32.6.2 Interim Monitoring

705

<<< Contents

32

* Index >>>

Dose Escalation
recommended dose, and all tried doses.

You may replace the IM dashboard plots with any other plots or corresponding tables,
by clicking on the associated icons at the top left of each panel.
Continue in this manner by clicking Enter Interim Data, entering the following
doses, and the corresponding number of DLTs.

Click Stop Trial. The recommended MTD combination is 200 mg for Agent 1 and 10
706

32.6 PIPE – 32.6.2 Interim Monitoring

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
mg for Agent 2. The recommended MTD combination must meet three criteria: (i)
closest to MTC from below, (ii) have been experimented on, and (iii) below safety
threshold. If there is no dose combination satisfying all three criteria, the MTD will be
undetermined.

32.6 PIPE – 32.6.2 Interim Monitoring

707

<<< Contents

* Index >>>

Volume 4

Exact Binomial Designs

33 Introduction to Volume 8

709

34 Binomial Superiority One-Sample – Exact

714

35 Binomial Superiority Two-Sample – Exact

736

36 Binomial Non-Inferiority Two-Sample – Exact
37 Binomial Equivalence Two-Sample – Exact
38 Binomial Simon’s Two-Stage Design

774

751
767

<<< Contents

* Index >>>

33

Introduction to Volume 8

This volume describes various cases of clinical trials using binomial endpoints where
the asymptotic normal approximation to the test statistic may fail. This is often the
case in situations where the trial sample size is too small, however testing and analysis
based on the exact binomial distribution would provide valid results. Asymptotic tests
may also fail when proportions are very close to the boundary, namely zero or one.
These exact methods can be applied in situations where the normal approximation is
adequate, in which case the solutions to both the exact and asymptotic method would
converge to the same result.
Using exact computations, Chapter 34 deals with the design and interim monitoring of
a one sample test of superiority for proportion. The first section discusses a fixed and
group sequential design in which an observed binomial response rate is compared to a
fixed response rate. The following section illustrates how, for a fixed sample,
McNemar’s conditional test can be used to compare matched pairs of binomial
responses.
Chapters 35 through 37 illustrates how to use East to design two-sample exact tests of
superiority, non-inferiority and equivalence, including examples for both the difference
and ratio of proportions.
Chapter 38 describes Simon’s two stage design in an exact setting, which computes the
expected minimal sample size of a trial that may be stopped due to futility or continue
to a second stage to further study efficacy and safety.
It is important to note that all exact tests work with only integer values for sample size,
and will override the Design Defaults - Common: Do not round off sample
size/events flag in the Options menu. Whenever the Perform Exact Computations
check box is selected in the Design Input Output dialog box, resulting sample sizes
will be converted to an integer value for all computations, including power and
chart/table values.

709

<<< Contents

33
33.1

* Index >>>

Introduction to Volume 8
Settings

Click the

icon in the Home menu to adjust default values in East 6.

The options provided in the Display Precision tab are used to set the decimal places of
numerical quantities. The settings indicated here will be applicable to all tests in East 6
under the Design and Analysis menus.
All these numerical quantities are grouped in different categories depending upon their
usage. For example, all the average and expected sample sizes computed at simulation
or design stage are grouped together under the category ”Expected Sample Size”. So to
view any of these quantities with greater or lesser precision, select the corresponding
category and change the decimal places to any value between 0 to 9.
The General tab has the provision of adjusting the paths for storing workbooks, files,
and temporary files. These paths will remain throughout the current and future sessions
even after East is closed. This is the place where we need to specify the installation
directory of the R software in order to use the feature of R Integration in East 6.

710

33.1 Settings

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
The Design Defaults is where the user can change the settings for trial design:

Under the Common tab, default values can be set for input design parameters.
You can set up the default choices for the design type, computation type, test type and
the default values for type-I error, power, sample size and allocation ratio. When a new
design is invoked, the input window will show these default choices.
Time Limit for Exact Computation
This time limit is applicable only to exact designs and charts. Exact methods are
computationally intensive and can easily consume several hours of computation
time if the likely sample sizes are very large. You can set the maximum time
available for any exact test in terms of minutes. If the time limit is reached, the
test is terminated and no exact results are provided. Minimum and default value
is 5 minutes.
Type I Error for MCP
If user has selected 2-sided test as default in global settings, then any MCP will
use half of the alpha from settings as default since MCP is always a 1-sided test.
Sample Size Rounding
Notice that by default, East displays the integer sample size (events) by rounding
up the actual number computed by the East algorithm. In this case, the
look-by-look sample size is rounded off to the nearest integer. One can also see
the original floating point sample size by selecting the option ”Do not round
sample size/events”.
33.1 Settings

711

<<< Contents

33

* Index >>>

Introduction to Volume 8
Under the Group Sequential tab, defaults are set for boundary information. When a
new design is invoked, input fields will contain these specified defaults. We can also
set the option to view the Boundary Crossing Probabilities in the detailed output. It can
be either Incremental or Cumulative.

Simulation Defaults is where we can change the settings for simulation:

If the checkbox for ”Save summary statistics for every simulation” is checked, then
East simulations will by default save the per simulation summary data for all the
712

33.1 Settings

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
simulations in the form of a case data.
If the checkbox for ”Suppress All Intermediate Output” is checked, the intermediate
simulation output window will be always suppressed and you will be directed to the
Output Preview area.
The Chart Settings allows defaults to be set for the following quantities on East6
charts:

We suggest that you do not alter the defaults until you are quite familiar with the
software.

33.1 Settings

713

<<< Contents

* Index >>>

34

Binomial Superiority One-Sample –
Exact

This chapter deals with the design and interim monitoring of tests involving binomial
response rates using exact computations. Section 34.1 discusses a fixed sample and
group sequential design in which an observed binomial response rate is compared to a
fixed response rate. In Section 34.2, McNemar’s conditional test for comparing
matched pairs of binomial responses for a fixed sample is discussed.

34.1

Binomial OneSample

34.1.1 Trial Design
34.1.2 Interim Monitoring

In experimental situations where the variable of interest has a binomial distribution, it
may be of interest to determine whether the response rate π differs from a fixed value
π0 . Specifically, we wish to test the null hypothesis H0 : π = π0 against one-sided
alternatives of the form H1 : π > π0 or H1 : π < π0 . Either the sample size or power is
determined for a specified value of π which is consistent with the alternative
hypothesis, denoted as π1 .

34.1.1

Trial Design

Consider a single-arm oncology trial designed to determine if the tumor response rate
for a new cytotoxic agent is at least 15%. Thus it is desired to test the null hypothesis
H0 : π = 0.15 against the one-sided alternative hypothesis H1 : π > 0.15. The trial will
be designed using a one-sided test that achieves 80% power at π = π1 = 0.25 with a
level α = 0.05 test.
Single-Look Design
To illustrate this example, in East under the Design ribbon for
Discrete data, click One Sample and then choose Single Arm Design: Single
Proportion:

714

34.1 Binomial One-Sample – 34.1.1 Trial Design

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
This will launch the following input window:

Leave the default values of Design Type: Superiority and Number of Looks: 1. In
the Design Parameters dialog box, select the Perform Exact Computations
checkbox and enter the following parameters:
Test Type: 1 sided
Type 1 Error (α): 0.05
Power: 0.8
Sample Size (n): Computed (select radio button)
Prop. Response under Null (π0 ): 0.15
Prop. Response under Alt (π1 ): 0.25

34.1 Binomial One-Sample – 34.1.1 Trial Design

715

<<< Contents

34

* Index >>>

Binomial Superiority One-Sample – Exact
Click Compute. The sample size for this design is calculated and the results are shown
as a row in the Output Preview window:

The sample size required in order to achieve 80% power is 110 subjects. Note that
because of the discreteness involved in performing exact computations, the attained
type-1 error is 0.035, less than the specified value of 0.05. Similarly, the attained
power is 0.81, slightly larger than the specified value of 0.80.
As is standard in East, this design has the default name Des 1. To see a summary of
the output of this design, click anywhere in the row and then click the
icon in the
Output Preview toolbar. The design details will be displayed in the upper pane,
labeled Output Summary. This can be saved to the Library by selecting Des 1 and

716

34.1 Binomial One-Sample – 34.1.1 Trial Design

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016

clicking the

icon.

The design details can be displayed by clicking the

icon.

The critical point, or the boundary set for the rejection of H0 is 24 (on the # response
scale). Therefore out of 110 subjects, if the observed number of patients responding to
the new treatment exceeds 24, the null hypothesis will be rejected in favor of declaring
the new treatment to be superior. This can also be seen using both a response scale and
proportion scale in either the Stopping Boundaries chart or table, available in the

34.1 Binomial One-Sample – 34.1.1 Trial Design

717

<<< Contents

34

* Index >>>

Binomial Superiority One-Sample – Exact
Library

Three-Look Design
In order to reach an early decision and enter into comparative
trials, conduct this single-arm study as a group sequential trial with a maximum of 3
718

34.1 Binomial One-Sample – 34.1.1 Trial Design

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016

looks. Create a new design by selecting Des1 in the Library, and clicking the
icon on the Library toolbar. To generate a study with two interim looks and a final
analysis, change the Number of Looks from 1 to 3. A Boundary Info tab will appear,
which allows the specification of parameters for the Efficacy and Futility boundary
families. By default, an efficacy boundary to reject H0 is selected, however there is no
futility boundary to reject H1 . The Boundary Family specified is of the Spending
Functions type and the default Spending Function is the Lan-DeMets (Lan &
DeMets, 1983), with Parameter OF (O’Brien-Fleming). The default Spacing of
Looks is Equal, therefore the interim analyses will be equally spaced by the number
of patients accrued between looks.

Return to the the Design Parameters dialog box. The binomial parameters π0 = 0.15

34.1 Binomial One-Sample – 34.1.1 Trial Design

719

<<< Contents

34

* Index >>>

Binomial Superiority One-Sample – Exact
and π1 = 0.25 are already specified. Click Compute to generate this exact design:

The maximum sample size is again 110 subjects with 110 also expected under the null
hypothesis H0 : π = 0.15 and 91 expected when the true value is π=0.25. Save this
design to the Library.

720

34.1 Binomial One-Sample – 34.1.1 Trial Design

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016

The details for Des2 can be displayed by clicking the

icon.

Here we can see the cumulative sample size and cumulative type 1 error (α) spent at
each of the three looks. The boundaries set for the rejection of H0 at each look are
14, 19 and 24 (on the # response scale). For example, at the second look with a
cumulative 73 subjects, if the observed number of patients responding to the new
treatment exceeds 19, the null hypothesis would be rejected in favor of declaring the
new treatment to be superior. In addition, the incremental boundary crossing
probabilities under both the null and alternative are displayed for each look.
The cumulative boundary stopping probabilities can also be seen in the Stopping
Boundaries chart and table. Select Des 2 in the Library, click the
icon and

34.1 Binomial One-Sample – 34.1.1 Trial Design

721

<<< Contents

34

* Index >>>

Binomial Superiority One-Sample – Exact
choose Stopping Boundaries.

The default scale is # Response Scale. The Proportion Scale can also be

722

34.1 Binomial One-Sample – 34.1.1 Trial Design

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
chosen from the drop-down list Boundary Scale in the chart.

To examine the Error Spending function click the

34.1 Binomial One-Sample – 34.1.1 Trial Design

icon in the Library and

723

<<< Contents

34

* Index >>>

Binomial Superiority One-Sample – Exact
choose Error Spending.

When the sample size for a study is subject to external constraints, power can be
computed for a specified maximum sample size. Suppose for the previous design the
total sample size is constrained to be at most 80 subjects. Create a new design by
editing Des2 in the Library. Change the parameters so that the trial is now designed to
compute power for a maximum sample size of 80 subjects, as shown below.

724

34.1 Binomial One-Sample – 34.1.1 Trial Design

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
The trial now attains only 73.9% power.

Power vs Sample size-Sawtooth paradigm
Generate the Power vs. Sample Size
graph for Des 2. You will get the following power chart which is commonly described
in the literature as a sawtooth chart.

This chart illustrates that it is possible to have designs with different sample sizes but
all with the same power. What is not apparent is that for designs with the same power,
the attained significance level may vary. Upon examination, the sample sizes of 43 and
55 seem to have the same power of about 0.525. The data can also be displayed in a
34.1 Binomial One-Sample – 34.1.1 Trial Design

725

<<< Contents

34

* Index >>>

Binomial Superiority One-Sample – Exact
chart form by selecting the
icon in the Library, and can be printed from here as
well. Compute the power for two new designs based on Des 2 with sample sizes of 43
and 55 respectively.

Although sample sizes of 43 and 55 attain nearly same power, the attained significance
levels are different, at 0.049 and 0.031 respectively. Though both are less than the
design specification of 0.05, the plan with lower sample size of 43 pays a higher
penalty in terms of type-1 error than the plan with a larger sample size of 55.

34.1.2

Interim Monitoring

Consider the interim monitoring of Des 2, which has 80% power. Select this design

726

34.1 Binomial One-Sample – 34.1.2 Interim Monitoring

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
from the Library and click the

icon.

Suppose at the first interim look, when 40 subjects have enrolled, the observed
cumulative response is 12. Click the Enter Interim Data button at the top left of the
Interim Monitoring window. Enter 40 for the Cumulative Sample Size and 12

34.1 Binomial One-Sample – 34.1.2 Interim Monitoring

727

<<< Contents

34

* Index >>>

Binomial Superiority One-Sample – Exact
for the Cumulative Response in the Test Statistic Calculator window.

At the second interim monitoring time point when 80 subjects have enrolled, suppose
the cumulative responses increases to 20. Again click the Enter Interim Data button
at the top left of the
Interim Monitoring window. Enter 80 for the Cumulative
Sample Size and 20 for the Cumulative Response in the Test Statistic Calculator
window. This will result in the following message:

728

34.1 Binomial One-Sample – 34.1.2 Interim Monitoring

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016

It can be concluded that π > 0.15 and the trial should be terminated. Clicking on
Stop results in the final analysis.

34.2

McNemar’s
Conditional Exact
Test

McNemar’s conditional test is used in experimental situations where paired
comparisons are observed. In a typical application, two binary response measurements
are made on each subject – perhaps from two different treatments, or from two
different time points. For example, in a comparative clinical trial, subjects are matched
on baseline demographics and disease characteristics and then randomized with one
subject in the pair receiving the experimental treatment and the other subject receiving
the control. Another example is the crossover clinical trial in which each subject
receives both treatments. By random assignment, some subjects receive the
experimental treatment followed by the control while others receive the control
followed by the experimental treatment. Let πc and πt denote the response
probabilities for the control and experimental treatments, respectively. The probability
parameters for this test are displayed in Table 34.1.
34.2 McNemar’s Conditional Exact Test

729

<<< Contents

34

* Index >>>

Binomial Superiority One-Sample – Exact
Table 34.1: A 2 x 2 Table of Probabilities for McNemar’s Conditional Exact Test

Control
No Response
Response
Total Probability

Experimental
No Response Response
π00
π01
π10
π11
1 − πt
πt

Total
Probability
1 − πc
πc
1

The null hypothesis
H0 : πc = πt
is tested against the alternative hypothesis
H1 : πc > πt
(or H1 : πc < πt ) for the one-sided testing problem. Since πt = πc if and only if
π01 = π10 , the null hypothesis is also expressed as
H0 : π01 = π10 ,
is tested against corresponding one-sided alternative. The power of this test depends on
two quantities:
1. The difference between the two discordant probabilities (which is also the
difference between the response rates of the two treatments)
δ = π01 − π10 = πt − πc ;
2. The sum of the two discordant probabilities
ξ = π10 + π01 .

East accepts these two parameters as inputs at the design stage.

34.2.1

Trial Design

Consider a trial in which we wish to determine whether a transdermal delivery system
(TDS) can be improved with a new adhesive. Subjects are to wear the old TDS
(control) and new TDS (experimental) in the same area of the body for one week each.
A response is said to occur if the TDS remains on for the entire one-week observation
730

34.2 McNemar’s Conditional Exact Test – 34.2.1 Trial Design

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
period. From historical data, it is known that control has a response rate of 85%
(πc = 0.85). It is hoped that the new adhesive will increase this to 95% (πt = 0.95).
Furthermore, of the 15% of the subjects who did not respond on the control, it is hoped
that 87% will respond on the experimental system. That is, π01 = 0.87 × 0.15 = 0.13.
Based on these data, we can fill in all the entries of Table 34.1 as displayed in
Table 34.2.
Table 34.2: McNemar Probabilities for the TDS Trial

Control
No Response
Response
Total Probability

Experimental
No Response Response
0.02
0.13
0.03
0.82
0.05
0.95

Total
Probability
0.15
0.85
1

As it is expected that the new adhesive will increase the adherence rate, the comparison
is posed as a one-sided testing problem, testing H0 : πc = πt against H1 : πc < πt at
the 0.05 level. We wish to determine the sample size to have 90% power for the values
displayed in Table 34.2.
To illustrate this example, in East under the Design ribbon for Discrete data, click
One Sample and then choose Paired Design: McNemar’s:

34.2 McNemar’s Conditional Exact Test – 34.2.1 Trial Design

731

<<< Contents

34

* Index >>>

Binomial Superiority One-Sample – Exact
This will launch the following input window:

Leave the default values of Design Type: Superiority and Number of Looks: 1. In
the Design Parameters dialog box, select the Perform Exact Computations
checkbox and enter the following parameters:
Test Type: 1 sided
Type 1 Error (α): 0.05
Power: 0.9
Sample Size (n): Computed (select radio button)
Difference in Probabilities (δ1 = πt − πc ): 0.1
Prop. of Discordant Pairs (ξ = π01 + π10 ): 0.16

Click Compute. The sample size for this design is calculated and the results are shown
732

34.2 McNemar’s Conditional Exact Test – 34.2.1 Trial Design

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
as a row in the Output Preview window:

The sample size required in order to achieve 90% power is 139 subjects.
As is standard in East, this design has the default name Des 1. To see a summary of
icon in the
the output of this design, click anywhere in the row and then click the
Output Preview toolbar. The design details will be displayed in the upper pane,
labeled Output Summary. This can be saved to the Library by selecting Des 1 and
clicking the

icon.

34.2 McNemar’s Conditional Exact Test – 34.2.1 Trial Design

733

<<< Contents

34

* Index >>>

Binomial Superiority One-Sample – Exact
The design details can be displayed by clicking the

icon.

The critical point, or the boundary set for the rejection of H0 is 1.645
It is important to note that in this exact computation the displayed sample size may not
be unique due to the discreteness of the distribution. This can be seen in the Power Vs
Sample Size graph, which is a useful tool along with its corresponding table, and can
be used to find all other sample sizes that guarantee the desired power. These visual

734

34.2 McNemar’s Conditional Exact Test

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
tools are available in the Library under the Plots and Tables menus.

34.2 McNemar’s Conditional Exact Test

735

<<< Contents

* Index >>>

35

Binomial Superiority Two-Sample –
Exact

In many experiments based on binomial data, the aim is to compare independent
samples from two populations in terms of the proportion of sampling units presenting a
given trait. In medical research, outcomes such as the proportion of patients
responding to a therapy, developing a certain side effect, or requiring specialized care,
would satisfy this definition. East exact tests support the design and monitoring of
clinical trials in which this comparison is based on either the difference of proportions
or ratio of proportions of the two populations. These two cases are discussed in
Sections 35.1, and 35.2 respectively.
Caution: The methods presented in this chapter are computationally intensive and
could consume several hours of computer time if the exact sample sizes are very large.
Here are some guidelines:
1. Estimate the likely sample size under the Exact method by first determining
the asymptotic sample size
2. If the exact sample size is likely to be larger than 1000, computing power is
preferable to computing the sample size

35.1

Difference of
Two Binomial
Proportions

Let πc and πt denote the binomial probabilities for the control and treatment arms,
respectively, and let δ = πt − πc . Interest lies in testing the null hypothesis that δ = 0
against one and two-sided alternatives.

35.1.1 Trial Design
The technical details of the sample size computations for this option are given in
Appendix V.

35.1.1

Trial Design

In a clinical study, an experimental drug coded Y73 is to be compared with a control
drug coded X39 to treat chronic hepatitis B infected adult patients. The primary end
point is histological improvement as determined by Knodell Scores at week 48 of
treatment period. It is estimated that the proportion of patients who are likely to show
histological improvement under treatment X39 to be 25% and under the treatment
Y73, as much as 60%. A one-sided fixed sample study is to be designed with α = 0.05
and 90% power.
Single Look Design
To illustrate this example, in East under the Design ribbon for
Discrete data, click Two Samples and then choose Parallel Design: Difference of

736

35.1 Difference of Two Binomial Proportions – 35.1.1 Trial Design

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
Proportions:

This will launch the following input window:

The goal of this study is to test the null hypothesis, H0 , that the X39 and Y73 arms
both have an event rate of 25%, versus the alternative hypothesis, H1 , that Y73
increases the event rate by 35%, from 25% to 60%. This will be a one-sided test with a
single fixed look at the data, a type-1 error of α = 0.05 and a power of (1 − β) = 0.9.
Leave the default values of Design Type: Superiority and Number of Looks: 1. In
the Design Parameters dialog box, select the Perform Exact Computations
checkbox and enter the following parameters:
Test Type: 1 sided (required)
Type 1 Error (α): 0.05
Power: 0.9
Sample Size (n): Computed (select radio button)
Prop. under Control (πc ): 0.25
Prop. under Treatment (πt ): 0.6
Diff. in Prop. (δ1 = πt − πc ): (will be calculated)
35.1 Difference of Two Binomial Proportions – 35.1.1 Trial Design

737

<<< Contents

35

* Index >>>

Binomial Superiority Two-Sample – Exact

Click Compute. The sample size for this design is calculated and the results are shown
as a row in the Output Preview window:

The sample size required in order to achieve 90% power is 68 subjects. Note that
because of the discreteness involved in performing exact computations, the attained
type-1 error is 0.049, slightly less than the specified value of 0.05. Similarly, the
attained power is 0.905, slightly larger than the specified value of 0.90.
As is standard in East, this design has the default name Des 1. To see a summary of
the output of this design, click anywhere in the row and then click the
icon in the
Output Preview toolbar. The design details will be displayed in the upper pane,
labeled Output Summary. This can be saved to the Library by selecting Des 1 and

738

35.1 Difference of Two Binomial Proportions – 35.1.1 Trial Design

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016

clicking the

icon.

The design details can be displayed by clicking the

icon.

It is important to note that in this exact computation the displayed sample size may not
be unique due to the discreteness of the distribution. This can be seen in the Power Vs
Sample Size graph, which is a useful tool along with its corresponding table, and can
be used to find all other sample sizes that guarantee the desired power. These visual
35.1 Difference of Two Binomial Proportions – 35.1.1 Trial Design

739

<<< Contents

35

* Index >>>

Binomial Superiority Two-Sample – Exact
tools are available in the Library under the Plots and Tables menus.

In tabular form:

740

35.1 Difference of Two Binomial Proportions – 35.1.1 Trial Design

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016

35.1 Difference of Two Binomial Proportions – 35.1.1 Trial Design

741

<<< Contents

35

* Index >>>

Binomial Superiority Two-Sample – Exact
The critical point, or the boundary set for the rejection of H0 is 1.715 attained at
πU = 0.371 (on the Z scale) and 0.176 (on the δ scale). If the observed test statistic
exceeds this boundary the null will be rejected in favor of declaring the new treatment
to be superior. This can also be seen in the Stopping Boundaries chart and table,
available in the Library.

The Power vs. Treatment Effect chart dynamically generates power under this design
for all values of treatment effect δ = πt − πc . Here it is easy to see how as treatment
effect size increases (H1 : alternative treatment is superior) the power of the study
742

35.1 Difference of Two Binomial Proportions – 35.1.1 Trial Design

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
reaches the desired 90%. This is available in tabular form as well.

35.2

Ratio of Two
Let πc and πt denote the binomial probabilities for the control and treatment arms,
Binomial Proportions respectively, and let ρ = πt /πc . It is of interest to test the null hypothesis that ρ = 1
against a one-sided alternative.
The technical details of the sample size computations for this option are given in
Appendix V.

35.2.1

Trial Design

In a clinical study, an experimental drug coded Y73 is to be compared with a control
drug coded X39 to treat chronic hepatitis B infected adult patients. The primary end
point is histological improvement as determined by Knodell Scores at week 48 of
treatment period. It is estimated that the proportion of patients who are likely to show
histological improvement under treatment coded X39 to be 25% and under the
treatment coded Y73 as much as 60%, that is 2.4 times the rate for X39. A single look,
one-sided fixed sample study is to be designed with α = 0.05 and 90% power.
Single Look Design
35.2 Ratio of Two Binomial Proportions – 35.2.1 Trial Design

743

<<< Contents

35

* Index >>>

Binomial Superiority Two-Sample – Exact
To illustrate this example, in East under the Design ribbon for Discrete data, click
Two Samples and then choose Parallel Design: Ratio of Proportions:

This will launch the following input window:

Leave the default values of Design Type: Superiority and Number of Looks: 1. In
the Design Parameters dialog box, select the Perform Exact Computations
checkbox and enter the following parameters:
Test Type: 1 sided (required)
Type 1 Error (α): 0.05
Power: 0.9
Sample Size (n): Computed (select radio button)
Prop. under Control(πc ): 0.25
Prop. under Treatment (πt ): (will be calculated to be 0.6)
Ratio of Proportions (ρ1 ): 2.4

744

35.2 Ratio of Two Binomial Proportions – 35.2.1 Trial Design

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016

Click Compute. The sample size for this design is calculated and the results are shown
as a row in the Output Preview window:

The sample size required in order to achieve 90% power is 72 subjects. Note that
because of the discreteness involved in performing exact computations, the attained
type-1 error is 0.046, less than the specified value of 0.05. Similarly, the attained
power is 0.903, slightly larger than the specified value of 0.90.
As is standard in East, this design has the default name Des 1. To see a summary of
the output of this design, click anywhere in the row and then click the
icon in the
Output Preview toolbar. The design details will be displayed in the upper pane,
labeled Output Summary. This can be saved to the Library by selecting Des 1 and

35.2 Ratio of Two Binomial Proportions – 35.2.1 Trial Design

745

<<< Contents

35

* Index >>>

Binomial Superiority Two-Sample – Exact
clicking the

icon.

Design details can be displayed by clicking the

icon.

It is important to note that in this exact computation the displayed sample size may not
be unique due to the discreteness of the distribution. This can be seen in the Power Vs
Sample Size graph, which is a useful tool along with its corresponding table, and can
be used to find all other sample sizes that guarantee the desired power. These visual

746

35.2 Ratio of Two Binomial Proportions – 35.2.1 Trial Design

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
tools are available in the Library under the Plots and Tables menus.

35.2 Ratio of Two Binomial Proportions – 35.2.1 Trial Design

747

<<< Contents

35

* Index >>>

Binomial Superiority Two-Sample – Exact
In tabular form:

The critical point, or the boundary set for the rejection of H0 is 1.813 (on the Z scale).
If the observed test statistic exceeds this boundary the null will be rejected in favor of
declaring the new treatment to be superior. This boundary can be seen in terms of the
observed ratio (0.916 on the ln(ρ) scale and 2.5 on the ρ scale) in the Stopping

748

35.2 Ratio of Two Binomial Proportions – 35.2.1 Trial Design

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
Boundaries chart and table, available in the Library.

The Power vs. Treatment Effect chart dynamically generates power under this design
for all values of treatment effect (ratio or ln(ratio) scale). Here it is easy to see how as
the ratio (treatment effect size) increases (H1 : the new treatment is superior) the power

35.2 Ratio of Two Binomial Proportions

749

<<< Contents

35

* Index >>>

Binomial Superiority Two-Sample – Exact
of the study reaches the desired 0.9%. This is available in tabular form as well.

750

35.2 Ratio of Two Binomial Proportions

<<< Contents

* Index >>>

36

Binomial Non-Inferiority Two-Sample
– Exact

In a non-inferiority trial, the goal is to establish that the response rate of an
experimental treatment is no worse than that of an established control. A therapy that
is demonstrated to be non-inferior to the current standard therapy might be an
acceptable alternative if, for instance, it is easier to administer, cheaper, or less toxic.
Non-inferiority trials are designed by specifying a non-inferiority margin, which is the
acceptable amount by which the response rate on the experimental arm can be less than
the response rate on the control arm. If the experimental response rate falls within this
margin, the new treatment can claim to be non-inferior. This chapter presents the
design of non-inferiority trials in which this margin is expressed as either the difference
between or the ratio of two binomial proportions. The difference is examined in
Section 36.1 and is followed by two formulations for the ratio in Section 36.2.
Caution: The methods presented in this chapter are computationally intensive and
could consume several hours of computer time if the exact sample sizes are very large.
Here are some guidelines:
1. Estimate the likely sample size under the Exact method by first determining
the asymptotic sample size
2. If the exact sample size is likely to be larger than 1000, computing power is
preferable to computing the sample size

36.1

Difference of
Proportions

Let πc and πt denote the response rates for the control and experimental treatments,
respectively. Let δ = πt − πc . The null hypothesis is specified as

36.1.1 Trial Design

H0 : δ = δ 0
and is tested against one-sided alternative hypotheses. If the occurrence of a response
denotes patient harm rather than benefit, then δ0 > 0 and the alternative hypothesis is
H1 : δ < δ 0
or equivalently as
H1 : πc > πt − δ0 .
Conversely, if the occurrence of a response denotes patient benefit rather than harm,
then δ0 < 0 and the alternative hypothesis is
H1 : δ > δ 0
or equivalently as
H1 : πc < πt − δ0 .
36.1 Difference of Proportions

751

<<< Contents

36

* Index >>>

Binomial Non-Inferiority Two-Sample – Exact
For any given πc , the sample size is determined by the desired power at a specified
value of δ = δ1 . A common choice is δ1 = 0 (or equivalently πt = πc ) but East allows
the study to be powered at any value of δ1 which is consistent with the choice of H1 .
Let π̂t and π̂c denote the estimates of πt and πc based on nt and nc observations from
the experimental and control treatments, respectively. The test statistic is
Z=

δ̂ − δ0
se(δ̂)

(36.1)

where
δ̂ = π̂t − π̂c
and

s
se(δ̂) =

π̃t (1 − π̃t ) π̃c (1 − π̃c )
+
.
nt
nc

(36.2)

(36.3)

Here π̃t and π̃c are the restricted maximum likelihood estimates of πt and πc . For more
details refer to Appendix V.

36.1.1

Trial Design

To evaluate the efficacy and safety of drug A vs. drug B in antiretroviral naive
HIV-infected individuals, a phase3, 52 week double-blind randomized study is
conducted. The primary response measure is the proportion of patients with HIV-RNA
levels ¡ 50 copies/mL. The study is a non-inferiority designed trial where a standard
drug A is expected to have a response rate of 80% and a new experimental drug B is to
be compared under a non-inferiority margin of 20% (δ0 = 0.20). For these studies,
inferiority is assumed as the null hypothesis and is to be tested against the alternative
of non-inferiority using a one-sided test. Therefore under the null hypothesis H0 :
πc = 0.8 and πt = 0.60. We will test this hypothesis against H1 , that both response
rates are equal to the null response rate of the control arm, i.e. δ1 = 0. Thus, under H1 ,
we have πc = πt = 0.8. East will be used to conduct a one-sided α = 0.025 level test
with 90% power.
Single Look Design
To illustrate this example, in East under the Design ribbon for
Discrete data, click Two Samples and then choose Parallel Design: Difference of

752

36.1 Difference of Proportions – 36.1.1 Trial Design

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
Proportions:

This will launch the following input window:

Change Design Type: Noninferiority and keep Number of Looks: 1. In the Design
Parameters dialog box, select the Perform Exact Computations checkbox and enter
the following parameters:
Test Type: 1 sided (required)
Type 1 Error (α): 0.025
Power: 0.9
Sample Size (n): Computed (select radio button)
Specify Proportion Response
Prop. under Control (πc ): 0.8
Specify Null Hypothesis
Prop. under Treatment (πt0 ): 0.6
Noninferiority margin (δ0 ): -0.2 (will be calculated)
Specify Alternative Hypothesis
Prop. under Treatment (πt1 ): 0.8
36.1 Difference of Proportions – 36.1.1 Trial Design

753

<<< Contents

36

* Index >>>

Binomial Non-Inferiority Two-Sample – Exact
Diff. in Prop. (δ1 = πt1 − πc ): 0

Click Compute. The sample size for this design is calculated and the results are shown
as a row in the Output Preview window:

This single look design requires a combined total of 172 patients in order to achieve
90% power.
As is standard in East, this design has the default name Des 1. To see a summary of
icon in the
the output of this design, click anywhere in the row and then click the
Output Preview toolbar. The design details will be displayed in the upper pane,
labeled Output Summary. This can be saved to the Library by selecting Des 1 and

754

36.1 Difference of Proportions – 36.1.1 Trial Design

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016

clicking the

icon.

The design details can be displayed by clicking the

icon.

It is important to note that in this exact computation the displayed sample size may not
be unique due to the discreteness of the distribution. This can be seen in the Power Vs
Sample Size graph, which is a useful tool along with its corresponding table, and can
be used to find all other sample sizes that guarantee the desired power. In this example,
sample sizes ranging from approximately 168-175 result in power close to the required
0.9. These visual tools are available in the Library under the Plots and Tables menus.
36.1 Difference of Proportions – 36.1.1 Trial Design

755

<<< Contents

36

* Index >>>

Binomial Non-Inferiority Two-Sample – Exact

The critical point, or the efficacy boundary set for the rejection of H0 is 1.991 (on the
756

36.1 Difference of Proportions – 36.1.1 Trial Design

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
Z scale) and (-0.056 on the δ scale). If the magnitude of the observed test statistic
exceeds this boundary the null will be rejected in favor of declaring the new treatment
to be non-inferior. This can also be seen in the Stopping Boundaries chart and table,
available in the Library.

The Power vs. Treatment Effect chart dynamically generates power under this design
for all values of treatment effect δ = πt − πc . Here it is easy to see how as treatment
effect size approaches zero (H1 : no difference between the two treatments) the power
36.1 Difference of Proportions – 36.1.1 Trial Design

757

<<< Contents

36

* Index >>>

Binomial Non-Inferiority Two-Sample – Exact
of the study reaches the desired 90%. This is available in tabular form as well.

36.2

Ratio of Proportions

Let πc and πt denote the response rates for the control and the experimental
treatments, respectively. Let the difference between the two arms be captured by the
ratio
πt
ρ=
.
πc
The null hypothesis is specified as
H0 : ρ = ρ0
and is tested against one-sided alternative hypotheses. If the occurrence of a response
denotes patient benefit rather than harm, then ρ0 < 1 and the alternative hypothesis is
H1 : ρ > ρ0
or equivalently as
H1 : πt > ρ0 πc .
Conversely, if the occurrence of a response denotes patient harm rather than benefit,
then ρ0 > 1 and the alternative hypothesis is
H1 : ρ < ρ0

758

36.2 Ratio of Proportions

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
or equivalently as
H1 : πt < ρ0 πc .
For any given πc , the sample size is determined by the desired power at a specified
value of ρ = ρ1 . A common choice is ρ1 = 1 (or equivalently πt = πc ), but East
permits you to power the study at any value of ρ1 which is consistent with the choice
of H1 .

36.2.1

Trial Design

Suppose with a rare disease condition, the cure rate with an expensive treatment A is
estimated to be 90%. The claim of non-inferiority for an inexpensive new treatment B
can be held if it can be statistically proven that the ratio ρ = πt /πc is at least 0.833. In
other words, B is considered to be non-inferior to A as long as πt > 0.75. Thus the
null hypothesis H0 : ρ = 0.833 is tested against the one-sided alternative hypothesis
H1 : ρ > 0.833. We want to determine the sample size required to have power of 80%
when ρ = 1 using a one-sided test with a type-1 error rate of 0.05.
Single Look Design Powered at ρ = 1 Consider a one look study with equal sample
sizes in the two groups. In East under the Design ribbon for Discrete data, click Two
Samples and then choose Parallel Design: Ratio of Proportions:

36.2 Ratio of Proportions – 36.2.1 Trial Design

759

<<< Contents

36

* Index >>>

Binomial Non-Inferiority Two-Sample – Exact
This will launch the following input window:

Change Design Type: Noninferiority and keep Number of Looks: 1. In the Design
Parameters dialog box, select the Perform Exact Computations checkbox and keep
the Test Statistic selected to Wald. Enter the following parameters:
Test Type: 1 sided (required)
Type 1 Error (α): 0.05
Power: 0.8
Sample Size (n): Computed (select radio button)
Specify Proportion
Prop. under Control (πc ): 0.9
Specify Null Hypothesis
Prop. under Treatment (πt0 ): 0.75
Noninferiority margin (ρ0 ): 0.833 (will be calculated)
Specify Alternative Hypothesis
Prop. under Treatment (πt1 ): 0.9
Ratio of Proportions (ρ1 = πt1 /πc ): 1

760

36.2 Ratio of Proportions – 36.2.1 Trial Design

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016

Click Compute. The sample size for this design is calculated and the results are shown
as a row in the Output Preview window:

The sample size required in order to achieve 80% power is 120 subjects. Note that
because of the discreteness involved in performing exact computations, the attained
power is 0.823, slightly larger than the specified value of 0.80.
As is standard in East, this design has the default name Des 1. To see a summary of
icon in the
the output of this design, click anywhere in the row and then click the
Output Preview toolbar. The design details will be displayed in the upper pane,
labeled Output Summary. This can be saved to the Library by selecting Des 1 and

36.2 Ratio of Proportions – 36.2.1 Trial Design

761

<<< Contents

36

* Index >>>

Binomial Non-Inferiority Two-Sample – Exact
clicking the

icon.

Design details can be displayed by clicking the

icon.

It is important to note that in this exact computation the displayed sample size may not
762

36.2 Ratio of Proportions – 36.2.1 Trial Design

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
be unique due to the discreteness of the distribution. This can be seen in the Power Vs
Sample Size graph, which is a useful tool along with its corresponding table, and can
be used to find all other sample sizes that guarantee the desired power. These visual
tools are available in the Library under the Plots and Tables menus.

36.2 Ratio of Proportions – 36.2.1 Trial Design

763

<<< Contents

36

* Index >>>

Binomial Non-Inferiority Two-Sample – Exact

The critical point, or the boundary set for the rejection of H0 is 1.961 (on the Z scale),
0.076 (on the ln(ρ) scale)and 1.079 (on the ρ scale). If the observed test statistic
exceeds this boundary the null will be rejected in favor of declaring the new treatment
to be non-inferior. This can also be seen in the Stopping Boundaries chart and table,

764

36.2 Ratio of Proportions – 36.2.1 Trial Design

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
available in the Library.

The Power vs. Treatment Effect chart dynamically generates power under this
design for all values of treatment effect (ratio or ln(ratio) scale). Here it is easy to see
how as treatment effect size approaches zero (H1 : no difference between the two
36.2 Ratio of Proportions – 36.2.1 Trial Design

765

<<< Contents

36

* Index >>>

Binomial Non-Inferiority Two-Sample – Exact
treatments) the power of the study reaches the desired 0.8%. This is available in
tabular form as well.

766

36.2 Ratio of Proportions

<<< Contents

* Index >>>

37
37.1

Equivalence Test

Binomial Equivalence Two-Sample –
Exact

In some experimental situations, it is desired to show that the response rates for the
control and the experimental treatments are ”close”, where ”close” is defined prior to
the collection of any data. It may be of interest to show that the rate of an adverse
event associated with an aggressive therapy is similar to that of the established control.
For example, the bleeding rate associated with thrombolytic therapy or cardiac
outcomes with a new stent. Let πc and πt denote the response rates for the control and
the experimental treatments, respectively and let
δ = πt − πc .

(37.1)

The null hypothesis H0 : |πt − πc | = δ0 is tested against the two-sided alternative
H1 : |πt − πc | < δ0 , where δ0 (> 0) defines equivalence. The theory is presented in
Section V.4 of Appendix V.
Caution: The methods presented in this chapter are computationally intensive and
could consume several hours of computer time if the exact sample sizes are very large.
Here are some guidelines:
1. Estimate the likely sample size under the Exact method by first determining
the asymptotic sample size
2. If the exact sample size is likely to be larger than 1000, computing power is
preferable to computing the sample size

37.1.1

Trial Design

Burgess et al. (2005) describe a randomized controlled equivalence trial, in which the
objective is to evaluate the efficacy and safety of a 4% dimeticone lotion for treatment
of head lice infestation, relative to a standard treatment. The success rate of the
standard treatment is estimated to be about 77.5%. Equivalence is defined as
δ0 = 0.20. The sample size is to be determined with α = 0.025 (two-sided) and power,
i.e. probability of declaring equivalence, of 1 − β = 0.90.
To illustrate this example, in East under the Design ribbon for Discrete data, click

37.1 Equivalence Test – 37.1.1 Trial Design

767

<<< Contents

37

* Index >>>

Binomial Equivalence Two-Sample – Exact
Two Samples and then choose Parallel Design: Difference of Proportions:

This will launch the following input window:

Change Design Type: Equivalence and in the Design Parameters dialog box, select
the Perform Exact Computations checkbox. Enter the following parameters:
Test Type: 2 sided (required)
Type 1 Error (α): 0.025
Power: 0.9
Sample Size (n): Computed (select radio button)
Specify Proportion Response
Prop. under Control (πc ): 0.775
Prop. under Treatment (πt0 ): 0.775 (will be calculated)
Expected Diff. (δ1 = πt − πc ): 0
Equivalence Margin (δ0 ): 0.2

768

37.1 Equivalence Test – 37.1.1 Trial Design

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016

Click Compute. The sample size for this design is calculated and the results are shown
as a row in the Output Preview window:

This single look design requires a combined total of 228 patients in order to achieve
90% power.
As is standard in East, this design has the default name Des 1. To see a summary of
the output of this design, click anywhere in the row and then click the
icon in the
Output Preview toolbar. The design details will be displayed in the upper pane,
labeled Output Summary. This can be saved to the Library by selecting Des 1 and

37.1 Equivalence Test – 37.1.1 Trial Design

769

<<< Contents

37

* Index >>>

Binomial Equivalence Two-Sample – Exact
clicking the

icon.

The design details, which include critical points, or the boundaries set for the rejection
of H0 , can be displayed by clicking the

icon.

It is important to note that in this exact computation the displayed sample size may not
be unique due to the discreteness of the distribution. This can be seen in the Power Vs
Sample Size graph, which is a useful tool along with its corresponding table, and can
be used to find all other sample sizes that guarantee the desired power. These visual

770

37.1 Equivalence Test – 37.1.1 Trial Design

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
tools are available in the Library under the Plots and Tables menus.

In tabular form:

37.1 Equivalence Test – 37.1.1 Trial Design

771

<<< Contents

37

* Index >>>

Binomial Equivalence Two-Sample – Exact

Suppose the expected value of the difference in treatment proportions δ1 is 0.05 or
0.10. A recalculation of the design shows the required sample size will increase to 300

772

37.1 Equivalence Test

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
and 606 respectively:

37.1 Equivalence Test

773

<<< Contents

* Index >>>

38

Binomial Simon’s Two-Stage Design

The purpose of a phase II trial is to determine if a new drug has sufficient efficacy
against a specific disease or condition to either warrant further development within
Phase II, or to advance onto a Phase III study. In a two-staged design, a fixed number
of patients are recruited and treated initially, and if the protocol is considered effective
the second stage will continue to enroll additional patients for further study regarding
efficacy and safety.
This chapter presents an example for the widely used two-stage optimal and minimax
designs developed by Simon (1989). In addition, East supports the framework of an
admissible two-stage design, a graphical method geared to search for an alternative
with more favorable features (Jung, et al. 2004). The underlying theory is examined in
Appendix U.

38.1

An Example

During a Phase II study of an experimental drug, a company determined that a
response rate of 10% or less is to be considered poor, whereas a response rate is 40%
or more is to be considered promising or good. Requirements call for a two-stage
study with the following hypotheses:
H0 : π ≤ 0.10

H1 : π ≥ 0.40

and design parameters α = 0.05 and 1 − β = 0.90.

38.1.1

Trial Design

To illustrate this example, in East under the Design ribbon for Discrete data, click
One Sample and then choose Single Arm Design: Simon’s Two Stage Design:

774

38.1 An Example – 38.1.1 Trial Design

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
This will launch the following input window:

Choose Design Type: Optimal and enter the following parameters in the Design
Parameters dialog box:
Test Type: 1 sided (required)
Type 1 Error (α): 0.05
Power: 0.9
Upper Limit for Sample Size: 100
Prop. Response under Null (π0 ): 0.1
Prop. Response under Alternative (π1 ): 0.4

38.1 An Example – 38.1.1 Trial Design

775

<<< Contents

38

* Index >>>

Binomial Simon’s Two-Stage Design
Click Compute. The design is calculated and the results are shown as a row in the
Output Preview window:

As is standard in East, this design has the default name Des 1. To see a summary of
the output of this design, click anywhere in the row and then click the
icon. The
design details will be displayed in the upper pane, labeled Output Summary. Note
that because of the discreteness involved in performing exact computations, the
attained type-1 error is less than the specified value of 0.05. Similarly, the attained
power is slightly larger than the specified value. Save this design to the Library by
selecting Des 1 and clicking the

icon.

Under the optimal design, the combined maximum sample size for both stages is
computed to be 20. The boundary parameter for futility at the first look is 1, and at the
second look it is 4. What this means can be further explained using the Stopping

776

38.1 An Example – 38.1.1 Trial Design

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
Boundaries chart available under the Plots

menu.

The scale of the stopping boundaries can be displayed using either number of
responses (# Resp. Scale) or Proportion Scale. The above graph uses the number of
responses, which tells us that at the first look, when the cumulative sample size is 9,
the trial could be stopped for futility if no more than one patient shows a favorable
response to treatment. At the second stage, when all 20 patients are enrolled, the
boundary response to reject H1 is 4 or less. The Stopping Boundaries table under the
Tables
menu also tells us that the probability of crossing the stopping

38.1 An Example – 38.1.1 Trial Design

777

<<< Contents

38

* Index >>>

Binomial Simon’s Two-Stage Design
boundary, thus warranting early termination, is 0.775.

Results can be further analyzed using the Expected Sample size (under Null) vs.

778

38.1 An Example – 38.1.1 Trial Design

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
Sample Size graph, which is also available in tabular form:

To generate a more sophisticated analysis of the design, select the
icon in the
Library. In addition to details pertaining to the required optimal design, East also
generates results for both minimax as well as admissible designs in regards to sample
size, power and probability, and weights used.

38.1 An Example – 38.1.1 Trial Design

779

<<< Contents

38

* Index >>>

Binomial Simon’s Two-Stage Design

For the optimal design the expected sample size under the null, which assumes the
drug performs poorly, is 11.447, which can also be seen in the Admissible Designs
table, available under the Tables
menu:

To regenerate the study using a minimax design, select the Edit Design
icon.
Select Design Type: Minimax, leave all design parameters the same and click
Compute. The cumulative maximum sample size for both stages using this design is
18. As with the optimal design, the first stage boundary response to reject H1 is 1 or

780

38.1 An Example – 38.1.1 Trial Design

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
less and the second stage boundary response to reject H1 is 4 or less.

Save this design to the Library by selecting Des 2 and clicking the
icon. Design
details, graphs and tables can be attained as with the optimal design described above.
East provides the capability to visually compare stopping boundaries for both methods
simultaneously using a compare plots graph. From the Library select both designs,
click the
icon, and select Stopping Boundaries.

38.1 An Example – 38.1.1 Trial Design

781

<<< Contents

38

* Index >>>

Binomial Simon’s Two-Stage Design
These stopping boundaries can be compared in tabular format as well:

Although the two futility boundaries are the same for both designs, the cumulative
sample size at both stages differ. We also see that the probability of early stopping for
futility is higher under the optimal design (0.775) than with the minimax design
(0.659). However the cumulative sample size at stage one for the optimal design is
only 9 whereas the minimax design requires 12 subjects for the first stage. Referring to
the design details generated for the optimal design above, we see that an admissible
design (labeled Design # 2) requires a total sample size of 19. Here, the cumulative
number of subjects required at the end of stage one is only 6 and offers a probability of
early stopping of 0.531, less than both the optimal and minimax designs. It is also
worthy to note that for the admissible design, the boundary parameter for futility at the
first look is 0. This means that only one patient has to show a promising result for the
study to proceed to a second stage, whereas at least two successes are required for both
782

38.1 An Example – 38.1.1 Trial Design

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
the optimal and minimax designs to warrant a second stage.

38.1 An Example – 38.1.1 Trial Design

783

<<< Contents

* Index >>>

Volume 5

Poisson and Negative Binomial Endpoints

39 Introduction to Volume 4

785

40 Count Data One-Sample

790

41 Count Data Two-Samples

799

<<< Contents

* Index >>>

39

Introduction to Volume 4

This volume describes various cases of clinical trials involving count data. This is
often useful in medical research due to its nature of modeling events counted in terms
of whole numbers, particularly events that may be considered rare. Typically, interest
lies in the rate of occurrence of a particular event during a specific time interval or
other unit of space.
Chapter 40 describes the design of tests involving count or Poisson response rates in
which an observed response rate is compared to a fixed response rate, possibly derived
from historical data.
Chapter 41 deals with the comparison of independent samples from two populations in
terms of the rate of occurrence of a particular outcome. East supports the design of
clinical trials in which this comparison is based on the ratio of rates, assuming a
Poisson or Negative Binomial distribution.

785

<<< Contents

39
39.1

* Index >>>

Introduction to Volume 4
Settings

Click the

icon in the Home menu to adjust default values in East 6.

The options provided in the Display Precision tab are used to set the decimal places of
numerical quantities. The settings indicated here will be applicable to all tests in East 6
under the Design and Analysis menus.
All these numerical quantities are grouped in different categories depending upon their
usage. For example, all the average and expected sample sizes computed at simulation
or design stage are grouped together under the category ”Expected Sample Size”. So to
view any of these quantities with greater or lesser precision, select the corresponding
category and change the decimal places to any value between 0 to 9.
The General tab has the provision of adjusting the paths for storing workbooks, files,
and temporary files. These paths will remain throughout the current and future sessions
even after East is closed. This is the place where we need to specify the installation
directory of the R software in order to use the feature of R Integration in East 6.

786

39.1 Settings

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
The Design Defaults is where the user can change the settings for trial design:

Under the Common tab, default values can be set for input design parameters.
You can set up the default choices for the design type, computation type, test type and
the default values for type-I error, power, sample size and allocation ratio. When a new
design is invoked, the input window will show these default choices.
Time Limit for Exact Computation
This time limit is applicable only to exact designs and charts. Exact methods are
computationally intensive and can easily consume several hours of computation
time if the likely sample sizes are very large. You can set the maximum time
available for any exact test in terms of minutes. If the time limit is reached, the
test is terminated and no exact results are provided. Minimum and default value
is 5 minutes.
Type I Error for MCP
If user has selected 2-sided test as default in global settings, then any MCP will
use half of the alpha from settings as default since MCP is always a 1-sided test.
Sample Size Rounding
Notice that by default, East displays the integer sample size (events) by rounding
up the actual number computed by the East algorithm. In this case, the
look-by-look sample size is rounded off to the nearest integer. One can also see
the original floating point sample size by selecting the option ”Do not round
sample size/events”.
39.1 Settings

787

<<< Contents

39

* Index >>>

Introduction to Volume 4
Under the Group Sequential tab, defaults are set for boundary information. When a
new design is invoked, input fields will contain these specified defaults. We can also
set the option to view the Boundary Crossing Probabilities in the detailed output. It can
be either Incremental or Cumulative.

Simulation Defaults is where we can change the settings for simulation:

If the checkbox for ”Save summary statistics for every simulation” is checked, then
East simulations will by default save the per simulation summary data for all the
788

39.1 Settings

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
simulations in the form of a case data.
If the checkbox for ”Suppress All Intermediate Output” is checked, the intermediate
simulation output window will be always suppressed and you will be directed to the
Output Preview area.
The Chart Settings allows defaults to be set for the following quantities on East6
charts:

We suggest that you do not alter the defaults until you are quite familiar with the
software.

39.1 Settings

789

<<< Contents

* Index >>>

40

Count Data One-Sample

This chapter deals with the design of tests involving count or Poisson response rates.
Here, independent outcomes or events under examination can be counted in terms of
whole numbers, and typically are considered rare. In other words, a basic assumption
of the Poisson distribution is that the probability of an event occurring is proportional
to the length of time under consideration. The longer the time interval, the more likely
the event will occur. Therefore, in this context interest lies in the rate of occurrence of
a particular event during a specified period. Section 40.1 focuses on designs in which
an observed Poisson response rate is compared to a fixed response rate, possibly
derived from historical data.

40.1

Single Poisson Rate
Data following a Poisson distribution are non-negative integers, and the probability
that an outcome occurs exactly k times can be calculated as:
P (k) =

e−λ λk
, k = 0, 1, 2, . . . where λ is the average rate of occurrence.
k!

When comparing a new protocol or treatment to a well-established control, a
preliminary single-sample study may result in valuable information prior to a full-scale
investigation. In experimental situations it may be of interest to determine whether the
response rate λ differs from a fixed value λ0 . Specifically we wish to test the null
hypothesis H0 : λ = λ0 against the two sided alternative hypothesis H1 : λ 6= λ0 or
against one sided alternatives of the form H1 : λ > λ0 or H1 : λ < λ0 . The sample
size, or power, is determined for a specified value of λ which is consistent with the
alternative hypothesis, denoted λ1 .

40.1.1

Trial Design

Consider the design of a single-arm clinical trial in which we wish to determine if the
positive response rate of a new acute pain therapy is at least 30% per single treatment
cycle. Thus, it is desired to test the null hypothesis H0 : λ = 0.2 against the one-sided
alternative hypothesis H1 : λ ≥ 0.3. The trial will be designed such that a one sided
α = 0.05 test achieves 80% power at λ = λ1 = 0.3.
In the Design tab under the Count group choose One Sample and then Single Poisson

790

40.1 Single Poisson Rate – 40.1.1 Trial Design

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
Rate.

This will launch the following input window:

Enter the following design parameters:
Test Type: 1 sided
Type 1 Error (α): 0.05
Power: 0.8
Sample Size (n): Computed (select radio button)
Rate under Null (λ0 ): 0.2
Rate under Alt. (λ1 ): 0.3
Follow-up Time (D): 1

40.1 Single Poisson Rate – 40.1.1 Trial Design

791

<<< Contents

40

* Index >>>

Count Data One-Sample

Click Compute. The design is shown as a row in the Output Preview window:

The sample size required in order to achieve the desired 80% power is 155 subjects. As
is standard in East, this design has the default name Des 1. To see a summary of the
output of this design, click anywhere in the row and then click the
icon in the
Output Preview toolbar. The design details are displayed labeled Output Summary.

In the Output Preview toolbar, click
icon to save this design Des1 to workbook
Wbk1 in the Library. An alternative method to view design details is to hover the
cursor over the node Des1 in the Library. A tooltip will appear that summarizes the
792

40.1 Single Poisson Rate – 40.1.1 Trial Design

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
input parameters of the design.

Click
icon on the Library toolbar, and then click Power vs. Sample Size. The
power curve for this design will be displayed. You can save this chart to the Library by
clicking Save inWorkbook. Alternatively, you can export the chart in one of several
image formats (e.g., Bitmap or JPEG) by clicking Save As... or Export into a

40.1 Single Poisson Rate – 40.1.1 Trial Design

793

<<< Contents

40

* Index >>>

Count Data One-Sample
PowerPoint presentation.

Close the Power vs. Sample Size chart. To view a summary of all characteristics of
this design, select Des1 in the Library, and click

icon.

In addition to the Power vs. Sample size chart and table, East also provides the
efficacy boundary in the Stopping Boundaries chart and table.
794

40.1 Single Poisson Rate – 40.1.1 Trial Design

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
Alternatively, East allows the computation of either the Type-1 error (α) or Power for
a given sample size. Using the Design Input/Output window as described above,
simply enter the desired sample size and click Compute to calculate the resulting
power of the test.
Power vs Sample Size: Sawtooth paradigm
Consider the following design which
uses East to compute power assuming a one sample, single Poisson rate.
Test Type: 1 sided
Type 1 Error (α): 0.025
Power: Computed
Sample Size (n): 525
Rate under Null (λ0 ): 0.049
Rate under Alt. (λ1 ): 0.012
Follow-up Time (D): 0.5

Save the design to a workbook, and then generate the Power vs. Sample Size graph to
obtain the power chart. The resulting curve is commonly described in the literature as a

40.1 Single Poisson Rate – 40.1.1 Trial Design

795

<<< Contents

40

* Index >>>

Count Data One-Sample
sawtooth chart.

This chart illustrates that it is possible to have a design where different sample sizes
could obtain the same power. As with the binomial distribution, the Poisson
distribution is discrete. For power and sample size computations for discrete data, the
so called ”Saw tooth” phenomena occurs.
The data can also be displayed in a chart form by selecting the

796

40.1 Single Poisson Rate – 40.1.1 Trial Design

icon in the

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
Library, and can be printed or saved as case data.

It is important to note that for designs with the same power, the attained significance
level may vary. For example, the sample sizes of 565 and 580 seem to have a similar
power of about 0.94. Upon computing two new designs based on the above design
with sample sizes of 565 and 580 respectively, it is apparent that the attained
significance levels are different. The design with a lower sample size of 565 pays a
higher penalty in terms of type-1 error (α = 0.03) than the plan with a larger sample

40.1 Single Poisson Rate

797

<<< Contents

40

* Index >>>

Count Data One-Sample
size of 580 (α = 0.016).

798

40.1 Single Poisson Rate

<<< Contents

* Index >>>

41

Count Data Two-Samples

Often in experiments based on count data, the aim is to compare independent samples
from two populations in terms of the rate of occurrence of a particular outcome. In
medical research, outcomes such as the number of times a patient responds to a
therapy, develops a certain side effect, or requires specialized care, are of interest. Or
perhaps a therapy is being evaluated to determine the number of times it must be
applied until an acceptable response rate is observed. East supports the design of
clinical trials in which this comparison is based on the ratio of rates, assuming a
Poisson or Negative Binomial distribution. These two cases are presented in
Sections 41.1 and 41.2, respectively.

41.1

Poisson - Ratio of
Rates

41.1.1 Trial Design
41.1.2 Example - Coronary
Heart Disease

Let λc and λt denote the Poisson rates for the control and treatment arms, respectively,
and let ρ1 = λt /λc . We want to test the null hypothesis that ρ1 = 1 against one or
two-sided alternatives. The sample size, or power, is determined to be consistent with
the alternative hypothesis, that is H1 : λt 6= λc , H1 : λt > λc , or H1 : λt < λc .

41.1.1

Trial Design

Suppose investigators are preparing design objectives for a prospective randomized
trial of a standard treatment (control arm) vs. a new combination of medications
(therapy arm) to present at a clinical trials workshop. The endpoint of interest is the
number of abnormal ECGs (electrocardiogram) within seven days. The investigators
were interested in comparing the therapy arm to the control arm with a two sided test
conducted at the 0.025 level of significance. It can be assumed that the rate of
abnormal ECGs in the control arm is 30%, thus λt = λc = 0.3 under H0 . The
investigators wish to determine the sample size to attain power of 80% if there is a
25% decline in the event rate, that is λt /λc = 0.75. It is important to note that the
power of the test depends on λc and λt , not just the ratio, so different values of the pair
(λc , λt ) with the same ratio will yield different solutions.
We will now design a study that compares the control arm to the combination therapy
arm. In the Design tab under the Count group choose Two Samples and then Parallel

41.1 Poisson - Ratio of Rates – 41.1.1 Trial Design

799

<<< Contents

41

* Index >>>

Count Data Two-Samples
Design - Ratio of Poisson Rates.

This will launch the following input window:

Enter the following design parameters:
Test Type: 2-sided
Type 1 Error (α): 0.05
Power: 0.8
Sample Size (n): Computed (select radio button)
Allocation Ratio (nt /nc ): 1
Rate for Control (λc ): 0.3
Rate for Treatment (λt ): 0.225 (will be automatically calculated)
Ratio of Rates ρ1 = (λt /λc ): 0.75
Follow-up Control (Dc ): 7
Follow-up Treatment (Dt ): 7

800

41.1 Poisson - Ratio of Rates – 41.1.1 Trial Design

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016

The Allocation Ratio (nt : nc ) describes the ratio of patients to each arm. For example,
an allocation ratio of 3:1 indicates that 75% of the patients are randomized to the
treatment arm as opposed to 25% to the control. Here we assume the same number of
patients in both arms. Click Compute. The design is shown as a row in the Output
Preview window:

The sample size required in order to achieve the desired 80% power is 211 subjects. As
is standard in East, this design has the default name Des 1. To see a summary of the
output of this design, click anywhere in the row and then click the

41.1 Poisson - Ratio of Rates – 41.1.1 Trial Design

icon in the

801

<<< Contents

41

* Index >>>

Count Data Two-Samples
Output Preview toolbar. The design details are displayed, labeled Output Summary.

In the Output Preview toolbar, click
icon to save this design Des1 to workbook
Wbk1 in the Library. An alternative method to view design details is to hover the
cursor over the node Des1 in the Library. A tooltip will appear that summarizes the
input parameters of the design.

With the design Des1 selected in the Library, click
icon on the Library toolbar,
and then click Power vs. Sample Size. The power curve for this design will be
displayed. You can save this chart to the Library by clicking Save inWorkbook.
Alternatively, you can export the chart in one of several image formats (e.g., Bitmap or

802

41.1 Poisson - Ratio of Rates – 41.1.1 Trial Design

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
JPEG) by clicking Save As... or Export into a PowerPoint presentation.

Close the Power vs. Sample Size chart. To view all computed characteristics of this

41.1 Poisson - Ratio of Rates – 41.1.1 Trial Design

803

<<< Contents

41

* Index >>>

Count Data Two-Samples
design, select Des1 in the Library, and click

icon.

In addition to the Power vs. Sample size chart and table, East also provides the
efficacy boundary in the Stopping Boundaries chart and table.
Alternatively, East allows the computation of either the Type-1 error (α) or Power for
a given sample size. Using the Design Input Output window as described above,
simply enter the desired sample size and click Compute to calculate the resulting
power of the test.

41.1.2

Example - Coronary Heart Disease

The following example is presented in the paper by Gu, et al. (2008) which references
a prospective study reported by Stampfer and Willett (1985) examining the
relationship between post-menopausal hormone use and coronary heart disease (CHD).
Researchers were interested if the group using hormone replacement therapy exhibited
less coronary heart disease. The study did show strong evidence that the incidence rate
of CHD in the group who did not use hormonal therapy was higher than that in the
group who did use post-menopausal hormones. The authors then determined the
sample size necessary for the two groups when what they referred to as the ratio of
sampling frames is 2, known as the allocation ratio in East. The study assumed an
observation time of 2 years, and that the incidence rate of CHD for those using the
804

41.1 Poisson - Ratio of Rates – 41.1.2 Example - Coronary Heart Disease

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
hormone therapy is 0.0005. The following excerpt from the paper presents the required
sample sizes for the participants using hormone therapy in order to achieve 90% power
at α = 0.05, for multiple different test procedures:

It is first necessary to determine the difference in notation between the referenced
paper and that used by East:
Gu et al. (2008)
γ1
γ0
0
R =4
D

East
λt
λc
1/ρ1 = 0.25
Allocation Ratio = 2

Once again in the Design tab under the Count group choose Two Samples and then
Parallel Design - Ratio of Poisson Rates. Enter the following design parameters:
Test Type: 1-sided
Type 1 Error (α): 0.05
Power: 0.9
Sample Size (n): Computed (select radio button)
Allocation Ratio (nt /nc ): 2
41.1 Poisson - Ratio of Rates – 41.1.2 Example - Coronary Heart Disease

805

<<< Contents

41

* Index >>>

Count Data Two-Samples
Rate for Control (λc ): 0.002
Rate for Treatment (λt ): 0.0005
Ratio of Rates ρ1 = (λt /λc ): 0.25
Follow-up Control (Dc ): 2
Follow-up Treatment (Dt ): 2

The Allocation Ratio (nt : nc ) describes the ratio of patients to each arm. For example,
an allocation ratio of 2:1 indicates that two-thirds of the patients are randomized to the
treatment arm as opposed to one-third to the control. Compute the design to produce
the following output:

806

41.1 Poisson - Ratio of Rates – 41.1.2 Example - Coronary Heart Disease

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
Table 6 in the referenced paper shows the number of subjects required for the treatment
group. The East results show that the total number of subjects required for the entire
study is 10027. Given that the allocation ratio is 2, the number of subjects required for
the control group is 10027/3=3342 and the treatment group is therefore 6685. This
falls in the range of the sample sizes presented in the referenced paper (and close to the
minimum of 6655), which again calculates these sizes using a number of different
methods.

41.2

Negative Binomial
Ratio of Rates
In experiments where the data follows a binomial distribution, the number of
successful outcomes for a fixed number of trials is of importance when determining the
sample size to adequately power a study. Suppose instead that it is of interest to
observe a fixed number of successful outcomes (or failures), but the overall number of
trials necessary to achieve this is unknown. In this case, the data is said to follow a
Negative Binomial Distribution. There are two underlying parameters of interest. As
with the Poisson distribution, λ denotes the average rate of response for a given
outcome. In addition, a shape parameter γ specifies the desired number of observed
”successes”. As with the Poisson distribution, the Negative Binomial distribution can
be useful when designing a trial where one must wait for a particular event. Here, we
are waiting for a specific number of successful outcomes to occur. A Poisson
regression analysis assumes a common rate of events for all subjects within a stratum,
as well as equal mean and variance (equidispersion). With over dispersed count data,
estimates of standard error from these models can be invalid, leading to difficulties in
planning a clinical trial. Increased variability resulting from over dispersed data
requires a larger sample size in order to maintain power. To address this issue of
allowing variability between patients, East provides valid sample size and power
calculations for count data using a negative binomial model, resulting in a better
evaluation of study design and increased likelihood of trial success.

41.2.1

Trial Design

Suppose that a hypothetical manufacturer of robotic prostheses, those that require
several components to fully function, has an order to produce a large quantity of
artificial limbs. According to historical data, about 20% of the current limbs fail the
rigorous quality control test and therefore cannot be shipped to patients. For each
order, the manufacturer must produce more than requested; in fact they must continue
to produce the limbs until the desired quantity passes quality control. Given that there
is a high cost in producing these prosthetic limbs, it is of great interest reduce the
number of those that fail the test.
41.2 Negative Binomial Ratio of Rates – 41.2.1 Trial Design

807

<<< Contents

41

* Index >>>

Count Data Two-Samples
The company plans to introduce a new feature to the current model, the goal being the
probability of failure is reduced to 10%. It is safe to assume that the enhancement will
not cause a decline in the original success rate. In this scenario, we wish to test the null
hypothesis H0 : λc = λt = 0.2 against the one sided alternative of the form
H1 : λc > λt . Quality control investigators wish to conduct a one-sided test at the
α = 0.05 significance level to determine the sample size required obtain 90% power to
observe a 50% decline in the event rate, i.e. λt /λc = 0.5. It is important to note that
the power of the test depends on λc and λt , not just the ratio, so different values of the
pair (λc , λt ) with the same ratio will have different solutions. The same holds true for
the shape parameter. Different values of (γc , γt ) will result in different sample sizes or
power calculations. East allows user specific shape parameters for both the treatment
and control groups, however for this example assume that the desired number of
successful outcomes for both groups is 10.
The following illustrates the design of a two-arm study comparing the control arm,
which the current model of the prosthesis, to the treatment arm, which is the enhanced
model. In the Design tab under the Count group choose Two Samples and then
Parallel Design - Ratio of Negative Binomial Rates.

808

41.2 Negative Binomial Ratio of Rates – 41.2.1 Trial Design

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
This will launch the following input window:

Enter the following design parameters:
Test Type: 1 sided
Type 1 Error (α): 0.05
Power: 0.9
Sample Size (n): Computed (select radio button)
Allocation Ratio (nt /nc ): 1
Rate for Control (λc ): 0.2
Rate for Treatment (λt ): 0.1
Ratio of Rates ρ = (λt /λc ): 0.5
Follow-up Time (D): 1
Shape Control (γc ): 10
Shape Treatment (γt ): 10

41.2 Negative Binomial Ratio of Rates – 41.2.1 Trial Design

809

<<< Contents

41

* Index >>>

Count Data Two-Samples

The Allocation Ratio (nt : nc ) describes the ratio of patients to each arm. For example,
an allocation ratio of 3:1 indicates that 75% of the patients are randomized to the
treatment arm as opposed to 25% to the control. Here we assume the same number of
patients in both arms. Click Compute. The design is shown as a row in the Output
Preview window:

The sample size required in order to achieve the desired 90% power is 1248 subjects.
As is standard in East, this design has the default name Des 1. To see a summary of the
output of this design, click anywhere in the row and then click the
icon in the
Output Preview toolbar. The design details will be displayed, labeled Output

810

41.2 Negative Binomial Ratio of Rates – 41.2.1 Trial Design

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
Summary.

In the Output Preview toolbar, click
icon to save this design Des1 to workbook
Wbk1 in the Library. An alternative method to view design details is to hover the
cursor over the node Des1 in the Library. A tooltip will appear that summarizes the
input parameters of the design.

With the design Des1 selected in the Library, click
icon on the Library toolbar,
and then click Power vs. Sample Size. The power curve for this design will be
displayed. You can save this chart to the Library by clicking Save inWorkbook.
Alternatively, you can export the chart in one of several image formats (e.g., Bitmap or

41.2 Negative Binomial Ratio of Rates – 41.2.1 Trial Design

811

<<< Contents

41

* Index >>>

Count Data Two-Samples
JPEG) by clicking Save As... or Export into a PowerPoint presentation.

Close the Power vs. Sample Size chart. To view all computed characteristics of this

812

41.2 Negative Binomial Ratio of Rates – 41.2.1 Trial Design

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016

design, select Des1 in the Library, and click

icon.

In addition to the Power vs. Sample size chart and table, East also provides the
efficacy boundary in the Stopping Boundaries chart and table.
For a specific desired sample size, East allows the computation of either the Type-1
error (α) or Power for a test. Using the Design Input Output window and methods
as described above, simply enter the desired sample size and click Compute to
calculate the resulting power of the test.
In addition to this example, consider the following illustration of the benefit of using
the negative binomial model in clinical trials. In real life settings, the variance of count
data observed between patients is typically higher than the observed mean. The
negative binomial model accommodates between subject heterogeneity according to a
Gamma distribution. For example:
Poisson: Y ∼ P oisson(λ)
Negative Binomial: Yi ∼ P oisson(λki ) where ki ∼ Gamma(k)
In the case of no overdispersion (k = 0) the negative binomial model reduces to the
41.2 Negative Binomial Ratio of Rates – 41.2.1 Trial Design

813

<<< Contents

41

* Index >>>

Count Data Two-Samples
Poisson model. In the figure below, the Poisson and negative binomial models are
displayed under various values of the dispersion parameter.

Assuming the above parameterization, the variance of the negative binomial model is
λ + kλ2 . The inflation in variance is thus linear by the factor 1 + k ∗ λ and dependent
on the mean. Depending on the distributional assumption used and its impact on the
variance, sample size and power can vary widely.
In multiple sclerosis (MS) patients, magnetic resonance imaging (MRI) is used as a
marker of efficacy by means of serial counts of lesions appearing on the brain.
Exacerbations rates as a primary endpoint are frequently used in MS as well as in
chronic obstructive pulmonary disease (COPD) and asthma (Keene et al. 2007).
Poisson regression could be considered, however this model would not address
variability between patients, resulting in over dispersion. The negative binomial model
offers an alternative approach.
TRISTAN (Keene et al. 2007) was a double-blind, randomized study for COPD
comparing the effects of the salmeterol/fluticasone propionate combination product
814

41.2 Negative Binomial Ratio of Rates – 41.2.1 Trial Design

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
(SFC) to salmeterol alone, fluticasone proprionate alone and placebo. Although the
primary end-point was pre-bronchodilator FEV1, the number of exacerbations was an
important secondary endpoint.
Suppose we are to design a new trial to be observed over a period of 1 to 2 years. The
primary objective is the reduction of the rate of exacerbations, defined as a worsening
of COPD symptoms that require treatment with antibiotics, cortisone or both, with the
combination product SFC versus placebo. Based on the TRISTAN results, we aim to
reduce the incidence of events by 33%. Suppose the exacerbation rate is 1.5 per year,
and can expect to detect a rate of 1.0 in the combination group. Assume a 2-sided test
with a 5% significance level and 90% power. Using a Poisson model, a total of 214
patients are needed to be enrolled in the study.
For the TRISTAN data, the estimate of the overdispersion parameter was 0.46 (95%
CI: 0.34-0.60). Using a negative binomial model with overdispersion of 0.33, 0.66, 1

41.2 Negative Binomial Ratio of Rates – 41.2.1 Trial Design

815

<<< Contents

41

* Index >>>

Count Data Two-Samples
and 2, the increase in sample size ranged from 298 to 725, respectively.

Exacerbation rates are calculated as number of exacerbations divided by the length of
time in treatment in years. EAST can be used to illustrate the impact of a one versus
two year study by changing the follow-up duration.
For 382 patients and a shape parameter of 0.66, power is increased from 90% to 97%

816

41.2 Negative Binomial Ratio of Rates – 41.2.1 Trial Design

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
when follow-up time is doubled :

The number of patients required for a two year study powered at 90% is 277, whereas
382 patients would be required to achieve the same power for a study period of one

41.2 Negative Binomial Ratio of Rates – 41.2.1 Trial Design

817

<<< Contents

41

* Index >>>

Count Data Two-Samples
year.

Negative binomial models are increasing in popularity for medical research, and as the
industry standard for trial design, East continues to evolve by incorporating sample
size methods for count data. These models allow the count to vary around the mean for
groups of patients instead of the population means. Additionally, increased variability
does lead to a larger test population; consequently the balance between power, sample
size and duration of observation needs to be evaluated.

818

41.2 Negative Binomial Ratio of Rates – 41.2.1 Trial Design

<<< Contents

* Index >>>

Volume 6

Time to Event Endpoints

42 Introduction to Volume 6
43 Tutorial: Survival Endpoint

820
826

44 Superiority Trials with Variable Follow-Up
45 Superiority Trials with Fixed Follow-Up

865
908

46 Non-Inferiority Trials Given Accrual Duration and Accrual
Rates
934
47 Non-Inferiority Trials with Fixed Follow-Up

950

48 Superiority Trials Given Accrual Duration and Study
Duration
966
49 Non Inferiority Trials Given Accrual Duration and Study
Duration
984
50 A Note on Specifying Dropout parameters in Survival
Studies
994
51 Multiple Comparison Procedures for Survival Data

999

<<< Contents

* Index >>>

42

Introduction to Volume 6

The chapters in this volume deal with clinical trials where the endpoint of interest is
the time from entry into the study until a specific event –for example, death, tumour
recurrence, or heart attack – occurs. Such trials are also referred to as survival trials,
time-to-event trials, or time-to-failure trials. Long-term mortality trials in oncology,
cardiology or HIV usually select time-to-event as the primary endpoint. The group
sequential methodology is particularly appropriate for such trials because of the
potential to shorten the study duration and thereby bring beneficial new therapies to
patients sooner than would be possible by a conventional single-look design. In
contrast to studies involving normal and binomial endpoints, the statistical power of a
time-to-event study is determined, not by the number of individuals accrued, but rather
by the number ofs events observed. Accruing only as many individuals as the number
of events required to satisfy power considerations implies having to wait until all
individuals have reached the event. This will probably make the trial extend over an
unacceptably long period of time. Therefore one usually accrues a larger number of
patients than the number of events required, so that the study may be completed within
a reasonable amount of time. East allows the user a high degree of flexibility in this
respect.
This volume contains Chapters 42 through 47. Chapter 42 is the present chapter. It
describes the contents of the remaining chapters of Volume 6.
Chapter 43 introduces you to East on the Architect platform, using an example clinical
trial to compare survival in two groups.
In Chapter 44 we discuss the Randomized Aldactone Evaluation Study (RALES) for
decreasing mortality in patients with severe heart failure (Pitt et al., 1999). The chapter
illustrates how East may be used to design and monitor a group sequential two-sample
superiority trial with a time-to-event endpoint. We begin with the simple case of a
constant enrollment rate, exponential survival and no drop-outs. The example is then
extended to cover non-uniform enrollment, non-constant hazard rates for survival, and
differential drop-out rates between the treatment and control arms. The role of
simulation in providing additional insights is discussed. Simulations in presence of
non-proportional hazard rates, stratification variables are explained. The trial was
designed so that every subject who had not dropped out or reached the stated endpoint
would be followed until the trial was terminated. This is an example of a variable
follow-up design, because subjects who are enrolled at the beginning of the enrollment
phase are followed for a longer time than subjects who are enrolled later.
In contrast to Chapter 44, Chapter 45 deals with the fixed follow-up design. Here we
820

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
design a trial in which each subject can only be followed for a maximum of one year
and then goes off study. We use East to design such a trial basing the design
parameters on the PASSION and TYPHOON trials – two recently published studies of
drug eluting stents (Spaulding et al., 2006; Laarman et al., 2006). The impact of
variable accrual patterns and drop-outs is also taken into account.
Chapter 46 shows how to use East to design a non-inferiority trial with a time-to-event
endpoint. The setting is a clinical trial to demonstrate the non-inferiority of Xeloda to
5-FU+LV in patients with metastatic colorectal cancer (Rothman et al., 2003). Part of
the discussion in this chapter is about the choice of the non-inferiority margin.
Chapter 47 will illustrate through a worked example how to design, monitor and
simulate a two-sample non-inferiority trial with a time-to-event endpoint in which each
subject who has not dropped out or experienced the event is followed for a fixed
duration only. This implies that each subject who does not drop-out or experience the
event within a given time interval, as measured from the time of randomization, will be
administratively censored at the end of that interval. In East we refer to such designs as
fixed follow-up designs.
Chapters 48 and 49 handle the trade-off between patient accruals and study duration in
a different way from the previous chapters. In contrast to publicly funded trials, which
usually lack the resources to exert control over the accrual rate of a trial, industry trials
are often run with a fixed timeframe as the constraint. Industry sponsors would rather
adjust the patient recruitment rate by opening and closing investigator sites than delay
the end of a study and therefore their entire drug development program, time to
market, and revenue. Chapters 48 and 49 illustrate how to design superiority and
non-inferiority trials in East given a fixed accrual period and fixed study duration.
Additionally, these design options provide the users with many useful graphs that chart
the relationship between power, sample size, number of events, accrual duration, and
study duration.
Also note that Chapter 44 contains a section that guides the user through the powerful
survival simulation tool available in East.
Chapter 50 is a note which gives details on specifying dropout parameters for survival
studies in East with the help of an example.
A unified formula for calculating the expected number of events d(l) in a time-to-event
trial can be found in the Appendix D.
821

<<< Contents

42
42.1

* Index >>>

Introduction to Volume 6
Settings

Click the

icon in the Home menu to adjust default values in East 6.

The options provided in the Display Precision tab are used to set the decimal places of
numerical quantities. The settings indicated here will be applicable to all tests in East 6
under the Design and Analysis menus.
All these numerical quantities are grouped in different categories depending upon their
usage. For example, all the average and expected sample sizes computed at simulation
or design stage are grouped together under the category ”Expected Sample Size”. So to
view any of these quantities with greater or lesser precision, select the corresponding
category and change the decimal places to any value between 0 to 9.
The General tab has the provision of adjusting the paths for storing workbooks, files,
and temporary files. These paths will remain throughout the current and future sessions
even after East is closed. This is the place where we need to specify the installation
directory of the R software in order to use the feature of R Integration in East 6.

822

42.1 Settings

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
The Design Defaults is where the user can change the settings for trial design:

Under the Common tab, default values can be set for input design parameters.
You can set up the default choices for the design type, computation type, test type and
the default values for type-I error, power, sample size and allocation ratio. When a new
design is invoked, the input window will show these default choices.
Time Limit for Exact Computation
This time limit is applicable only to exact designs and charts. Exact methods are
computationally intensive and can easily consume several hours of computation
time if the likely sample sizes are very large. You can set the maximum time
available for any exact test in terms of minutes. If the time limit is reached, the
test is terminated and no exact results are provided. Minimum and default value
is 5 minutes.
Type I Error for MCP
If user has selected 2-sided test as default in global settings, then any MCP will
use half of the alpha from settings as default since MCP is always a 1-sided test.
Sample Size Rounding
Notice that by default, East displays the integer sample size (events) by rounding
up the actual number computed by the East algorithm. In this case, the
look-by-look sample size is rounded off to the nearest integer. One can also see
the original floating point sample size by selecting the option ”Do not round
sample size/events”.
42.1 Settings

823

<<< Contents

42

* Index >>>

Introduction to Volume 6
Under the Group Sequential tab, defaults are set for boundary information. When a
new design is invoked, input fields will contain these specified defaults. We can also
set the option to view the Boundary Crossing Probabilities in the detailed output. It can
be either Incremental or Cumulative.

Simulation Defaults is where we can change the settings for simulation:

If the checkbox for ”Save summary statistics for every simulation” is checked, then
East simulations will by default save the per simulation summary data for all the
824

42.1 Settings

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
simulations in the form of a case data.
If the checkbox for ”Suppress All Intermediate Output” is checked, the intermediate
simulation output window will be always suppressed and you will be directed to the
Output Preview area.
The Chart Settings allows defaults to be set for the following quantities on East6
charts:

We suggest that you do not alter the defaults until you are quite familiar with the
software.

42.1 Settings

825

<<< Contents

* Index >>>

43

Tutorial: Survival Endpoint

This tutorial introduces you to East 6, using examples for designing a clinical trial to
compare survival in two groups. It is suggested that you go through the tutorial while
you are at the computer, with East 6 running in it.

43.1

A Quick Feel of the
Software

When you open East 6, the screen will look as shown below.

In the tabs bar at the top of the ribbon, Design tab is already selected. Each tab has its
own ribbon. All the commands buttons under Design tab are displayed in its ribbon,
with suggestive icons. These commands have been grouped under the categories of
Continuous, Discrete, Count, Survival and General. For this tutorial, let us explore the
command Two Samples under Survival category. In East, we use the terms ’time to
event’ and ’survival’ interchangeably. Click on Two Samples. You will see a list of
826

43.1 A Quick Feel of the Software

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
action items, which are dialog box launchers.

Click on Logrank Test Given Accrual Duration and Study Duration. You will get
the following dialog box in the work area.

This dialog box is for computing Sample Size (n) and Number of Events. All the
default input specifications under the tab Design Parameters are on display: Design
Type=Superiority, Number of Looks=1, Test Type=1-Sided, Type-1 Error (α)=0.025,
Power (1-β)=0.9, Allocation Ratio (nt /nc )=1, # of Hazard Pieces=1, Input
Method=Hazard Rates, Hazard Ratio (λt /λc )=0.5, Log Hazard Ratio
ln(λt /λc )=-0.693, Hazard Rate (Control)=0.0347, Hazard Rate (Treatment)=0.0173,
and Variance of Log-Hazard Ratio=Null. There are two radio buttons in this dialog
box, one at the side of Power (1-β) box and the second at the side of the combined
boxes for Sample Size (n) and Number of Events. By default, the latter radio button is
43.1 A Quick Feel of the Software

827

<<< Contents

43

* Index >>>

Tutorial: Survival Endpoint
selected indicating that the items against this radio button are to be computed using all
other inputs. Similarly, if the first radio button is selected, then Power will be
computed using all other inputs.
Now click on the tab Accrual/Dropout and you will see the following dialog box.

The default specifications in this dialog box are: Subjects are followed=Until End of
Study, Accrual Duration=22, Study Duration=38, # of Accrual Periods=1, and no
Dropouts. Now accept all the default specifications that are displayed for this single
look design and be ready to compute the Sample Size (n) and the Number of Events
for the design. Click Compute.
At the end of the computation, you will see the results appearing at the bottom of the
screen, in the Output Preview pane, as shown below.

This single row of output preview contains relevant details of all the inputs and the
computed results for events and accruals. The maximum value for events is 88 and the
committed accrual is 182 subjects. Since this is a fixed-look design, the expected
events are same as the maximum required. Click anywhere in this row, and then click
on the
828

icon to get a detailed display in the upper pane of the screen as shown

43.1 A Quick Feel of the Software

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
below.

The contents of this output, displayed in the upper pane, are the same as what is
contained in the output preview row for Design1 shown in the lower pane, but the
upper pane display is easier to read and comprehend. The title of the upper pane
display is Output Summary. This is because, you can choose more than one design in
the Output Preview pane and the display in the upper pane will show the details of all
the selected designs in juxtaposed columns.
43.1 A Quick Feel of the Software

829

<<< Contents

43

* Index >>>

Tutorial: Survival Endpoint
The discussion so far gives you a quick feel of the software for computing the required
events and sample size for a single look survival design. We have not discussed about
all the icons in the output preview pane or the library pane or the hidden Help pane in
the screen. We will describe them taking an example for a group sequential design in
the next section.

43.2

Group Sequential
Design for a
Survival Superiority
Trial

43.2.1 Background
Information on the
study
43.2.2 Creating the design
in East
43.2.3 Design Outputs
43.2.4 East icons explained
43.2.5 Saving created
designs
43.2.6 Displaying Detailed
Output
43.2.7 Comparing Multiple
Designs
43.2.8 Events vs. Time plot
43.2.9 Simulation
43.2.10 Interim Monitoring

43.2.1

Background Information on the study

The randomized aldactone evaluation study (RALES) was a double-blind multicenter
clinical trial of aldeosterone-recepter blocker vs. placebo published in New England
Journal of Medicine (vol 341, 10, pages 709-717, 1999). This trial was open to patients
with severe heart failure due to systolic left ventricular dysfunction. The Primary
endpoint was all-causes mortality. The anticipated accrual rate was 960 patients/year.
The mortality rate for the placebo group was 38%. The investigators wanted 90%
power to detect a 17% reduction in the mortality hazard rate for the Aldactone group
(from 0.38 to 0.3154) with α = 0.05, 2-sided test. Six DMC meetings were planned.
The dropout rate in both the groups is expected to be 5% each year. The patient accrual
period is planned to be 1.7 years and the total study duration to be 6 years.

43.2.2

Creating the design in East

For our purpose, let us create our own design from the basic details of this study. Now
start afresh East. On the Design tab, click on Two Samples under Survival category.
You will see a list of action items, which are dialog box launchers.

Click on the second option Logrank Test Given Accrual Duration and Study

830

43.2 Group Seq. Design – 43.2.2 Creating the design in East

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
Duration. You will get the following dialog box in the work area.

All the specifications you see in this dialog box are default values, which you will have
to modify for the study under consideration.
Now, let the Design Type be Superiority.

Next, enter 6 in the Number of Looks box. You can see the range of choices for the

43.2 Group Seq. Design – 43.2.2 Creating the design in East

831

<<< Contents

43

* Index >>>

Tutorial: Survival Endpoint
number of looks is from 1 to 20.

Immediately after this selection, you will see a new tab Boundary Info added to the
input dialog box. We will look into this tab, after you complete the filling of current
tab Design Parameters.
Next, choose 2-Sided in the Test Type box.

Next, enter 0.05 in the Type-1 Error (α) box, and 0.9 in the Power box.

Next enter the specifications for survival parameters. Keep # of Hazard Pieces as 1.
Click on the check box against Hazard Ratio and choose Hazard Rates as the Input
Method. Enter 0.83 as the Hazard Ratio and 0.38 as the Hazard Rate (Control). East
computes and displays the Hazard Rate (Treatment) as 0.3154. Keep the default choice
of Null for Variance of Log-Hazard Ratio. Now the dialog box will look as shown

832

43.2 Group Seq. Design – 43.2.2 Creating the design in East

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
below.

Next click the tab Accrual/Dropout . Keep the specification ‘Until End of Study’ for
Subjects are followed. Enter 1.7 as Accrual Duration and 6 as Study Duration.
Keep # of Accrual Periods as 1. Change the # of Pieces for dropouts to 1. Choose
’Prob. of Dropout’ as the Input Method for entering information on dropouts. Enter
0.05 as probability of dropout at end of 1 year for both the groups. Now the dialog box
will appear as shown below.

Now click on the Boundary tab. In the dialog box of this tab, you can specify stopping
boundaries for efficacy or futility or both. For this trial, let us consider only Efficacy
boundaries only. Choose ’Spending Functions’ as the Efficacy Boundary Family.

43.2 Group Seq. Design – 43.2.2 Creating the design in East

833

<<< Contents

43

* Index >>>

Tutorial: Survival Endpoint
Choose ’Lan-DeMets’ in the Spending Function box.

Choose ’OF’ in the Parameter box.

Next, click the radio button near ’Equal’ for Spacing of Looks.
Choose ’Z Scale’ in the Efficacy Boundary Scale box.

In the table below of look-wise details, the columns - Info Fraction, Cumulative Alpha
Spent, and the upper and lower efficacy boundaries are computed and displayed as
shown here. Scroll a little bit to see the sixth look details.

The two icons
and
represent buttons for Error Spending Function chart
and Stopping Boundaries chart respectively. Click these two buttons one by one to see

834

43.2 Group Seq. Design – 43.2.2 Creating the design in East

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
the following charts.

43.2 Group Seq. Design – 43.2.2 Creating the design in East

835

<<< Contents

43

* Index >>>

Tutorial: Survival Endpoint
43.2.3

Design Outputs

Now you have completed specifying all the inputs required for a group sequential trial
design and you are ready to compute the required events and sample size or accruals
for the trial. Click on the Compute button. After the computation is over, East will
show in the Output Preview pane the following results:

This single row of output preview contains relevant details of all the inputs and the
computed results for events and accruals. The maximum required Events is computed
as 1243 and the Committed Accrual to be 1646 subjects. The expected Events under
H0 and H1 are estimated to be 1234 and 904 respectively. The expected Study
Duration under H0 and H1 are 5.359 and 3.729 respectively.
Click anywhere in this Output Preview row and then click on

836

43.2 Group Seq. Design – 43.2.3 Design Outputs

icon to get a

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
summary in the upper pane of the screen as shown below.

43.2.4

East icons explained

In the ’Output Preview’ pane, you see the following icons in the upper row.

The functions of the above icons are as indicated below. The tooltips also will indicate
their functions.

43.2 Group Seq. Design – 43.2.4 East icons explained

837

<<< Contents

43

* Index >>>

Tutorial: Survival Endpoint
Output Summary(The output summary of selected design(s) will appear in the
upper pane)
Edit Design (The input dialog box of a selected design will appear in the
upper pane)
Save in Workbook (Save one or more selected designs in a workbook)
Delete (Delete one or more selected designs)
Rename (Rename Design names)
Print (Print selected designs)
Display Precision (Local Settings)
Filter (Filter and select designs according to specified conditions)
Show/Hide Columns (Show/Hide Columns of the designs in the Output
Preview panel)
The following icons can be seen at the right end of Output Preview pane and Output
Summary or Input/Output window respectively. Their functions are:
Maximize Output Preview Pane
Minimize Output Preview Pane
You may also notice a row of icons at the top of Output Summary window as shown
below.

The first icon is for Plot (Plots of a selected design will appear in a pop-up window).

838

43.2 Group Seq. Design – 43.2.4 East icons explained

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
The second icon is for Show Tables (The data for different plots can be displayed in
tabular form in pop-up windows).

If you have multiple designs in the output summary window, the third icon becomes
active and can be used to move the order of those columns in the Output Summary.

The fourth icon is to print the Output Summary window.
As an example, if you click Power vs. Sample Size under Plot icon, you will get the
43.2 Group Seq. Design – 43.2.4 East icons explained

839

<<< Contents

43

* Index >>>

Tutorial: Survival Endpoint
following chart.

If you want to see the data underlying the above chart, click Show Table icon and click

840

43.2 Group Seq. Design – 43.2.4 East icons explained

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
Power vs. Sample Size. You will see the following table in a pop-up window.

You can customize the format of the above table and also save it as case data in a
workbook. You may experiment with all the above icon / buttons to understand their
functions.

43.2.5

Saving created Designs in the library and hard disk

In the Output Preview pane, select one or more design rows and click the

icon,

The selected design(s) will then get added as a node(s) in the current workbook, as

43.2 Group Seq. Design – 43.2.5 Saving created designs

841

<<< Contents

43

* Index >>>

Tutorial: Survival Endpoint
shown below.

The above action only adds the design to the workbook node in the library and it is not
saved in the hard disk. For saving in the hard disk, you may either save the entire
workbook or only the design by right-clicking on the desired item and choosing save
or save as options.
Here in the library also, you see rows of icons.

Some of these icons you have already seen. The functions of other icons are:
Details (Details of a selected design will appear on the upper pane in the work
area)
Output Settings (Output Settings can be changed here)
Simulate (Start the simulation process for any selected design node)
Interim Monitoring (Start the Interim Monitoring process for any selected
design)

43.2.6

Displaying Detailed Output

Select the design from the Library and click the
842

icon or Right-click on the Des1

43.2 Group Seq. Design – 43.2.6 Displaying Detailed Output

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
node in the library and click Details.

You will see the detailed output of the design displayed in the Work area.

43.2 Group Seq. Design – 43.2.6 Displaying Detailed Output

843

<<< Contents

43

* Index >>>

Tutorial: Survival Endpoint

43.2.7

Comparing Multiple Designs

Click on Des1 row and then click Edit icon
. You will get the input dialog box in
the upper pane. Change the Power value to 0.8 and then click Compute.
You will see now Des2 is created and a row added to Output Preview pane as shown
below.

Click on Des1 row and then keeping Ctrl key pressed, click on Des2 row. Now both
the rows will be selected. Next, click the Output Summary icon
844

43.2 Group Seq. Design – 43.2.7 Comparing Multiple Designs

.

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
Now you will see the output details of these two designs displayed in the upper pane
Compare Designs in juxtaposed columns, as shown below.

In a similar way, East allows the user to easily create multiple designs by specifying a
range of values for certain parameters in the design window. For example, in a survival
trial the Logrank Test given Accrual Duration and Study Duration design allows
the input of multiple key parameters at once to simultaneously create a number of
different designs. For example, suppose in a multi-look study the user wants to
generate designs for all combinations of the following parameter values: Power = 0.8
and 0.9, and Hazard Ratio - Alternative = 0.6, 0.7, 0.8 and 0.9. The number of
combinations is 2 x 4 = 8. East creates all permutations using only a single
specification under the Design Parameters tab in the design window. As shown
below, the values for Power are entered as a list of comma separated values, while the
alternative hazard ratios are entered as a colon separated range of values, 0.6 to 0.9 in

43.2 Group Seq. Design – 43.2.7 Comparing Multiple Designs

845

<<< Contents

43

* Index >>>

Tutorial: Survival Endpoint
steps of 0.1.

East computes all 8 designs and displays them in the Output Preview window:

East provides the capability to analyze multiple designs in ways that make
comparisons between the designs visually simple and efficient. To illustrate this, a
selection of a few of the above designs can be viewed simultaneously in both the
Output Summary section as well as in the various tables and plots. The following is a
subsection of the designs computed from the above example with differing values for
number of looks, power and hazard ratio. Designs are displayed side by side, allowing

846

43.2 Group Seq. Design – 43.2.7 Comparing Multiple Designs

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
details to be easily compared:

In addition East allows multiple designs to be viewed simultaneously either graphically
or in tabular format: Notice that all the four designs in the Output Summary window
are selected. Following figures compare these four designs in different formats.

43.2 Group Seq. Design – 43.2.7 Comparing Multiple Designs

847

<<< Contents

43

* Index >>>

Tutorial: Survival Endpoint
Stopping Boundaries (table)

Expected Sample Size (table)

848

43.2 Group Seq. Design – 43.2.7 Comparing Multiple Designs

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
Power vs. Sample Size (plot)

Total Sample Size / Events vs. Time (plot)

This capability allows the user to explore a greater space of possibilities when
determining the best choice of study design.

43.2.8

Events vs. Time plot

For survival studies, East provides a variety of charts and plots to visually validate and
43.2 Group Seq. Design – 43.2.8 Events vs. Time plot

849

<<< Contents

43

* Index >>>

Tutorial: Survival Endpoint
analyze the design. For example, the Sample Size / Events vs. Time plot allows the
user to see the rate of increase in the number of events (control and treatment) over
time (accrual duration, study duration). An additional feature of this particular chart is
that a user can easily update key input parameters to determine how multiple different
scenarios can directly impact a study. This provides significant benefits during the
design phase, as the user can quickly examine how a variety of input values affect a
study before the potentially lengthy task of simulation is employed.
To illustrate this feature what follows is the example from the RALES study. For study
details, refer to subsection Background Information on the study of this tutorial.
Currently there are ten designs in the Output Preview area. Select Des1 from them
and save it to the current workbook. You may delete the remaining ones at this point.
To view the Sample Size / Events vs. Time plot, select the corresponding node in the
Library and under the Charts icon choose Sample Size / Events vs. Time:

850

43.2 Group Seq. Design – 43.2.8 Events vs. Time plot

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
Survival parameters for this design can be edited directly through this chart by clicking
the Modify button. The Modify Survival Design window is then displayed for the
user to update design parameters:

To illustrate the benefit of the modification feature, suppose at design time there is
potential flexibility in the accrual and duration times for the study. To see how this may
affect the number of subsequent events, modify the design to change the Accrual
Duration to 3 and Study Duration to 4. Re-create the plot to view the effect of these
new values on the shape and magnitude of the curves by clicking OK:

Similar steps can be taken to observe the effect of changing other parameter values on
the number of events necessary to adequately power a study.

43.2 Group Seq. Design – 43.2.8 Events vs. Time plot

851

<<< Contents

43

* Index >>>

Tutorial: Survival Endpoint
43.2.9

Simulation

In the library, right-click on the node Des1 and click Simulate. You will be presented
with the following Simulation sheet.

This sheet has four tabs - Test Parameters, Response Generation, Accrual/Dropout, and
Simulation Controls. Additionally, you can click Include Options and add some more
tabs like Site, Randomization, User Defined R Function and Stratification. The first
three tabs essentially contain the details of the parameters of the design. In the
Simulation Control tab, you can specify the number of simulations to carry out and
specify the file for storing simulation data. Let us first carry out 1000 simulations to
check whether the design can reach the specified power of 90%. The Response
Generation tab, by default, shows the hazard rates for control and treatment. We will
use these values in our simulation.

In the Simulation Control tab, specify the number of simulations as 1000. Use the

852

43.2 Group Seq. Design – 43.2.9 Simulation

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
Random number seed as Fixed 12345.

Let us keep the values in other tabs as they are and click Simulate. The progress of
simulation process will appear in a temporary window as shown below.

This is the intermediate window showing the complete picture of simulations. Close
this window after viewing it. You can see the complete simulation output in the details
view. A new row, with the ID as Sim1, will be added in Output Preview.

43.2 Group Seq. Design – 43.2.9 Simulation

853

<<< Contents

43

* Index >>>

Tutorial: Survival Endpoint
Click on Sim1 row and click the Output Summary icon
. You will see Simulation
Output summary appearing in the upper pane. It shows that the simulated power as
0.892, indicating that in 892 out of 1000 simulations the boundary was crossed.

You can save Sim1 as a node in the workbook. If you right-click on this node and then
click Details, you will see the complete details of simulation appearing in the work

854

43.2 Group Seq. Design – 43.2.9 Simulation

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
area. Here is a part of it.

43.2.10

Interim Monitoring

Click Des1 node under workbook wbk1 and click the
icon. Alternatively, you
can right-click the Des1 node and select the item Interim Monitoring. In either case,
you will see the IM dashboard appearing as shown below.

43.2 Group Seq. Design – 43.2.10 Interim Monitoring

855

<<< Contents

43

* Index >>>

Tutorial: Survival Endpoint
In the top row, you see a few icons. For now, we will discuss only the first icon
which represents Test Statistic Calculator. Using this calculator, you will
enter the details of interim look data analysis results into the IM dashboard.
Suppose we have the following data used by the Data Monitoring Committee during
the first 5 looks of interim monitoring.
Date
Aug 96
Mar 97
Aug 97
Mar 98
Aug 98

Total Deaths
125
299
423
545
670

δ̂
-0.283
-0.195
-0.248
-0.259
-0.290

SE(δ̂)
0.179
0.116
0.097
0.086
0.077

Z-Statistic
-1.581
-1.681
-2.557
-3.012
-3.766

The first look was taken at 125 events and the analysis of the data showed the value of
δ= -0.283 and SE(δ)=0.179. First, click the blank row in the IM Dashboard and then
click the
icon. Now you can enter the first analysis results into the TS
calculator and click Recalc. The Test Statistic value will be computed and the TS

856

43.2 Group Seq. Design – 43.2.10 Interim Monitoring

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
calculator will appear as shown below.

Now click on the button ’OK’ to get the first look details into IM Dashboard. The
following message will appear that some required computations are being carried out.

After the computations are over, the output for the first look will appear in the IM

43.2 Group Seq. Design – 43.2.10 Interim Monitoring

857

<<< Contents

43

* Index >>>

Tutorial: Survival Endpoint
Dashboard as shown below.

For the first look at total number of events, 125, the Information Fraction works out to
be 0.101. The efficacy boundaries for this information fraction are newly computed.
The Repeated 95% Confidence Interval limits and Repeated p-value are computed and
displayed. You may also see that the charts at the bottom of the IM Dashboard have
been updated with relevant details appearing on the side.

In a similar way, enter the interim analysis results for the next 4 looks in the IM
Dashboard.
At the fifth look, the boundary is crossed. A message window appears as shown below.

858

43.2 Group Seq. Design – 43.2.10 Interim Monitoring

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
Click Stop and you will see the details of all the looks in the IM Dashboard as shown
below.

The final Adjusted Inference output also appears as displayed below.

One important point to note here is that this study got over almost about 2 years ahead
of planned schedule, because of the very favorable interim analysis results.
This completes the Interim Monitoring exercise in this trial.

43.3

User Defined R
Function
East allows you to customize simulations by inserting user-defined R functions for one
or more of the following tasks: generate response, compute test statistic, randomize
subjects, generate arrival times, and generate dropout information. The R functionality
for arrivals and dropouts will be available only if you have entered such information at
the design stage. Although the R functions are also available for all normal and
binomial endpoints, we will illustrate this functionality for a time-to-event endpoint.
Specifically, we will use an R function to generate Weibull survival responses.
Start East afresh. On the Design tab, click Survival: Two Samples and then Logrank
43.3 User Defined R Function

859

<<< Contents

43

* Index >>>

Tutorial: Survival Endpoint
Test Given Accrual Duration and Study Duration.

Choose the design parameters as shown below. In particular, select a one sided test
with type-1 error of α = 0.025.

Click Compute and save this design (Des1) to the Library. Right-click Des1 in the
Library and click Simulate. In the Simulation Control Info tab, check the box for
Suppress All Intermediate Output. Type 10000 for Number of

860

43.3 User Defined R Function

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
Simulations and select Clock for Random Number Seed.

In the top right-hand corner for the input window, click Include Options, and then
click User Defined R Function.

Go to the User Defined R Function tab. For now, leave the box Initialize R
simulation (optional) unchecked. This optional task can be used to load required
libraries, set seeds for simulations, and initialize global variables.
Select the row for Generate Response, click Browse..., and navigate to the folder
containing your R file. Select the file and click Open. The path should now be
displayed under File Name.

43.3 User Defined R Function

861

<<< Contents

43

* Index >>>

Tutorial: Survival Endpoint
Click View to open a notepad application to view your R file. In this example, we are
generating survival responses for both control and treatment arms from a Weibull with
shape parameter = 2 (i.e. exponential), with the same hazard rate in both arms. This
sample file is available in the folder named R Samples under installation directory
of East 6.

Copy the function name (in this case GenWeibull) and paste it into the cell for
Function Name. Save and close the R file, and click Simulate.

Return to the tab for User Defined R Function, select the Generate Response row,
and click View. In the R function, change the shape parameter = 1, to generate
responses from a Weibull distribution with increasing hazards. Save and close the R

862

43.3 User Defined R Function

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
file, and click Simulate. You may have to save this file on some other location.

Select both simulations (Sim1 and Sim2) from the Output Preview, and on the
toolbar, click

to display in the Output Summary.

Notice that the type-1 error appears to be controlled in both cases. When we simulated
from the exponential (Sim2), the average study duration (30.7 months) was close to
what was calculated at Des1 for the expected study duration under the null. However,
when we simulated from the Weibull with decreasing hazards (Sim1), the average
study duration increased to 34.6 months.
43.3 User Defined R Function

863

<<< Contents

43

* Index >>>

Tutorial: Survival Endpoint
The ability to use custom R functions for many simulation tasks allows considerable
flexibility in performing sensitivity analyses and assessment of key operating
characteristics.

864

43.3 User Defined R Function

<<< Contents

* Index >>>

44

Superiority Trials with Variable
Follow-Up

This chapter will illustrate through a worked example how to design, monitor and
simulate a two-sample superiority trial with a time-to-event trial endpoint. Each
subject who has not dropped out or experienced the event is followed until the trial
ends. This implies that a subject who is enrolled earlier could potentially be followed
for a longer time than a subject who is enrolled later on in the trial. In East we refer to
such designs as variable follow-up designs.

44.1

The RALES Clinical
Trial: Initial Design

The RALES trial (Pitt et al., 1999) was a double blind study of aldosterone-receptor
blocker spironolactone at a daily dose of 25 mg in combination with standard doses of
an ACE inhibitor (treatment arm) versus standard therapy of an ACE inhibitor (control
arm) in patients who had severe heart failure as a result of systolic left ventricular
dysfunction. The primary endpoint was death from any cause. Six equally-spaced
looks at the data using the Lan-DeMets-O’Brien-Fleming spending function were
planned. The trial was designed to detect a hazard ratio of 0.83 with 90% power at a
two-sided 0.05 level of significance. The hazard rate of the control arm was estimated
to be 0.38/year. The trial was expected to enroll 960 patients/year.
We begin by using East to design RALES under these basic assumptions. Open East,
click Design tab and then Two Samples button in Survival group. You will see the
following screen.

Note that there are two choices available in the above list; Logrank Test Given
Accrual Duration and Accrual Rates and Logrank Test Given Accrual Duration
and Study Duration. The option Logrank Test Given Accrual Duration and Study
Duration is explained later in Chapter 48. Now click Logrank Test Given Accrual

44.1 The RALES Clinical Trial: Initial Design

865

<<< Contents

44

* Index >>>

Superiority Trials with Variable Follow-Up
Duration and Accrual Rates and you will get the following input dialog box.

In the above dialog box, enter 6 for Number of Looks, keep the default choices of
Design Type: Superiority, change the Test Type to 2-Sided, Type I Error (α)
to 0.05, Power : 0.9, and the Allocation Ratio: 1.
Further, keep the default choices of # of Hazard Pieces as 1 and the Input Method:
as Hazard Rates. Click the check box against Hazard Ratio and enter the Hazard
Ratio as 0.83. Enter Hazard Rate (Control) as 0.38. You will see the Hazard
Rate (Treatment:Alt) computed as 0.3154. Also, keep the Variance of Log
Hazard Ratio to be used as under Null. Now the Test Parameters tab of the input

866

44.1 The RALES Clinical Trial: Initial Design

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
dialog will appear as shown below.

Now click on the tab Boundary. You will see the following input dialog box.

Keep all the default specifications for the boundaries to be used in the design. You can

44.1 The RALES Clinical Trial: Initial Design

867

<<< Contents

44

* Index >>>

Superiority Trials with Variable Follow-Up
look at the Error Spending Chart by clicking on the icon

Close this chart.
If you click on the boundary chart icon

868

, you will see the boundary chart as

44.1 The RALES Clinical Trial: Initial Design

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
displayed below.

Close this chart.
Now click Accrual/Dropouts tab. Keep the default choice Until End of Study
for the input Subjects are followed:. Keep the # of Accrual Periods as 1 and enter
960/year as the accrual rate. For this example, assume no dropouts. The dialog box

44.1 The RALES Clinical Trial: Initial Design

869

<<< Contents

44

* Index >>>

Superiority Trials with Variable Follow-Up
will look as shown below.

Under the Accrual section and in column titled Comtd. (commited) , you see two
radio buttons Durations and Subjects with the latter selected by default. The selected
item will appear as the x-axis item in the Study Duration vs. Accrual chart, which
you can get by clicking on the icon displayed on the side. Against Durations and
Subjects you see two rows of three cells each. The first and third cells will show the
min and max values for the row item and the middle cell, mid value between min and
max values. From these results, you see that any sample size in the range 1243 to 3111
will suffice to attain the desired 90% power and selects 2177, the mid-point of the
allowable range, as the default sample size. Depending on the needs of the study, you
may wish to use a different sample size within the allowable range. The choice of
sample size generally depends on how long you wish the study to last. The larger you
make the patient accrual the shorter will be the total study duration, consisting of
accrual time plus follow up time. To understand the essence of this trade-off, bring up

870

44.1 The RALES Clinical Trial: Initial Design

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
the Study Duration vs. Accrual chart by clicking on the icon

.

Based on this chart, a sample size of 1660 subjects is selected. Close the chart and
enter 1660 for Committed Accrual (subjects). Click on Compute and see
icon to
the results in the new design created under Output Preview. Click the
see the design summary. This sample size ensures that the maximum study duration
will be slightly more than 4.9 years. Additionally, under the alternative hypothesis, the

44.1 The RALES Clinical Trial: Initial Design

871

<<< Contents

44

* Index >>>

Superiority Trials with Variable Follow-Up
expected study duration will be only about 3.3 years.

44.2

Incorporating
Drop-Outs

The investigators expect 5% of the patients in both the groups to drop out each year.
To incorporate this drop-out rate into the design, in the Piecewise Constant
Dropout Rates tab, select 1 for the number of pieces and change the Input Method
from Hazard Rates to Prob. of Dropout. Then enter 0.05 as the
probability of dropouts at 1 year for both the groups.

To make Des1 and Des2 comparable, change the sample size of Des2 to 1660 by
872

44.2 Incorporating Drop-Outs

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
typing this value into the Committed Accrual (Subjects) cell. Click on
Compute and see the results in the new design created under Output Preview.
Select the two designs and click on

icon to see them side-by-side.

A comparison of two designs reveals that, because of the drop-outs, the maximum
study duration will be prolonged from 4.9 years under Des1 to 5.9 years under Des2.
The expected study duration will likewise be prolonged from 3.3 years to 3.7 years
under the alternative hypothesis, and from 4.5 years to 5.3 years under the null
hypothesis.

44.2 Incorporating Drop-Outs

873

<<< Contents

44
44.3

* Index >>>

Superiority Trials with Variable Follow-Up
Incorporating NonConstant Accrual
Rates

In many clinical trials, the enrollment rate is low in the beginning and reaches its
maximum expected level a few months later when all the sites enrolling patients have
been recruited. Suppose that patients are expected to enroll at an average rate of
400/year for the first six months and at an average rate of 960/year thereafter. Click on
icon on the bottom of your screen to go back to the input
the
window of Des2. Now in Accrual section, specify that there are two accrual periods
and enter the accrual rate for each period in the dialog box as shown below.

Once again let the sample size be 1660 to make Des3 comparable to the other two
designs. Click on Compute to complete the design. Select all the three designs in the

874

44.3 Incorporating Non-Constant Accrual Rates

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016

Output Preview area and click on

icon to see them side-by-side.

Notice that the enrollment period has increased from 1.7 years to 2 years. Likewise, the
maximum study duration and the expected study durations under H0 and H1 have also
increased relative to Designs 1 and 2. Now the maximum study duration is 6.15 years.

44.4

Incorporating
Piecewise Constant
Hazards

Prior studies had suggested that the survival curves might not follow an exponential
distribution. Suppose it is believed that the hazard rate for failure on the control arm
decreases after the first 12 months from 0.38 to 0.35. We will assume that the hazard
ratio is still 0.83. We can enter the appropriate piecewise hazard rates into East as
follows. Click on

icon on the bottom of your screen to go back to

44.4 Incorporating Piecewise Constant Hazards

875

<<< Contents

44

* Index >>>

Superiority Trials with Variable Follow-Up
the input window and go to Test Parameters tab.

Change the sample size to 1660 on Accrual/Dropouts tab for comparability with the
previous designs. Click on Compute and see the results of the design in the Output
Summary mode.

We observe that the impact of changing from a constant hazard rate to a piecewise
constant hazard rate is substantial. The maximum study duration has increased from
6.15 years for Des3 to 6.56 years for Des4. Before proceeding further, save all the four
designs in the workbook.
876

44.4 Incorporating Piecewise Constant Hazards

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016

44.5

Simulating a Trial
with Proportional
Hazards

44.5.1 Simulation
Worksheet
44.5.2 Simulating Under
H1
44.5.3 Simulating...

It would be useful to verify the operating characteristics of the various designs created
in the previous section by simulation. The new survival simulation capabilities in East
permit this. Let us use these capabilities to simulate Des4. Save this design in the
workbook. Right-click on this design node and select the menu item Simulate. You’ll
see the following Survival Simulation worksheet.

44.5.1

Components of the Simulation Worksheet

This simulation worksheet consists four tabs - Test Parameters, Response
Generation, Accrual/Dropouts, and Simulation Controls. The Test Parameters tab
displays all the parameters of the simulation. If desired, you may modify one or more
44.5 Simulating a Trial with Prop.Hazards – 44.5.1 Simulation Worksheet

877

<<< Contents

44

* Index >>>

Superiority Trials with Variable Follow-Up
of these parameter values before carrying out simulation. The second tab Response
Generation will appear as shown below.

In this tab, you may modify values of response parameters before carrying out
simulation. The third tab Accrual/Dropouts will display information relating to
accrual and dropouts.

As in the case of other tabs, you may modify one or more values appearing in this tab
before simulation is carried out.
In the Simulation Controls, you may specify the simulation parameters like

878

44.5 Simulating a Trial with Prop.Hazards – 44.5.1 Simulation Worksheet

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
number of simulations required and the desired simulation seed etc.

Also optionally, you may bring out one more tab Randomization by clicking on
Include Options button on the right hand top corner. In the Randomization,
you may alter the allocation ratio of the design before carrying out simulation. The
other tabs under the Include Options will be discussed elsewhere in this manual.

Keeping all the default parameter values same as in the different tabs, click
Simulate. You can see the progress of the simulation process summarized as shown
in the following screen shot. The complete summary of simulations will be displayed

44.5 Simulating a Trial with Prop.Hazards – 44.5.1 Simulation Worksheet

879

<<< Contents

44

* Index >>>

Superiority Trials with Variable Follow-Up
at the end of simulations.

Close this window. The simulation results appear in a row in the Output Preview
as shown below.

The output summary can be seen by clicking on the icon

880

after selecting the

44.5 Simulating a Trial with Prop.Hazards – 44.5.1 Simulation Worksheet

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
simulation row in the Output Preview.

Now save the simulation results to the workbook by selecting the simulation results
. On this newly added workbook node for simulation,
row and then clicking on
right-click and select Details. You will see the complete details simulation

44.5 Simulating a Trial with Prop.Hazards – 44.5.2 Simulation Worksheet

881

<<< Contents

44

* Index >>>

Superiority Trials with Variable Follow-Up
appearing on the output pane. The core part is shown below.

44.5.2 Simulating Under H1
Notice that in the above simulations, we did not change anything on the Response
Generation tab which indicates that we executed 10000 simulations under the designs
assumptions or in other words, under alternative hypothesis.
Let us examine these 10000 simulations more closely. The actual values may differ
from the manual, depending on the starting seed used.
The column labeled Events in the second table, displays the number of events after
which each interim look was taken. The column labeled Avg. Look Time in the
first table, displays the average calendar times at which each interim look was taken.
Thus, the first interim look (taken after observing 207 events) occurred after an average
elapse of about 1.5 years; the second interim look (taken after observing 414 events)
occurred after an average elapse of about 2.1 years; and so on. The remaining columns
of the simulation output are self-explanatory. The columns labeled Stopping For
show that 8966 of the 10000 simulations crossed the lower stopping boundary, thus
confirming (up to Monte Carlo accuracy) that this design has 90% power. The detailed
output tables also show how the events, drop-outs, accruals, and average follow-up
times were observed at each interim analysis.
882

44.5 Simulating a Trial with Prop.Hazards – 44.5.3 Simulating...

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016

44.5.3 Simulating Under H0
To simulate under the null hypothesis we must go back to the input window of Sim1
and then to the Response Generation tab. In this pane change the hazard rate for
the treatment arm to 0.38 for the first piece and to 0.35 for the second piece of the
hazard function.

This change implies that we will be simulating under the null hypothesis. Click on the
Simulate button. A new row in Output Preview will be added now. Select this row
and add to the library node. By double-clicking on this node, you will see the detailed
simulation output as shown below. The results are displayed below.

Out of 10000 simulated trials only 27 crossed the upper stopping boundary and 25
crossed the lower stopping boundary thus confirming (up to Monte Carlo accuracy)
that the type-1 error is preserved for this design.

44.5 Simulating a Trial with Prop.Hazards – 44.6.3 Simulating...

883

<<< Contents

44
44.6

* Index >>>

Superiority Trials with Variable Follow-Up
Simulating a
Trial with NonProportional
Hazards

44.6.1 Single-Look Design
44.6.2 Single-Look Design
44.6.3 Group Seq. Design

A new agent is to be tested against placebo in a large cardiovascular study with the
endpoint being time to stroke, MI or death. The control arm has a 12-month event-free
rate of 97%. We wish to design the study to detect a hazard ratio of 0.75 with 90%
power, using a two-sided test conducted at the 0.05 level. An important design
consideration is that treatment differences are expected to emerge only after one year
of therapy. Subjects will enroll at the rate of 1000/month and be followed to the end of
the study. The dropout rate is expected to be 10% per year for both treatment arms.
Finally, the study should be designed for maximum study duration of 50 months.
The usual design options in East are not directly applicable to this trial because they
require the hazard ratio to be constant under the alternative hypothesis. Here, however,
we are required to power the trial to detect a hazard ratio of 0.75 that only emerges
after patients have been on the study for 12 months. The simulation capabilities of East
can help us with the design.

44.6.1

Single-Look Design with Proportional Hazards

We begin by creating a single-look design powered to detect hazard ratio of 0.75,
ignoring the fact that the two survival curves separate out only after 12 months. Open a
new survival design worksheet by clicking on Design>Survival>Logrank Test
Given Accrual Duration and Accrual Rates. In the resulting Test Parameters
tab, enter the parameters values as shown below.

Click on the tab Accrual/Dropouts and enter the values as shown below,
884

44.6 Simulating a Trial with Non-Prop.Hazards – 44.6.1 Single-Look Design

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
excluding the Accrual tab.

East informs you in the Accrual tab, that any sample size in the range 2524 to 22260
will suffice to attain the desired 90% power. However, the study will end sooner if we
enroll more patients. Recall that we wish the trial to last no more than 50 months,
inclusive of accrual and follow-up. The Accrual-Duration chart can provide
guidance on sample size selection. This chart reveals that if 6400 subjects are enrolled,
the expected maximum duration of a trial is close to 50 months.

44.6 Simulating a Trial with Non-Prop.Hazards – 44.6.1 Single-Look Design

885

<<< Contents

44

* Index >>>

Superiority Trials with Variable Follow-Up
Now change the Comtd. number of subjects to 6400 and click on Compute to
complete the design. A new row is added for this design in the Output Preview. Select
this row and add it to a library node under a workbook. Now you double-click on this
node, you will see the detailed output. A section of it is shown below:

We can verify the operating characteristics of Des1 by simulation. With the cursor on
Des1 node, Click on Simulation icon from the library menu bar. You’ll be taken to the
survival simulation worksheet. In the Simulation Control tab, specify the
number of simulations to be 1000. Now click on Simulate button. This will
generate 1000 simulations from the survival curves specified in the design. Each
simulation will consist of survival data on 6400 subjects entering the trial uniformly at
the rate of 1000/month. Events (failures) will be tracked and the simulated trial will be
terminated when the total number of events equals 508. Subjects surviving past this
termination time point will have their survival times censored. The resulting survival
data will be summarized in terms of the logrank test statistic. Each simulation records
two important quantities:
the calendar time at which the last of the specified 508 events arrived;
whether or not the logrank test statistic rejected the null hypothesis.
We would expect that, on average, the 508 events will occur in about 48.7 months and
about 90% of the simulations will reject the null hypothesis. The simulation summary

886

44.6 Simulating a Trial with Non-Prop.Hazards – 44.6.1 Single-Look Design

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
is shown in the following screen shot.

Indeed we observe that the average study duration for this set of 1000 simulations was
48.691 months, and that 913 of the 1000 simulated trials crossed the critical value and
rejected H0 and hence the power attained is 0.913. This serves as an independent
verification of the operating characteristics of Des1, up to Monte Carlo accuracy.

44.6.2

Single-Look Design with Non-Proportional Hazards

Were it not for the fact that the hazard ratio of 0.75 only emerges after 12 months of
therapy, Des1 would meet the goals of this study. However, the impact of the late
separation of the survival curves must be taken into consideration. This is
accomplished, once again, by simulation. Click the Edit Simulation icon while the
cursor is on the last simulation node. In the resulting simulation sheet click on
Response Generation tab. In this tab, specify that the hazard rates for the
control and treatment arms are identical and equal to 0.0025 for the first 12 months and
the hazard ratio is 0.75 thereafter. This is done by making appropriate entries in this

44.6 Simulating a Trial with Non-Prop.Hazards – 44.6.2 Single-Look Design

887

<<< Contents

44

* Index >>>

Superiority Trials with Variable Follow-Up
tab as shown below.

Click on the Simulate button. This will generate 10000 simulations from survival
curves specified in the Survival Parameters Pane. As before, each simulation
will consist of survival data on 6400 subjects entering the trial uniformly at the rate of
1000/month. Events (failures) will be tracked and the simulated trial will be terminated
when the total number of events equals 508. The summary output of this simulation

888

44.6 Simulating a Trial with Non-Prop.Hazards – 44.6.2 Single-Look Design

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
run as shown below.

This time only 522 of the 1000 trials were able to reject H0 .The drop in power is of
course due to the fact that the two survival curves do not separate out until 12 months
have elapsed. Thus events that arise within the first 12 months arrive at the same rate
for both arms and are not very informative about treatment differences.
We need to increase the power of the study to 90%. This can be accomplished in one
of two ways:
1. Prolonging the study duration until a sufficient number of events are obtained to
achieve 90% power.
2. Increasing the sample size.
The first approach cannot be used because the study duration is not permitted to exceed
50 months. The simulations have shown that the study duration is already almost 50
months, and it has only achieved 56.5% power. Thus we must resort to increasing the
sample size.
44.6 Simulating a Trial with Non-Prop.Hazards – 44.6.2 Single-Look Design

889

<<< Contents

44

* Index >>>

Superiority Trials with Variable Follow-Up
Now if we increase the sample size while keeping the total number of events fixed at
508, the average study duration will drop. The power, however, may not increase. In
fact it might even decrease since a larger fraction of the 508 events will arise in the first
12 months, before the two survival curves have separated. To see this, increase the
sample size from 6400 to 10000 in the Accrual/Dropouts tab. Then click on
Simulate button. From this simulation run, you will get the output summary as
shown below.

Notice that the average study duration has dropped to 29.7 months. But the power has
dropped also. This time only 261 of the 10000 simulations could reject the null
hypothesis.
To increase power we must increase sample size while keeping the study duration fixed
at about 50 months. This is accomplished by selecting the Look Time option from
the drop-down box in the Fix at Each Look section of the Survival
Parameters Pane and choosing a 50 month Total Study Durn., while keeping the

890

44.6 Simulating a Trial with Non-Prop.Hazards – 44.6.2 Single-Look Design

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
sample size increase from 6400 to 10000.

We will now run 10000 simulations in each of which 10000 subjects are enrolled at the
rate of 1000/year. Each simulated trial will be terminated at the end of 50 months of
calendar time and a logrank test statistic will be derived from the data. Click on the
Simulate button. Add the simulation run output to library node and see the

44.6 Simulating a Trial with Non-Prop.Hazards – 44.6.2 Single-Look Design

891

<<< Contents

44

* Index >>>

Superiority Trials with Variable Follow-Up
following output summary.

892

44.6 Simulating a Trial with Non-Prop.Hazards – 44.6.2 Single-Look Design

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016

For more details, you can click

icon after selecting the saved simulation node.

Now you can see, the power of the study has increased to 73.5%. On average 811
events occurred during the 50 months that the study remained open. Since we require
90% power, the sample size must be increased even further. This can be done by trial
and error over several simulation experiments. Eventually we discover that a sample
size of 18000 patients will provide about 90% power with an average of 1358 events.

It is evident from these simulations that the proportional hazards assumption is simply
not appropriate if the survival curves separate out late. In the present example the
44.6 Simulating a Trial with Non-Prop.Hazards – 44.6.2 Single-Look Design

893

<<< Contents

44

* Index >>>

Superiority Trials with Variable Follow-Up
proportional hazards assumption would have led to a sample size of 6400 whereas the
sample size actually needed was 18000.

44.6.3

Group Sequential Design with Non-Proportional Hazards

The single-look design discussed in the previous section required a sample size of
17200 subjects. A group sequential design, monitored by an independent data
monitoring committee, is usually more efficient for large studies of this type. Such a
trial can be designed with efficacy stopping boundaries or with efficacy and futility
stopping boundaries. Consider first a design with five equally spaced efficacy
boundaries. Go back to the library, click on Des1 node, and then click on
. In
the resulting design input dialog window, change the entry in the Number of
Looks cell from 1 to 5. Click on Compute button and save the plan as Des2 in the
library. Select Des1 and Des2 nodes and then click on

894

to see the following

44.6 Simulating a Trial with Non-Prop.Hazards – 44.6.3 Group Seq. Design

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
details for both the designs.

Des2 reveals that a group sequential design, with five equally spaced looks, taken after
observing 104, 208, 312, 416 and 520 events, respectively, utilizing the default
Lan-DeMets-O’Brien-Fleming (LD(OF)) spending function, achieves 90% power
with a maximum sample size of 12555 and a maximum study duration of 27.232
months. The expected study duration under H1 is 21.451 months. However, these
operating characteristics are based on the assumption that the hazard ratio is constant
and equals 0.75. Since in fact the hazard ratio is 0.75 only after 12 months of
treatment, the actual power of this design is unlikely to be 90%. We can use simulation
to determine the actual power. With the cursor in any cell of Des2 node, select
44.6 Simulating a Trial with Non-Prop.Hazards – 44.6.3 Group Seq. Design

895

<<< Contents

44

* Index >>>

Superiority Trials with Variable Follow-Up
from the menu bar. You will be taken to the simulation worksheet. In the Response
Generation tab, make the changes in the hazard rates as shown below.

After changing the number of simulations as 1000 in the Simulation Control, click on
the Simulate button to run 1000 simulations of Des2 with data being generated from
the survival distributions that were specified in the Response Generation tab.

896

44.6 Simulating a Trial with Non-Prop.Hazards – 44.6.3 Group Seq. Design

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
The results of this simulation run are as shown below.

Only 187 of the 1000 simulated trials were able to reject the null hypothesis indicating
that the study is grossly underpowered. We can improve on this performance by
extending the total study duration so that additional events may be observed. To
increase study duration, go to the Simulation Parameters tab and select the
Look Time option under Fix at Each Look. We had specified at the outset that
the total study duration should not exceed 50 months. Let us therefore fix the total
study duration at 50 months and space each interim look 10 months apart by editing

44.6 Simulating a Trial with Non-Prop.Hazards – 44.6.3 Group Seq. Design

897

<<< Contents

44

* Index >>>

Superiority Trials with Variable Follow-Up
the Study Duration.

We are now ready to simulate a 5-look group sequential trial in which the LD(OF)
stopping boundaries are applied and the looks are spaced 10 months apart. Each
simulated trial will enroll 12555 subjects at the rate of 1000/month. The simulation
data will be generated from survival distributions in which the hazard rates of both
arms are 0.0025 for the first 12 months and the hazard ratio is 0.75 thereafter. To
generate 1000 simulations of this design click on the Simulate button. These
simulations do indeed show a substantial increase in power, from 18.7% previously to
79.9% .

898

44.6 Simulating a Trial with Non-Prop.Hazards – 44.6.3 Group Seq. Design

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
The design specifications stated, however, that the trial should have 90% power. In
order to achieve this amount of power we will have to increase the sample size. By
trial and error, upon increasing the sample size to 18200 on the Simulation
Parameters tab we observe that the power has increased to 90 % (up to Monte
Carlo accuracy).

44.6 Simulating a Trial with Non-Prop.Hazards – 44.6.3 Group Seq. Design

899

<<< Contents

44
44.7

* Index >>>

Superiority Trials with Variable Follow-Up
Simulating a Trial
with Stratification
variables

The data presented in Appendix I of Kalbfleisch and Prentice (1980) on lung cancer
patients were used as a basis for this example. We will design a trial to compare two
treatments (Standard and Test) in a target patient group where patients had some prior
therapy. The response variable is the survival time in days of lung cancer patients.
First, we will create a design for 3 looks, to compare the two treatment groups. Next,
using this design, we will carry out simulation with stratification variables. Three
covariates in the data are used here as stratum variables: a) type of cancer cell (small,
adeno, large, squamous,), b) age in years (<= 50, > 50), and c) performance status
score (<= 50, > 50 and <= 70, > 70).
The input data for base design are as follows: Trial type:superiority; test type:2-sided;
type I error:0.05; power:0.90; allocation ratio:1; hazard rate (control):0.009211; hazard
rate (treatment):0.004114; number of looks:3; Boundary family:spending functions;
spending function:Lan-DeMets (OF); subjects are followed:until end of study; subjects
accrual rate:12 per day.
The input data for stratified simulation are as given below: The number of stratum
variables=3 (cell type; age group; performance status score).
Table 44.1: Input data for stratified simulation

44.7.1

Cell type
small
adeno
large
squamous

Proportion
0.28
0.13
0.25
0.34

Hazard ratio
Baseline
2.127
0.528
0.413

Age group
≤ 50 years
> 50 years

Proportion
0.28
0.72

Hazard ratio
Baseline
0.438

Performance status score group
≤ 50
> 50 and ≤ 70
> 70

Proportion
0.43
0.37
0.20

Hazard ratio
Baseline
0.164
0.159

Creating the design

First we will create a design using the input data. Open East, click Design tab and then
Two Samples button in Survival group. Now click Logrank Test: Given Accrual
Duration and Accrual Rates. In the resulting screen, enter the input data in the dialog
900

44.7 Simulating a trial with stratification – 44.7.1 Creating the design

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
boxes under the different tabs. Finally click on Compute button. Now the dialog
boxes under the different tabs will appear as shown below.
The Test Parameters tab is shown below, where you can see the computed value of
No.of Events.

The Boundary will appear as shown below, where all the input data are seen.

44.7 Simulating a trial with stratification – 44.7.1 Creating the design

901

<<< Contents

44

* Index >>>

Superiority Trials with Variable Follow-Up
The Accrual/Dropouts tab containing the input data will be as shown below.

After the design is completed and saved in a workbook, select the design node and

902

44.7 Simulating a trial with stratification – 44.7.1 Creating the design

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
click on the output summary icon to see the following output display.

44.7.2

Running Stratified Simulation

After selecting the design node, click on Simulate icon. You will see simulation screen
with the dialog boxes under different tabs. Click on Include Options and select
Stratification.
44.7 Simulating a trial with stratification – 44.7.2 Running Stratified Simulation

903

<<< Contents

44

* Index >>>

Superiority Trials with Variable Follow-Up
The dialog box under Test Parameters will be as shown below. Keep the default test
statistic LogRank and the default choice of Use Stratified Statistic.

After entering the stratification input information, the dialog box under Stratification
will appear as shown below.

After entering adding response related input information, the dialog box under

904

44.7 Simulating a trial with stratification – 44.7.2 Running Stratified Simulation

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
Response Generation will display details as shown in the following screen shots.

44.7 Simulating a trial with stratification – 44.7.2 Running Stratified Simulation

905

<<< Contents

44

* Index >>>

Superiority Trials with Variable Follow-Up
The Accrual/Dropout dialog box will appear as shown below.

In the Simulation Control tab, specify number of simulations as 1000 and select the
choices under output options to save simulation data. The dialog box will appear as
shown below.

After clicking on Simulate button, the results will appear in the Output Preview row.
Click on it and save it in the workbook. Select this simulation node and click on

906

44.7 Simulating a trial with stratification – 44.7.2 Running Stratified Simulation

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
Output Summary icon to see the following stratification simulation output summary.

The stratified simulation results show that the attained power 0.856 is slightly less than
the design specified power of 0.90.

44.7 Simulating a trial with stratification

907

<<< Contents

* Index >>>

45

Superiority Trials with Fixed
Follow-Up

This chapter will illustrate through a worked example how to design, monitor and
simulate a two-sample superiority trial with a time-to-event endpoint in which each
subject who has not dropped out or experienced the event is followed for a fixed
duration only. This implies that each subject who does not drop-out or experience the
event within a given time interval, as measured from the time of randomization, will be
administratively censored at the end of that interval. In East we refer to such designs as
fixed follow-up designs.

45.1

Clinical Trial of
Drug Eluting Stents

Drug-eluting coronary-artery stents were shown to decrease the risks of death from
cardiac causes, myocardial infarction and target-vessel revascularization as compared
to uncoated stents in patients undergoing primary percutaneous coronary intervention
(PCI) in two randomized clinical trials published in the September 14, 2006 issue of
the New England Journal of Medicine. In the Paclitaxel-Eluting Stent versus
Conventional Stent in Myocardial Infarction with ST-Segment Elevation (PASSION)
trial, Laarman et al. (2006) randomly assigned 619 patients to receive either a
paclitaxel-eluting stent or an uncoated stent. The primary endpoint was the percentage
of cardiac deaths, recurrent myocardial infarctions or target-lesion revascularizations at
12 months. A marginally lower 12-month failure rate was observed in the
paclitaxel-stent group compared with the uncoated-stent group (8.8% versus 12.8%, p
= 0.09). The Trial to Assess the Use of the Cypher Stent in Acute Myocardial
Infarction Treated with Balloon Angioplasty (TYPHOON), (Spaulding et al., 2006)
showed even more promising results. In this trial of 712 patients the sirolimus-eluting
stents had a significantly lower target-vessel failure rate at 12 months than the
uncoated stents (7.3% versus to 14.3%, p = 0.004). Based on these results an editorial
by Van de Werf (2006) appeared in the same issue of the New England Journal of
Medicine as the Typhoon and PASSION trials, recommending that studies with a
larger sample size and a hard clinical endpoint be conducted so that drug-eluting stents
might be routinely implanted in patients undergoing PCI. In this chapter we will use
East to design and monitor a possible successor to the PASSION trial using a
time-to-event endpoint with one year of fixed follow-up for each subject.

45.2

Single-Look Design

The primary endpoint for the trial is the time to target-vessel failure, with a failure
being defined as target-vessel related death, recurrent myocardial infarction, or
target-vessel revascularization. Each subject will be followed for 12 months. Based on
the PASSION data we expect that 87.2% of subjects randomized to the uncoated stents
will be event-free at 12 months. We will design the trial for 90% power to detect an
increase to 91.2% in the paclitaxel-stents group, using a two-sided level-0.05 test.
Enrollment is expected to be at the rate of 30 subjects per month.

45.2.1 Initial Design

908

45.2 Single-Look Design – 45.2.1 Initial Design

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016

45.2.1

Initial Design

We begin by opening a new East Workbook and selecting Logrank Test Given
Accrual Duration and Accrual Rates.
This will open the input window for the design as shown below. Select 2-Sided for
Test Type, and enter 0.05 for Type I error.
The right hand side panel of this input window is to be used for entering the relevant
time-to event information.

The default values in the above dialog box must be changed to reflect the time-to-event
parameters specified for the design. Select % Cumulative Survival for the Input

45.2 Single-Look Design – 45.2.1 Initial Design

909

<<< Contents

45

* Index >>>

Superiority Trials with Fixed Follow-Up
Method and enter the relevant 12-month event-free percentages.

Change the Input Method to Hazard Rates. You will see the information you
entered converted as shown below. Note that you may need to change the decimal
display options for hazard rates using the
decimal places.

icon to see these numbers with more

Another parameter to be decided is the Variance which specifies whether the
calculation of the required number of events is to be based on the variance estimate of
log hazard ratio under the null hypothesis or the alternative hypothesis. The default
choice in East is Null. Most textbooks recommend this choice as well (see, for
example Collett, 1994, equation (2.21) specialized to no ties). It will usually not be
necessary to change this default. For a technical discussion of this issue refer to
910

45.2 Single-Look Design – 45.2.1 Initial Design

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
Appendix B, Section B.5.3
The second tab, labeled Accrual/Dropouts is used to enter the patient accrual rate
and, for fixed follow-up designs, the duration of patient follow-up and the dropout
information. In this example the clinical endpoint is progression-free survival for 12
months. Patients who are still on study at month 12 and who have not experienced the
endpoint will be treated as censored. Therefore, in the first panel out of two, we select
the entry from the dropdown that indicates that subjects are followed For Fixed
Period and enter the number 12 in the corresponding edit box. Suppose that the
anticipated rate of enrollment is 30 patients per month. This number is also entered into
the dialog box as shown below. Let the committed accrual of subjects be same as 2474.

The second panel, labeled Piecewise Constant Dropout Rates, is used to
enter the rate at which we expect patients to drop out of the study. For the present we
will assume that there are no drop-outs.

45.2 Single-Look Design – 45.2.1 Initial Design

911

<<< Contents

45

* Index >>>

Superiority Trials with Fixed Follow-Up
An initial design, titled Des1, is created in the Output Preview pane upon clicking
the Compute button. Click on

icon to save the design in a workbook or on

icon to see the output summary of this design.

East reveals that 268 events are required in order to obtain 90% power. If each patient
can only be followed for a maximum of 12 months, we must commit to enrolling a
total of 2474 patients over a period of 82.5 months. With this commitment we expect
to see the required 268 events within 12 months of the last patient being enrolled. So
the total study duration is expected to be 82.5 + 12 = 94.5 months. To see how the
912

45.2 Single-Look Design – 45.2.1 Initial Design

<<< Contents

* Index >>>
R

East 6.4
c Cytel Inc. Copyright 2016
events are expected to arrive over time, invoke a plot of Sample Size/ Events
vs.

Time by clicking the Plots icon

from the toolbar.

Uncheck the Sample Size box, to see the events graphs on a larger scale as shown
below.

45.3

Shortening the
Study Duration

45.3.1 Increasing the
Sample Size
45.3.2 Patient Follow-Up
45.3.3 Increasing the Rate
of Enrollment

Under Des1 the trial will last for 94.5 months, with 82.5 months of patient enrollment
(i.e., a sample size of 2474 subjects). This is not considered to be satisfactory to the
trial sponsor. There are three possible ways in which the study duration might be
shortened; by increasing the sample size, by increasing the duration of patient
follow-up, or by increasing the rate of patient enrollment.

45.3.1

Increasing the