East 6 User Manual
User Manual:
Open the PDF directly: View PDF .
Page Count: 2767
Download | |
Open PDF In Browser | View PDF |
<<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 i <<< Contents * Index >>> Preface Acknowledgements Welcome to East, a software platform for the statistical design, simulation and monitoring of clinical trials. The current release of East (version 6.4) was developed by a team comprising (in alphabetical order): Gordhan Bagri, Dhaval Bapat, Priyanka Bhosle, Jim Bolognese, Sudipta Basu, Jaydeep Bhattacharyya, Swechhya Bista, Apurva Bodas, Pushkar Borkar, V. P. Chandran, Soorma Das, Pratiksha Deoghare, Aniruddha Deshmukh, Namita Deshmukh, Yogesh Dhanwate, Suraj Ghadge, Pranab Ghosh, Karen Han, Aarati Hasabnis, Pravin Holkar, Munshi Imran Hossain, Abhijit Jadhav, Yogesh Jadhav, Prachi Jagtap, Paridhi Jain, Yannis Jemiai, Ashwini Joshi, Nilesh Kakade, Janhavi Kale, Aditya Kamble, Anthiyur Kannappan, Parikshit Katikar, Uday Khadilkar, Kapildev Koli, Yogita Kotkar, Hrishikesh Kulkarni, Mandar Kulkarni, Mangesh Kulkarni, Shailesh Kulthe, Charles Liu, Lingyun Liu, Shashank Maratkar, Cyrus Mehta, Pradoshkumar Mohanta, Manashree More, Tejal Motkar, Ankur Mukherjee, Nabeela Muzammil, Neelam Nakadi, Vijay Nerkar, Sandhya Paranjape, Gaurangi Patil, Vidyadhar Phadke, Anup Pillai, Shital Pokharkar, Vidyagouri Prayag, Achala Sabane, Sharad Sapre, Rohan Sathe, Pralay Senchaudhuri, Rhiannon Sheaparé, Pradnya Shinde, Priyadarshan Shinde, Sumit Singh, Sheetal Solanki, Chitra Tirodkar, Janhavi Vaidya, Shruti Verma, Pantelis Vlachos, Suchita Wageshwari, Kiran Wadje, Ritika Yadav. Others contributors to this release include Asmita Ghatnekar, Sam Hsiao, Brent Rine, Ajay Sathe, Chinny Swamy, Nitin Patel, Yogesh Gajjar, Shilpa Desai. Other contributors who worked on previous releases of East: Gayatri Bartake, Ujwala Bamishte, Apurva Bhingare, Bristi Bose, Chandrashekhar Budhwant, Krisnaiah Byagari, Vibhavari Deo, Rupali Desai, Namrata Deshpande, Yogesh Deshpande, Monika Ghatage, Ketan Godse, Vishal Gujar, Shashikiran Halvagal, Niranjan Kshirsagar, Kaushal Kulkarni, Nilesh Lanke, Manisha Lohokare, Jaydip Mukhopadhyay, Abdulla Mulla, Seema Nair, Atul Paranjape, Rashmi Pardeshi, Sanket Patekar, Nabarun Saha, Makarand Salvi, Abhijit Shelar, Amrut Vaze, Suryakant Walunj, Sanhita Yeolekar. We thank all our beta testers for their input and obvious enthusiasm for the East software. They are acknowledged by name in Appendix Z. We owe a debt of gratitude to Marvin Zelen and to Swami Sarvagatananda, special ii Preface <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 people whose wisdom, encouragement and generosity have inspired Cytel for over two decades. Finally, we dedicate this software package to our families and to the memory of our dearly departed Stephen Lagakos and Aneesh Patel. Our Philosophy We would like to share with you what drives and inspires us during the research and development stages of the East software. Empower, do not Frustrate We believe in making simple, easy-to-use software that empowers people. We believe that statisticians have a strategic role to play within their organization and that by using professionally developed trial design software they will utilize their time better than if they write their own computer programs in SAS or R to create and explore complex trial designs. With the help of such software they can rapidly generate many alternative design options that accurately address the questions at hand and the goals of the project team, freeing time for strategic discussions about the choice of endpoints, population, and treatment regimens. We believe that software should not frustrate the user’s attempt to answer a question. The user experience ought to engage the statistician and inspire exploration, innovation, and the quest for the best design. To that end, we believe in the following set of principles: Fewer, but Important and Useful Features It is better to implement fewer, but important and useful features, in an elegant and simple-to-use manner, than to provide a host of options that confuse more than they clarify. As Steve Jobs put it: ’Innovation is not about saying ”Yes” to everything. It’s about saying ”No” to all but the most crucial features.’ Just because we Can, doesn’t mean we Should Just because we can provide functionality in the software, doesn’t mean we should. Simplify, Simplify, Simplify Find and offer simple solutions - even for the most complex trial design problems. Don’t Hurry, but continually Improve Release new solutions when they are ready to use and continually improve the commercial releases with new features, bug fixes, and better documentation. Provide the best Documentation and Support Our manuals are written like textbooks, to educate, clarify, and elevate the statistical knowledge of the user. Preface iii <<< Contents * Index >>> Preface Our support is provided by highly competent statisticians and software engineers, focusing on resolving the customer’s issue, and being mindful of the speed and quality requirements. We believe that delivering delightful customer support is essential to our company’s lifeblood. Finally, we listen to our customers constantly and proactively through countless informal and formal interactions, software trainings, and user group meetings. This allows us to follow all the principles laid out above in the most effective manner. Assess It is essential to be able to assess the benefits and flaws of various design options and to work one’s way through a sensitivity analysis to evaluate the robustness of design choices. East can very flexibly generate multiple fixed sample size, group sequential, and other adaptive designs at a click of a button. The wealth of design data generated in this manner requires new tools to preview, sort, and filter through in order to make informed decisions. Share Devising the most innovative and clever designs is of no use if the statistician is unable to communicate in a clear and convincing manner what the advantages and characteristics of the design are for the clinical trial at hand. We believe statistical design software tools should also be communication tools to share the merits of various trial design options with the project team and encourage dialog in the process. The many graphs, tables, simulation output, and other flexible reporting capabilities of East have been carefully thought out to provide clear and concise communication of trial design options in real time with the project team. Trust East has been fully validated and intensely tested. In addition, the East software package has been in use and relied upon for almost 20 years. East has helped design and support countless actual studies at all the major pharmaceutical and biotech companies, academic research centers, and government institutions. We use and rely on our software every day in our consulting activities to collaborate with our customers, helping them optimize and defend their clinical trial designs. This also helps us quickly identify things that are frustrating or unclear, and improve them fast - for our own sake and that of our customers. iv Preface <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 What’s New in East 6.4 Version 6.4 of East introduces some important new features: 1. Multi-arm multi-stage designs East now offers the ability to design multi-arm multi-stage studies with options for early stopping, dose selection, and sample size re-estimation. The group sequential procedures (Gao et al., 2014) have been implemented for normal endpoint whereas the p-value combination approaches (Posch et al. 2005) have been implemented for both normal and binomial endpoints. See Chapters 17, 18 and 29 for more details. 2. Multiple endpoints designs for binomial endpoints Gatekeeping procedures to control family-wise type-1 error when testing multiple families of binomially distributed endpoints are now available in East for fixed sample (1-look) designs. East will also use the intersection-union test when testing a single family of endpoints. See Chapter 16 and 28 for more details. 3. Multi-arm designs for survival endpoints Designs for pairwise comparisons of treatment arms to control have been added for survival endpoints. See Chapter 51 for more details. 4. Enrollment and event prediction East now includes options to predict enrollment and events based on accumulating blinded data and summary statistics. Prediction based on unblinded data was already implemented in the previous version so the current version provides both options - Unblinded as well as Blinded. See Chapter 68 for more details. 5. Dual agent dose-escalation designs This version of East adds methods to the Escalate module for dual-agent dose-escalation designs, including the Bayesian logistic regression model (BLRM; Neuenschwander et al., 2014), and the Product of Independent beta Probabilities dose Escalation (PIPE; Mander et al., 2015). Numerous feature enhancements have also been made to the existing single-agent dose escalation designs. See Chapter 32 for more details. 6. Bayesian probability of success (assurance) and predictive power for survival designs East 6.4 will now calculate assurance (O’Hagan et al., 2005), or Bayesian probability of success, and predictive power for survival endpoints. See Chapter 48 for more details. 7. Interim monitoring using Muller and Schafer method East6.4 will now provide the capability of monitoring clinical trials using the adaptive approach. It can be done using the Muller and Schafer method. Currently, this feature is Preface v <<< Contents * Index >>> Preface available for Survival Endpoint tests only. See Chapter 56 for more details. 8. General usability enhancements Numerous enhancements have been made to the software to improve the user experience and workflow. What’s New in East 6.3 Version 6.3 of East introduces some important new features: 1. Updates to Promising Zone designs: Ratio of Proportions designs; Müller and Schäfer type-1 error control method; Estimation East 6.3 introduces Promising Zone designs for the ratio of proportions. East 6.3 also implements the method of Müller and Schäfer (2001) to control type-1 error for adaptive unblinded sample size re-estimation designs. This is available for simulation and interim monitoring. Also estimation using Repeated Confidence Intervals (RCI) and Backward Image Confidence Intervals (BWCI) (Gao, Liu & Mehta, 2013) are available in Müller and Schäfer simulations. See Chapter 52 for more details. 2. Multiple endpoint designs Parallel gatekeeping procedures to control family-wise type-1 error when testing multiple families of normally distributed endpoints are now available in East for fixed sample (1-look) designs. East will also use the intersection-union test when testing a single family of endpoints. See Chapter 16 for more details. 3. Exact designs for binomial endpoints East now includes the ability to use the exact distribution when computing power and samples size for binomial endpoints. This applies for all binomial tests in the case of fixed designs. In addition, group sequential exact designs are available for the single proportion case, and the Simon’s two-stage optimal and minimax designs (Simon, 1989) have been implemented that allow for early futility stopping while optimizing the expected sample size and the maximum sample size, respectively. See Chapter 33 for more details. 4. Dose escalation designs East 6.3 now includes a module for the design, simulation, and monitoring of modern dose-escalation clinical trials. Model-based dose-escalation methods in this module include the Continual Reassessment Method (mCRM; Goodman et al., 1995), the Bayesian logistic regression model (BLRM; Neuenschwander et al., 2008), and the modified Toxicity Probability Interval (mTPI; Ji et al., 2010). See Chapter 32 for more details. 5. Predictive interval plots, conditional simulations, , and enrolment/events vi Preface <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 prediction East 6.3 now includes a module that offers the ability to simulate and forecast the future course of the trial based on current data. This includes conditional simulations to assess expected treatment effects and associated repeated confidence intervals at future looks (also called Predicted Interval Plots or PIP; Li et al. 2009), as well as the probability of finishing with a successful trial (conditional power). You can also plan and simulate clinical trials with greater precision using different accrual patterns and response information for different regions/sites. East allows you to make probabilistic statements about accruals, events, and study duration using Bayesian models and accumulating data. See Chapters 65, 66 and 67 for more details. 6. Sample size and information calculators Sample size and information calculators have been added back into East to allow easy calculation of the two quantities. See Chapter 59 for more details. 7. Exporting/Importing between East and East Procs East 6.3 designs can now be exported to work with the newly released East Procs. The output from East Procs can be imported back into East 6.3 for use in the East Interim Monitoring dashboard and to conduct conditional inference and simulations. See Chapters 69 for more details. 8. Changes to East input Many changes have been implemented in East to enhance the user experience in providing input for their designs. These changes include the ability to specify multiple values of input parameters for survival designs (most notably the Hazard Ratio), the ability to directly convert many fixed sample designs into group sequential designs with the use of the Sample Size based design option, and the ability to convert an ANOVA design into a Multiple Comparison to Control design. 9. Changes to East output Display of East output has been changed in many ways, including color coding of input and output, ability to collapse and expand individual tables, greater decimal display control, and more exporting options for results (e.g. ability to export graphs directly into Microsoft Power Point). What’s New in East 6.2 Version 6.2 of East introduces some important new features: 1. Promising Zone Designs using CHW and CDL type-1 error control methods Preface vii <<< Contents * Index >>> Preface East 6.2 introduces Promising Zone Designs from East 5.4 for differences of means, proportions, and the log-rank test. The methods of Cui, Hung, and Wang (1999) and Chen, DeMets, and Lan (2003) are implemented for adaptive unblinded sample size re-estimation designs and available for simulation and interim monitoring. 2. Multiple endpoint designs Serial gatekeeping procedures to control family-wise type-1 error when testing multiple families of normally-distributed endpoints are now available in East for fixed sample (1-look) designs. 3. Power and sample size calculations for count data East now offers power analysis and sample size calculations for count data in fixed sample (1-look) designs. Specifically, East provides design capabilities for: (a) Test of a single Poisson rate (b) Test for a ratio of Poisson rates (c) Test for a ratio of Negative Binomial rates 4. Precision-based sample size calculations Sample size calculations are now available based on specification of a confidence interval for most tests provided in East. What’s New in East 6.1 Version 6.1 of East introduces some important new features: 1. Bayesian probability of success (assurance) and predictive power For one-sample and two-sample continuous and binomial endpoints, East 6.1 will now compute Assurance (O’Hagan et al., 2005) or Bayesian probability of success, a Bayesian version of power, which integrates power over a prior distribution of the treatment effect, giving an unconditional probability that the trial will yield a significant result. When monitoring such a design using the Interim Monitoring dashboard, East 6.1 will also compute Bayesian predictive power using the pre-specified prior distribution on the treatment effect. This computation will be displayed in addition to the fiducial version of predictive power, which uses the estimated treatment effect and standard error to define a Gaussian prior distribution. 2. Stratification in simulation of survival endpoints When simulating a trial design with a time-to-event endpoint, East 6.1 accommodates data generation in a stratified manner, accounting for up to 3 stratification variables and up to 25 individual strata. The fraction of subject data generated in each stratum, and the survival response generation mechanism for each stratum, can be flexibly adjusted. In addition, stratified versions of the logrank statistic and other test statistics available for analysis of the simulated data are provided. viii Preface <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 3. Integration of R code into simulations East 6.1 simulations now include the option to use custom R code to define specific elements of the simulation runs. R code can be used to modify the way the subjects are accrued, how they are randomized, how their response data are generated, and how the test statistic is computed. 4. Reading East 5.4 workbooks East 5.4 workbooks can be read into East 6.1 after conversion using the utility provided in the program menu. Go to the start menu and select: Programs > East Architect > File Conversion> East5 to East6 5. Floating point display of sample size East 6.1 now has a setting to choose whether to round sample sizes (at interim and final looks) up to the nearest integer, or whether to display them as a floating point number, as in East 5. (See 6. Enhancement to the Events vs. Time plot This useful graphic for survival designs has been updated to allow the user to edit study parameters and create a new plot directly from a previous one, providing the benefit of quickly assessing the overall impact of input values on a design prior to simulation. (See 7. Interim monitoring (IM) dashboard The capability to save snapshots of the interim monitoring (IM) dashboard is now supported in East 6.1. At each interim look of a trial, updated information can be saved and previous looks can be easily revisited. Alternatively, prior to employing actual data this functionality could be used to compare multiple possible scenarios, providing the user a sense of how a future trial could unfold. 8. Enhancement to the Logrank test For trials with survival endpoints, East 6.1 allows the user to simultaneously create multiple designs by specifying a range of values for key parameters in t Logrank test. (See Subsection 9. Enhancement to binomial designs For studies with discrete outcomes, East 6.1 allows the user to simultaneously create multiple designs by specifying a range of values for key parameters. What’s New in East 6.0 on the Architect Platform East Architect is version 6.0 of the East package and builds upon earlier versions of the software. The transition of East to the next generation platform that is Architect has abandoned all prior dependencies of Microsoft Excel. As a result the user interface is very different leading to a new user experience and workflow. Although you might find that there is a learning curve to getting comfortable with the software, we trust that you will find that the new platform provides for a superior user experience and improved workflow. Preface ix <<< Contents * Index >>> Preface The Architect platform also adds data management and analysis capabilities similar to those found in Cytel Studio, StatXact, and LogXact, as well as a powerful reporting tool we call Canvas, which provides flexible and customizable reports based on design and simulation information. Version 6.0 of East introduces some important new features in addition to the new platform environment. Here is a selection: 1. New designs A large number of fixed sample designs have been added for various endpoints and trial types. These were present in the SiZ software and have now been fully integrated into East. 2. Multi-arm designs Designs for pairwise comparisons of treatment arms to control have been added for differences of means and differences of proportions. These designs are mostly simulation-based and provide operating characteristics for fixed sample studies using multiplicity adjusting procedures such as Dunnett’s, Bonferroni, Sidak, Hochberg, Fallback, and others. 3. Creation of multiple designs or simulations at once: East Architect provides the ability to create multiple designs or to run multiple simulation scenarios at once, by specifying lists or sequences of values for specific parameters rather than single scalars. This capability allows the user to explore a greater space of possibilities or to easily perform sensitivity analysis. Accompanying tools to preview, sort, and filter are provided to easily parse the large output generated by East. 4. Response lag, accrual, and dropouts for continuous and discrete endpoints: Designs created for continuous and discrete endpoints now have the option for the user to specify a response lag (between randomization and observation of the endpoint), as well as an accrual rate and dropout rate for the study population. As a result, some terminology has been introduced to distinguish between the number of subjects who need to be enrolled in the study (Sample Size) and the number of subjects whose endpoint must be observed in order to properly power the study (Completers). 5. Flexibility in setting up boundaries Both the efficacy and futility rules of a design need not be present at each and every look anymore. The user can specify whether a look includes either the efficacy stopping rule or the futility rule or both. Therefore, a design can be set up where at the first look only futility stopping is possible, whereas at later looks both efficacy and futility or maybe only efficacy stopping is allowed. In addition, the futility rule can now be specified on two new scales, which are the standardized treatment scale and the conditional power scale. 6. Predictive power Predictive power is now provided as an alternative to conditional power in the interim monitoring sheet of the software. Further x Preface <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 details about how this is implemented can be found in the appendix C. 7. Comparing designs One can compare multiple designs either graphically or in tabular format simply by selecting them and choosing a plot or table output button. 8. Improvements in algorithms Many improvements have been made to the way computations are performed, both to improve accuracy and speed, but also to provide more intuitive results. For example, older versions of East used an approximation to conditional power based on ignoring all future looks but the final one. This approximation has been dropped in favor of computing the exact value of conditional power. Many other changes have been made that might result in different values being computed and displayed in East Architect as compared to earlier versions of the software. For greater details about the changes made, please refer to the ”Read Me” notes that accompany the software release. What’s New in East 5 After East 5 (version 5.0) was released, a few upgrades have been issued. The details are: 1. 2. 3. 4. In the current release of version 5.4, the module EastSurvAdapt has been added. In the previous version 5.3, the module EastAdapt was substantially revised. In the earlier version 5.2, the module EastExact was released. In the still earlier version 5.1, several improvements were introduced in EastSurv module. The details of these modules can be found in the respective chapters of the user manual. East 5 upgraded the East system in several important ways in direct response to customer feedback. Six important extensions had been developed in East 5: 1. Designs using t-tests: In previous versions of East, the single look design was treated as a special case of a group sequential design. Thus the same large sample theory was used to power and size these traditional types of designs. Recognizing this solution not to be entirely satisfactory for small sample trials, in East 5, we have implemented single-look t-test designs for continuous data. (Sections 8.1.4, 8.2.4, 9.1.3, and 11.1.3) 2. New boundaries: East 5 provides two new procedures for specifying group sequential boundaries. Generalized Haybittle-Peto boundaries allow the user to specify unequal p-values at each interim look for a group sequential plan. East will Preface xi <<< Contents * Index >>> Preface recalculate the final p-value in order to preserve the type I error. (Section 38.1) The cells for entering the cumulative alpha values of an interpolated spending function can be automatically populated with the cumulative alpha values of any of the published spending functions available to East, and subsequently edited to suit user requirements. For example, a 4-look Lan and DeMets O’Brien-Fleming spending function can be modified so that the critical value at the first look is less conservative than usual. (Section 38.3.1) 3. Interim monitoring and simulation for single-look designs: Interim monitoring and simulation sheets have been provided for all single look designs in East 5. 4. Improvement to Charts: Many improvements to existing charts in East have been implemented in this version. Scaling in the Duration vs. Accrual chart has been corrected to provide a better tool for the user. The use of semi-log scaling has enabled us to represent many charts on the natural scale of the treatment effect. This concerns mostly any ratio and odds ratio metrics such as the relative risk, the hazard ratio, and the odds ratio. Boundaries on the relative risk scale for example are now available in East 5. Boundaries can also be visualized on the score scale. Charts can be summarized in tabular form. Option is given to the user to generate tables of power vs. sample size, power vs. treatment effect, events vs. time, and so on. These tables can easily be copied and pasted into external applications like Microsoft Word and Excel (Section 4.5) 5. Improved usability: Much attention in East 5 was spent to improve the user’s experience within the environment. A graph sheet allows the user to compare up to 16 charts side by side. Charts for any number of plans within a workbook can be exported to the graph sheet. (Section 5.3) The scratch sheet is a full-fledged Microsoft Excel sheet that can be brought up within the East application . (Section 4.4) The split view option enables the user to see two sheets of the same workbook simultaneously. This can be useful if one window pane contains a scratch sheet where side calculations may be done based on numbers in xii Preface <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 the other window pane. Another use can be to have two or plans to show up on one pane and their graphsheet containing boundaries or other charts to show up on another pane for easy comparison. (Section 4.8) Messages in the help menu, pop-up help, and context sensitive help have been revised and rendered more informative to the user. The default appearance of charts can be specified by the user through the preferences settings menu item. (Section 4.7) 6. Installation validation: East 5 includes an installation validation procedure that will easily check that the software has been properly installed on the user’s system. (Section 2.3) Finally, there has been an important reorganization of the East manual, which now comprises seven volumes organized as follows: (1) The East System (2) Continuous Endpoints (3) Binomial and Categorical Endpoints (4) Time-to-Event Endpoints (5) Adaptive Designs (6) Special Topics (7) Appendices. Page numbers are continuous through volumes 1-7. Each volume contains a full table of contents and index to the whole manual set. Preface to East 4 East 4 was a very large undertaking involving over 20 developers, documenters, testers and helpers over a two-year period. Our goal was to produce one single powerful design and monitoring tool with a simple, intuitive, point and click, menu driven user interface, that could cover the full range of designs commonly encountered in a clinical trial setting, for either fixed sample or group sequential designs. The resulting product, East 4, extends the East system for flexible design and interim monitoring in four major ways as listed below. 1. Broad Coverage: Previous versions of East dealt primarily with the design of two-arm group sequential trials to detect a difference of means for normal and binomial endpoints and a hazard ratio for survival endpoints. East 4 extends these capabilities to other settings. Easily design and monitor up to 34 different clinical trial settings including one-, two- and K-sample tests; linear, logistic and Cox regression; longitudinal designs; non-inferiority and bioequivalence designs; cross-over and matched-pair designs; nonparametric tests for continuous and ordered categorical outcomes. Comparisons between treatment and control groups can be in terms of differences, ratios or odds ratios. Preface xiii <<< Contents * Index >>> Preface Non-inferiority trials can be designed to achieve the desired power at superiority alternatives 2. New Stopping Boundaries and Confidence Intervals: Non-binding futility boundaries. Previously futility boundaries could not be overruled without inflating the type-1 error. New non-binding futilty boundaries preserve power and type-1 error and yet can be overruled if desired. Asymmetric two-sided efficacy boundaries. You can allocate the type-1 error asymmetrically between the upper and lower stopping boundaries, and can spend it at different rates with different error spending functions. This will provide added flexiblity for aggressive early stopping if the treatment is harmful and conservative early stopping if the treatment is beneficial. Futility boundaries can be represented in terms of conditional power. This brings greater objectivity to conditional power criteria for early stopping. Two sided repeated confidence intervals are now available for one-sided tests with efficacy and futility boundaries. Previously only one-sided confidence bounds were available. Interactive repeated confidence intervals are provided at the design stage to aid in sample size determination and selection of stopping boundaries. 3. New Analytical and Simulation Tools for Survial Studies: EastSurv is an optional new module, fully integrated into the East system, that extends East’s design capabilities to survival studies with non-uniform accrual, piecewise exponential distributions, drop outs, and fixed length of follow-up for each subject. Designs can be simulated under general settings including non-proportional hazard alternatives. 4. Design and Simulation of Adaptive Trials: EastAdapt is an optional new module, fully integrated into the East system, that permits data-dependent changes to sample size, spending functions, number and spacing of interim looks, study objectives, and endpoints using a variety of published flexible approaches. In addition to these substantial statistical capabilities, East 4 has added numerous improvements to the user interface including clearer labeling of tables and graphs, context sensitive help, charts of power versus sample size and power versus number of events, convenient tools for calculating the test statistics to be entered into the interim monitoring worksheet for binomial endpoints, and the ability to type arithmetic expressions into dialog boxes and into design, interim monitoring and simulation worksheets. xiv Preface <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Preface to East 3 East 3 is a major upgrade of the East-2000 software package for design and interim monitoring of group sequential clinical trials. It has evolved over a three-year period with regular input from our East-2000 customers. The main improvements that East 3 offers relative to East-2000 are greater flexibility in study design, better tracking of interim results, and more powerful simulation capabilities. Many of our East-2000 customers expressed the desire to create group sequential designs that are ultra-conservative in terms of stopping early for efficacy, but which can be quickly terminated for futility. The extremely wide selection of spending functions and stopping boundaries in East 3, combined with its interactive Excel-based spreadsheet user interface for comparing multiple designs quickly and effortlessly, make such designs possible. The interim monitoring module of East 3 has been completely revised, with a “dashboard” user interface that can track the test statistic, error spent, conditional power, post-hoc power and repeated confidence intervals on a single worksheet, over successive interim monitoring time points, for superior trial management and decision making by a data monitoring committee. Finally, we have enhanced the simulation capabilities of East 3 so that it is now possible to evaluate the operating characteristics not only of traditional group sequential designs, but also of adaptive designs that permit mid-course alterations in the sample size based on interim estimates of variance or treatment effect. A list of the substantial new features in East 3 relative to East-2000 is given below. (The items on this list beginning with ‘(*)’ are optional extras.) New Design Features 1. Design of non-inferiority trials. 2. Design of trials with unequally spaced looks. 3. Use of Lan and DeMets (1983) error spending functions to derive stopping boundaries. 4. (*) Flexible stopping boundaries derived from the gamma spending function family (Hwang, Shih and DeCani, 1990) and the rho spending function family (Kim and DeMets, 1987). 5. Haybittle-Peto stopping boundaries (Haybittle, 1971). 6. (*) Boundaries derived from user-specified spending functions with interpolation. 7. Boundaries for early stopping for futility only. 8. Graphical and numerical representation of stopping boundaries on other scales besides the standard normal scale; e.g., boundaries expressed on the p-value scale, effect size scale, and conditional power scale. 9. Computing power for a fixed sample size. 10. Chart displaying the number of events as a function of time (for survival studies). Preface xv <<< Contents * Index >>> Preface New Interim Monitoring Features 1. Detailed worksheet for keeping track of interim monitoring data and providing input to the data monitoring committee. 2. Simultaneous view of up to four thumbnail charts on the interim monitoring worksheet. Currently one may select any four charts from, the stopping boundary chart, the error spending chart, the conditional power chart, the post-hoc power chart, and the repeated confidence intervals chart. You can also expand each thumbnail into a full-sized chart by a mouse click. 3. Computation of repeated confidence interval (Jennison and Turnbull, 2000) at each interim look. New Simulation Features 1. (*) Simulation of actual data generated from the underlying normal or binomial model instead of simulating the large sample distribution of the test statistic. 2. (*) Simulation on either the maximum sample size scale, or the maximum information scale. 3. (*) Simulation of the adaptive design due to Cui, Hung and Wang (1999). New User Interface Features 1. Full integration into the Microsoft Excel spreadsheet for easy generation and display of multiple designs, interim monitoring or simulation worksheets, and production of reports. 2. Save design details and interim monitoring results in Excel worksheets for easy electronic transmission to regulatory reviewers or to end-users. 3. Create custom calculators in Excel and save them with the East study workbook. Preface to East-2000 For completeness we repeat below the preface that we wrote for the East-2000 software when it was released in April, 2000. Background to the East-2000 Development The precursor to East-2000 was East-DOS an MS-DOS program with design and interim monitoring capabilities for normal, binomial and survival end points. When East-DOS was released in 1991 its user interface and statistical features were adequate to the needs of its customer base. MS-DOS was still the industry standard operating system for desktop computers. Group sequential designs were not as popular then as they are now. The role of data and safety monitoring boards (DSMB’s) in interim monitoring was just beginning to emerge. FDA and industry guidelines on the conduct of group sequential studies were in the early draft stage. Today the situation is very different. Since the publication of xvi Preface <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 the ICH-E9 guidance on clinical trials by the FDA and regulatory bodies in Europe and Japan, industry sponsors of phase-III clinical trials are more favorably inclined to the group sequential approach. For long-term mortality studies especially, interim monitoring by an independent DSMB is almost mandatory. As the popularity of group sequential studies has increased so has the demand for good software to design and monitor such studies. For several years now we have been flooded with requests from our old East-DOS customers to move away from the obsolete MS-DOS platform to Microsoft Windows and to expand the statistical capabilities of the software. We have responded by developing East-2000, a completely re-designed Windows package with unparalleled design, simulation and interim monitoring capabilities. What’s New in East-2000 The East-2000 software adds considerable functionality to its MS-DOS predecessor through a superior user interface and through the addition of new statistical methods. New User Interface East-2000 is developed on the Microsoft Windows platform. It supports a highly interactive user interface with ready access to stopping boundary charts, error spending function charts, power charts and the ability to present the results as reports in Microsoft Office. 1. Interactivity Designing a group sequential study is much more complex than designing a fixed sample study. The patient resources needed in a group sequential setting depend not only on the desired power and significance level, but also on how you will monitor the data. How many interim looks are you planning to take? What stopping boundary will you use at each interim look? Does the stopping boundary conform to how you’d like to spend the type-1 error at each look? Do you intend to stop early only for benefit, only for futility, or for both futility and benefit? In a survival study, how long are you prepared to follow the patients? These design and monitoring decisions have profound implications for the maximum sample size you must commit up-front to the study, the expected sample size under the null and alternative hypotheses, and the penalty you will have to pay in terms of the nominal p-value needed for declaring significance at the final look. To take full advantage of the group sequential methodology and consider the implications of potential decisions you must have highly interactive software available, both at the study design stage and at the interim monitoring stage. East-2000 is expressly developed with this interactivity in mind. Its intuitive form-fill-in graphical user interface can be an invaluable tool for visualizing how these design and monitoring decisions will affect the operating characteristics of the study. Preface xvii <<< Contents * Index >>> Preface 2. Charts By clicking the appropriate icon on the East toolbar you can view stopping boundary charts, study duration charts, error spending function charts, conditional and post-hoc power charts, and exit probability tables. The ease with which these charts can be turned on and off ensures that they will be well utilized both at the design and interim monitoring phases of the study. 3. Reports All worksheets, tables and charts produced by East-2000 can be copied and pasted into Microsoft Word, Excel and PowerPoint pages thus facilitating the creation of annotated reports describing the study design and interim monitoring schedule. New Statistical Methods East-2000 has greatly expanded the design and interim monitoring capabilities previously available in East-DOS. In addition East-2000 provides a simulation module for investigating how the power of a sequential design is affected by different assumptions about the magnitude of the treatment difference. Some highlights from these new capabilities are listed below. 1. Design Whereas East-DOS only provided design capabilities for normal, binomial and survival end points East-2000 makes it possible to design more general studies as well. This is achieved through the use of an inflation factor. The inflation factor determines the amount by which the sample size of a fixed sample study should be inflated so as to preserve its type-1 error in the presence of repeated hypothesis tests. It is thus possible to use any external software package to determine the fixed sample size of the study, input this fixed sample size into the design module of East-2000 and have the sample size inflated appropriately. These general capabilities are discussed in Chapter 8. 2. Interim Monitoring A major new feature in the interim monitoring module of East-2000 is the computation of adjusted p-values, confidence intervals and unbiased parameter estimates at the end of the sequential study. Another important feature is the ability to monitor the study on the Fisher information scale and thereby perform sample-size re-estimation if initial assumptions about the data generating process were incorrect. Chapter 9 provides an example of sample-size re-estimation for a binomial study in which the initial estimate of the response rate of the control drug was incorrect. 3. Simulation East-2000 can simulate an on-going clinical trial and keep track of the frequency with which a stopping boundary is crossed at each interim monitoring time-point. These simulations can be performed under the null hypothesis, the alternative hypothesis or any intermediate hypothesis thus permitting us to evaluate how the various early stopping probabilities are affected by miss-specifications in the magnitude of the treatment effect. xviii Preface <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Continuous Development of East East-2000 will undergo continuous development with major new releases expected on an annual basis and smaller improvements regularly posted on the Cytel web site. We will augment the software and implement new techniques based on the recommendations of the East Advisory Committee, and as the demand for them is expressed by our customers. The following items are already on the list: Easy links to fixed-sample design packages so as to extend the general methods in Chapter 8; Analytical and simulation tools to convert Fisher information into sample size and thereby facilitate the information based design and interim monitoring methods of Chapter 9, especially for sample-size re-estimation. We will build a forum for discussing East related issues on the Cytel web site, www.cytel.com. Interesting case studies, frequently asked questions, product news and other related matters will be posted regularly on this site. Roster of East Consultants Cytel offers consulting services to customers requiring assistance with study design, interim monitoring or representation on independent data and safety monitoring boards. Call us at 617-661-2011, or email sales@cytel.com, for further information on this service. Preface xix <<< Contents * Index >>> <<< Contents * Index >>> Table of Contents Preface 1 The East System ii 1 1 Introduction to Volume 1 2 2 Installing East 6 3 3 Getting Started 7 4 Data Editor 2 Continuous Endpoints 55 71 5 Introduction to Volume 2 73 6 Tutorial: Normal Endpoint 79 7 Normal Superiority One-Sample 91 8 Normal Noninferiority Paired-Sample 113 9 Normal Equivalence Paired-Sample 128 10 Normal Superiority Two-Sample 141 11 Nonparametric Superiority Two Sample 179 12 Normal Non-inferiority Two-Sample 185 13 Normal Equivalence Two-Sample 211 xxi <<< Contents * Index >>> Table of Contents xxii 14 Normal: Many Means 232 15 Multiple Comparison Procedures for Continuous Data 240 16 Multiple Endpoints-Gatekeeping Procedures 265 17 Continuous Endpoint: Multi-arm Multi-stage (MaMs) Designs 285 18 Two-Stage Multi-arm Designs using p-value combination 309 19 Normal Superiority Regression 332 3 342 Binomial and Categorical Endpoints 20 Introduction to Volume 3 344 21 Tutorial: Binomial Endpoint 350 22 Binomial Superiority One-Sample 363 23 Binomial Superiority Two-Sample 394 24 Binomial Non-Inferiority Two-Sample 474 25 Binomial Equivalence Two-Sample 535 26 Binomial Superiority n-Sample 549 27 Multiple Comparison Procedures for Discrete Data 577 28 Multiple Endpoints-Gatekeeping Procedures for Discrete Data 601 29 Two-Stage Multi-arm Designs using p-value combination 621 30 Binomial Superiority Regression 644 31 Agreement 649 32 Dose Escalation 658 <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 4 Exact Binomial Designs 708 33 Introduction to Volume 8 709 34 Binomial Superiority One-Sample – Exact 714 35 Binomial Superiority Two-Sample – Exact 736 36 Binomial Non-Inferiority Two-Sample – Exact 751 37 Binomial Equivalence Two-Sample – Exact 767 38 Binomial Simon’s Two-Stage Design 774 5 784 Poisson and Negative Binomial Endpoints 39 Introduction to Volume 4 785 40 Count Data One-Sample 790 41 Count Data Two-Samples 799 6 819 Time to Event Endpoints 42 Introduction to Volume 6 820 43 Tutorial: Survival Endpoint 826 44 Superiority Trials with Variable Follow-Up 865 45 Superiority Trials with Fixed Follow-Up 908 46 Non-Inferiority Trials Given Accrual Duration and Accrual Rates 934 47 Non-Inferiority Trials with Fixed Follow-Up 950 48 Superiority Trials Given Accrual Duration and Study Duration 966 49 Non Inferiority Trials Given Accrual Duration and Study Duration 984 xxiii <<< Contents * Index >>> Table of Contents 50 A Note on Specifying Dropout parameters in Survival Studies 994 51 Multiple Comparison Procedures for Survival Data 999 7 xxiv Adaptive Designs 1019 52 Introduction To Adaptive Features 1020 53 The Motivation for Adaptive Sample Size Changes 1027 54 The Cui, Hung and Wang Method 1055 55 The Chen, DeMets and Lan Method 1160 56 Muller and Schafer Method 1221 57 Conditional Power for Decision Making 1350 8 Special Topics 1387 58 Introduction to Volume 8 1388 59 Design and Monitoring of Maximum Information Studies 1393 60 Design and Interim Monitoring with General Endpoints 1423 61 Early Stopping for Futility 1434 62 Flexible Stopping Boundaries in East 1460 63 Confidence Interval Based Design 1493 64 Simulation in East 1552 65 Predictive Interval Plots 1575 66 Enrollment/Events Prediction - At Design Stage (By Simulation) 1609 67 Conditional Simulation 1658 <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 68 Enrollment/Events Prediction - Analysis 1675 69 Interfacing with East PROCs 1787 9 1795 Analysis 70 Introduction to Volume 9 1798 71 Tutorial: Analysis 1806 72 Analysis-Descriptive Statistics 1827 73 Analysis-Analytics 1837 74 Analysis-Plots 1854 75 Analysis-Normal Superiority One-Sample 1890 76 Analysis-Normal Noninferiority Paired-Sample 1901 77 Analysis-Normal Equivalence Paired-Sample 1907 78 Analysis-Normal Superiority Two-Sample 1913 79 Analysis-Normal Noninferiority Two-Sample 1926 80 Analysis-Normal Equivalence Two-Sample 1941 81 Analysis-Nonparametric Two-Sample 1956 82 Analysis-ANOVA 1976 83 Analysis-Regression Procedures 1987 84 Analysis-Multiple Comparison Procedures for Continuous Data 2024 85 Analysis-Multiple Endpoints for Continuous Data 2055 86 Analysis-Binomial Superiority One-Sample 2060 xxv <<< Contents * Index >>> Table of Contents xxvi 87 Analysis-Binomial Superiority Two-Sample 2069 88 Analysis-Binomial Noninferiority Two-Sample 2088 89 Analysis-Binomial Equivalence Two-Samples 2106 90 Analysis-Discrete: Many Proportions 2111 91 Analysis-Binary Regression Analysis 2131 92 Analysis- Multiple Comparison Procedures for Binary Data 2180 93 Analysis-Comparison of Multiple Comparison Procedures for Continuous Data- Analysis 2207 94 Analysis-Multiple Endpoints for Binary Data 2211 95 Analysis-Agreement 2216 96 Analysis-Survival Data 2219 97 Analysis-Multiple Comparison Procedures for Survival Data 2240 10 2267 Appendices A Introduction to Volume 10 2269 B Group Sequential Design in East 6 2271 C Interim Monitoring in East 6 2313 D Computing the Expected Number of Events 2334 E Generating Survival Simulations in EastSurv 2345 F Spending Functions Derived from Power Boundaries 2347 G The Recursive Integration Algorithm 2352 H Theory - Multiple Comparison Procedures 2353 <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 I Theory - Multiple Endpoint Procedures 2368 J Theory-Multi-arm Multi-stage Group Sequential Design 2374 K Theory - MultiArm Two Stage Designs Combining p-values 2394 L Technical Details - Predicted Interval Plots 2404 M Enrollment/Events Prediction - Theory 2409 N Dose Escalation - Theory 2412 O R Functions 2427 P East 5.x to East 6.4 Import Utility 2478 Q Technical Reference and Formulas: Single Look Designs 2484 R Technical Reference and Formulas: Analysis 2542 S Theory - Design - Binomial One-Sample Exact Test 2605 T Theory - Design - Binomial Paired-Sample Exact Test 2611 U Theory - Design - Simon’s Two-Stage Design 2614 V Theory-Design - Binomial Two-Sample Exact Tests 2617 W Classification Table 2638 X Glossary 2639 Y On validating the East Software 2657 Z List of East Beta Testers 2686 References 2695 Index 2719 xxvii <<< Contents * Index >>> Volume 1 The East System 1 Introduction to Volume 1 2 Installing East 6 3 3 Getting Started 4 Data Editor 7 55 2 <<< Contents * Index >>> 1 Introduction to Volume 1 This volume contains chapters which introduce you to East software system. Chapter 2 explains the hardware and operating system requirements and the installation procedures. It also explains the installation validation procedure. Chapter 3 is a tutorial for introducing you to East software quickly. You will learn the basic steps involved in getting in and out of the software, selecting various test options under any of the endpoints, designing a study, creating and comparing multiple designs, simulating and monitoring a study, invoking the graphics, saving your work in files, retrieving previously saved studies, obtaining on-line help and printing reports. It basically describes the menu structure and the menus available in East software, which is a menu driven system. Almost all features are accessed by making selections from the menus. Chapter 4 discusses the Data Editor menu of East 6 which allows you to create and manipulate the contents of your Case Data and Crossover Data. This menu is in use while working with the Analysis menu as well as with some other features like PIP or Conditional Simulations. These features are illustrated with the help of a simple worked example of a binary endpoint trial. 2 <<< Contents * Index >>> 2 2.1 System Requirements to run East 6 Installing East 6 The minimum hardware/operating system/software requirements for East 6 (standalone version of the software or the East client in case of concurrent version) are listed below: In case of Standalone version and East clients in case of concurrent version, the following operating systems are supported: – Windows 7 (32-bit / 64 bit) – Windows 8 (32-bit / 64 bit) – Windows 8.1 (32-bit / 64-bit) – Windows 10 (32-bit / 64-bit) – All of above for computers with English, European and Japanese versions of Windows. In case of concurrent user version, the following server operating systems are supported: – Windows 7 (32-bit / 64 bit) – Windows 8 (32-bit / 64 bit) – Windows 8.1 (32-bit / 64-bit) – Windows 10 (32-bit / 64-bit) – All of above for computers with English, European and Japanese versions of Windows – Windows Server 2008 (32-bit / 64-bit) – Windows Server 2012 – Citrix ∗ ∗ ∗ ∗ XenApp 6.0 on Windows 2008 XenApp 6.5 on Windows 2008 XenApp 7.6 on Windows 2008 XenApp 7.6 on Windows 2012 Further, East has the following hardware/software requirements: – CPU -1 GHz or faster x86 (32 bit) or x64 (64 bit) processor – Memory - Minimum 1 GB of RAM – Hard Drive - Minimum 5 GB of free hard disk space – Display - 1024 x 768 or higher resolution 2.1 System Requirements to run East 6 3 <<< Contents * Index >>> 2 Installing East 6 – Microsoft .Net Framework 4.0 Full (this will be installed as a part of prerequisites if your computer does not have it) – Microsoft Visual C++ 2010 SP1 (this will be installed as a part of prerequisites if your computer does not have it) Installer 4.5 – Internet Explorer 9.0 or above – A stable internet connection is required during installation so that prerequisites like the – East is compatible and supported with R versions between 2.9.0 to 3.2.3. East may or may not work well with later versions of R. If R is not installed, the ability to include custom R functions to modify specific simulation steps will not be available. The R integration feature is an Add-on to East and is required only to integrate custom R functions with East. But note that this feature doesn’t affect any of the core functionalities of East. 2.2 Other Requirements Users with Windows 7 or above: East uses the font Verdana. Generally Verdana is a part of the default fonts installed by Windows. However, sometimes this font may not be available on some computers, especially if a language other than English has been selected. In such cases, the default fonts need to be restored. To restore fonts, go to Control Panel → Fonts → Font settings. Click the button “Restore default font settings”. This will restore all default fonts including Verdana. Note that this must be done before the first use of East. Users with Windows 8.1 On some computers with Windows 8.1, problems may be observed while uninstalling East, especially if the user has upgraded from the previous version using a patch. This is because of a security update KB2962872 (MS14-037) released by Microsoft for Internet Explorer versions 6, 7, 8, 9, 10 and 11. Microsoft has fixed this issue and released another security update KB2976627 (MS14-051) for Internet Explorer which replaces the old problematic update. So it is recommended that users who are affected by this issue install security update KB2976627 (MS14-051) on their computers. 2.3 Installation IMPORTANT: Please follow the steps below if you are installing a standalone/single user version of East. If you are installing a concurrent version, please refer to the document ”Cytel License Manager Setup.pdf” for detailed installation instructions. 1. Uninstalling Previous VersionsIf any version (including a beta or demo) of East 6 is currently installed on your PC, please uninstall it completely or else the 4 2.3 Installation <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 installation of the current version will not proceed correctly. To uninstall the earlier version of East 6, go to the Start Menu and select: All Programs→ Cytel Architect → East 6.x→ Uninstall Or All Programs→ East Architect → Uninstall East Architect depending upon the version installed on your computer. 2. Installing Current Version You will need to be an administrator of your computer in order to perform the following steps. If you do not have administrator privileges on your computer, please contact your system administrator / IT. In order to install East, please follow these steps: (a) If you received an email containing a link for downloading the setup, please follow the link and download the setup. This will be a zipped folder. Unzip this folder completely. (b) In the setup folder, locate the program setup.exe and double-click on it. Follow the instructions on the subsequent windows. 2.4 Installation Qualification and Operational Qualification To perform the installation and operational qualification of East 6, go to the Start Menu and select All Programs→ Cytel Architect → East 6.4→ Installation Qualification (IQ). You will be presented with the following dialog box. It will take a few minutes to complete. At the end of the process, the status of the installation qualification will appear. Press Enter (or any other key) to open the 2.4 Installation Qualification and Operational Qualification 5 <<< Contents * Index >>> 2 Installing East 6 validation log. Similarly, one can run the Operational Qualification (OQ). If the validation is successful, the log file will contain a detailed list of all files installed by East on your computer and other details related to IQ and OQ. If the validation fails, the validation log file will contain detailed error messages. Please contact your system administrator with the log file. IQ (Installation Qualification) script: This script verifies whether the software is completely and correctly installed on the system or not. It does this by checking whether all the software components, XML and DLL files are in place. OQ (Operational Qualification) script: This script runs some representative test cases covering all the major modules/features of East and compares the runtime results to the benchmarks (benchmarks are validated results stored internally in the OQ program). It ensures the quality and consistency of the results in the new version. Manual Examples: In addition to IQ/OQ, if more testing is to be done, refer to the user manual and reproduce the results for some representative examples/modules. The flow of examples is easy to follow. Some examples in the manual require additional files (datasets) which are available to you in the Samples folder. Validation Chapter: There is a chapter in this manual dedicated to describe how every feature was validated within Cytel. Refer to the appendix chapter Y on ”Validating East Software”. This covers validation strategies for all the features available in East 6. 6 2.4 Installation Qualification and Operational Qualification <<< Contents * Index >>> 3 Getting Started East has evolved over the past several years with MS Excel R as the user interface. The East on MS Excel R did not integrate directly with any other Cytel products. Under the Architect platform, East is expected to coexist and integrate seamlessly with other Cytel products such as SiZ, and Compass. Architect is a common platform designed to support various Cytel products. It provides a user-friendly, Windows-standard graphical environment, consisting of tabs, icons, and dialog boxes, with which you can design, simulate and analyze. Throughout the user manual, this product is referred to as East 6. One major advantage of East 6 is the facility for creating multiple designs. This is achieved by giving multiple inputs of the parameters as either comma separated, or in a range such as (a:b:c) with a as the initial value, b as the last value and c as the step size. If you give multiple values for more than one parameter, East creates all possible combinations of the input parameters. This is an immense advancement over earlier versions of East, where you had to create one design at a time. Furthermore, one could not compare different types of designs (e.g., superiority vs. noninferiority designs). Similarly, graphical comparison of designs with different numbers of looks was difficult with earlier versions of East. All such comparisons are readily available in East 6. Another new feature is the option to add assumptions for accruals and dropouts at the design stage. Previously, this was available only for survival endpoint trials, but has been extended to continuous and discrete endpoints in East 6. Information about accrual rates, response lag, and dropouts can be given whether designing or simulating a trial. This makes more realistic, end-to-end design and simulation of a trial possible. Section 3.6 discusses all the above features under the Design menu with the help of a case study, CAPTURE. Simulations help to develop better insight into the operating characteristic of a design. In East 6, the simulation module has now been enhanced to allow fixed or random allocation to treatment and control, and different sample sizes. Such options were not possible with earlier versions of East. Section 3.7 briefly describes the Simulations in East 6. Section 3.8 discusses capability to flexibly monitoring a group sequential trial using the Interim Monitoring feature of East 6. We have also provided powerful data editors to create, view, and modify data. A wide variety of statistical tests are now a part of East 6, which enables you to conduct 7 <<< Contents * Index >>> 3 Getting Started statistical analysis of interim data for continuous, discrete and time to event endpoints. Sections 3.4 and 3.5 briefly describes the Data Editor and Analysis menus in East 6. The purpose of this chapter is to familiarize you with the East 6 user interface. 3.1 Workflow in East In this section, the architecture of East 6 is explained. The logical workflow in which the different parts of the user interface co-ordinate with each other is discussed. The basic structure of the user interface is depicted in the following diagram. Besides the top Ribbon, there are four main windows in East 6 namely, (starting from left), the Library pane, the Input / Output window, the Output Preview window and the Help pane. Note that both, the Library and the Help Pane can be auto-hidden temporarily or throughout the session, allowing the other windows to occupy larger area on the screen for display. Initially, Library shows only the Root node. As you work with East, several nodes corresponding to designs, simulation scenarios, data sets and related analyses can be managed using this panel. Various nodes for outputs and plots are created in the Library, facilitating work on multiple scenarios at a time. The width of the Library window can be adjusted for better readability. The central part of the user interface, the Input / Output window, is the main work area where you can8 3.1 Workflow in East <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Enter input parameters for design computation create and compare multiple designs, view plots Simulate a design under different scenarios Perform interim analysis on a group sequential design look by look and view the results, receive decisions such as stopping or continuing during the execution of a trial Open a data on which you want to perform analysis, enter new data, view outputs, prepare a report etc. This is the area where the user interacts with the product most frequently. The Output Preview window compiles several outputs together in a grid like structure where each row is either a design or simulation run. This area is in use only when working with Design or Simulations. When the Compute or Simulate button is clicked, all requested design or simulation results are computed and are listed row wise in the Output Preview window: By clicking different rows of interest while simultaneously holding the Ctrl key, either a single or multiple designs can be displayed in the Output Summary in vertical 3.1 Workflow in East 9 <<< Contents * Index >>> 3 Getting Started manner or side-by-side comparison can be done. Note that the active window and the Output Preview can be minimized, maximized, or resized. If you want to focus on the Output Summary, click the icon in the top-right corner of the main window. The Output will be maximized as shown below: Any of the designs/simulations in the Output Preview window can be saved in the Library, as depicted in the following workflow diagram. 10 3.1 Workflow in East <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Double click any of these nodes and the detailed output of the design will be displayed. This will include all relevant input and output information. Right clicking any design node in the Library will allow you to perform various operations on the design such as interim monitoring and simulation. The Help pane displays the context sensitive help for the control currently under the focus. This help is available for all the controls in the Input / Output window. This pane also displays the design specific help which discusses the purpose of the selected test, the published literature referred while developing it and the chapter/section numbers of this user manual to quickly look-up for more details. This pane can be hidden or locked by clicking the pin in its corner. All the windows and features mentioned above are described in detail with the help of an illustration in the subsequent sections of this chapter. 3.2 A Quick Overview of User Interface Almost all the functionalities of East 6 are invoked by selecting appropriate menu items and icons from the Ribbon. The interface consists of four windows as described 3.2 A Quick Overview of User Interface 11 <<< Contents * Index >>> 3 Getting Started in the previous section and four major menu items. These menu items are: Home. This menu contains typical file-related Windows sub-menus. The Help sub-menu provides access to this manual. Data Editor. This menu will be available once a data set is open, providing several sub-menus used to create, manage and transform data. Design. This menu provides a sub-menu for each of the study designs which can be created using East 6. The study designs are grouped according to nature of the response. The tasks like Simulations and Interim Monitoring are available for almost all the study designs under this menu. Analysis. This menu provides a sub-menu for each of the analysis procedure that can be carried out in East 6. The tests are grouped according to the nature of the response. There are also options for basic statistics and plots. 3.3 Home Menu 3.3.1 File 3.3.2 Importing workbooks from East5.4 3.3.3 Settings 3.3.4 View 3.3.5 Window 3.3.6 Help The Home menu contains icons that are logically grouped under File, Settings, View, Window and Help. These icons can be used for specific tasks. 3.3.1 File Click this icon to create new case data or crossover data. A new workbook or log can also be created. Click this icon to open a saved data set, workbook, or log file. Click this icon to import external files created by other programs. Click this icon to export files in various formats. Click this icon to save the current files or workbooks. Click this icon to save a file or workbook with different name. 12 3.3 Home Menu – 3.3.2 Importing workbooks from East5.4 <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 3.3.2 Importing workbooks from East5.4 East allows the conversion of workbooks previously created in East 5.4 (and above) to be imported into East 6 for further development. In order to open a workbook with the .es5 extension given by previous versions of East, it must first be converted to a file with the .cywx extension that will be recognized by East 6. This is easily accomplished through the Covert Old Workbook utility. Click the to see the location of this utility. icon under Home menu From the Start Menu and select: All Programs→ Cytel Architect → East 6.x→ Convert Old Workbook We can see the following window which accepts East5.4 workbook as input and outputs a workbook of East6. Click the Browse buttons to choose the East 5.4 file to 3.3 Home Menu – 3.3.2 Importing workbooks from East5.4 13 <<< Contents * Index >>> 3 Getting Started be converted and the file to be saved with .cywx extension of East 6 version. To start the conversion, click Convert Workbook. Once complete, the file can be opened as a workbook in East 6 as shown below: In order to convert files from East 5.3 or older versions, open the file in East 5.4, save it with a new name say with a suffix East5.4 and then convert this 5.4 file to 6.x as explained above. To get East 5.4 or any help regarding file conversion, contact Cytel at support@cytel.com. 14 3.3 Home Menu – 3.3.3 Settings <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 3.3.3 Settings Click the icon in the Home menu to adjust default values in East 6. The options provided in the Display Precision tab are used to set the decimal places of numerical quantities. The settings indicated here will be applicable to all tests in East 6 under the Design and Analysis menus. All these numerical quantities are grouped in different categories depending upon their usage. For example, all the average and expected sample sizes computed at simulation or design stage are grouped together under the category ”Expected Sample Size”. So to view any of these quantities with greater or lesser precision, select the corresponding category and change the decimal places to any value between 0 to 9. The General tab has the provision of adjusting the paths for storing workbooks, files, and temporary files. These paths will remain throughout the current and future sessions even after East is closed. This is the place where we need to specify the installation directory of the R software in order to use the feature of R Integration in East6. 3.3 Home Menu – 3.3.3 Settings 15 <<< Contents * Index >>> 3 Getting Started The Design Defaults is where the user can change the settings for trial design: Under the Common tab, default values can be set for input design parameters. You can set up the default choices for the design type, computation type, test type and the default values for type-I error, power, sample size and allocation ratio. When a new design is invoked, the input window will show these default choices. Time Limit for Exact Computation This time limit is applicable only to exact designs and charts. Exact methods are computationally intensive and can easily consume several hours of computation time if the likely sample sizes are very large. You can set the maximum time available for any exact test in terms of minutes. If the time limit is reached, the test is terminated and no exact results are provided. Minimum and default value is 5 minutes. Type I Error for MCP If user has selected 2-sided test as default in global settings, then any MCP will use half of the alpha from settings as default since MCP is always a 1-sided test. Sample Size Rounding Notice that by default, East displays the integer sample size (events) by rounding up the actual number computed by the East algorithm. In this case, the look-by-look sample size is rounded off to the nearest integer. One can also see the original floating point sample size by selecting the option ”Do not round sample size/events”. 16 3.3 Home Menu – 3.3.3 Settings <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Under the Group Sequential tab, defaults are set for boundary information. When a new design is invoked, input fields will contain these specified defaults. We can also set the option to view the Boundary Crossing Probabilities in the detailed output. It can be either Incremental or Cumulative. Simulation Defaults is where we can change the settings for simulation: If the checkbox for ”Save summary statistics for every simulation” is checked, then East simulations will by default save the per simulation summary data for all the 3.3 Home Menu – 3.3.3 Settings 17 <<< Contents * Index >>> 3 Getting Started simulations in the form of a case data. If the checkbox for ”Suppress All Intermediate Output” is checked, the intermediate simulation output window will be always suppressed and you will be directed to the Output Preview area. The Chart Settings allows defaults to be set for the following quantities on East6 charts: We suggest that you do not alter the defaults until you are quite familiar with the software. 3.3.4 View The View submenu consists of enabling or disabling the Help and Library panes by (un)checking the respective check boxes. 3.3.5 Window The Window submenu contains an Arrange and Switch option. This provides the 18 3.3 Home Menu – 3.3.5 Window <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 ability to view different standard arrangements of available windows (Design Input Output, Log, Details, charts and plots) and to switch the focus from one window to another. 3.3.6 Help The Help group provides the following ways to access the East6 documentation: User Manual: Invoke the current East 6 user manual. Tutorial: Invoke the available East 6 tutorials. About East 6: Displays the current version and license information for the installed software. Update License: Use this utility to update the license file which you will be receiving from Cytel. 3.4 Data Editor Menu All submenus under the Data Editor menu are enabled once a new or existing data set is open. The Open command under the Home menu shows the list of items that can be opened: Suppose East 6 is installed in the directory C:/Program Files (x86)/Cytel/Cytel 3.4 Data Editor Menu 19 <<< Contents * Index >>> 3 Getting Started Architect/East 6.4 on your machine. You can find sample datasets in the Samples under this directory. Suppose, we open the file named Toxic from the Samples folder. The data is displayed in the main window under the Data Editor menu as shown: Here the columns represent the variable and the rows are the different records. Placing the cursor on a cell containing data will enable all submenus under the Data Editor menu. The submenus are grouped into three sections, Variable, Data and Edit. Here we can modify and transform variables, perform operations on case data, and edit a case or variable in the data. The icons in the Variable group are: Creates a new variable at the current column position. 20 3.4 Data Editor Menu <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Renames the current variable. Modifies the currently selected variable. Transforms the currently selected variable. Numerous algebraic, statistical functions are available which can be used to transform the variable. This feature can also be used to generate a data randomly from distributions such as Normal, Uniform, Chi-Square etc. The following functions are available in the Data group: Sorts case data in ascending or descending order. Filter cases from the case data as per specified criteria. Converts case data to crossover data. Converts crossover data to case data. Displays case data contents to the log window. For the Edit group the following options are available: Selects a case or variable. Inserts a case or variable. Deletes a case or variable. Navigates to a specified case. 3.5 Analysis Menu The Analysis menu allows access to analytical tests which can be performed in East 6. 3.5.1 Basic Plots 3.5.2 Crossover Plots The tests available in the Analysis menus are grouped according to the nature of the response variable. Click an icon to select the test available in a drop down menu. 3.5 Analysis Menu 21 <<< Contents * Index >>> 3 Getting Started Basic Statistics - This part contains tests to compute basic statistics and frequency distribution from a dataset. Continuous - This part groups analysis tests for continuous response. Discrete - This part groups all analysis tests for discrete response. Events - This group contains tests for time to event outcomes Predict - This group contains different procedures to predict the future course of the trial given the current subject level data or summary data. Refer to chapter 68 for more details. 3.5.1 Basic Plots Bar and pie charts for categorical data. Plots such as area, bubble, scatter plot and normality plots for continuous data. Plots related to frequency distributions such as histogram, stem and leaf plots, cumulative plots. 3.5.2 Crossover Plots This menu provides plots applicable to 2x2 crossover data. Subject plots. Summary plots. Diagnostic plots. All the tests under Analysis menu are discussed in detail under Volume 8 of this manual. 22 3.5 Analysis Menu – 3.5.2 Crossover Plots <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 3.6 Design Menu 3.6.1 Design Input-Output Window 3.6.2 Creating Multiple Designs 3.6.3 Filter Designs 3.6.4 What is a Workbook? 3.6.5 Group Sequential Design for the CAPTURE Trial 3.6.6 Adding a Futility Boundary 3.6.7 Accrual Dropout Information 3.6.8 Output Details This section discusses with the help of the CAPTURE trial the various East features mentioned so far in this chapter. This was a randomized clinical trial of placebo versus Abciximab for patients with refractory unstable angina. Results from this trial were presented at a workshop on clinical trial data monitoring committees Randomised placebo-controlled trial of abciximab before and during coronary intervention in refractory unstable angina: the CAPTURE study, THE LANCET: Vol 349 - May 17, 1997. Let us design, simulate and monitor the CAPTURE trial using East6. The goal of this study is to test the null hypothesis, H0 , that the Abciximab and placebo arms both have an event rate of 15%, versus the alternative hypothesis, H1 , that Abciximab reduces the event rate by 5%, from 15% to 10%. It is desired to have a 2-Sided test with three looks at the data, a type-1 error, α as 0.05 and a power, (1 − β) as 0.8. We shall start with a fixed sample design and then extend it to group sequential design. In this process, we demonstrate the useful features of Architect one by one. To begin, click Design menu, then Two Samples on the Discrete group, and then click Difference of Proportions. Below the top ribbon, there are three windows: the Input/Output, the Library, and the Help. All these windows are explained in section 3.1 on Workflow of East. Both the Library and the Help can be hidden temporarily or throughout the session. The 3.6 Design Menu 23 <<< Contents * Index >>> 3 Getting Started input window for Difference of Proportions test appears as shown below: The design specific help can be accessed by clicking the design. This help is available for all the designs in East6. 3.6.1 icon after invoking a Design Input-Output Window This window is used to enter various design specific input parameters in the input fields and drop-down options available. Let us enter the following inputs for the CAPTURE Trial and create a fixed sample design. Test Type as 2-Sided, Type I Error as 0.05, Power as 0.8, πc as 0.15 and πt as 0.1. On clicking Compute button, a new row for this design gets added in the Output Preview window. Select this row and click the icon. Rename this design as CAPT-FSD to indicate that it is a fixed sample design for the CAPTURE trial. 24 3.6 Design Menu – 3.6.2 Creating Multiple Designs <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 3.6.2 Creating Multiple Designs Before finalizing on any particular study design, the statisticians might want to assess the operating characteristics of the trial under different conditions and over a range of parameter values. For example, when we are working on time-to-event trials, we want to see the effect of different values of hazard ratio on the overall power and duration of the study. East makes it easy to rapidly generate and assess multiple options, to perform sensitivity analysis, and select the optimal plan. We can enter multiple values for one or more input parameters and East creates designs for all possible combinations. These designs can then be compared in a tabular as well as graphical manner. Following are the three ways in which we can enter the multiple values: Comma-separated values: (0.8, 0.9, 0.95) Colon-separated range of values: (0.8 to 0.9 in steps of 0.05 can be entered as 0.8:0.9:0.05) Combined values: (0.7, 0.8, 0.85: 0.95: 0.01) Multiple values can be entered only in the cells with pink background color. Now suppose, we want to create designs for two values of Type I Error, three values of Power and four values of πt : 0.1, 0.2 : 0.3 : 0.05. Without changing other parameters, let us enter these ranges for the three parameters as shown below: On clicking Compute button, East will create 2 × 3 × 4 = 24 designs for the CAPTURE Trial. To view all the designs in the Output Preview window, maximize it 3.6 Design Menu – 3.6.2 Creating Multiple Designs 25 <<< Contents * Index >>> 3 Getting Started from the right-hand top. 3.6.3 Filter Designs Suppose we are interested in designs with some specific input/output values, we can set up a criterion by using Filter functionality by clicking the icon available on the top right corner of Output Preview window. For example, we want to see designs with Sample Size less than 1000 and Type I Error 26 3.6 Design Menu – 3.6.3 Filter Designs <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 equal to 0.05. The qualified designs appear in the Output Preview window as shown below: The filter criteria can be edited or cleared by again clicking the Filter icon. On clearing the above criterion, all the 24 designs are displayed back. Before we proceed, let us first delete these recently created 24 designs, leaving behind CAPT-FSD and then minimize the Output Preview window from the right-hand top. One or more rows in the can be deleted by selecting them and clicking the Use the Ctrl key and mouse click to select specific rows. Use the Shift key and mouse click to select all the rows in the range. Use the combination Ctrl + A to select all the rows. The resulting Output Preview is shown below: icon. It is advisable to save this design or any work which you would like to refer in future in an East Workbook. The next subsection briefly discusses about use of workbooks. 3.6 Design Menu – 3.6.4 What is a Workbook? 27 <<< Contents * Index >>> 3 Getting Started 3.6.4 What is a Workbook? A Workbook is a storage construct managed by East for holding different types of generated outputs. The user designs a trial, simulates it, monitors it at several interim looks, conducts certain analyses, draws plots, etc. All of these outputs can be kept together in a workbook which can be saved and retrieved for further development when required. . Note that a single workbook can also contain outputs from more than one design. Select the design CAPT-FSD in the Output Preview window and click the icon. When a design is saved to the library for the first time, East automatically creates a workbook named Wbk1 which can be renamed by right-clicking the node. Let us name it as CAPTURE. Now this is still a temporary storage which means if we exit out of East without saving it permanently, the workbook will not be available in future. Note that Workbooks are not saved automatically on your computer; they are to be saved by either right-clicking the node in the Library and selecting Save or 28 3.6 Design Menu – 3.6.4 What is a Workbook? <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 clicking the icon. In addition, the user will be prompted to save contents of the Library while closing East 6. Many a times, we wish to add some specific comments to a design or any other output window. These comments are useful for future references. One can do that by attaching a Note to any node by selecting it and clicking on the icon. A small window will pop up where comments can be stored. Once saved, a yellow icon against the design node will indicate the presence of a note. If you want to view or remove the note, right click the design node, select Note, and clear the contents. The tabs available on the status bar at the bottom left of the screen can be used to navigate between the active windows of East. 3.6 Design Menu – 3.6.4 What is a Workbook? 29 <<< Contents * Index >>> 3 Getting Started For example, if you wish to return to the design inputs, click the Input button which will take you the latest Input window you worked with. As we proceed further, more such tabs will appear enabling us to navigate from one screen of East to another. 3.6.5 Group Sequential Design for the CAPTURE Trial icon in the Library to modify the Select the design CAPT-FSD and click the design. On clicking this icon, following message will pop up. Click ”Yes” to continue. Let us extend this fixed sample design to a group sequential design by changing the Number of Looks from 1 to 3. It means that we are planning to take 2 interim looks and one final look at the data while monitoring the study. An additional tab named Boundary is added which allows us to enter inputs related to 30 3.6 Design Menu – 3.6.5 Group Sequential Design for the CAPTURE Trial <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 the boundary family, look spacing and error spending functions. Let the boundary family be Spending Functions and the alpha spending function, Lan-DeMets with the parameter OF. Click on Compute to create the three-look design and rename it as CAPT-GSD. As you go on creating multiple designs in East, the output preview area can become too busy to manage. Thus, you can also select the designs you are interested in, save them in the workbook and then rename them appropriately. The Output Preview window now looks as shown below: Notice that CAPT-GSD requires 18 subjects more than CAPT-FSD to achieve 80% power. This view gives us the horizontal comparison of two designs. Save the design CAPT-GSD in the workbook. One can also compare these designs in a vertical manner. Select the two designs by clicking on one of them, pressing Ctrl and then clicking on the other one. Next, click 3.6 Design Menu – 3.6.5 Group Sequential Design for the CAPTURE Trial 31 <<< Contents * Index >>> 3 Getting Started the icon. This is the Output Summary window of East which compares the two designs vertically. We can easily copy this display from East to MS Excel and modify/save it further in any other format. To do that, right click anywhere in the Output Summary window, select Copy All option and paste the copied data in an Excel workbook. The table gets pasted as two formatted columns. Let us go back to the input window of CAPT-GSD (select the design and click the icon) and activate the Boundary tab. By default, the boundary values in the table at the bottom of this tab are displayed on Z Scale. We can also view these boundaries on other scales such as: Score Scale, δ Scale and p-value Scale. Let us view the efficacy boundaries for CAPT-GSD on a p-value scale. 32 3.6 Design Menu – 3.6.5 Group Sequential Design for the CAPTURE Trial <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 The final p-value required to attain statistical significance at level 0.05 is 0.0463. This is sometimes regarded as the penalty for taking two interim looks at the data. Also observe that, although the maximum sample size for this design is 1384, the expected sample size under alternative that δ = -0.05 is much less, 1183. However, there is very little saving under the null hypothesis that δ = 0. The sample size in this case is 1378. Therefore, it might be beneficial to consider replacing the lower efficacy boundary by a futility boundary. Also, sometimes we might wish to stop a trial early because the effect size observed at an interim analysis is too small to warrant continuation. This can be achieved by using β-spending function and introducing a futility boundary at the design stage. 3.6.6 Adding a Futility Boundary Select the design CAPT-GSD and click icon to edit it. Change the Test Type from 2-Sided to 1-Sided and also the Type I Error from 0.05 to 0.025. Go to Boundary tab and add the futility boundaries by using γ (-2) spending function. 3.6 Design Menu – 3.6.6 Adding a Futility Boundary 33 <<< Contents * Index >>> 3 Getting Started Before we create this design, we can see the error spending chart and the boundaries chart for the CAPTURE trial with efficacy as well as futility boundaries. This gives us a way to explore different boundary families and error spending functions and deciding icon to upon the desired combination before even creating a design. Click the view the Error Spending Chart. 34 3.6 Design Menu – 3.6.6 Adding a Futility Boundary <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Click the icon to view the Boundaries Chart. The shaded region in light pink corresponds to the critical region for futility and the one in light blue corresponds to the critical region for efficacy. We can also view the boundaries on conditional power scale in presence of a futility boundary. Select the entry named cp deltahat Scale from the dropdown Boundary 3.6 Design Menu – 3.6.6 Adding a Futility Boundary 35 <<< Contents * Index >>> 3 Getting Started Scale. The chart is be updated and the boundaries are displayed on CP scale. Zooming the Charts To zoom into any area of the chart, click and drag the mouse over that area. After clicking Zoom button, click on the plot at the top left corner of the area you want to magnify, keep the mouse button pressed and drag the mouse over the desired area. This draws a rectangle around that area. Now leave the mouse button and East magnifies the 36 3.6 Design Menu – 3.6.6 Adding a Futility Boundary <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 selected area. You can keep doing this to zoom in further. The magnified chart appears as below: Note that after zooming, the Zoom button changes to Reset. When you click it, the plot 3.6 Design Menu – 3.6.6 Adding a Futility Boundary 37 <<< Contents * Index >>> 3 Getting Started is reset back to the original shape. Let us compute the third design for the CAPTURE trial and rename it as CAPT-GSD-EffFut. Save it in the workbook. Click the icon to compare all the three designs side-by-side as explained above. Along with the side-by-side comparison, let us compare the two group sequential designs graphically. Press Ctrl and click on CAPT-FSD. Notice that the remaining two designs are still highlighted which means they are selected and CAPT-FSD is unselected. Now click the icon and select Stopping Boundaries to view the graphical comparison of boundaries of the two designs. 38 3.6 Design Menu – 3.6.6 Adding a Futility Boundary <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 As we can see, the design CAPT-GSD uses an upper efficacy boundary whereas CAPT-GSD-EffFut uses an upper futility boundary. We can turn ON and OFF the boundaries by checking the boxes available in the legends. Before we proceed, let us save this third design in the workbook. We can also create several workbooks in the Library and then compare multiple designs across the workbooks. This is an advantage of working with workbooks in East6. 3.6.7 Accrual / Dropout option for Continuous and Discrete Endpoints In the earlier versions of East, the option to incorporate the accrual and dropout information was available only for tests under time-to-event/survival endpoint. East 6 now provides this option for almost all the tests under Continuous and Discrete endpoints as well. Let us see the use it in CAPTURE trial. Select the design CAPT-GSD-EffFut from the Library and edit it to add the accrual-dropout information. From the Design Parameters tab, add the option Accrual/Dropout Info by clicking on Include Options button. Let the accrual rate be 12 subjects/week. Suppose we expect the response to be observed after 4 weeks from the recruitment. Let us create a design by first assuming that there will not be any dropouts during the course of trial. We will then introduce some dropouts and compare the two designs. After entering the above inputs, click on 3.6 Design Menu – 3.6.7 Accrual Dropout Information 39 <<< Contents * Index >>> 3 Getting Started the icon to see how the subjects will accrue and complete the study. Close the chart, create the design by clicking the Compute button, save it in the workbook CAPTURE and rename it as CAPT-GSD-NoDrp to indicate that there are no dropouts in this design. Notice that in this design, the maximum sample size and maximum number of completers is same as there is no dropout. Let us now introduce dropouts. Suppose there is a 5% chance of a subject dropping out 40 3.6 Design Menu – 3.6.7 Accrual Dropout Information <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 of the trial. Notice that the two lines are not parallel anymore because of the presence of dropouts. Click Compute button to create this design. Save the design in the workbook CAPTURE and rename it as CAPT-GSD-Drp. Compare this design with CAPT-GSD-NoDrp by selecting the two designs and clicking on icon Notice the inflation in sample size for CAPT-GSD-Drp. This design will require additional 80 subjects to obtain data on 1455 subjects (1455 completers). Let us now compare all the five designs saved in the workbook. Select them all 3.6 Design Menu – 3.6.7 Accrual Dropout Information 41 <<< Contents * Index >>> 3 Getting Started together and click the icon. The resulting screen will look as shown below: We can see additional quantities in the design CAPT-GSD-Drp. These correspond to 42 3.6 Design Menu – 3.6.7 Accrual Dropout Information <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 the information on total number of completers and the study duration which are computed by taking into account the non-zero response lag and possibility of dropouts. Also notice the trend in Maximum Sample Size across all these designs. We can see that it increases as more constraints are added to the study. But if we see values of Expected Sample Size under null and alternative, there is a significant potential saving. You can also save this output summary window comparing three designs in the library by clicking the 3.6.8 icon Output Details In the earlier part of this chapter, we have seen the design output at two different places: Output Preview (horizontal view) and Output Summary (vertical view). The final step in the East6 design workflow is to see the detailed output in the form of an HTML file. Select the design CAPT-GSD-Drp from the Library and click the icon. Alternatively, one can also double-click on any of the nodes in the Library to see the 3.6 Design Menu – 3.6.8 Output Details 43 <<< Contents * Index >>> 3 Getting Started details. The output details are broadly divided into two panels. The left panel consists of all the input parameters and the right panel consists of all the design output quantities in the tabular format. These tables will be explained in detail in subsequent chapters of this manual. Click the Save icon to save all the work done so far. This is the end of introduction to the Design Menu. The next section discusses another very useful feature called Simulations. 3.7 Simulations in East6 A simulation is a very useful way to perform sensitivity analysis of the design assumptions. For instance - What happens to the power of the study when the δ value is not the same as specified at the design stage? We will now simulate design CAPT-GSD-Drp. Select this design from the library and click the icon. Alternatively, you can right-click this design in the Library, and select Simulate. 44 3.7 Simulations in East6 <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 The default view of input window for simulations is as shown below: Notice that the value of δ on the Response Generation tab is -0.05. This corresponds to the difference in proportions under the alternative hypothesis. You may either keep this default value for the simulation or change it if you wish to simulate the study with a different value of δ. Let us run some simulations by changing the value of δ. We will run simulations over a range of values for πt , say, 0.1,0.125 and 0.14. Enter the values as shown below: Before running simulations, let us have a quick look at the Simulation Control tab where we can change the number of simulations, save the simulation data in East format or in a csv format and some more useful things. You can manipulate the simulations with the following actions: Enter the number of simulations you wish to run in the ”Number of Simulations” field. The default is 10000 simulations. Increase/ Decrease the ”Refresh Frequency” field to speed up or slow down the simulations. The default is to refresh the screen after every 1000 simulations. Set the Random Number Seed to Clock or Fixed. The default is Clock. Select the checkbox of ”Suppress All Intermediate Output” to suppress the intermediate output. 3.7 Simulations in East6 45 <<< Contents * Index >>> 3 Getting Started To see the intermediate results after a specific number of simulations, select the checkbox of ”Pause after Refresh” and enter the refresh frequency accordingly. The checkbox of ”Stop At End” is selected by default to display the summary results at the end of all the simulations a corresponding item gets created in the Output Preview window. One can uncheck this box and save the simulation node directly in the Output Preview window. One can also save the summary statistics for each simulation run and the subject level simulated data in the form of a Case Data or a Comma Separated File. Select the checkboxes accordingly and provide the file names and paths while using the CSV option. If you are saving the data as Case Data, the corresponding data file will be associated with the simulation node. It can be accessed by saving the simulation node from Output Preview to the workbook in Library. For now, let us keep the Simulation Control tab as shown below: Click the Simulate button on right hand side bottom to run the simulations. Three scenarios corresponding to three values of πt are simulated one after the other and in the end, the following output window appears. This is the Simulation Intermediate Output window which shows the results from last simulated scenario. The two plots 46 3.7 Simulations in East6 <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 on this window are useful to see how the study performed over 10000 simulations. Click the Close button on this intermediate window which takes us to the Output Preview window. Save these three simulation rows in the workbook CAPTURE. Since we simulated the design CAPT-GSD-Drp, the three simulation nodes get saved as child nodes of this design. This is the hierarchy which is followed throughout the East6 software. A full picture of the CAPTURE trial design with accrual/dropout information and its simulations can be viewed easily. Select the three simulation nodes and the parent 3.7 Simulations in East6 47 <<< Contents * Index >>> 3 Getting Started design node in the Library and click the icon. Note the drop in simulated power as the difference between the two arms decreased. This is because, the sample size of 1532 was insufficient to detect the δ value -0.025 and -0.01. It shows the effect of mis-specifying the alternative hypothesis. It did achieve the power of 80% for the first case with δ equal to -0.05 which was actually the δ at the design stage. This is called simulating the design under Alternative. We can also simulate a design under Null by entering πt equal to 0.15, same as πc and verify that the type I error is preserved. The column width is the comparison mode is fixed and the heading appears in the format workbook name:design name:Sim. If this string is longer than the fixed width then you may not be able to see the complete heading. In that case, you can hover the mouse on cell of column heading to see the complete heading. 48 3.7 Simulations in East6 <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Thus, simulations in East6 are one of the very powerful tools which help us to verify the operating characteristics of the design. The next section introduces us to another key feature of East6 - Interim Monitoring. Let us see monitor the CAPTURE Trial using this feature. 3.8 Interim Monitoring Interim monitoring is critical for the management of group sequential trials, and there are many reasons why flexibility in both design and monitoring is necessary. Administrative schedules may call for the recalculation of statistical information and unplanned analyses at arbitrary time points, while the need for simultaneously preserving both the type-1 error and power of the study must be maintained. East provides the capability to flexibly monitor a group sequential trial using the Interim Monitoring. The IM dashboard provides a coherent visual display of many output values based on interim information. In addition to important statistical information, included are tables and graphs for stopping boundaries, conditional power, error spending and confidence intervals for each interim look. All of this information is useful in tracking the progress of a trial for decision making purposes, as well as allowing for improvements to a study design adaptively. Consider the monitoring of CAPT-GSD-Drp of the CAPTURE trial. Select this design from the Library and click the icon. The adaptive version of IM dashboard can be invoked by clicking the icon. But for this example, we will use regular IM dashboard. A node named Interim Monitoring gets associated with the design in the Library and a 3.8 Interim Monitoring 49 <<< Contents * Index >>> 3 Getting Started blank IM dashboard is opened up as shown below: Suppose we have to take the first look at the data based on 485 completers. The interim data on these subjects is to be entered in Test Statistic Calculator which can be opened by clicking OK with default parameters. 50 3.8 Interim Monitoring button. Open this calculator and click <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 If we have run any analysis procedure on the interim data then the test statistic calculator can read in the information from Analysis node. Select the appropriate workbook and the node and hit Recalc to see the interim inputs. Alternatively, for binomial endpoint trials, we can enter the interim data in terms of the number of responses on each arm and East computes the difference in proportion and its standard error. Alternatively, we can directly enter the and its standard error which can be the output of some external computation. The inputs on the test statistic calculator depend upon the type of trial you are monitoring. 3.8 Interim Monitoring 51 <<< Contents * Index >>> 3 Getting Started The resulting screen is as shown below: The output quantities for the first look are computed in that row and all the four charts are updated based on the look1 data. There some more advanced features like Conditional Power calculator, Predicted Intervals Plot, Conditional Simulations available from the IM dashboard. These are explained in later sections of this manual. Let us take the second look at 970 subjects. Open the test statistic calculator and leaving all other parameters default, change the number of responses on Treatment arm 52 3.8 Interim Monitoring <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 to 30. Click the OK button. The screen will look as shown below: East tells us that the null hypothesis is rejected at second look, provides an option to stop the trial and conclude efficacy of the drug over the control arm. It computes the final inference in the end. At this stage, it also provides another option to continue entering data for future looks. But the final inference is computed only once. In the last part of this chapter, we shall see how to capture a snapshot of any ongoing interim monitoring of a trial. The IM dashboard can also be used as a tool at design time, where we can construct and analyze multiple possible trial scenarios before actual data is collected. The feature to save a snapshot of information at interim looks can be employed to allow a user the benefit of quickly comparing multiple scenarios under a variety of assumptions. This option increases the flexibility of both, design and interim monitoring process. At each interim look, a snapshot of the updated information in the dashboard can be saved for the current design in the workbook. icon located at the top of IM Dashboard window to save the current Click the contents of the dashboard: A new snapshot node is added under the Interim Monitoring node in the library. The Interim Monitoring window is the input window which can’t be printed whereas it 3.8 Interim Monitoring 53 <<< Contents * Index >>> 3 Getting Started snapshot is in the HTML format which can be printed and shared. To illustrate the benefit of the snapshot feature, it is often the case that actual trial data is not available at design time. Determining a reasonable estimate of nuisance parameters, such as the variance, rather than making strong assumptions of its certainty may be desired. The ability to quickly compare potential results under a variety of different estimates of the variance by easily looking at multiple interim snapshots of a study can be a powerful tool. Other examples could include sample size re-estimation where initial design assumptions may be incorrect or using hypothetical interim data to compare relevant treatment differences. With this, we come to an end of the chapter on getting started with East6. The subsequent chapters in this manual discuss in detail with the help of case studies all the features available in the software. The theory part of all the design and analysis procedures is explained in Appendix A of this manual. 54 3.8 Interim Monitoring <<< Contents * Index >>> 4 Data Editor Data Editor allows you to manipulate the contents of your data. East caters to Case Data and Crossover Data. Depending on the type of data, a corresponding set of menu items becomes available in the Data Editor menu. 4.1 Case Data 4.1.1 Data Editor Capabilities for Case Data 4.1.2 Creating Variables 4.1.3 Variable Type Setting 4.1.4 Editing Data 4.1.5 Filter Cases The Data editor window for case data is a spreadsheet-like facility for creating or editing case data files. A case data file is organized as a sequence of records called cases one below the other. Each record is subdivided into a fixed number of fields, called variables. The name assigned to that field is referred to as the variable name. Each such name identifies a specific variable across all the cases. Each cell holds a value of a variable for a case. The top line of the Data editor holds the variable names. Case data is the most common format to enter and store data. If you plan to share data with any other package you need to use case data editor. 4.1.1 Data Editor Capabilities for Case Data The Data Editor is used to create a new Case Data file or to edit one that was previously saved. You can: Create new variables Change names and attributes of existing variables Alter the column width Alter the row height Type in new case data records Edit existing case data records Insert new variables into the data set Remove variables from the data set Select or reject subsets of the data Transform variables List data in the log window Calculate summary measures from the variables 4.1.2 Creating Variables To create a new Case Data set, invoke the menu Home. Click on the icon Select Case Data. When you create a new case data set, all the columns are labeled var, indicating that new variables may be created in any of the columns. To create a new variable simply start entering data in a blank column. The column is given a default name Var1, Var2, etc. Alternatively, select any unused column, right click and select Create Variable from the menu that appears. The data editor will create all the variables with default names up to the column you are working on. To create a new 4.1 Case Data – 4.1.2 Creating Variables 55 <<< Contents * Index >>> 4 Data Editor variable in the first unused column and to select its attributes, choose menu Data Editor. Click on the icon You will be presented with the dialog box shown below, in which you can select the variable name, variable type, alignment, format, value labels and missing values. 4.1.3 Variable Type Setting You can change the default variable name and its type in this dialog box and click on the OK button. East will automatically add this new variable to the case data file. New variables are added immediately adjacent to the last existing variable in the case data set. The Variable Type Setting dialog box contains five tabs: Detail, Alignment, Format, Value Label, and Missing Value(s). Detail The Detail tab allows you to change the default variable name, add a description of the variable and select the type (Numeric, String, Date, Binary, Categorical or Integer). Note that depending on the type of the variable, different tabs and options become available in 56 4.1 Case Data – 4.1.3 Variable Type Setting <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 the Variable Type Settings. For example the tab Category Details and option Base Level become available only if you select Variable Type as Categorical. Value Label The Value label tab is displayed below. Here, you can add labels for particular data values, change a selected label, remove a selected label, or remove all value labels for the current variable. Missing Value(s) The Missing Value(s) tab is used for specifying which values are to be treated as missing. You have three choices: Not Defined, which means that no values will be treated as missing values; Discrete value(s), which allows you to add particular values to the list of missing values; or Range, which lets you specify an entire range of numbers as missing values. 4.1.4 Editing Data Besides changing the actual cell entries of a case data set you can: Add new Cases and Variables Insert or delete Cases and Variables 4.1 Case Data – 4.1.4 Editing Data 57 <<< Contents * Index >>> 4 Data Editor Sort Cases 4.1.5 Filter Cases We illustrate the ability of East to filter cases with the help of the following example: Step 1: Open the Data set Open the data set leukemia.cyd by clicking on menu Home. Click on the icon Select Data. The data is stored in the Samples folder of the installation directory of East. Step 2: Invoke the Filter Cases menu Invoke the menu item Data Editor. Click on the icon Filter cases. East will present you with a dialog box that allows you to use subsets of data in the Case Data editor. The dialog box will allow you to select All cases, those satisfying an If condition, falling in a Range, or using a Filter Variable as shown below. Step 3: Filter Variable option Select the Filter Variable option. Select Status from the variable list and click on the black triangle, which will remove the variable Status from the variable list and add it to the empty box on the other side. Suppose we want to filter the cases for which the Status variable has value 1. Insert 58 4.1 Case Data – 4.1.5 Filter Cases <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 1 in the empty box next to Event code. Step 4: Output Click on OK . As shown in the following screenshot, East will grey out all the cases that have Status variable value 1. Now any analysis carried out on the data set uses only the filtered cases. In this way you, can carry out subgroup analyses if the 4.1 Case Data – 4.1.5 Filter Cases 59 <<< Contents * Index >>> 4 Data Editor subgroups are identified by the values of a variable in the data set. 60 4.1 Case Data – 4.1.5 Filter Cases <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 4.2 Crossover Data 4.2.1 Data Editor Capabilities for Crossover Data 4.2.2 Creating a New Crossover Data Set The Data Editor allows you to enter data for a 2 × 2 crossover trial with one record for each patient. You can use this crossover data editor to input individual patients’ responses in 2 × 2 crossover trials. The response could be continuous (such as systolic blood pressure) or binary (such as the development of a tumor after injecting a carcinogenic agent). Only the continuous response type is currently supported in East. 4.2.1 Data Editor Capabilities for Crossover Data The Data Editor is used to create a new 2 × 2 Crossover Data file or to edit one that was previously saved. You can: Create and edit data with continuous response of individual patients. Edit period labels. Assign treatments to different groups and periods. Convert to case data. Convert case data into crossover data. List data to the log 4.2.2 Creating a New Crossover Data Set To create a new crossover data set, invoke the menu Home. Click on icon from the drop down menu choose Crossover data. You will be presented with a dialog box as shown below: and In the above dialog box, you see a 2 × 2 grid called Treatment Assignment Table. This grid is provided to assign the treatments to different groups and periods. 4.2 Crossover Data – 4.2.2 Creating a New Crossover Data Set 61 <<< Contents * Index >>> 4 Data Editor In this version of the software, you can analyze data for 2 × 2 crossover trials. Hence the number of groups and number of periods are always two. The rows specify the two groups labeled as G1 and G2. The columns represent two periods of the crossover data labeled ”P1” and ”P2”. If you’d like to change these labels, click inside the table cells. Type the treatment names associated with the corresponding group and period. Having entered the treatments, the crossover data editor settings dialog box will look as follows: Rules for editing these fields The row names G1 and G2 can be changed using a string consisting of a maximum of 8 characters from the set A-Z, 0-9, ’.’, ’ ’ (underscore), starting with either a letter or a digit; blank spaces are not accepted as part of a name. The column names P1 and P2 can be changed the same way. Also note that the Group names as well as the Period names must be distinct. The letters are not case sensitive. Once you have assigned all the treatments, click on the button OK . 62 4.2 Crossover Data – 4.2.2 Creating a New Crossover Data Set <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 This will open up the Patients’ crossover data editor. This editor resembles the case data editor. Like the case data editor, this is a spreadsheet into which you can enter data directly. There are four pre-defined fields in this editor. The PatientId column must contain the Patients’ identification number. The GroupId column will contain the group identification to which the patient belongs. The entry in this column should be one of the labels that you have entered as row names in the 2 × 2 grid earlier. The inputs in the next two columns are numeric and contain the responses of the patient in two periods respectively. The title of the next two columns is created by concatenating the word ”Resp” to the period identifications that you have entered previously. For example, here in the setting dialog we have entered P1 and P2 as period identifiers and these two response columns are labeled as P1 Resp and P2 Resp. However, if the period values are starting with digits such as 1 and 2, then the period ids are prefixed by the letter P, and the heading of the next two columns would be P1 Resp and P2 Resp. The variable names PatientId, GroupId, are fixed and cannot be edited in the data editor. If you use Transform Variable on Group Id and the result is either ”G1” or ”G2,” then the value is displayed; otherwise, the value is shown as missing. You can also add covariates such as age and sex. All variable settings of the case data editor are applicable to these covariates. The Settings button allows you to edit the GroupId, PeriodId or the treatment labels that you have edited earlier. If you make any changes, these changes will automatically be made in the data editor. 4.3 Data Transformation You can transform an existing variable with the data transformation facility available in the Data Editor of East . 4.3 Data Transformation 63 <<< Contents * Index >>> 4 Data Editor To transform any variable: 1. Select the menu Data Editor. Click on the icon You will be presented with the expression builder dialog box screen. Here you can transform the values of the current variable using a combination of statistical, arithmetic, and logical operations. The current variable name is the target variable on the left hand side of an equation with the form: VAR = Where, VAR is the variable name of the current variable. In order to create a new variable, type the variable name in the target variable field. 2. Complete the right hand side of the equation with any combination of allowable functions. To select a function, double-click on it. If the function that you select needs any extra parameters (typically variable names), this will be indicated by a ? for each required parameter. Replace the ? character with the desired parameter. 3. Select the OK button to fill in values for the current variable computed according to the expression that you have constructed. The statistical, arithmetical, and logical functions that are available in the Transform Variable dialog box are given below: 64 4.3 Data Transformation <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 4.4 Mathematical and Statistical Functions The following is a list of mathematical and statistical functions available in East used for variable transformation. ABS(X) Returns the absolute value of X. Argument Range: −1 × 1025 ≤ X ≤ 1 × 1025 . ACOS(X) Returns the arccosine of X. Argument Range: −1 ≤ X ≤ 1. ASIN(X) Returns the arcsine of X. Argument Range: −1 ≤ X ≤ 1. ATAN(X) Returns the arctangent of X. Argument Range: −1 × 1025 ≤ X ≤ 1 × 1025 . AVG(X1 , X2 , . . .) Returns the mean of (X1 , X2 , . . .). Argument Range: −1 × 1025 ≤ X ≤ 1 × 1025 . CEIL(X) Returns the ceiling, or smallest integer greater than or equal to X. Argument Range: −1 × 1025 ≤ X ≤ 1 × 1025 . CHIDIST(X,df) Returns the probability in the tail area to the left of X from the chi-squared distribution with df > 0 degrees of freedom. Argument Range: 0 ≤ X ≤ 1 × 1025 . CHIINV(X,df) Returns the Xth percentile value of the chi-squared distribution with d > 0 degrees of freedom, i.e., returns z such that Pr(Z ≤ z) = X. Argument Range: 0.0001 ≤ X ≤ 0.9999. COS(X) Returns the cosine of X, where X is expressed in radians. Argument Range: −2.14 × 109 ≤ X ≤ 2.14 × 109 . COSH(X) Returns the hyperbolic cosine of X. Argument Range: −87 ≤ X ≤ 87. CUMULATIVE(X) Given a column of X values this function returns a new column in which the entry in row j is the sum of entries in the first j rows of the original column. EXP(X) Returns the exponential function evaluated at X. Argument Range: −87 ≤ X ≤ 87. FLOOR(X) Returns the floor, or largest integer less than or equal to X. Argument Range: −1 × 1025 ≤ X ≤ 1 × 1025 . INT(X) Returns the integer part of X. Argument Range: −1 × 1025 ≤ X ≤ 1 × 1025 . ISNA(X) Returns a value of 1 if X is a missing value 0 otherwise. Argument Range: −1 × 1025 ≤ X ≤ 1 × 1025 . This function is useful. For example, set missing observations to average values. X1 = IF(ISNA(X)=1, COLMEAN(X), X) Another extremely useful task performed by the ISNA() function is to eliminate records from the data set in which there are missing values. 4.4 Mathematical and Statistical Functions 65 <<< Contents * Index >>> 4 Data Editor REJECTIF(ISNA(X)=1) ←- Enter SELECTIF(ISNA(V1)+ISNA(V2)+ISNA(V3)=0) ←- Enter LOG(X) Returns the logarithm of X to base 10. Argument Range: 1 × 10−25 ≤ X ≤ 1 × 1025 . . LN(X) Returns the logarithm of X to base e. Argument Range: 1 × 10−25 ≤ X ≤ 1 × 1025 . MAX(X1 , X2 , . . .) Returns the maximum value of (X1 , X2 , . . .). MIN(X1 , X2 , . . .) Returns the minimum value of (X1 , X2 , . . .). MOD(X,Y) Returns the remainder of X divided by Y. The sign of this remainder is the same as that of X. Argument Range: −1 × 1025 ≤ X ≤ 1 × 1025 . NORMDIST(X) Returns the probability in the tail area to the left of X from the standardized normal distribution. Argument Range: −10 ≤ X ≤ 10. NORMINV(X) Returns the Xth percentile value of the standard normal distribution, i.e., returns z such that Pr(Z ≤ z) = X. Argument Range: 0.001 ≤ X ≤ 0.999. ROUND(X,d) Returns a floating point number obtained by rounding X to d decimal digits. If d=0, X is rounded to the nearest integer. Argument Range: −1 × 1025 ≤ X ≤ 1 × 1025 . SIN(X) Returns the sine of X, where X is expressed in radians. Argument Range: −2.14 × 109 ≤ X ≤ 2.14 × 109 . SINH(X) Returns the hyperbolic sine of X. Argument Range: −87 ≤ X ≤ 87. SQRT(X) Returns the square root of X. Argument Range: 0 ≤ X ≤ 1 × 1025 . TAN(X) Returns the tangent of X, where X is expressed in radians. Argument Range: −1 × 1025 ≤ X ≤ 1 × 1025 ; X 6= (2n + 1) π2 , n an integer. TANH(X) Returns the hyperbolic tangent of X. Argument Range: −87 ≤ X ≤ 87. 4.4.1 The IF Function This function tests arithmetic or logical condition and returns one value if true, another value if false. The syntax is IF(CONDITION, X, Y) The function returns the value X if CONDITION is ”true” and Y if CONDITION is ”false”. For example consider the following equation: HIVPOS = IF(CD4>1,1,-1) 66 4.4 Mathematical and Statistical Functions – 4.4.1 The IF Function <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 The above equation defines a variable HIVPOS that assumes the value 1, if the variable CD4 exceeds 1 and assumes the value -1 otherwise. Usually CONDITION is made up of two arithmetic expressions separated by a ”comparison operator”, e.g., CD4>CD8, CD4+CD8=15*BLOOD, etc. The following comparison operators are allowed: = , >, <, >=, <=, <> More generally, CONDITION can be constructed by combining two or more individual conditions with AND, OR, or NOT operators. For example consider the following expression HIVPOS = IF((CD4>1) !AND! (CD8>1), 1,-1) The above expression means that HIVPOS will take on the value 1 if both CD4>1 and CD8>1, and -1 otherwise. On the other hand consider the following expression: HIVANY = IF((CD4>1) !OR! (CD8>1),1,-1) The above expression means that HIVANY will take on the value 1 if either CD4>1 or CD8>1 and -1 otherwise. 4.4.2 The SELECTIF Function This function provides a powerful way of selecting only those records that satisfy a specific arithmetic or logical condition. All other records are deleted from the current data set. The syntax is: SELECTIF(CONDITION) This function selects only those records for which CONDITION is ”true” and excludes all other records from the current dataset. For example consider the following equation: HIVPOS = SELECTIF(CD4>1) The above condition retails records for which CD4 exceeds 1. The same rules governing CONDITION for the IF function are applicable here as well. Note that the column location of the cursor when Transform Variable was selected plays no role in the execution of this function. 4.4.3 The RECODE Function This function recodes different ranges of a variable. It is extremely useful for creating a new variable consisting of discrete categories at pre-specified cut-points of the original variable. The syntax for RECODE has two forms — one for recoding a 4.4 Mathematical and Statistical Functions – 4.4.3 The RECODE Function 67 <<< Contents * Index >>> 4 Data Editor categorical variable and one for recoding a continuous variable. In both cases, the variable being recoded must assume numerical values. Recoding a Categorical Variable syntax is: RECODE(X, S1 = c1 , S2 = c2 , . . . , Sn = cn , [else]) , where X is the categorical variable (or arithmetic expression) being recoded, Sj represents a set of numbers in X, all being recoded to cj , and the optional argument [else] is a default number to which all the numbers belonging to X, but excluded from the sets S1 , S2 , . . . Sn , are recoded. If [else] is not specified as an argument of RECODE, then all the numbers excluded from the sets S1 , S2 , . . . , Sn are unchanged. Notice that the argument Sj = cj in the RECODE function consists of a set of numbers Sj being recoded to a single number cj . The usual mathematical convention is adopted of specifying a set of numbers within braces. Thus if set Sj consisted of m distinct numbers s1j , s2j , . . . , smj , it would be represented in the RECODE argument list as {s1j , s2j , . . . , smj }. For example Y = RECODE(X, {1,2,3}=1, {7,9}=2, {10}=3) will recode the categorical variable X into another categorical variable Y that assumes the value 1 for X ∈ {1, 2, 3}, 2 for X ∈ {7, 9}, and 3 for X = 10. Other values of X, if any, remain unchanged. If you want those other values of X to be recoded to, e.g.,-1, simply augment the argument list by including -1 at the end of the recode statement: Y = RECODE(X, {1,2,3}=1, {7,9}=2, {10}=3, -1) . Recoding a Continuous Variable syntax is: RECODE(X, I1 = c1 , I2 = c2 , . . . , In = cn , [else]) where X is the continuous variable (or arithmetic expression) being recoded, Ij represents an interval of numbers all being recoded to cj , and the optional argument [else] is a default number to which all the numbers belonging to X, but excluded from the intervals I1 , I2 , . . . In , are recoded. If [else] is not specified as an argument of RECODE, then all the numbers excluded from the intervals I1 , I2 , . . . , In are unchanged. Notice that the arguments of RECODE are intervals being recoded to individual numbers. The usual mathematical convention for specifying an interval Ij as open, semi-open, and closed is adopted. Thus: An interval Ij of the form (u, v) is open and includes all numbers between u and v, but not the end points. 68 4.4 Mathematical and Statistical Functions – 4.4.3 The RECODE Function <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 An interval Ij of the form [u, v] is closed and includes all numbers between u and v inclusive of the end points. An interval of the form (u, v] is open on the left but closed on the right. It excludes u, includes v, and includes all the numbers in between. An interval of the form [u, v) is closed on the left but open on the right. It includes u, excludes v, and includes all the numbers in between. For example Y = RECODE(X, (2.5,5.7]=1, (5.7,10.4]=2) will recode the continuous variable X so that all numbers 2.5 < X ≤ 5.7 are replaced by 1, all numbers 5.7 < X ≤ 10.4 are replaced by 2, and all other values of X are unchanged. If you want all other values of X to also be recoded to say -1, append the -1 as the last argument of the equation: Y = RECODE(X, (2.5,5.7]=1, (5.7,10.4]=2, -1) . 4.4.4 Column Functions Column functions operate on an entire column of numbers and return a scalar quantity. The returned value is often used in arithmetic expressions. The following column functions are available. All of them are prefixed by the letters COL. The argument var of all these column functions must be a variable in the worksheet; arithmetic expressions are not permitted. This may require you to create an intermediate column of computed expressions before using a column function. Also note that missing values are ignored in computing these column functions. COLMEAN(X) Returns the sample mean of X. COLVAR(X) Returns the sample variance of X. COLSTD(X) Returns the sample standard deviation of X. COLSUM(X) Returns the sum of all the numbers in X. COLMAX(X) Returns the maximum value of X. COLMIN(X) Returns the minimum value of X. COLRANGE(X) Returns the value of COLMAX(X)-COLMIN(X). COLCOUNT(X) Returns the number of elements in X. You can use the values returned by these column functions in arithmetic expressions and as arguments of other functions. To do this, it is not necessary to know the actual value returned by the column function. However, if you want to know the value returned by any column function, you must define a new variable in the worksheet and fill its entire column with the value of the column function. 4.4.5 Random Numbers 4.4 Mathematical and Statistical Functions – 4.4.5 Random Numbers 69 <<< Contents * Index >>> 4 Data Editor You can fill an entire column of a worksheet with random numbers and constants. Suppose the cursor is in a cell of a variable named RANDNUM. The expression RANDNUM = #RAND will result in the variable RANDNUM being filled with a column of uniform random numbers in the range (0, 1). Three random number functions or generators are available to you with the editors: #RAND Generates uniform random numbers in the range (0, 1). #NORMRAND Generates random numbers from the standard Normal Distribution. #CHIRAND(X) Generates random numbers from the chi-squared distribution with X degrees of freedom. You may of course use these three random number generators to generate random numbers from other distributions. For example, the equation Y = 3+2*#NORMRAND will generate random numbers from the normal distribution with mean 3 and standard deviation 2, in variable Y. Again, the equation Z = #CHIRAND(5) will generate random numbers from the chi-squared distribution with 5 degrees of freedom. 4.4.6 Special functions The following special functions are available for use in arithmetic expressions: #PI This is the value of π. #NA This is the missing value code. It can be used to detect if a value is missing, or to force a value to be treated as missing. #SQNO This is the value of the current sequence number (SQNO) in the current data set. #SQEND This is the largest value of the sequence number (SQNO) in the current data set. 70 4.4 Mathematical and Statistical Functions – 4.4.6 Special functions <<< Contents * Index >>> Volume 2 Continuous Endpoints 5 Introduction to Volume 2 73 6 Tutorial: Normal Endpoint 79 7 Normal Superiority One-Sample 91 8 Normal Noninferiority Paired-Sample 113 9 Normal Equivalence Paired-Sample 10 Normal Superiority Two-Sample 128 141 11 Nonparametric Superiority Two Sample 12 Normal Non-inferiority Two-Sample 13 Normal Equivalence Two-Sample 14 Normal: Many Means 179 185 211 232 15 Multiple Comparison Procedures for Continuous Data 16 Multiple Endpoints-Gatekeeping Procedures 265 240 <<< Contents * Index >>> 17 Continuous Endpoint: Multi-arm Multi-stage (MaMs) Designs 285 18 Two-Stage Multi-arm Designs using p-value combination 19 Normal Superiority Regression 72 332 309 <<< Contents * Index >>> 5 Introduction to Volume 2 This volume describes the procedures for continuous endpoints (normal) applicable to one-sample, two-samples, many-samples and regression situations. All the three type of designs - superiority, non-inferiority and equivalence are discussed in detail. Chapter 6 introduces you to East on the Architect platform, using an example clinical trial to test difference of means. Chapter 7, 8 and 9 detail the design and interim monitoring in one-sample situation where it may be required to compare a new treatment to a well-established control, using a single sample. These chapters respectively cover superiority, non-inferiority and equivalence type of trials. Chapter 10 details the design and interim monitoring in superiority two-sample situation where the superiority of a new treatment over the control treatment is tested comparing the group-dependent means of the outcome variables. Chapter 11 details the design in the Wilcoxon-Mann-Whitney nonparametric test which is a commonly used test for the comparison of two distributions when the observations cannot be assumed to come from normal distributions. It is used when the distributions differ only in a location parameter and is especially useful when the distributions are not symmetric. For Wilcoxon-Mann-Whitney test, East supports single look superiority designs only. Chapter 12 provides an account of the design and interim monitoring in non-inferiority two-sample situation where the goal is to establish that an experimental treatment is no worse than the standard treatment, rather than attempting to establish that it is superior. Non-inferiority trials are designed by specifying a non-inferiority margin. The amount by which the mean response on the experimental arm is worse than the mean response on the control arm must fall within this margin in order for the claim of non-inferiority to be sustained. Chapter 13 narrates the details of the design and interim monitoring in equivalence two-sample situation where the goal is neither establishing superiority nor non-inferiority, but equivalence. When the goal is to show that two treatments are similar, it is necessary to develop procedures with the goal of establishing equivalence in mind. In Section 13.1, the problem of establishing the equivalence with respect to the difference of the means of two normal distributions using a parallel-group design is presented. The corresponding problem of establishing the equivalence with respect to 73 <<< Contents * Index >>> 5 Introduction to Volume 2 the log ratio of means is presented in Section 13.2. For the crossover design, the problem of establishing the equivalence with respect to the difference of the means is presented in Section 13.3 and with respect to the log ratio of means in Section 13.4. Chapter 16 details the clinical trials that are often designed to assess benefits of a new treatment compared to a control treatment with respect to multiple clinical endpoints which are divided into hierarchically ordered families. It discusses two methods Section 16.2 discusses Serial Gatekeeping whereas section 16.3 discusses Parallel Gatekeeping. Chapter 14 details the various tests available for comparing more than two continuous means in East. Sections 14.1, 14.2 and 14.3 discuss One Way ANOVA, One Way Repeated Measures ANOVA and Two Way ANOVA respectively. Chapter 15 details the Multiple Comparison Procedures (MCP) for continuous data. It is often the case that multiple objectives are to be addressed in one single trial. These objectives are formulated into a family of hypotheses. Multiple comparison (MC) procedures provides a guard against inflation of type I error while testing these multiple hypotheses. East supports several parametric and p-value based MC procedures. This chapter explains how to design a study using a chosen MC procedure that strongly maintains FWER. Chapter 19 elaborates on the design and interim monitoring in superiority regression situation where linear regression models are used to examine the relationship between a response variable and one or more explanatory variables. This chapter discusses the design and interim monitoring of three types of linear regression models. Section 19.1 examines the problem of testing a single slope in a simple linear regression model involving one continuous covariate. Section 19.2 examines the problem of testing the equality of two slopes in a linear regression model with only one observation per subject. Finally Section 19.3 examines the problem of testing the equality of two slopes in a linear regression repeated measures model, applied to a longitudinal setting. 74 <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 5.1 Settings Click the icon in the Home menu to adjust default values in East 6. The options provided in the Display Precision tab are used to set the decimal places of numerical quantities. The settings indicated here will be applicable to all tests in East 6 under the Design and Analysis menus. All these numerical quantities are grouped in different categories depending upon their usage. For example, all the average and expected sample sizes computed at simulation or design stage are grouped together under the category ”Expected Sample Size”. So to view any of these quantities with greater or lesser precision, select the corresponding category and change the decimal places to any value between 0 to 9. The General tab has the provision of adjusting the paths for storing workbooks, files, and temporary files. These paths will remain throughout the current and future sessions even after East is closed. This is the place where we need to specify the installation directory of the R software in order to use the feature of R Integration in East 6. 5.1 Settings 75 <<< Contents * Index >>> 5 Introduction to Volume 2 The Design Defaults is where the user can change the settings for trial design: Under the Common tab, default values can be set for input design parameters. You can set up the default choices for the design type, computation type, test type and the default values for type-I error, power, sample size and allocation ratio. When a new design is invoked, the input window will show these default choices. Time Limit for Exact Computation This time limit is applicable only to exact designs and charts. Exact methods are computationally intensive and can easily consume several hours of computation time if the likely sample sizes are very large. You can set the maximum time available for any exact test in terms of minutes. If the time limit is reached, the test is terminated and no exact results are provided. Minimum and default value is 5 minutes. Type I Error for MCP If user has selected 2-sided test as default in global settings, then any MCP will use half of the alpha from settings as default since MCP is always a 1-sided test. Sample Size Rounding Notice that by default, East displays the integer sample size (events) by rounding up the actual number computed by the East algorithm. In this case, the look-by-look sample size is rounded off to the nearest integer. One can also see the original floating point sample size by selecting the option ”Do not round sample size/events”. 76 5.1 Settings <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Under the Group Sequential tab, defaults are set for boundary information. When a new design is invoked, input fields will contain these specified defaults. We can also set the option to view the Boundary Crossing Probabilities in the detailed output. It can be either Incremental or Cumulative. Simulation Defaults is where we can change the settings for simulation: If the checkbox for ”Save summary statistics for every simulation” is checked, then East simulations will by default save the per simulation summary data for all the 5.1 Settings 77 <<< Contents * Index >>> 5 Introduction to Volume 2 simulations in the form of a case data. If the checkbox for ”Suppress All Intermediate Output” is checked, the intermediate simulation output window will be always suppressed and you will be directed to the Output Preview area. The Chart Settings allows defaults to be set for the following quantities on East6 charts: We suggest that you do not alter the defaults until you are quite familiar with the software. 78 5.1 Settings <<< Contents * Index >>> 6 Tutorial: Normal Endpoint This tutorial introduces you to East on the Architect platform, using an example clinical trial to test difference of means. 6.1 Fixed Sample Design When you open East, by default, the Design tab in the ribbon will be active. The items on this tab are grouped under the following categories of endpoints: Continuous, Discrete, Count, Survival, and General. Click Continuous: Two Samples, and then Parallel Design: Difference of Means. The following input window will appear. By default, the radio button for Sample Size (n) is selected, indicating that it is the variable to be computed. The default values shown for Type I Error and Power are 0.025 and 0.9. Keep the same for this design. Since the default inputs provide all of the necessary input information, you are ready to compute sample size by clicking the Compute button. The calculated result will appear in the Output Preview pane, as 6.1 Fixed Sample Design 79 <<< Contents * Index >>> 6 Tutorial: Normal Endpoint shown below. This single row of output contains relevant details of inputs and the computed result of total sample size (and total completers) of 467. Select this row, and click display a summary of the design details in the upper pane (known as Output Summary). to The discussion so far gives you a quick feel of the software for computing sample size for a single look design. We will describe further features in an example for a group sequential design in the next section. 6.2 Group Sequential Design for a Normal Superiority Trial 6.2.1 Study Background Drug X is a newly developed lipase inhibitor for obesity management that acts by inhibiting the absorption of dietary fats. The performance of this drug needs to be compared with an already marketed drug Y for the same condition. In a randomized, 80 6.2 Group Sequential Design – 6.2.1 Study Background <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 double-blind, trial comparing the efficacy and safety of 1 year of treatment with X to Y (each at 120 mg for three times a day), obese adults are to be randomized to receive either X or Y combined with dietary intervention for a period of one year. The endpoint is weight loss (in pounds). You are to design a trial having 90% power to detect a mean difference of 9 lbs between X and Y, assuming 15 lbs and 6 lbs weight loss in each treatment arm, respectively, and a common standard deviation of 32 lbs. The design is required to be a 2-sided test at the 5% significance level. From the design menu choose Continuous: Two Samples, and then Parallel Design: Difference of Means. Select 2-Sided for Test Type, and enter 0.05 for Type I Error. Specify the Mean Control be 6, the Mean Treatment to be 15, and the common Std. Deviation to be 32. Next, change the Number of Looks to be 5. You will see a new tab, Boundary , added to the input dialog box. Click the Boundary tab, and you will see the following screen. On this tab, you can choose whether to specify stopping boundaries for efficacy, or futility, or both. For this trial, choose efficacy boundaries only, and leave all other default values. We will implement the Lan-Demets (O’Brien-Fleming) spending function, with equally spaced 6.2 Group Sequential Design – 6.2.1 Study Background 81 <<< Contents * Index >>> 6 Tutorial: Normal Endpoint looks. On the Boundary tab near the Efficacy drop-down box, click on the icons 82 6.2 Group Sequential Design – 6.2.1 Study Background or <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 , to generate the following charts. Click Compute. East will show the results in the Output Preview. The maximum combined sample size required under this design is 544. The expected 6.2 Group Sequential Design – 6.2.1 Study Background 83 <<< Contents * Index >>> 6 Tutorial: Normal Endpoint sample sizes under H0 and H1 are 540 and 403, respectively. Click in the Output Preview toolbar to save this design to Wbk1 in the Library. Double-click on Des1 to generate the following output. Once you have finished examining the output, close this window, and re-start East before continuing. 6.2.2 Creating multiple designs easily In East, it is easy to create multiple designs by inputting multiple parameter values. In the trial described above, suppose we want to generate designs for all combinations of the following parameter values: Power = 0.8, 0.9, and Difference in Means = 8.5, 9, 9.5, 10. The number of such combinations is 2 × 4 = 8. East can create all 8 designs by a single specification in the input dialog box. Enter the following values as shown below. Remember that the common Std. Deviation is 32. From the Input Method, select the Difference of Means option. The values of Power have been entered as a list of comma-separated values, while Difference in 84 6.2 Group Sequential Design – 6.2.2 Creating multiple designs easily <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Means has been entered as a colon-separated range of values: 8.5 to 10 in steps of 0.5. Now click compute. East computes all 8 designs, and displays them in the Output Preview as shown below. Click to maximize the Output Preview. Select the first 3 rows using the Ctrl key, and click to display a summary of the design details in the upper pane, known as the Output Summary. Select Des1 in the Output Preview, and click toolbar to save this design in the Library. We will use this design for simulation and interim monitoring, as described below. Now that you have saved Des1, delete all designs from the Output Preview before continuing, by selecting all designs with the Shift key, and clicking the toolbar. 6.2.3 in Simulation Right-click Des1 in the Library, and select Simulate. Alternatively, you can select 6.2 Group Sequential Design – 6.2.3 Simulation 85 <<< Contents * Index >>> 6 Tutorial: Normal Endpoint Des1 and click the icon. We will carry out a simulation of Des1 to check whether it preserves the specified power. Click Simulate. East will execute by default 10000 simulations with the specified inputs. Close the intermediate window after examining the results. A row labeled as Sim1 will be added in the Output Preview. Click the icon to save this simulation to the Library. A simulation sub-node will be added under Des1 node. Double clicking on the Sim1 node, will display the 86 6.2 Group Sequential Design – 6.2.3 Simulation <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 detailed simulation output in the work area. In 80.23% of the simulated trials, the null hypothesis was rejected. This value is very close to the specified power of 80%. Note that your results may differ from the results displayed over here as the simulations would be run with different seed. The next section will explore interim monitoring with this design. 6.2 Group Sequential Design – 6.2.3 Simulation 87 <<< Contents * Index >>> 6 Tutorial: Normal Endpoint 6.2.4 Interim Monitoring Right-click Des1 in the Library and select Interim Monitoring. Click the to open the Test Statistic Calculator. Suppose that after 91 subjects, at the first look, you have observed a mean difference of 8.5, with a standard error of 6.709. Click OK to update the IM Dashboard. The Stopping Boundaries and Error Spending Function charts on the left: 88 6.2 Group Sequential Design – 6.2.4 Interim Monitoring <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 The Conditional Power and Confidence Intervals charts on the right: Suppose that after 182 subjects, at the second look, you have observed a mean difference of 16, with a standard error of 4.744. Click Recalc, and then OK to update the IM Dashboard. In this case, a boundary has been crossed, and the following 6.2 Group Sequential Design – 6.2.4 Interim Monitoring 89 <<< Contents * Index >>> 6 Tutorial: Normal Endpoint window appears. Click Stop to complete the trial. The IM Dashboard will be updated accordingly, and a table for Final Inference will be displayed as shown below. 90 6.2 Group Sequential Design <<< Contents * Index >>> 7 Normal Superiority One-Sample To compare a new process or treatment to a well-established control, a single-sample study may suffice for preliminary information prior to a full-scale investigation. This single sample may either consist of a random sample of observations from a single treatment when the mean is to be compared to a specified constant or a random sample of paired differences or ratio between two treatments. The former is presented in Section (7.1) and the latter is discussed in Section (7.2) and Section (7.3). 7.1 Single Mean 7.1.1 7.1.2 7.1.3 7.1.4 Trial Design Simulation Interim Monitoring Trial Design Using a t-Test (Single Look) The problem of comparing the mean of the distribution of observations from a single random sample to a specified constant is considered. For example, when developing a new drug for treatment of a disease, there should be evidence of efficacy. For this single-sample problem, it is desired to compare the unknown mean µ to a fixed value µ0 . The null hypothesis H0 : µ = µ0 is tested against the two-sided alternative hypothesis H1 : µ 6= µ0 or a one-sided alternative hypothesis H1 : µ < µ0 or H1 : µ > µ0 . The power of the test is computed at a specified value of µ = µ1 and standard deviation σ. Let µ̂j denote the estimate of µ based on nj observations, up to and including the j-th look, j = 1, ..., K, with a maximum of K looks. The test statistic at the j-th look is based on the value specified by the null hypothesis, namely 1/2 Zj = nj (µ̂j − µ0 )/σ̂j , (7.1) where σ̂j2 is the sample variance based on nj observations. 7.1.1 Trial Design Consider the situation where treatment for a certain infectious disorder is expected to result in a decrease in the length of hospital stay. Suppose that hospital records were reviewed and it was determined that, based on this historical data, the average hospital stay is approximately 7 days. It is hoped that the new treatment can decrease this to less than 6 days. It is assumed that the standard deviation is σ = 2.5 days.The null hypothesis H0 : µ = 7(= µ0 ) is tested against the alternative hypothesis H1 : µ < 7. First, click Continuous: One Sample on the Design tab and then click Single Arm Design: Single Mean. This will launch a new input window. Single-Look Design 7.1 Single Mean – 7.1.1 Trial Design 91 <<< Contents * Index >>> 7 Normal Superiority One-Sample We want to determine the sample size required to have power of 90% when µ = 6(= µ1 ), using a test with a one-sided type-1 error rate of 0.05. Choose Test Type as 1-Sided. Specify Mean Response under Null (µ0 ) as 7, Mean Response under Alt. (µ1 ) as 6 and Std. Deviation (σ) as 2.5. The upper pane should appear as below: Click Compute. This will calculate the sample size for this design and the output is shown as a row in the Output Preview. The computed sample size is 54 subjects. This design has default name Des 1. Select this design by clicking anywhere along the row and click 92 in the Output Preview toolbar. Some of the design details will 7.1 Single Mean – 7.1.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 be displayed in the upper pane, labeled as Output Summary. In the Output Preview toolbar select Des 1, click in the Library. to save this design to Wbk1 Five-Look Design To allow the opportunity to stop early and proceed with a full-scale plan, five equally-spaced analyses are planned, using the Lan-DeMets (O’Brien-Fleming) stopping boundary. Create a new design by right-clicking Des 1 in the Library, and selecting Edit Design. In the Input, change the Number of Looks from 1 to 5, to generate a study with four interim looks and a final analysis. A new tab for Boundary Info should appear. Click this tab to reveal the stopping boundary parameters. By default, the Spacing of Looks is set to Equal, which means that the interim analyses will be equally spaced in terms of the number of patients accrued between looks. The left side contains details for the Efficacy boundary, and the right side contains details for the Futility boundary. By default, there is an efficacy boundary (to reject H0 ) selected, but no futility boundary (to reject H1 ). The Boundary Family specified is of the Spending Functions type. The default Spending Function is the Lan-DeMets (Lan & DeMets, 1983), with Parameter as OF (O’Brien-Fleming), which generates boundaries that are very similar, though not identical, to the classical stopping boundaries of O’Brien and Fleming (1979). For a detailed description of the different spending functions and stopping boundaries available in East refer to Chapter 62. The cumulative alpha spent and the boundary values are displayed below. 7.1 Single Mean – 7.1.1 Trial Design 93 <<< Contents * Index >>> 7 Normal Superiority One-Sample Click Compute. The maximum and expected sample sizes are highlighted in yellow in the Output Preview. Save this design in the current workbook by selecting the corresponding row in the Output Preview and clicking on the Output Preview toolbar. To compare Des 1 and Des 2, select both rows in Output Preview using the Ctrl key and click in the Output Preview toolbar. This will display both designs in the Output Summary pane. 94 7.1 Single Mean – 7.1.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Des 2 results in a maximum of 56 subjects in order to attain 90% power, with an expected sample size of 40 under the alternative hypothesis. In order to see the stopping probabilities, double-click Des 2 in the Library. The clear advantage of this sequential design resides in the relatively high cumulative probability of stopping by the third look if the alternative is true, with a sample size of 34 patients, which is well below the requirements for a fixed sample study (54 patients). Close the Output window before continuing. Examining stopping boundaries and spending functions You can plot the boundary values of Des 2 by clicking 7.1 Single Mean – 7.1.1 Trial Design on the Library toolbar, 95 <<< Contents * Index >>> 7 Normal Superiority One-Sample and then clicking Stopping Boundaries. The following chart will appear: You can choose different boundary scales from the drop down box located in the right hand side. The available boundary scales are Z scale, Score Scale, µ/σ Scale and p-value scale. To plot the error spending function for Des 2, select Des 2 in the in the toolbar, and then click Error Spending. The following Library, click 96 7.1 Single Mean – 7.1.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 chart will appear: The above spending function is according to Lan and DeMets (1983) with O’Brien-Fleming flavor and for one-sided tests has the following functional form: Zα/2 α(t) = 2 − 2Φ √ t Observe that very little of the total type-1 error is spent early on, but more is spent rapidly as the information fraction increases, and reaches 0.05 at an information fraction of 1. Feel free to try other plots by clicking in the Library toolbar. Close all charts before continuing. 7.1.2 Simulation Suppose we want to see the advantages of performing the interim analyses, as it relates to the chance of stopping prior to the final analysis. This examination can be conducted using simulation. Select Des 2 in the Library, and click in the toolbar. Alternatively, right-click on Des 2 and select Simulate. A new Simulation window will appear. For example, suppose you wish to determine how quickly this trial could be 7.1 Single Mean – 7.1.2 Simulation 97 <<< Contents * Index >>> 7 Normal Superiority One-Sample terminated if the treatment difference was much greater than expected. For example, under the alternative hypothesis, µ = 4.5. Click on the Response Generation Info tab, and specify: Mean Response(µ) = 4.5 and Std. Deviation (σ) = 2.5. Click Simulate to start the simulation. Once the simulation run has completed, East will add an additional row to the Output Preview labeled as Sim 1. Select Sim 1 in the Output Preview and click . Now double-click on Sim 1 in the Library. The simulation output details will be displayed in the upper pane. Observe that 100% simulated trials rejected the null hypothesis, and about 26% of these simulations were able to reject the null at the first look after enrolling only 11 subjects. Your numbers might differ slightly due to a different starting seed. 98 7.1 Single Mean – 7.1.2 Simulation <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 7.1.3 Interim Monitoring Suppose that the trial has commenced and Des 2 was implemented. Right-click Des 2 in the Library, and select Interim Monitoring. Although we specified that there will be five equally spaced interim looks, the Lan-DeMets methodology implemented in East allows you to alter the number and spacing of these looks. Accordingly, suppose that an interim look was taken after enrolling 20 subjects and the sample mean, based on these 20 subjects, was 5.1 with a standard error of 0.592. Since µ0 = 7, based on equation (7.1) the value of the test statistic at the first look would be Z1 = (5.1 − 7)/0.592 or -3.209. Click Enter Interim Data on the toolbar. In the Test Statistic Calculator, enter the following values, and click Recalc and thenOK. Since the stopping boundary is crossed, the following dialog box appears. 7.1 Single Mean – 7.1.3 Interim Monitoring 99 <<< Contents * Index >>> 7 Normal Superiority One-Sample Click Stop to take you back to the interim monitoring dashboard. For final inference, East will display the following summary information on the dashboard. 7.1.4 Trial Design Using a t-Test (Single Look) The sample size obtained to correctly power Des 1 in Section (7.1.1) relied on using a Wald-type statistic for the hypothesis test, given by equation (7.1). Due to the assumption of normal distribution for the test statistic, we have ignored the fact that the variance σ is estimated from the sample. For large sample sizes this approximation is acceptable. However, in small samples with unknown standard deviation the test statistic Z = n1/2 (µ̂ − µ0 )/σ̂, (7.2) is distributed with student’s t distribution with (n − 1) degrees of freedom. Here, σ̂ 2 denotes the sample variance based on n observations. Consider the example in Section 7.1.1 where we would like to test the null hypothesis that the average hospital stay is 7 days, H0 : µ = 7(= µ0 ), against the alternative hypothesis that is less than 7 days, H1 : µ < 7. We will now design the same trial in a different manner, using the t distribution for the test statistic. Right-click Des 1 in the Library, and select Edit Design. In the input window, change 100 7.1 Single Mean – 7.1.4 Trial Design Using a t-Test (Single Look) <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 the Test Stat. from Z to t. The entries for the other fields need not be changed. Click Compute. East will add an additional row to the Output Preview labeled as Des 3. The required sample size is 55. Select the rows corresponding to Des 1 and Des 3 and click . This will display Des 1 and Des 3 in the Output Summary. Des 3, which uses the t distribution, requires that we commit a combined total of 55 patients to the study, just one more compared to Des 1, which uses the normal distribution. The extra patient is needed to compensate for the extra variability due to estimation of the var[δ̂]. 7.2 Mean of Paired Differences 7.2.1 7.2.2 7.2.3 7.2.4 Trial Design Simulation Interim Monitoring Trial Design Using a t-Test (Single Look) The paired t-test is used to compare the means of two normal distributions when each observation in the random sample from one distribution is matched with a unique observation from the other distribution. Let µc and µt denote the two means to be compared and let σ 2 denote the variance of the differences. The null hypothesis H0 : µc = µt is tested against the two-sided alternative hypothesis H1 : µc 6= µt or a one-sided alternative hypothesis H1 : µc < µt or H1 : µc > µt . Let δ = µt − µc . The null hypothesis can be expressed as H0 : δ = 0 and the alternative can be expressed as H1 : δ 6= 0, H1 : δ > 0, or H1 : δ < 0. The power of the test is computed at specified values of µc , µt , and σ. Let µ̂cj and µ̂tj denote the estimates of µc and µt based on nj observations, up to and including j-th look, j = 1, . . . , K where a maximum of K looks are to be made. The estimate of the difference at the j-th look is δ̂j = µ̂tj − µ̂cj 7.2 Mean of Paired Differences 101 <<< Contents * Index >>> 7 Normal Superiority One-Sample and the test statistic at the j-th look is 1/2 Zj = nj δ̂j /σˆj , (7.3) where σ̂j2 is the sample variance of nj paired differences. 7.2.1 Trial Design Consider the situation where subjects are treated once with placebo after pain is experimentally induced, and later treated with a new analgesic after pain is induced a second time. Pain is reported by the subjects using a 10 cm visual analog scale (0=“no pain”, . . . , 10=“extreme pain”). After treatment with placebo, the average is expected to be 6 cm. After treatment with the analgesic, the average is expected to be 4 cm. It is assumed that the common standard deviation is σ = 5 cm. The null hypothesis H0 : δ = 0 is tested against the alternative hypothesis H1 : δ < 0. Start East afresh. First, Continuous: One Sample on the Design tab, and then click Paired Design: Mean of Paired Differences This will launch a new input window. Single-Look Design We want to determine the sample size required to have power of 90% when µc = 6 and µt = 4, using a test with a one-sided type-1 error rate of 0.05. Select Test Type as 1-Sided, Individual Means for Input Method, and specify the Mean Control (µc ) as 6 and Mean Treatment (µt ) as 4. Enter Std. Dev. of Paired Difference (σ0 ) as 5. The upper pane should appear as below: 102 7.2 Mean of Paired Differences – 7.2.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Click Compute. This will calculate the sample size for this design and the output is shown as a row in the Output Preview. The computed sample size is 54 subjects. This design has default name Des 1. Select this design by clicking anywhere along the row in the Output Preview and click . Some of the design details will be displayed in the upper pane, labeled as Output Summary. In the Output Preview toolbar select Des 1, click in the Library. to save this design to Wbk1 Three-Look Design For the above study, suppose we wish to take up to two equally spaced interim looks and one final look as we accrue data, using the Lan-DeMets (O’Brien-Fleming) stopping boundary. Create a new design by right-clicking Des 1 in the Library, and Edit Design. In the Input, change the Number of Looks from 1 to 3, to generate a study with two interim looks and a final analysis. Click Compute. The maximum and expected sample sizes are highlighted in yellow in the Output Preview. Save this design in the current workbook by selecting the on the Output Preview corresponding row in Output Preview and clicking toolbar. To compare Des 1 and Des 2, select both rows in Output Preview using the 7.2 Mean of Paired Differences – 7.2.1 Trial Design 103 <<< Contents * Index >>> 7 Normal Superiority One-Sample Ctrl key and click pane. . Both designs will be displayed in the Output Summary Des 2 results in a maximum of 55 subjects in order to attain 90% power, with an expected sample size of 43 under the alternative hypothesis. In the Output Preview toolbar select Des 2, click to save this design to Wbk1 in the Library. In order to see the stopping probabilities, double-click Des 2 in the Library. The clear advantage of this sequential design resides in the high cumulative probability of stopping by the third look if the alternative is true, with a sample size of 37 patients, which is well below the requirements for a fixed sample study (54 patients). Close the Output window before continuing. Select Des 2 and click 104 on the Library toolbar. You can select one of many 7.2 Mean of Paired Differences – 7.2.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 plots, including one for Stopping Boundaries: Close this chart before continuing. 7.2.2 Simulation in the toolbar. Click on the Response Select Des 2 in the Library, and click Generation Info tab, and make sure Mean Treatment(µt ) = 4, Mean Control(µc ) = 6 and Std. Deviation (σ) = 5. Click Simulate. Once the simulation run has completed, East will add an additional row to the Output Preview labeled as Sim 1. Select Sim 1 in the Output Preview and click 7.2 Mean of Paired Differences – 7.2.2 Simulation . Now double-click on Sim 1 in 105 <<< Contents * Index >>> 7 Normal Superiority One-Sample the Library. The simulation output details will be displayed. Overall, close to 90% of simulations have rejected H0 . The numbers on your screen might differ slightly due to a different seed. 7.2.3 Interim Monitoring For an ongoing study we evaluate the test statistic at an interim stage to see whether we have enough evidence to reject H0 . Right-click Des 2 in the Library, and select Interim Monitoring. Although the design specified that there be three equally spaced interim looks, the Lan-DeMets methodology implemented in East allows you to alter the number and spacing of these looks. Suppose that an interim look was taken after enrolling 18 subjects and the sample mean, based on these subjects, was -2.2 with a standard error of 1.4. Then based on equation (7.3), the value of the test statistic at first look would be Z1 = (−2.2)/1.4 or -1.571. Click Enter Interim Data on the toolbar. In the Test Statistic Calculator, enter the 106 7.2 Mean of Paired Differences – 7.2.3 Interim Monitoring <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 following values, and click Recalc and thenOK. The dashboard will be updated accordingly. As the observed value -1.571 has not crossed the critical boundary value of -3.233, the trial continues. Now, 18 additional subjects are enrolled, and a second interim analysis with 36 subjects is conducted. Suppose that the observed difference is -2.3 with 7.2 Mean of Paired Differences – 7.2.3 Interim Monitoring 107 <<< Contents * Index >>> 7 Normal Superiority One-Sample standard error as 0.8. Select the Look 2 row and click Enter Interim Data. Enter these values, and click Recalc, and thenOK. Since the stopping boundary is crossed, the following dialog box appears. Click on Stop. For final inference, East will display the following summary information on the dashboard. 108 7.2 Mean of Paired Differences – 7.2.4 Trial Design Using a t-Test (Single Look) <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 7.2.4 Trial Design Using a t-Test (Single Look) The sample size obtained to correctly power the trial in Section (7.2.1) relied on using a Wald-type statistic for the hypothesis test, given by equation (7.3). However, we neglected the fact that the variance σ is estimated by assuming that the test statistic follows a standard normal distribution. For large sample sizes, asymptotic theory supports this approximation. In a single-look design, this test statistic is calculated as Z = n1/2 δ̂/σ̂, (7.4) where σ̂ 2 is the sample variance based on n observed paired differences. In the following calculations we take into consideration that Z follows a Student’s t-distribution with (n − 1) degrees of freedom. Consider the example in Section 7.2.1 where we would like to test the null hypothesis that the analgesic does not reduce pain, H0 : δ = 0, against the alternative hypothesis that the new analgesic works to reduce pain, H1 : δ < 0. We will design this same trial using the t distribution for the test statistic. Right-click Des 1 from the Library, and select Edit Design. Change the Test Stat. from Z to t. The entries for the other fields need not be changed, and click Compute. East will add an additional row to the Output Preview labeled as Des 3. Select the rows corresponding to Des 1 and Des 3. This will display Des 1 and Des 3 in the 7.2 Mean of Paired Differences – 7.2.4 Trial Design Using a t-Test (Single Look) 109 <<< Contents * Index >>> 7 Normal Superiority One-Sample Output Summary. Using the t distribution, we need one extra subject to compensate for the extra variability due to estimation of the var[δ̂]. 7.3 Ratio of Paired Means The test for ratio of paired difference is used to compare the means of two log normal distributions when each observation in the random sample from one distribution is matched with a unique observation from the other distribution. Let µc and µt denote the two means to be compared and let σc2 adn σt2 are the respective variances. The null hypothesis H0 : µc /µt = 1 is tested against the two-sided alternative hypothesis H1 : µc /µt 6= 1 or a one-sided alternative hypothesis H1 : µc /µt < 1 or H1 : µc /µt > 1. Let ρ = µt /µc . Then the null hypothesis can be expressed as H0 : ρ = 1 and the alternative can be expressed as H1 : ρ 6= 1, H1 : ρ > 1, or H1 : ρ < 1. The power of the test is computed at specified values of µc , µt , and σ. We assume that σt /µt = σc /µc i.e., the coefficient of variation (CV) is the same under both control and treatment. 7.3.1 Trial Design Start East afresh. Click Continuous: One Sample on the Design tab, and then click 110 7.3 Ratio of Paired Means – 7.3.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Paired Design: Mean of Paired Ratios as shown below. This will launch a new window. The upper pane of this window displays several fields with default values. Select Test Type as 1-Sided, and Individual Means for Input Method. Specify the Mean Control (µc ) as 4 and Mean Treatment (µt ) as 3.5. Enter Std. Dev. of Log ratio as 0.5. The upper pane should appear as below: Click Compute. This will calculate the sample size for this design and the output is shown as a row in the Output Preview. The computed sample size is 121 subjects (or pairs of observations). This design has default name Des 1. In the Output Preview toolbar select Des 1, click 7.3 Ratio of Paired Means – 7.3.1 Trial Design 111 <<< Contents * Index >>> 7 Normal Superiority One-Sample to save this design to Wbk1 in the Library. 7.3.2 Trial Design Using a t-test Right-click Des 1 in the Library and select Edit Design. In the input window, change the Test Stat. from Z to t. Click Compute. East will add an additional row to the Output Preview labeled as Des 2. Select the rows corresponding to Des 1 and Des 2 using the Ctrl key and click . This will display Des 1 and Des 2 in the Output Summary. Des 2 uses the t distribution and requires that we commit a combined total of 122 patients to the study, one more compared to Des 1, which uses a normal distribution. 112 7.3 Ratio of Paired Means <<< Contents * Index >>> 8 Normal Noninferiority Paired-Sample Two common applications of the paired sample design include: (1) comparison of two treatments where patients are matched on demographic and baseline characteristics, and (2) two observations made from the same patient under different experimental conditions. The type of endpoint for paired noninferiority design could be difference of means or ratio of means. The former is presented in Section 8.1 and the latter is discussed in Section 8.2. For paired sample noninferiority trials, East can be used only when no interim look is planned. 8.1 Mean of Paired Differences 8.1.1 Trial Design 8.1.2 Trial Design Using a t-Test (Single Look) 8.1.3 Simulation Consider a randomized clinical trial comparing an experimental treatment, T, to a control treatment, C, on the basis of outcome variable, X, with means µt and µc , 2 . Here, the null respectively, and with a standard deviation of paired difference as σD hypothesis H0 : µt − µc ≤ δ0 is tested against the one-sided alternative hypothesis H1 : µt − µc > δ0 . Here δ0 denotes the noninferiority margin and δ0 < 0. Let δ = µt − µc . Then the null hypothesis can be expressed as H0 : δ ≤ δ0 and the alternative can be expressed as H1 : δ > δ0 . Here we assume that the each paired observation on X from T and C are distributed according to a bivariate normal distribution with means as (µt , µc ) , variances as (σt2 , σc2 ) and correlation coefficient as ρ. Let us have N such paired observations from T and C and µ̂c and µ̂t denote the estimates of µc and µt based on these N pairs. Therefore, the estimate of the difference is δ̂ = µ̂t − µ̂c . Denoting the standard error of δ̂ by se(δ̂), the test statistic can be defined as Z= δ̂ − δ0 se(δ̂) (8.1) The test statistic Z is distributed as a t distribution with (N − 1) degrees of freedom. For large samples, the t-distribution can be approximated by the standard normal distribution. The power of the test is computed at specified values of µc , µt , and σD . East allows you to analyze using both normal and t distribution. The advantage of the paired sample noninferiority design compared to the two independent sample noninferiority design lies in the smaller se(δ̂) in former case. The paired sample design is more powerful than the two independent sample design: to achieve the same level of power, the paired sample design requires fewer subjects. 8.1.1 Trial Design Iezzi et. al. (2011) investigated the possibility of reducing radiation dose exposure 8.1 Mean of Paired Differences – 8.1.1 Trial Design 113 <<< Contents * Index >>> 8 Normal Noninferiority Paired-Sample while maintaining the image quality in a prospective, single center, intra-individual study. In this study, patients underwent two consecutive multidetector computed tomography angiography (MDCTA) scans 6 months apart, one with a standard acquisition protocol (C) and another using a low dose protocol (T). Image quality was rated as an ordinal number using a rating scale ranging from 1 to 5. Let µc and µt denote the average rating of image quality for standard acquisition and low dose protocol, respectively, and δ = µt − µc be the difference between two means. Based on the 30 samples included in the study, µc and µt were estimated as 3.67 and 3.12, respectively. The noninferiority margin for image quality considered was −1. Accordingly, we will design the study to test H0 : δ ≤ −1 against H1 : δ > −1 The standard deviation of paired difference was estimated as 0.683. We want to design a study with 90% power at µc = 3.67 and µt = 3.12 and that maintains overall one-sided type I error of 0.025. First, click Continuous: One Sample on the Design tab and then click Paired Design: Mean of Paired Differences as shown below. This will launch a new window. Select Noninferiority for Design Type, and Individual Means for Input Method. Specify the Mean Control (µc ) as 3.67, Mean Treatment (µt ) as 3.12, and the Std. Dev. of Paired Difference (σD ) as 0.683. Finally, enter −1 for the Noninferiority Margin (δ0 ). Leave all other entries with their 114 8.1 Mean of Paired Differences – 8.1.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 default values. The upper pane should appear as below: Click Compute. This will calculate the sample size for this design and the output is shown as a row in the Output Preview located in the lower pane of this window. The computed sample size (25 subjects) is highlighted. This design has default name Des 1. You can select this design by clicking anywhere along the row in the Output Preview. Select this design and click in the Output Preview toolbar. Some of the design details will be displayed in the upper 8.1 Mean of Paired Differences – 8.1.1 Trial Design 115 <<< Contents * Index >>> 8 Normal Noninferiority Paired-Sample pane, labeled as Output Summary. A total of 25 subjects must be enrolled in order to achieve the desired 90% power under the alternative hypothesis. In the Output Preview select Des 1 and click in the toolbar to save this design to Wbk1 in the Library. The noninferiority margin of −1 considered above is the minimal margin. Since the observed difference is only little less than -0.5 we would like to calculate sample size for a range of noninferiority margins, say, −0.6, −0.7, −0.8, −0.9 and −1. This can be done easily in East. First select Des 1 in the Library, and click on the Library toolbar. In the Input, change the Noninferiority Margin (δ0 ) −0.6 : −1 : −0.1. Click Compute to generate sample sizes for different noninferiority margins. This will add 5 new rows to the Output Preview. There will be a single row for each of the 116 8.1 Mean of Paired Differences – 8.1.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 noninferiority margins. The computed sample sizes are 1961, 218, 79, 41 and 25 with noninferiority margins −0.60, −0.7, −0.8, −0.9 and −1, respectively. To compare all 5 designs, select last 5 rows in Output Preview, and click Output Summary pane. . The 5 designs will be displayed in the Suppose we have decided to go with Des 3 to test the noninferiority hypothesis with noninferiority margin of −0.7. This requires a total sample size of 218 to achieve 90% in the toolbar to save this power. Select Des 3 in the Output Preview and click design to Wbk1 in the Library. Before we proceed we would like to delete all designs from the Output Preview. Select all rows and then either click in the toolbar, or click Delete after right click. To delete the designs from the workbook in Library select the corresponding designs individually (one at a time) and then click Delete after right click. You can try deleting Des 1 from the Library. Plotting With Des 3 selected in the Library, click on the Library toolbar, and then 8.1 Mean of Paired Differences – 8.1.1 Trial Design 117 <<< Contents * Index >>> 8 Normal Noninferiority Paired-Sample click Power vs Sample Size. The resulting power curve for this design will appear. You can move the vertical bar along the X axis. To find out power at any sample size, move the vertical bar to that sample size and the numerical value of sample size and power will be displayed on the right of the plot.You can export this chart in one of several image formats (e.g., Bitmap or JPEG) by clicking Save As.... Close this chart before continuing. In a similar fashion one can see power vs delta plot by clicking and then Power vs Treatment Effect. You can obtain the tables associated with these plot by clicking clicking the appropriate table. Close the plots before continuing. 8.1.2 , and then Trial Design Using a t-Test (Single Look) The sample size obtained to correctly power Des 3 relied on using a Wald-type statistic for the hypothesis test. Due to the assumption of a normal distribution for the test statistic, we have ignored the fact that the variance σ is estimated from the sample. For large sample sizes, this approximation is acceptable. However, in small samples with unknown standard deviation, the test statistic Z = (δ̂ − δ0 )/se(σ̂) is distributed as Student’s t distribution with (n − 1) degrees of freedom where n is the 118 8.1 Mean of Paired Differences – 8.1.2 Trial Design Using a t-Test (Single Look) <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 number of paired observations. Select Des 3 from the Library, and click . This will take you to the input window. Now change the Test Statistic from Z to t. The entries for the other fields need not be changed. Click Compute. East will add an additional row to the Output Preview. The required sample size is 220. This design uses the t distribution and it requires us to commit a combined total of 220 patients to the study, two more compared to Des 3 which uses the normal distribution. The extra couple of patients are needed to compensate for the extra variability due to estimation of the var[δ̂]. 8.1.3 Simulation Select Des 3 in the Library, and click in the toolbar. Alternatively, right-click on Des 3 and select Simulate. A new Simulation window will appear. Click on the Response Generation Info tab, and specify: Mean control = 3.67, Mean Treatment = 3.12, and Std. Deviation of Paired Difference (σD )= 0.683. Leave all default values, and click Simulate. Once the simulation run has completed, East will add an additional row to the Output Preview labeled as Sim 1. Select Sim 1 in the Output Preview and click . Double-click Sim 1 in the Library, and the simulation output details will be displayed in the right pane under the 8.1 Mean of Paired Differences – 8.1.3 Simulation 119 <<< Contents * Index >>> 8 Normal Noninferiority Paired-Sample Simulation tab. Notice that the percentage of rejections out of 10000 simulated trials is consistent with the design power of 90%. The exact result of the simulations may differ slightly, depending on the seed. Now we wish to simulate from a point that belongs to H0 to check whether the chosen design maintains type I error of 5%. Right-click Sim 1 in the Library and select Edit Simulation. Go to the Response Generation Info tab in the upper pane and specify: Mean control = 3.67, Mean Treatment = 2.97, and Std. Deviation of Paired Difference (σD ) = 0.683. Click Simulate. Once the simulation run has completed, East will add an additional row to the Output Preview labeled as Sim 2. Select Sim 2 in the Output Preview and click 120 . Now double-click on Sim 2 in the Library. The simulation output 8.1 Mean of Paired Differences – 8.1.3 Simulation <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 details will be displayed. The upper efficacy stopping boundary was crossed close to the specified type I error of 2.5%. The exact result of the simulations may differ slightly, depending on the seed. 8.2 Ratio of Paired Means Consider a randomized clinical trial comparing an experimental treatment, T, to a control treatment, C, on the basis of outcome variable, X, with means µt and µc , respectively, and let σt2 and σc2 denote the respective variances. The null hypothesis H0 : µt /µc ≤ ρ0 is tested against the one-sided alternative hypothesis H1 : µt /µc > ρ0 . Here, ρ0 denotes the noninferiority margin and ρ0 < 1. Let ρ = µt /µc . Then the null hypothesis can be expressed as H0 : ρ ≤ ρ0 and the alternative can be expressed as H1 : ρ > ρ0 . Let us have N such paired observations from T and C and (Xit , Xic ) denotes the ith pair of observations (i = 1, · · · , N ). Then log Xit − log Xic = log (Xit /Xic ) denotes the logarithm of ratio of means for ith subject. We assume that the paired log-transformed observations on X from T and C, (log Xit , log Xic ) are bivariate normally distributed with common parameters. In other words, (Xit , Xic ) is distributed as bivariate log-normal distribution. Denote log Xit by yit , log Xic by yic , and the corresponding difference by δyi = yit − yic . Assume that δ̂y denotes the sample mean for these paired differences with estimated standard error se(δ̂y ). The test statistic can be defined as Z= 8.2 Ratio of Paired Means δ̂y − log ρ0 se(δ̂y ) , (8.2) 121 <<< Contents * Index >>> 8 Normal Noninferiority Paired-Sample The test statistic Z is distributed as a t distribution with (N − 1) degrees of freedom. For large samples, the t-distribution can be approximated by the standard normal distribution. East allows you to analyze using both normal and t distribution. The power of the test is computed at specified values of µc , µt , and σ. 8.2.1 Trial Design We will use the same example cited in the previous section, but will transform the difference hypothesis into the ratio hypothesis. Let µc and µt denote the average rating of image quality for standard acquisition and low dose protocol, estimated as 3.67 and 3.12, respectively. Let ρ = µt /µc be the ratio between two means. Considering a noninferiority margin of −0.7 for the test of difference, we can rewrite the hypothesis mentioned in previous section as H0 : ρ ≤ 0.81 against H1 : ρ > 0.81 We are considering a noninferirority margin of 0.81(= ρ0 ). For illustration we will assume the standard deviation of log ratio as 0.20. As before, we want to design a study with 90% power at µc = 3.67 and µt = 3.12, and maintains overall one-sided type I error of 0.025. Start East afresh. Click Continuous: One Sample on the Design tab and then click Paired Design: Mean of Paired Ratios. This will launch a new window. The upper pane of this window displays several fields with default values. Select Noninferiority for Design Type, and Individual Means for Input Method. Specify the Mean Control (µc ) as 3.67, Mean Treatment (µt ) as 3.12, and Noninferiority margin (ρ0 ) as 0.81. Enter 0.20 for Std. Dev. of Log Ratio, and 0.025 for Type I Error (α). The upper pane now should appear as below: 122 8.2 Ratio of Paired Means – 8.2.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Click Compute. This will calculate the sample size for this design and the output is shown as a row in the Output Preview located in the lower pane of this window. The computed sample size (180 subjects) is highlighted in yellow. This design has default name Des 1. You can select this design by clicking anywhere in the along the row in the Output Preview. Select this design and click Output Preview toolbar. Some of the design details will be displayed in the upper pane, labeled as Output Summary. A total of 180 subjects must be enrolled in order to achieve the desired 90% power under the alternative hypothesis. In the Output Preview select Des 1 and click in the toolbar to save this design to Wbk1 in the Library. Suppose you think enrolling 180 subjects is too much for your organization and you can go up to only 130 subjects. You want to evaluate the power of your study at sample size 130 but with the design parameters remain unaltered. In order to compute power with 130 subjects, first select the Des 1 in the Library, and click on the Library toolbar. In the Input dialog box, first select the radiobutton for Power, and 8.2 Ratio of Paired Means – 8.2.1 Trial Design 123 <<< Contents * Index >>> 8 Normal Noninferiority Paired-Sample then enter 130 for Sample Size. Now click Compute. This will add another row labeled as Des 2 in Output Preview with computed power highlighted in yellow. The design attains a power of 78.7%. Now select both the rows in Output Preview by pressing the Ctrl key, and click in the Output Preview toolbar to see a summary of both designs in the Output Summary. In the Output Preview select Des 2 and click to Wbk1 in the Library. in the toolbar to save this design Plotting With Des 2 selected in the Library, click on the Library toolbar, and then click Power vs Sample Size . The resulting power curve for this design will appear. 124 8.2 Ratio of Paired Means – 8.2.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 You can move the vertical bar along the X axis. Suppose you would like to explore the relationship between power and standard deviation. In order to visualize this relationship, select Des 2 in the Library, click on the Library toolbar, and then click General (User Defined Plot). Select Std Dev 8.2 Ratio of Paired Means – 8.2.1 Trial Design 125 <<< Contents * Index >>> 8 Normal Noninferiority Paired-Sample of Log Ratio for X-Axis. This will display the power vs. standard deviation plot. Close the plot window before you continue. 8.2.2 Simulation Select Des 2 in the Library, and click in the toolbar. Alternatively, right-click on Des 2 and select Simulate. A new Simulation window will appear. Click on the Response Generation Info tab, and specify: Mean control = 3.67, Mean Treatment = 3.12, and Std Dev of Log Ratio= 0.2. Click Simulate. Once the simulation run has completed, East will add an additional row to the Output Preview labeled as Sim 1. 126 8.2 Ratio of Paired Means – 8.2.2 Simulation <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Select Sim 1 in the Output Preview and click . Now double-click on Sim 1 in the Library. The simulation output details will be displayed. 8.2 Ratio of Paired Means 127 <<< Contents * Index >>> 9 Normal Equivalence Paired-Sample Two common applications of the paired sample designs include: (1) comparison of two treatments where patients are matched on demographic and baseline characteristics, and (2) two observations made from the same patient under different experimental conditions. The type of endpoint for paired equivalence design may be a difference of means or ratio of means. The former is presented in Section 9.1 and the latter is discussed in Section 9.2. 9.1 Mean of Paired Differences 9.1.1 Trial Design 9.1.2 Simulation Consider a randomized clinical trial comparing an experimental treatment, T, to a control treatment, C, on the basis of a outcome variable, X, with means µt and µc , 2 . Here, the null respectively, and with a standard deviation of paired difference as σD hypothesis H0 : µt − µc < δL or µt − µc > δU is tested against the two-sided alternative hypothesis H1 : δL ≤ µt − µc ≤ δU . Here, δL and δU denote the equivalence limits. The two one-sided tests (TOST) procedure of Schuirmann (1987) is commonly used for this analysis. Let δ = µt − µc denotes the true difference in the means. The null hypothesis H0 : δ ≤ δL or δ ≥ δU is tested against the two-sided alternative hypothesis H1 : δL < δ < δU at level α, using TOST procedure. Here, we perform the following two tests together: Test1: H0L : δ ≤ δL against H1L : δ > δL at level α Test2: H0U : δ ≥ δU against H1U : δ < δU at level α H0 is rejected in favor of H1 at level α if and only if both H0L and H0U are rejected. Note that this is the same as rejecting H0 in favor of H1 at level α if the (1 − 2α) 100% confidence interval for δ is completely contained within the interval (δL , δU ). Here we assume that the each paired observation on X from T and C are bivariate normally distributed with parameters µt , µc , σt2 , σc2 and ρ. Let us have N such paired observations from T and C, and let µ̂c and µ̂t denote the estimates of µc and µt based on these N pairs. The estimate of the difference is δ̂ = µ̂t − µ̂c . Denoting the standard error of δ̂ by se(δ̂), test statistics for Test1 and Test2 are defined as: TL = (δ̂ − δL ) se(δ̂) and TU = (δ̂ − δU ) se(δ̂) TL and TU are assumed to follow Student’s t-distribution with (N − 1) degrees of freedom under H0L and H0U , respectively. H0L is rejected if TL ≥ t1−α,(N −1) , and H0U is rejected if TU ≤ tα,(N −1) . 128 9.1 Mean of Paired Differences <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 The null hypothesis H0 is rejected in favor of H1 if TL ≥ t1−α,(N −1) and TU ≤ tα,(N −1) , or in terms of confidence intervals: Reject H0 in favor of H1 at level α if δL + t1−α,(N −1) se(δ̂) < δ̂ < δU + tα,(N −1) se(δ̂) (9.1) We see that decision rule (9.1) is the same as rejecting H0 in favor of H1 if the (1 − 2 α) 100% confidence interval for δ is entirely contained within interval (δL , δU ). The power or sample size of such a trial design is determined for a specified value of δ, say δ1 , for a single-look study only. The choice δ1 = 0, i.e. µt = µc , is common. For a specified value of δ1 , the power is given by Pr(Reject H0 ) = 1 − τν (tα,ν |Ω1 ) + τν (−tα,ν |Ω2 ) (9.2) where ν = N − 1 and Ω1 and Ω2 are non-centrality parameters given by Ω1 = (δ1 − δL )/se(δ̂) and Ω2 = (δ1 − δU )/se(δ̂), respectively. tα,ν denotes the upper α × 100% percentile from a Student’s t distribution with ν degrees of freedom. τν (x|Ω) denotes the distribution function of a non-central t distribution with ν degrees of freedom and non-centrality parameter Ω, evaluated at x. Since the sample size N is not known ahead of time, we cannot characterize the bivariate t-distribution. Thus, solving for sample size must be performed iteratively by equating the formula (9.2) to the power 1 − β. The advantage of the paired sample equivalence design compared to the two sample equivalence design lies in the smaller se(δ̂) in former case. The paired sample equivalence design is more powerful than the two sample equivalence design: to achieve the same level of power, the paired sample equivalence design requires fewer subjects. 9.1.1 Trial Design To ensure that comparable results can be achieved between two laboratories or methods, it is important to conduct cross-validation or comparability studies to establish statistical equivalence between the two laboratories or methods. Often, to establish equivalence between two laboratories, a paired sample design is employed. Feng et al. (2006) reported the data on 12 quality control (QC) samples. Each sample was analyzed first by Lab1 and then by Lab2. In this example we will consider Lab1 as the standard laboratory (C) and Lab2 is the one to be validated (T). Denote the mean concentrations from Lab1 and Lab2 by µc and µt , respectively. Considering an equivalence limit of (−10, 10) we can state our hypotheses as: H0 : µt − µc < −10 or µt − µc > 10 against H1 : − 10 ≤ µt − µc ≤ 10 9.1 Mean of Paired Differences – 9.1.1 Trial Design 129 <<< Contents * Index >>> 9 Normal Equivalence Paired-Sample Based on the reported data µc and µt are estimated as 94.2 pg mL−1 and 89.9 pg mL−1 , repsectively. The standard deviation of paired difference was estimated as 8.18. We want to design a study with 90% power at µc = 94.2 and µt = 89.9. We want to reject H0 with type I error not exceeding 0.025. First, click Continuous: One Sample on the Design tab, and then click Paired Design: Mean of Paired Differences as shown below. This will launch a new window. Since we are interested in testing an equivalence hypothesis select Equivalence for Trial Type, with an Type I Error of 0.025, and Power of 0.9. Select Individual Means for Input Method. Enter −10 for Lower Equivalence Limit (δL ) and 10 for Upper Equivalence Limit (δU ). Specify the Mean Control (µc ) as 94.2, Mean Treatment (µt ) as 89.9, and Std. Dev. of Paired Difference (σD ) as 8.18. The upper pane should appear as below: 130 9.1 Mean of Paired Differences – 9.1.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Click Compute. This will calculate the sample size for this design and the output is shown as a row in the Output Preview located in the lower pane of this window. The computed sample size (20 samples) is highlighted in yellow. This design has default name Des 1 and you can select this design by clicking in the anywhere along the row in the Output Preview and then clicking Output Preview toolbar. Some of the design details will be displayed in the upper pane, labeled as Output Summary. A total of 20 samples is required to achieve the desired 90% power under the alternative hypothesis. In the Output Preview select Des 1 and click toolbar to save this design to Wbk1 in the Library. in the The equivalence limits of (−10, 10) might be too narrow and therefore a wider equivalence interval (−12.5, 12.5) could be considered. Select Des 1 in the Library, and click on the Library toolbar. In the Design Parameters tab, change the entry for Lower Equivalence Limit (δL ) and Upper Equivalence Limit (δU ) to −12.5 and 12.5, respectively, and click Compute. This will add a new row in the Output Preview labeled as Des 2. In the Output 9.1 Mean of Paired Differences – 9.1.1 Trial Design 131 <<< Contents * Index >>> 9 Normal Equivalence Paired-Sample Preview select Des 2 and click in the toolbar to save this design to Wbk1 in the Library. To compare the two designs, select both rows in Output Preview using the Ctrl key and click in the Output Preview toolbar. This will display the two designs side by side in the Output Summary pane. As we widen the equivalence limit from (−10, 10) to (−12.5, 12.5), the required sample size is reduced from 20 to 11. Plotting We would like to explore how power is related to the required sample size. Select Des 2 in the Library, click on the Library toolbar, and then click Power vs 132 9.1 Mean of Paired Differences – 9.1.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Sample Size. The resulting power curve for this design will appear. You can move the vertical bar along the X axis. To find out power at any sample size simply move the vertical bar to that sample size and the numerical value of sample size and power will be displayed on the right of the plot. You can export this chart in one of several image formats (e.g., Bitmap or JPEG) by clicking Save As.... Close this chart before continuing. In a similar fashion one can see power vs delta plot by clicking 9.1 Mean of Paired Differences – 9.1.1 Trial Design and then 133 <<< Contents * Index >>> 9 Normal Equivalence Paired-Sample Power vs Treatment Effect. To produce tables associated with these plots, first click select the appropriate table. 9.1.2 in the toolbar and then Simulation Now we wish to simulate from Des 2 to verify whether the study truly maintains the in the toolbar. power and type I error. Select Des 2 in the Library, and click Alternatively, right-click on Des 2 and select Simulate. Click on the Response Generation Info tab, and specify: Mean control = 94.2, Mean Treatment = 89.9, and Std. Dev. of Paired Difference (σD ) = 8.18. Click Simulate. Once the simulation run has completed, East will add an additional row to the Output Preview labeled as Sim 1. 134 9.1 Mean of Paired Differences – 9.1.2 Simulation <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Select Sim 1 in the Output Preview and click icon. Now double-click on Sim 1 in the Library. The simulation output details will be displayed. Notice that the simulated power is close to the attained power of 92.6% for Des 2. The exact result of the simulations may differ slightly, depending on the seed. Now we wish to simulate from a point that belongs to H0 to check whether the chosen design maintains type I error of 5% or not. For this we consider, µc = 94.2 and µt = 81.7. Since in this case δ = 81.7 − 94.2 = −12.5, this (µt , µc )=(81.7, 94.2) point belongs to H0 . Right-click on Sim 1 in the Library and select Edit Simulation. Go to the Response Generation Info tab in the upper pane and specify: Mean control = 94.2, Mean Treatment = 81.7, and Std. Dev. of Paired Difference (σD ) = 8.18. Click Simulate to start the simulation. Once the simulation run has completed, East will add an additional row to the Output Preview labeled as Sim 2. Select Sim 2 in the Output Preview and click icon. Now double-click on Sim 2 in the Library. 9.1 Mean of Paired Differences – 9.1.2 Simulation 135 <<< Contents * Index >>> 9 Normal Equivalence Paired-Sample The simulation output details will be displayed in the right pane under Simulation tab. Notice that the simulated power here is close to the pre-set type I error of 5%. The exact result of the simulations may differ slightly, depending on the seed. 9.2 Ratio of Paired Means Consider a randomized clinical trial comparing an experimental treatment, T, to a control treatment, C, on the basis of a outcome variable, X, with means µt and µc , respectively, and let σt2 and σc2 denote the respective variances. Here, the null hypothesis H0 : µt /µc ≤ ρL or µt /µc ≥ ρU is tested against the alternative hypothesis H1 : ρL < µt /µc < ρU . Let ρ = µt /µc denotes the ratio of two means. Then the null hypothesis can be expressed as H0 : ρ ≤ ρL or ρ ≥ ρU and the alternative can be expressed as H1 : ρL < ρ < ρU . In practice, ρL and ρU are often chosen such that ρL = 1/ρU . The two one-sided tests (TOST) procedure of Schuirmann (1987) is commonly used for this analysis, and is employed in this section for a parallel-group study. Let us have N such paired observation from T and C and (Xit , Xic ) denotes the ith pair of observations (i = 1, · · · , N ). Then log Xit − log Xic = log (Xit /Xic ) denotes the logarithm of ratio of means for the ith subject. Here we assume that the each paired log-transformed observations on X from T and C, (log Xit , log Xic ) are bivariate normally distributed with common parameters. In other words, (Xit , Xic ) is distributed as a bivariate log-normal distribution. Since we have translated the ratio hypothesis into a difference hypothesis using the log transformation, we can perform the test for difference as discussed in section 9.1. Note that we need the standard deviation of log of ratios. Sometimes, we are provided with information on coefficient of variation (CV) of ratios instead, and the standard 136 9.2 Ratio of Paired Means <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 deviation of log ratios can be obtained using: sd = q ln (1 + CV2 ). This is a test for the comparison of geometric means of ratio, as we are taking the mean of the logarithms of ratios. 9.2.1 Trial Design Here we will use the same example reported by Feng et al (2006). Denote the mean concentrations from Lab1 and Lab2 by µc and µt , and ρ = µt /µc is the ratio between two means. Considering an equivalence limit of (0.85, 1.15) we can state our hypotheses as H0 : µt /µc < 0.85 or µt /µc > 1.15 against H1 : 0.85 ≤ µt /µc ≤ 1.15 Based on the reported data, µc and µt are estimated as 94.2 pg mL−1 and 89.9 pg mL−1 , repsectively. Assume that the standard deviation of log ratio can be estimated is 0.086. As before, we want to design a study with 90% power at µc = 94.2 and µt = 89.9. We want to reject H0 with type I error not exceeding 0.025. Start East afresh. First, click Continuous: One Sample on the Design tab and then click Paired Design: Mean of Paired Ratios as shown below. This will launch a new window. Select Equivalence for Trial Type, and enter 0.025 for Type I Error, and 0.9 for Power. Then select Individual Means for Input Method, and enter the Mean Control (µc ) as 94.2, Mean Treatment (µt ) as 89.9, and Std. Dev. of Log Ratio as 0.086. Enter 0.85 for Lower Equiv. Limit (ρL ) and 1.15 for Upper Equiv. Limit 9.2 Ratio of Paired Means – 9.2.1 Trial Design 137 <<< Contents * Index >>> 9 Normal Equivalence Paired-Sample (ρU ). The upper pane should appear as below: Click Compute. This will calculate the sample size for this design and the output is shown as a row in the Output Preview located in the lower pane of this window. The computed sample size (8 samples) is highlighted in yellow. This design has default name Des 1. Select this design and click in the Output Preview toolbar. Some of the design details will be displayed in the upper pane, labeled as Output Summary. 138 9.2 Ratio of Paired Means – 9.2.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 In the Output Preview select Des 1 and click to Wbk1 in the Library. in the toolbar to save this design Plotting Suppose you want to see how the standard deviation influences the sample size. In order to visualize this relationship, select Des 1 in the Library, click on the Library toolbar, and then click General (User Defined Plot). Select Std Dev of Log Ratio for X-Axis in right of the plot. This will display the sample size vs. standard deviation plot. Close this plot before continuing. 9.2.2 Simulation Now we want to check by simulation whether the sample size of 8 provides at least 90% power. Select Des 1 in the Library, and click in the toolbar. Click on the Response Generation Info tab, and specify: Mean control = 94.2, Mean Treatment = 89.9, and Std Dev. of Log Ratio= 0.086. 9.2 Ratio of Paired Means – 9.2.2 Simulation 139 <<< Contents * Index >>> 9 Normal Equivalence Paired-Sample Click Simulate to start the simulation. Once the simulation run has completed, East will add an additional row to the Output Preview labeled as Sim 1. Notice that the simulated power is very close to the design power. 140 9.2 Ratio of Paired Means <<< Contents * Index >>> 10 Normal Superiority Two-Sample To demonstrate the superiority of a new treatment over the control, it is often necessary to randomize subjects to the control and treatment arms, and contrast the group-dependent means of the outcome variables. In this chapter, we show how East supports the design and interim monitoring of such experiments. 10.1 Difference of Means 10.1.1 Trial Design (Weight Control Trial of Orlistat) 10.1.2 IM of the Orlistat trial 10.1.3 t-Test Design Consider a randomized clinical trial comparing an experimental treatment, T, to a control treatment, C, on the basis of a normally distributed outcome variable, X, with means µt and µc , respectively, and with a common variance σ 2 . We intend to monitor the data up to K times after accruing n1 , n2 , . . . nK ≡ nmax patients. The information fraction at the jth look is given by tj = nj /nmax . Let r denote the fraction randomized to treatment T. Define the treatment difference to be δ = µt − µc . The null hypothesis of interest is H0 : δ = 0 . We wish to construct a K-look group sequential level α test of H0 having 1 − β power at the alternative hypothesis H1 : δ = δ1 . Let X̄t (tj ) and X̄c (tj ) be the mean responses of the experimental and control groups, respectively, at time tj . Then δ̂(tj ) = X̄t (tj ) − X̄c (tj ) (10.1) σ2 . nj r(1 − r) (10.2) and var[δ̂(tj )] = Therefore, by the Scharfstein, Tsiatis and Robins (1997), Jennison and Turnbull (1997) theorem the stochastic process W (tj ) = p X̄t (tj ) − X̄c (tj ) tj q , j = 1, 2, . . . K, 2 σ nj r(1−r) (10.3) √ is N (ηtj , tj ) with independent increments, where η = 0 under H0 and η = δ1 Imax under H1 . We refer to η as the drift parameter. 10.1 Difference of Means – 10.1.1 Trial Design (Weight Control Trial of Orlistat) 141 <<< Contents 10 * Index >>> Normal Superiority Two-Sample 10.1.1 Trial Design (Weight Control Trial of Orlistat) Eighteen U.S. research centers participated in this trial, where obese adults were randomized to receive either Orlistat or placebo, combined with a dietary intervention for a period of two years (Davidson et al, 1999). Orlistat is an inhibitor of fat absorption, and the trial was intended to study its effectiveness in promoting weight loss and reduce cardiovascular risk factors. The study began in October 1992. More than one outcome measure was of interest, but we shall consider only body weight changes between baseline and the end of the first year intervention. We shall consider a group sequential design even though the original study was not intended as such. The published report does not give details concerning the treatment effect of interest or the desired significance level and power of the test. It does say, however, that 75% of subjects had been randomized to the Orlistat arm, probably to maximize the number of subjects receiving the active treatment. Single-Look Design Suppose that the expected mean body weight change after one year of treatment was 9 kg in the Orlistat arm and 6 kg in the control arm. Assume also that the common standard deviation of the observations (weight change) was 8 kg. The standardized difference of interest would therefore be (9 − 6)/8 = 0.375. We shall consider a one sided test with 5% significance level and 90% power, and an allocation ratio (treatment:control) of 3:1; that is, 75% of the patients are randomized to the Treatment (Orlistat) arm. First, click Continuous: Two Samples on the Design tab, and then click Parallel Design: Difference of Means. In the upper pane of this window is the Input dialog box, which displays default input values. The effect size can be specified in one of three ways, selected from Input Method: (1) individual means and common standard deviation, (2) difference of means and common standard deviation, or (3) standardized difference of means. We will use the Individual Means method. Enter the appropriate design parameters so that the dialog box appears as shown. Remember to set the Allocation Ratio to 3. 142 10.1 Difference of Means – 10.1.1 Trial Design (Weight Control Trial of Orlistat) <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Then click Compute. The design is shown as a row in the Output Preview, located in the lower pane of this window. The computed sample size is 325 subjects. You can select this design by clicking anywhere along the row in the Output Preview. On the Output Preview toolbar, click to display a summary of the design to save details in the upper pane. Then, in the Output Preview toolbar, click this design to Wbk1 in the Library. If you hover the cursor over Des1 in the Library, a tooltip will appear that summarizes the input parameters of the design. on the Library toolbar, and then With Des1 selected in the Library, click click Power vs Treatment Effect (δ). The resulting power curve for this design is shown. You can save this chart to the Library by clicking Save in Workbook. You can also export the chart in one of several image formats (e.g., Bitmap or JPEG) by 10.1 Difference of Means – 10.1.1 Trial Design (Weight Control Trial of Orlistat) 143 <<< Contents 10 * Index >>> Normal Superiority Two-Sample clicking Save As.... For now, you may close the chart before continuing. Three-Look Design Create a new design by selecting Des1 in the Library, and on the Library toolbar, or by right-clicking and selecting Edit clicking Design. In the Input, change the Number of Looks from 1 to 3, to generate a study with two interim looks and a final analysis. A new tab for Boundary Info should appear. Click this tab to reveal the stopping boundary parameters. By default, the Spacing of Looks is set to Equal, which means that the interim analyses will be equally spaced in terms of the number of patients accrued between looks. The left side contains details for the Efficacy boundary, and the right side contains details for the Futility boundary. By default, there is an efficacy boundary (to reject H0) selected, but no futility boundary (to reject H1). The Boundary Family specified is of the Spending Functions type. The default Spending Function is the Lan-DeMets (Lan & DeMets, 1983), with Parameter OF (O’Brien-Fleming), which generates boundaries that are very similar, though not identical, to the classical stopping boundaries of O’Brien and Fleming (1979). 144 10.1 Difference of Means – 10.1.1 Trial Design (Weight Control Trial of Orlistat) <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 The cumulative alpha spent, and the boundary values, are displayed in the table below. Expected sample size and stopping probabilities Click Compute to generate output for Des2. Select both Des1 and Des2 in the Output Preview and click in yellow. . The maximum and expected sample sizes are highlighted The price to be paid for multiple looks is the commitment of a higher maximum sample size (331 patients) compared to that of a single-look design (325 patients). However, if the alternative hypothesis H1 holds, the study has a chance of stopping at one of the two interim analyses and saving patient accrual: on average, Des2 will stop 10.1 Difference of Means – 10.1.1 Trial Design (Weight Control Trial of Orlistat) 145 <<< Contents 10 * Index >>> Normal Superiority Two-Sample with 257 patients if the alternative is true. The expected sample size under the null is 329, less than the maximum since there is a small probability of stopping before the last look and, wrongly, rejecting the null. With Des2 selected in the Output Preview, click to save Des2 to the Library. In order to see the stopping probabilities, as well as other characteristics, double-click Des2 in the Library. The clear advantage of this sequential design resides in the high probability of stopping by the second look, if the alternative is true, with a sample size of 221 patients, which is well below the requirements for a fixed sample study (325 patients). Even under the null, however, there is a small chance for the test statistic to cross the boundary for its early rejection (type-1 error probability) at the first or second look. Close the Details window before continuing. Examining stopping boundaries and spending functions Plot the boundary values of Des2 by clicking on the Library toolbar, and then 146 10.1 Difference of Means – 10.1.1 Trial Design (Weight Control Trial of Orlistat) <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 selecting Stopping Boundaries. The following chart will appear: The three solid dots correspond to the actual boundary values to be used at the three planned analyses. Although the three looks are assumed to be equally spaced at design time, this assumption need not hold at analysis time. Values of the test-statistic (z-test) greater than the upper boundary values would warrant early stopping in favor of H1, that Orlistat is better than placebo. The horizontal axis expresses the total number of patients at each of the three analysis time-points. The study is designed so that the last analysis time point coincides with the maximum sample size required for the chosen design, namely 331 patients. By moving the vertical line cursor from left to right, one can observe the actual values of the stopping boundaries at each interim analysis time-point. The boundaries are rather conservative: for example, you would need the standardized test statistic to exceed 2.139 in order to stop the trial at the second look. It is sometimes convenient to display the stopping boundaries on the p-value scale. Under Boundary Scale, select the p-value Scale. The chart now displays the cumulative number of patients on the X-axis and the nominal p-value (1-sided) that we would need in order to stop the trial at that interim look. To change the scale of this chart, click Settings... and in the Chart Settings dialog box, change the Maximum to 10.1 Difference of Means – 10.1.1 Trial Design (Weight Control Trial of Orlistat) 147 <<< Contents 10 * Index >>> Normal Superiority Two-Sample 0.05, and the Divisions: Major to 0.01, and click OK. The following chart will be displayed. For example, at the second look, after 221 subjects have been observed, we require a p-value smaller than 0.016 in order to stop the study. Notice that the p-value at the 3rd and final look needs to be smaller than 0.045, rather than the usual 0.05 that one would require for a single-look study. This is the penalty we pay for the privilege of taking three looks at the data instead of one. You may like to display the boundaries in the delta scale. In this scale, the boundaries are expressed in units of the effect size, or the difference in means. We need to observe a difference in average weight loss of 2.658 148 10.1 Difference of Means – 10.1.1 Trial Design (Weight Control Trial of Orlistat) <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 kg or more, in order to cross the boundary at the second look. Close these charts, and click chart will appear. and then Error Spending. The following This spending function was proposed by Lan and DeMets (1983), and for one-sided tests has the following functional form: Zα/2 . (10.4) α(t) = 2 − 2Φ √ t Observe that very little of the total type-1 error is spent early on, but more is spent rapidly as the information fraction increases, and reaches 0.05 at an information fraction of 1. A recursive method for generating stopping boundaries from spending functions is described in the Appendix G. Close this chart before continuing. Lan and DeMets (1983) also provided a function for spending the type-1 error more aggressively. This spending function is denoted by PK, signifying that it is the Lan-DeMets spending function for generating stopping boundaries that closely resemble the classical Pocock (1977) stopping boundaries. It has the functional form: α(t) = α ln[1 + (e − 1)t] 10.1 Difference of Means – 10.1.1 Trial Design (Weight Control Trial of Orlistat) (10.5) 149 <<< Contents 10 * Index >>> Normal Superiority Two-Sample Select Des2 in the Library, and click on the Library toolbar. On the Boundary Info tab, change the Parameter from OF to PK, and click Compute. With Des3 selected in the Output Preview, click and Des3, by holding the Ctrl key, and then click the details of the two designs side-by-side: . In the Library, select both Des2 . The upper pane will display In the Output Summary toolbar, click to compare the two designs according to Stopping Boundaries. Notice that the stopping boundaries for Des3 (PK) are relatively flat; almost the same critical point is used at all looks to declare significance. 150 10.1 Difference of Means – 10.1.1 Trial Design (Weight Control Trial of Orlistat) <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Close the chart before continuing. Click and select Error Spending. Des3 (PK) spends the type-1 error 10.1 Difference of Means – 10.1.1 Trial Design (Weight Control Trial of Orlistat) 151 <<< Contents 10 * Index >>> Normal Superiority Two-Sample probability at a much faster rate than Des2 (OF). Close the chart before continuing. Wang and Tsiatis Power Boundaries The stopping boundaries generated by the Lan-Demets OF and PK functions closely resemble closely the classical O’Brien-Fleming and Pocock stopping boundaries, respectively. These classical boundaries are a special case of a family of power boundaries proposed by Wang and Tsiatis (1987). For a two-sided level-ψ test, using K equally spaced looks, the power boundaries for the standardized test statistic Zj at the j-th look are of the form C(∆, α, K) Zj ≥ (10.6) (j/K)0.5−∆ The normalizing constant C(∆, α, K) is evaluated by recursive integration so as to ensure that the K-look group sequential test has type-1 error equal to α (see Appendix G for details), and ∆ is a parameter characterizing the shape of the stopping boundary. For example, if ∆ = 0.5, the boundaries are constant at each of the K looks. These are the classical Pocock stopping boundaries (Pocock, 1977). If ∆ = 0, the width of the boundaries is inversely proportional to the square root of the information fraction j/K at the j-th look. These are the classical O’Brien-Fleming stopping boundaries (O’Brien and Fleming, 1979). Other choices produce boundaries of different shapes. Notice from equation (10.6) that power boundaries have a specific 152 10.1 Difference of Means – 10.1.1 Trial Design (Weight Control Trial of Orlistat) <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 functional form, and can be evaluated directly, or tabulated, once the normalizing constant C(∆, α, K) has been worked out for various combinations of α and K. In contrast, spending function boundaries are evaluated indirectly by inverting a pre-specified spending function as shown in Appendix F. Right-click Des3 in the Library and select Edit Design. On the Boundary Info tab, change the Boundary Family from Spending Functions to Wang-Tsiatis. Leave the default value of ∆ as 0, and click Compute. With Des4 selected in the Output Preview, click . In the Library, select both Des2 and Des4 by holding the Ctrl key. Click and select Stopping Boundaries. As expected from our discussion above, the boundary values for Des2 (Lan-Demets, OF) and for Des4 (Wang-Tsiatis, ∆ = 0) are very similar. Close the chart before continuing. More charts Select Des3 in the Library, click , and then click Power vs. Treatment effect (δ). Click the radiobutton for Standardized under X-Axis Scale. By scrolling from left to right with the vertical line cursor, one can observe the power for various 10.1 Difference of Means – 10.1.1 Trial Design (Weight Control Trial of Orlistat) 153 <<< Contents 10 * Index >>> Normal Superiority Two-Sample values of the effect size. Close this chart, and with Des3 selected, click again. Then click Expected Sample Size. Click the radio button for Standardized under X-Axis Scale. The following chart appears: 154 10.1 Difference of Means – 10.1.1 Trial Design (Weight Control Trial of Orlistat) <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 By scrolling from left to right with the vertical line cursor we can observe how the expected number of events decreases as the effect size increases. Close this chart before continuing. Unequally spaced analysis time points In the above designs, we have assumed that analyses were equally spaced. This assumption can be relaxed if you know when interim analyses are likely to be performed (e.g., for administrative reasons). In either case, departures from this assumption are allowed during the actual interim monitoring of the study, but sample size requirements will be more accurate if allowance is made for this knowledge. With Des3 selected in the Library, right-click Edit Design. Under Spacing of Looks in the Boundary Info tab, click the Unequal radio button. The column titled Info. Fraction can be edited to modify the relative spacing of the analyses. The information fraction refers to the proportion of the maximum (yet unknown) sample size. By default, this table displays equal spacing, but suppose that the two interim analyses will be performed with 0.25 and 0.5 of the maximum sample size. Click Recalc to recompute the cumulative alpha spent and the efficacy boundary values. After entering these new information fraction values, click Compute. Select Des5 in the Output Preview and click to save it in the Library for now. Arbitrary amounts of error probability to be spent at each analysis Another feature of East is the possibility to specify arbitrary amounts of cumulative error probability to be used at each look. This option can be combined with the option of unequal spacing of the analyses. With Des5 selected in the Library, click on the Library toolbar. Under the Boundary Info tab, select Interpolated for the Spending Function. In the column titled Cum. α Spent, enter 0.005 for the first look 10.1 Difference of Means – 10.1.1 Trial Design (Weight Control Trial of Orlistat) 155 <<< Contents 10 * Index >>> Normal Superiority Two-Sample and 0.03 for the second look, click Recalc, and then Compute. Select Des6 in the Output Preview and click and Des6 by holding the Ctrl key. Click The following chart will be displayed. . From the Library, select Des5 , and select Stopping Boundaries. The advantage of Des6 over Des5 is the more conservative boundary (less type-1 error probability spent) at the first look. Close these charts before continuing. Computing power for a given sample size East can compute the achieved power, given the other design parameters such as sample size. Select Des6 in the Library and right-click Edit Design. On the Design Parameters tab, click the radio button for Power. You will notice that the field for power will contain the word “Computed”. You may now enter a value for the sample 156 10.1 Difference of Means – 10.1.1 Trial Design (Weight Control Trial of Orlistat) <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 size: Enter 250, and click Compute. As expected, the achieved power is less than 0.9, namely 0.781. To delete this design, click Des7 in the Output Preview, and click in the toolbar. East will display a warning to make sure that you want to delete the selected row. Click Yes to continue. Spending function boundaries for early stopping in favor of H0 or H1 So far we have considered only efficacy boundaries, which allow for early stopping in favor of the alternative. It may be of interest, in addition, to consider futility boundaries, which allow for early stopping when there is lack of evidence against the null hypothesis. Select Des2 in the Library and click . On the Boundary Info tab, you can select from one of several types of futility boundaries, such as from a spending function, or by conditional power. Note that some of these options are available for one-sided tests only. Select Spending Functions under Boundary Family. Select PK for the Parameter, and leave all other default settings. See the updated values of the stopping boundaries populated in the table below. On the Boundary Info tab, you may also like to click the or icons to 10.1 Difference of Means – 10.1.1 Trial Design (Weight Control Trial of Orlistat) 157 <<< Contents 10 * Index >>> Normal Superiority Two-Sample view plots of the error spending functions, or stopping boundaries, respectively. Click Compute, and with Des7 selected in the Output Preview, click . To view the design details, double-click Des7 in the Library. Because not all the type-2 error is spent at the final look, this trial has a chance of ending early if the null 158 10.1 Difference of Means – 10.1.1 Trial Design (Weight Control Trial of Orlistat) <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 hypothesis is true. This is demonstrated by the low expected sample size under the null (209 patients), compared to those of the other designs considered so far. Close the Output window before continuing. Before continuing to the next section, we will save the current workbook, and open a new workbook. Select Wbk1 in the Library and right-click, then click Save. Next, click the button, click New, and then Workbook. A new workbook, Wbk2, should appear in the Library. Delete all designs from the Output Preview before continuing. Creating multiple designs To create more than one design from the Input, one simply enters multiple values in any of the highlighted input fields. Multiple values can be entered in two ways. First, one can enter a comma-separated list (e.g., “0.8, 0.9”). Second, one can use colon notation (e.g., “7:9:0.5”) to specify a range of values, where “a:b:c” is read as from ‘a’ to ‘b’ in step size ‘c’. Suppose that we wished to explore multiple variations of Des7. With Des7 selected in the Library, right-click and select Edit Design. In the Design Parameters tab of the Input, enter multiple values for the Power(1-β) (0.8, 0.9) and Std.Deviation(σ) (7 : 9 : 0.5) and click Compute: We have specified 10 designs here, from the combination of 2 distinct values of the power and 5 distinct values of the standard deviation. To view all 10 designs on the to maximize the Output Preview. The designs within the Output screen, click Preview can be sorted in ascending or descending order, according to one of the column variables. For example, if you click once on the column titled Sample Size, the designs will be sorted (from top to bottom) in ascending order of the total sample size. In addition, you may wish to filter and select designs that meet certain criteria. Click 10.1 Difference of Means – 10.1.1 Trial Design (Weight Control Trial of Orlistat) 159 <<< Contents 10 * Index >>> Normal Superiority Two-Sample on the Output Preview toolbar, and in the filter criterion box, select only those designs for which the maximum sample size is less than or equal to 400, as follows: From the remaining designs, select Des8 in the Output Preview, and click . You will be asked to nominate the workbook in which this design should be saved. Select Wbk2 and click OK. Accrual and dropout information More realistic assumptions regarding the patient accrual process – namely, accrual rate, response lag, and probability of dropout – can be incorporated into the design stage. First, the accrual of patients may be estimated to occur at some known rate. Second, because the primary outcome measure is change in body weight from baseline to end of first year, the response lag is known to be 1 year. Finally, due to the long-term nature of the study, it is estimated that a small proportion of patients is likely to drop out over the course of the study. 160 10.1 Difference of Means – 10.1.1 Trial Design (Weight Control Trial of Orlistat) <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 With Des8 selected in the Library, click . Click Include Options in the top right hand corner of the Input, and then click Accrual/Droput Info. A new tab should appear to the right of Design Parameters and Boundary Info. Click on this Accrual/Dropout tab, and enter the following information as shown below: The accrual rate is 100 patients per year, the response lag is 1 year, and the probability that a patients drops out before completing the study is 0.1. A plot of the predicted accruals and completers over time can be generated by clicking . Click Compute to generate the design. Select Des18 in the Output Preview, and click . Select Wbk2 and click OK. Double-click Des18 in the Library. The output details reveal that in order to ensure that data can be observed for 153 completers by the second look, one needs to have accrued 255 subjects. Close this Output window 10.1 Difference of Means – 10.1.1 Trial Design (Weight Control Trial of Orlistat) 161 <<< Contents 10 * Index >>> Normal Superiority Two-Sample before continuing. Select individual looks With Des8 selected in Wbk2, click . In the look details table of the Boundary Info tab, notice that there are ticked checkboxes under the columns Stop for Efficacy and Stop for Futility. East gives you the flexibility to remove one of the stopping boundaries at certain looks. For example, untick the checkbox in the first look under the Stop for Futility column, and click Recalc. Click 162 to view the new boundaries. Notice that the futility boundary does not 10.1 Difference of Means – 10.1.1 Trial Design (Weight Control Trial of Orlistat) <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 begin until the second look. Simulation of the Orlistat trial Suppose you now wish to simulate Des4 in Wbk1. Select Des4 in the Library, and click the from the Library toolbar. Alternatively, right-click on Des4 and select Simulate. A new Simulation worksheet will appear. Click on the Response Generation Info tab, and input the following values: Mean control = 6; Mean Treatment = 6; (Common) Std. Deviation = 8. In other words, we are simulating from a population in which there is no true difference between the control and treatment means. This simulation will allow us to check the type-1 eror rate when using Des4. Click Simulate to start the simulation. Once the simulation run has completed, East will add an additional row to the Output Preview labeled Sim1. 10.1 Difference of Means – 10.1.1 Trial Design (Weight Control Trial of Orlistat) 163 <<< Contents 10 * Index >>> Normal Superiority Two-Sample With Sim1 selected in the Output Preview, click , then double-click Sim1 in the Library. The simulation output details will be displayed in the upper pane. In the Overall Simulation Result table, notice that the percentage of times the upper efficacy stopping boundary was crossed is largely consistent with a type-1 error of 5%. The exact values of your simulations may differ, depending on your seed. Right-click Sim1 in the Library and click Edit Simulation. In the Response Generation Info tab, enter 9 for Mean Treatment. Leave all other values, and click Simulate. With Sim2 selected in the Output Preview, click , then double-click Sim2 in the Library. Notice that the percentage of times the efficacy stopping boundary was crossed is largely consistent with 90% power for the original design. Feel free to experiment further with other simulation options before continuing. 10.1.2 Interim monitoring of the Orlistat trial Suppose we decided to adopt Des2. Select Des2 in the Library, and click on the Library toolbar. Alternatively, right-click on Des2 and select Interim Monitoring. The interim monitoring dashboard contains various controls for monitoring the trial, and is divided into two sections. The top section contains several columns for displaying output values based on the interim inputs. The bottom section contains four charts, each with a corresponding table to its right. These charts provide graphical and numerical descriptions of the progress of the 164 10.1 Difference of Means – 10.1.2 IM of the Orlistat trial <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 clinical trial and are useful tools for decision making by a data monitoring committee. Making Entries in the Interim Monitoring Dashboard Although the study has been designed assuming three equally spaced analyses, departures from this strategy are permissible using the spending function methodology of Lan and DeMets (1983) and its extension to boundaries for early stopping in favor of H0 proposed by Pampallona, Tsiatis and Kim (2001). At each interim analysis time point, East will determine the amount of type-1 error probability and type-2 error probability that it is permitted to spend based on the chosen spending functions specified in the design. East will then re-compute the corresponding stopping boundaries. This strategy ensures that the overall type-1 error will not exceed the nominal significance level α. We shall also see how East proceeds so as to control the type-2 error probability. Open the Test Statistic Calculator by clicking on the Enter Interim Data button. Assume that we take the first look after 110 patients (Sample Size (Overall), with an Estimate of δ as 3, and Standard Error of Estimate of δ as 1.762. Click OK to 10.1 Difference of Means – 10.1.2 IM of the Orlistat trial 165 <<< Contents 10 * Index >>> Normal Superiority Two-Sample continue. East will update the charts and tables in the dashboard accordingly. For example the Stopping Boundaries Chart displays recomputed stopping boundaries and the path traced out by the test statistic. The Error Spending Function Chart displays the cumulative error spent at each interim look. The Conditional Power (CP) Chart shows the probability of crossing the upper stopping boundary, given the most recent information. Finally, the RCI (Repeated Confidence Interval) Chart displays repeated confidence intervals (Jennison & Turnbull, 2000). Repeat the input procedure from above with the second look after 221 patients (Sample Size (Overall), Estimate of δ as 2, and Standard Error of Estimate of δ as 1. Click Recalc and OK to continue. 166 10.1 Difference of Means – 10.1.2 IM of the Orlistat trial <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 For the final look, make sure to tick the box Set Current Look as Last. Input the following estimates: 331 patients (Sample Size (Overall), with an Estimate of δ as 3, and Standard Error of Estimate of δ as 1. Click Recalc and OK to continue. The upper boundary has been crossed. The dashboard will be updated, and the Final Inference table shows the final outputs. For example, the adjusted p-value is 0.017, consistent with the rejection of the null. 10.1.3 Trial Design Using a t-Test (Single Look) In Section 10.1.1 the sample size obtained to correctly power the trial relied on asymptotic approximation for the distribution of a Wald-type statistic. In the single look setting this statistic is δ̂ Z=q , (10.7) var[δ̂] with var[δ̂] = σ̂ 2 . nr(1 − r) (10.8) In a small single-look trial a more accurate representation of the distribution of Z is obtained by using Student’s t-distribution with (n − 1) degrees of freedom. Consider the Orlistat trial described in Section 10.1.1 where we would like to test the null hypothesis that treatment does not lead to weight loss, H0 : δ = 0, against the alternative hypothesis that the treatment does result in a loss of weight, H1 : δ > 0. We will now design this same trial in a different manner, using the t-distribution for the test statistic. Start East afresh. Click Continuous: Two Samples on the Design tab, and then click Parallel Design: Difference of Means. Enter the following design parameters so that the dialog box appears as shown. Remember to select a 1-Sided for Trial Type, and enter an Allocation Ratio of 3. These values are the same as those 10.1 Difference of Means – 10.1.3 t-Test Design 167 <<< Contents 10 * Index >>> Normal Superiority Two-Sample from Des1, except that under Dist. of Test Stat., select t. Then click Compute. We observe that the required sample size for this study is 327 patients. Contrast this to the 325 patients obtained using the normal distribution in Section 10.1.1. 168 10.1 Difference of Means – 10.1.3 t-Test Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 10.2 Ratio of Means for Independent Data (Superiority) Let σt and σc denote the standard deviations of the treatment and control group responses respectively. It is assumed that the coefficient of variation (CV), defined as σt = σc . the ratio of the standard deviation to the mean, is the same for both groups: µ µc t µt Finally let ρ = µc . For a Superiority trial, the null hypothesis H0 : ρ = ρ0 is tested against the two-sided alternative hypothesis H1 : ρ 6= ρ0 or a one-sided alternative hypothesis H1 : ρ < ρ0 or H1 : ρ > ρ0 . First, click Continuous: Two Samples on the Design tab, and then click Parallel Design: Ratio of Means. Suppose that we wish to determine the sample size required for a one sided test to achieve a type-1 error of .05, and power of 90%, to detect a ratio of means of 1.25. We also need to specify the CV = 0.25. Enter the appropriate design parameters so that the input dialog box appears as below, and click Compute. 10.2 Ratio of Means for Independent Data (Superiority) 169 <<< Contents 10 * Index >>> Normal Superiority Two-Sample The computed sample size (42 subjects) is highlighted in yellow. 170 10.2 Ratio of Means for Independent Data (Superiority) <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 10.3 Difference of Means for Crossover Data (Superiority) In a crossover trial, each experimental subject receives two or more different treatments. The order in which each subject receives the treatments depends on the particular design chosen for the trial. The simplest design is a 2×2 crossover trial, where each subject receives two treatments, say A and B. Half of the subjects receive A first and then, after a suitably chosen period of time, crossover to B. The other half receive B first and then crossover to A. The null and alternative hypotheses are the same as for a two sample test for difference of means for independent data. However, a key advantage of the crossover design is that each subject serves as his/her own control. The test statistic also needs to account for not only treatment effects, but period and carryover effects. We will demonstrate this design for a Superiority trial. First, click Continuous: Two Samples on the Design tab, and then click Crossover Design: Difference of Means. Suppose that we wish to determine the sample size required to achieve a type-1 error of .05, and power of 90%, to detect a difference of means of 75 with standard deviation of the difference of 150. Enter the appropriate design parameters so that the input 10.3 Difference of Means for Crossover Data (Superiority) 171 <<< Contents 10 * Index >>> Normal Superiority Two-Sample dialog box appears as below, and click Compute. The computed sample size (45 subjects) is highlighted in yellow. 172 10.3 Difference of Means for Crossover Data (Superiority) <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 10.4 Ratio of Means for Crossover Data (Superiority) We will demonstrate this design for a Superiority trial. The null hypothesis H0 : ρ = ρ0 is tested against the two-sided alternative hypothesis H1 : ρ 6= ρ0 or a one-sided alternative hypothesis H1 : ρ < ρ0 or H1 : ρ > ρ0 . First, click Continuous: Two Samples on the Design tab, and then click Crossover Design: Ratio of Means. Suppose that we wish to determine the sample size required for a one sided test to achieve a type-1 error of .05, and power of 80%, to detect a ratio of means of 1.25 with square root of MSE of 0.3. Enter the appropriate design parameters so that the input dialog box appears as below, and click Compute. The computed sample size (24 subjects) is highlighted in yellow. 10.4 Ratio of Means for Crossover Data (Superiority) 173 <<< Contents 10 10.5 * Index >>> Normal Superiority Two-Sample Assurance (Probability of Success) Assurance, or probability of success, is a Bayesian version of power, which corresponds to the (unconditional) probability that the trial will yield a statistically significant result. Specifically, it is the prior expectation of the power, averaged over a prior distribution for the unknown treatment effect (see O’Hagan et al., 2005). For a given design, East allows you to specify a prior distribution, for which the assurance or probability of success will be computed. Select Des2 in the Library, and click on the Library toolbar. Alternatively, recompute this design with the following inputs: A 3-look design with Lan-Demets(OF) efficacy only boundary, Superiority Trial, 1-sided, 0.05 type-1 error, 90% power, allocation ratio = 3, mean control = 6, mean treatment = 9, and standard deviation = 8. Select the Assurance checkbox in the Input window. Suppose that we wish to specify a Normal prior distribution for the treatment effect δ, with a mean of 3, and standard deviation of 2. Thus, rather than assuming δ = 3 with certainty, we use this prior distribution to reflect the uncertainty about the true treatment effect. In the Distribution list, click Normal, and in the Input Method list, click E(δ) and SD(δ). Type 3 in the E(δ) box, and type 2 in the SD(δ) box, and then click Compute. 174 10.5 Assurance (Probability of Success) <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 The computed probability of success (0.72) is shown below. Note that for this prior, assurance is less than the specified power (0.9); incorporating the uncertainty about δ has yielded a less optimistic estimate of power. In the Output Preview, right-click the row corresponding to this design, and rename the design ID as Bayes1, and save it to the Library. Return to the input window. Type 0.001 in the SD(δ) box, and click Compute. Such a prior approximates the non-Bayesian power calculation, where one specifies a fixed treatment effect. As shown below, such a prior yields a probability of success that is similar to the specified power. East also allows you to specify an arbitrary prior distribution through a CSV file. In the Distribution list, click User Specified, and then click Browse... to select the CSV file where you have constructed a prior. 10.5 Assurance (Probability of Success) 175 <<< Contents 10 * Index >>> Normal Superiority Two-Sample The CSV file should contain two columns, where the first column lists the grid points for the parameter of interest (in this case, δ), and the second column lists the prior probability assigned to each grid point. For example, we consider a 5-point prior with probability = 0.2 at each point. The prior probabilities can be entered as weights that do not sum to one, in which case East will re-normalize for you. Once the CSV filename and path has been specified, click Compute to calculate the assurance, which will be displayed in the box below: As stated in O’Hagan et al. (2005, p.190): “Assurance is a key input to decision-making during drug development and provides a reality check on other methods of trial design.” Indeed, it is not uncommon for assurance to be much lower than the specified power. The interested reader is encouraged to refer to O’Hagan et al. for further applications and discussions on this important concept. 10.6 176 Predictive Power and Bayesian Predictive Power Similar Bayesian ideas can be applied to conditional power for interim monitoring. Rather than calculating conditional power for a single assumed value of the treatment effect, δ, such as at δ̂, we may account for the uncertainty about δ by taking a weighted average of conditional powers, weighted by the posterior distribution for δ. For normal 10.6 Predictive Power and Bayesian Predictive Power <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 endpoints, East assumes a posterior distribution for δ that results from a diffuse prior distribution, which produces an average power called the predictive power (Lan, Hu, & Proschan, 2009). In addition, if the user specified a normal prior distribution at the design stage to calculate assurance, then East will also calculate the average power, called Bayesian predictive power, for the corresponding posterior. We will demonstrate these calculations for the design renamed as Bayes1 earlier. In the Library, right-click Bayes1 and click Interim Monitoring, then click (Show/Hide Columns) in the toolbar of the IM Dashboard. In the Show/Hide Columns window, make sure to show the columns for: CP (Conditional Power), Predictive Power, Bayes Predictive Power, Posterior Distribution of δ Mean, and Posterior Distribution of δ SD, and click OK. The following columns will be displayed in the main grid of the IM Dashboard. Assume that we observed interim data after 110 patients, with an estimate of δ = 1, and a standard error of the estimate = 0.7. Enter these values in the Test Statistic Calculator by clicking Enter Interim Data, and click OK. 10.6 Predictive Power and Bayesian Predictive Power 177 <<< Contents 10 * Index >>> Normal Superiority Two-Sample The IM Dashboard will be updated. In particular, notice the differing values for CP and the Bayesian measures of power. 178 10.6 Predictive Power and Bayesian Predictive Power <<< Contents * Index >>> 11 Nonparametric Superiority Two Sample The Wilcoxon-Mann-Whitney nonparametric test is a commonly used test for the comparison of two distributions when the observations cannot be assumed to come from normal distributions. It is used when the distributions differ only in a location parameter and is especially useful when the distributions are not symmetric. For Wilcoxon-Mann-Whitney test, East supports single look superiority designs only. 11.1 Wilcoxon-MannWhitney Test Let X1 , . . . , Xnt be the nt observations from the treatment (T ) with distribution function Ft and Y1 , . . . , Ync be the nc observations from the control (C) with distribution function Fc . Ft and Fc are assumed to be continuous with corresponding densities ft and fc , respectively. The primary objective in Wilcoxon-Mann-Whitney test is to investigate whether there is a shift of location, which indicates the presence of the treatment effect. Let θ represents the treatment effect. Then we test the null hypothesis H0 : θ = 0 against the two-sided alternative H1 : θ 6= 0 or a one-sided alternative hypothesis H1 : θ < 0 or H Let U denote the number of pairs P1 :ncθ > P0. nt (Xi , Yj ) such that Xi < Yj , so U = i=1 j=1 I(Xi , Yj ) where I(a, b) = 1 if a < b and I(a, b) = 0 if a ≥ b. Then U/nc nt is a consistent estimator of Z ∞ p = P (X < Y ) = Z Ft (y) fc (y) dy = −∞ 1 Ft [Fc−1 (u)] du. (11.1) 0 The power is approximated using the asymptotic normality of U and depends on the value of p, and thus depends on Fc and Ft . In order to find the power for a given sample size or to find the sample size for a given power, we must specify p. However, this is often a difficult task. If we are willing to specify Fc and Ft , then p can be computed. East computes p assuming that Fc and Ft are normal distributions with means µc and µt and a common standard deviation σ, by specifying the values of the difference in the means and the standard deviation. With this assumption, equation (11.1) results in µt − µc √ (11.2) p=Φ 2σ Using the results of Noether (1987), with nt = rN , the total sample size for an α level two-sided test to have power 1 − β for a specified value of p is approximated by N= (zα/2 + zβ )2 . 12r(1 − r)(p − .5)2 11.1 Wilcoxon-Mann-Whitney Test 179 <<< Contents 11 11.2 * Index >>> Nonparametric Superiority Two Sample Example: Designing a single look superiority study Based on a pilot study of an anti-seizure medication, we want to design a 12-month placebo-controlled study of a treatment for epilepsy in children. The primary efficacy variable is the percent change from baseline in the number of seizures in a 28-day period. The mean percent decrease was 2 for the control and 8 for the new treatment, with an estimated standard deviation of 25. We plan to design the study to test the null hypothesis H0 :θ = 0 against H1 :θ 6= 0. We want to design a study that would have 90% power at µc = 2 and µt = 8 under H1 and maintains type I error at 5%. 11.2.1 Designing the study Click Continuous: Two Samples on the Design tab and then click Parallel Design: Wilcoxon-Mann-Whitney. This will launch a new window. The upper pane of this window displays several fields with default values. Select 2-Sided for Test Type and enter 0.05 for Type I Error. Select Individual Means for Input Method and then specify Mean Control (µc ) as 2 and Mean Treatment (µt ) as 8. Specify Std. Deviation as 25. Click Compute. The upper pane now should appear as below: 180 11.2 Designing a single look study – 11.2.1 Designing the study <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 The required sample size for this design is shown as a row in the Output Preview, located in the lower pane of this window. The computed total sample size (772 subjects) is highlighted in yellow. This design has default name Des 1 and results in a total sample size of 772 subjects in order to achieve 90% power. The probability displayed in the row is 0.567, which indicates the approximate probability P [X < Y ] assuming X ∼ N (8, 252 ) and Y ∼ N (2, 252 ). This is in accordance with the equation 11.2. Select this design by clicking anywhere along the row in the Output Preview and click in the Output Preview toolbar. Some of the design details will be 11.2 Designing a single look study – 11.2.1 Designing the study 181 <<< Contents 11 * Index >>> Nonparametric Superiority Two Sample displayed in the upper pane, labeled as Output Summary. In the Output Preview toolbar, click to save this design to Wbk1 in the Library. Double-click Des 1 in the Library to see the details of the design. According to this summary, the study needs a total of 772 subjects. Of these 772 subjects, 386 will be allocated to the treatment group and remaining 386 will be allocated to the control group. Since the sample size is inversely proportional to (p − .5)2 , it is sensitive to mis-specification of p (see equation (11.1)). The results of the pilot study included several subjects who worsened over the baseline and thus the difference in the means might not be an appropriate approach to determining p. To obtain a more appropriate value of p, we have several alternative approaches. We can further examine the results 182 11.2 Designing a single look study – 11.2.1 Designing the study <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 of the pilot study after exclusion of some of the extreme values, which will decrease the standard deviation and provide a difference in the means, which may be a more reasonable measure of the difference between the distributions. The difference in the medians may be a more reasonable measure of the difference between the distributions, especially when used with a decreased standard deviation. The median percent decrease was 10 for the control and 18 for the new treatment, with an estimated standard deviation of 25. Create a new design by selecting Des 1 in the Library, and clicking on the Library toolbar. In the Input, change the Mean Control (µc ) and Mean Treatment (µt ) to 10 and 18, respectively. Click Compute to generate output for Des 2. To compare Des 1 and Des 2, select both rows in Output Preview using the Ctrl key, and click icon in the Output Preview toolbar. Both designs will be displayed in the Output Summary pane. The sample size required for Des 2 is only 438 subjects as compared to 772 subjects in Des 1. Now we consider decreasing the standard deviation to 20 to lessen the impact of the extreme values. Select Des 2 in the Output Preview, and click 11.2 Designing a single look study – 11.2.1 Designing the study icon in the 183 <<< Contents 11 * Index >>> Nonparametric Superiority Two Sample toolbar. In the Input, change the Std. Deviation to 20. Click Compute to generate output for this design. Select all the rows in Output Preview and click in the Output Preview toolbar to see them in the Output Summary pane. This design results in a total sample size of 283 subjects in order to attain 90% power. 184 11.2 Designing a single look study <<< Contents * Index >>> 12 Normal Non-inferiority Two-Sample In a noninferiority trial, the goal is to establish that an experimental treatment is no worse than the standard treatment, rather than attempting to establish that it is superior. A therapy that is demonstrated to be non-inferior to the current standard therapy for a particular indication might be an acceptable alternative if, for instance, it is easier to administer, cheaper, or less toxic. Non-inferiority trials are designed by specifying a non-inferiority margin. The amount by which the mean response on the experimental arm is worse than the mean response on the control arm must fall within this margin in order for the claim of non-inferiority to be sustained. In this chapter, we show how East supports the design and interim monitoring of such experiments, with a normal endpoint. 12.1 Difference of Means 12.1.1 12.1.2 12.1.3 12.1.4 12.1.5 Trial design Three-Look Design Simulation Interim Monitoring Trial Design Using a t-Test (Single Look) 12.1.1 Trial design Consider the design of an antihypertension study comparing an ACE inhibitor to a new AII inhibitor. Let µc be the mean value of a decrease in systolic blood pressure level (in mmHg) for patients in the ACE inhibitor (control) group and µt be the mean value of a decrease in blood pressure level for patients in the AII inhibitor (treatment) group. Let δ = µt − µc be the treatment difference. We want to demonstrate that the AII inhibitor is non-inferior to the ACE inhibitor. For this example, we will consider a non-inferiority margin equal to one-third of the mean response in control group. From historical data, µc = 9 mmHg and therefore the non-inferiority margin is 3 mmHg. Accordingly we will design the study to test the null hypothesis of inferiority H0 : δ ≥ −3, against the one sided non-inferiority alternative H1 : δ < −3. The test is to be conducted at a significance level (α) of 0.025 and is required to have 90% power at δ = 0. We assume that σ 2 , the variance of the patient response, is the same for both groups and is equal to 100. Start East afresh. Click Continuous: Two Samples on the Design tab and then click Parallel Design: Difference of Means. Single-look design In the input window, select Noninferiority for Design Type. The effect size can be specified in one of three ways by selecting different options for Input Method: (1) individual means and common standard deviation, (2) difference of means and common standard deviation, or (3) standardized difference of means. We will use the Individual Means method. Select Individual Means for Input Method, specify the Mean Control (µc ) as 9 and Noninferiority margin (δ0 ) as −3 and specify the 12.1 Difference of Means – 12.1.1 Trial design 185 <<< Contents 12 * Index >>> Normal Non-inferiority Two-Sample Std. Deviation (σ) as 10. Specify 0 for Difference in Means (δ1 ). The upper pane should appear as below: Click Compute. This will calculate the sample size for this design, and the output is shown as a row in the Output Preview located in the lower pane of this window. The computed sample size (467 subjects) is highlighted in yellow. This design has default name Des 1. Select this design by clicking anywhere along the row in the Output Preview and click . In the Output Preview toolbar, click to save this design to Wbk1 in the Library. If you hover the cursor over Des 1 in the Library, a tooltip will appear that summarizes the input parameters of the design. With Des 1 selected in the Library, click on the Library toolbar, and then click Power vs Treatment Effect (δ). The resulting power curve for this design will 186 12.1 Difference of Means – 12.1.1 Trial design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 appear. You can save this chart to the Library by clicking Save in Workbook. In addition, you can export the chart in one of several image formats (e.g., Bitmap or JPEG) by clicking Save As.... For now, you may close the chart before continuing. 12.1.2 Three-Look Design Create a new design by selecting Des 1 in the Library, and clicking on the Library toolbar. In the Input, change the Number of Looks from 1 to 3, to generate a study with two interim looks and a final analysis. A new tab for Boundary Info should appear. Click this tab to reveal the stopping boundary parameters. By default, the Spacing of Looks is set to Equal, which means that the interim analyses will be equally spaced in terms of the number of patients accrued between looks. The left side contains details for the Efficacy boundary, and the right side contains details for the Futility boundary. By default, there is an efficacy boundary (to reject H0 ) selected, but no futility boundary (to reject H1 ). The Boundary Family specified is of the Spending Functions type. The default Spending function is the Lan-DeMets (Lan & DeMets, 1983), with Parameter OF (O’Brien-Fleming), which generates boundaries that are very similar, though not identical, to the classical stopping 12.1 Difference of Means – 12.1.2 Three-Look Design 187 <<< Contents 12 * Index >>> Normal Non-inferiority Two-Sample boundaries of O’Brien and Fleming (1979). Click Compute to generate output for Des 2. Save this design in the current workbook by selecting the corresponding row in Output Preview and clicking . To compare Des 1 and Des 2, select both rows in the Output Preview using the Ctrl key and click 188 . Both designs will be displayed in the Output Summary. 12.1 Difference of Means – 12.1.2 Three-Look Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 The maximum sample size with Des 2 is 473, which is only a slight increase over the fixed sample size in Des 1. However, the expected sample size with Des 2 is 379 patients under H1 , a saving of almost 100 patients. In order to see the stopping probabilities, double-click Des 2 in the Library. The clear advantage of this sequential design resides in the high probability of stopping by the second look, if the alternative is true, with a sample size of 315 patients, which is well below the requirements for a fixed sample study (467 patients). Close the Output window before continuing. Examining stopping boundaries and spending functions You can plot the boundary values of Des 2 by clicking on the Library toolbar, and then clicking Stopping Boundaries. The following chart will appear: 12.1 Difference of Means – 12.1.2 Three-Look Design 189 <<< Contents 12 * Index >>> Normal Non-inferiority Two-Sample You can choose a different Boundary Scale from the corresponding drop down box. The available boundary scales include: Z scale, Score Scale, δ Scale, δ/σ Scale and p-value scale. To plot the error spending function for Des 2, select Des 2 in the in the toolbar, and then click Error Spending. The Library, click the following chart will appear: The above spending function is according to Lan and DeMets (1983) with O’Brien-Fleming flavor, and for one-sided tests has the following functional form: Zα/2 α(t) = 2 − 2Φ √ t Observe that very little of the total type-1 error is spent early on, but more is spent rapidly as the information fraction increases, and reaches 0.025 at an information fraction of 1. Feel free to explore other plots by clicking the icon in the Library toolbar. Close all charts before continuing. To obtain the tables used to generate these plots, click the icon. Select Des 2 in the Library, and click on the Library toolbar. In the Boundary Info tab, change the Boundary Family from Spending Functions to 190 12.1 Difference of Means – 12.1.2 Three-Look Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Wang-Tsiatis. The Wang-Tsiatis (1989) power boundaries are of the form c(tj ) = C(∆, α, K)t∆ j for j = 1, 2, · · · , K, where ∆ is a shape parameter that characterizes the boundary shape and C(∆, α, K) is a positive constant. The choice ∆ = 0 will yield the classic O’Brien-Fleming stopping boundary, whereas the ∆ = 0.5 will yield the classic Pocock stopping boundary. Other choices of parameters in the range -0.5 to 0.5 are also permitted. Accept the default parameter 0 and click Compute to obtain the sample size. A new row will be added to the Output Preview with design name as Des 3. Select all three rows in Output Preview using the Ctrl key and click designs will be displayed in the Output Summary. 12.1 Difference of Means – 12.1.2 Three-Look Design . All three 191 <<< Contents 12 * Index >>> Normal Non-inferiority Two-Sample Note that the total sample size and the expected sample size under H1 for Des 3 are close to those for Des 2. This is expected because the Wang-Tsiatis power family with shape parameter 0 yields the classic O’Brien-Fleming stopping boundaries. Save this design in the current workbook by selecting the corresponding row in Output Preview and clicking on the Output Preview toolbar. Select Des 2 in the Library, and click the on the Library toolbar. In the Boundary Info tab, change the Spending Function from Lan-DeMets to Rho Family. The Rho spending function was first published by Kim and DeMets (1987) and was generalized by Jennison and Turnbull (2000). It has following functional form: α(t) = αtρ ρ>0 When ρ = 1, the corresponding stopping boundaries resemble the Pocock stopping boundaries. When ρ = 3, the boundaries resemble the O’Brien-Fleming boundaries. Larger value of ρ yield increasingly conservative boundaries. Specify parameter (ρ) as 2, and click Compute A new row will be added to the Output Preview with design name as Des 4. Select all four rows in Output Preview using the Ctrl key and click 192 12.1 Difference of Means – 12.1.2 Three-Look Design . All the designs will <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 be displayed in the Output Summary. Observe that Des 4 requires a total sample size of 14 more subjects than Des 2. The expected sample size under H1 for Des 4 is only 351 patients, compared to 379 patients for Des 2 and 467 patients for Des 1. Save Des 4 to the Library by selecting the corresponding row in the Output Preview and clicking 12.1.3 . Simulation Select Des 4 in the Library, and click in the toolbar. Alternatively, right-click on Des 4 and select Simulate. A new window for simulation will appear. Click on the Response Generation Info tab, and specify: Mean control = 9; Mean Treatment = 9; SD Control = 10. Click Simulate. Once the simulation run has completed, East will add an additional 12.1 Difference of Means – 12.1.3 Simulation 193 <<< Contents 12 * Index >>> Normal Non-inferiority Two-Sample row to the Output Preview labeled as Sim 1. Select Sim 1 in the Output Preview and click . Double-click Sim 1 in the Library. The simulation output details will be displayed. The upper efficacy stopping boundary was crossed around 90% of times, out of 10,000 simulated trials, which is consistent with the power of 90%. The exact result of the simulations may differ slightly, depending on the seed. 12.1.4 Interim Monitoring Select Des 4 in the Library, and click 194 from the Library toolbar. Alternatively, 12.1 Difference of Means – 12.1.4 Interim Monitoring <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 right-click on Des 4 and select Interim Monitoring. The interim monitoring dashboard contains various controls for monitoring the trial, and is divided into two sections. The top section contains several columns for displaying output values based on the interim inputs. The bottom section contains four charts, each with a corresponding table to its right. These charts provide graphical and numerical descriptions of the progress of the clinical trial and are useful tools for decision making by a data monitoring committee. Although the study has been designed assuming three equally spaced analyses, departures from this strategy are permissible using the spending function methodology of Lan and DeMets (1983) and its extension to boundaries for early stopping in favor of H0 proposed by Pampallona, Tsiatis and Kim (2001). At each interim analysis time point, East will determine the amount of type-1 error probability and type-2 error probability that it is permitted to spend based on the chosen spending functions specified in the design. East will then re-compute the corresponding stopping boundaries. This strategy ensures that the overall type-1 error does not exceed the nominal significance level α. 12.1 Difference of Means – 12.1.4 Interim Monitoring 195 <<< Contents 12 * Index >>> Normal Non-inferiority Two-Sample Let us take the first look after accruing 200 subjects. The test statistic at look j for testing non-inferiority is given by Zj = δ̂j − δ0 SE(δ̂j ) where δ̂j and δ0 indicate estimated treatment difference and the non-inferiority margin, respectively. SE denotes the standard error. Suppose we have observed δ̂j = 2.3033 and SE(δ̂j ) = 2.12132. With δ0 = −3, the value of test statistic at first look would be Z1 = (2.3033 + 3)/2.12132 or 2.5. To pass these values to East, click Enter Interim Data to open the Test Statistic Calculator. Enter the following values: 200 for Cumulative Sample Size, 2.3033 as Estimate of δ and 2.12132 as Standard Error of Estimate of δ. Click Recalc, and thenOK. The value of test statistic is 2.498, which is very close to the stopping boundary 2.634. The lower bound of 97.5% repeated confidence interval (RCI) for δ is -3.29. 196 12.1 Difference of Means – 12.1.4 Interim Monitoring <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Click the dashboard. icon in the Conditional Power chart located in lower part of the The conditional power at the current effect size 2.303 is over 99.3%. Suppose we take the next interim look after accruing 350 subjects. Enter 350 for Cumulative Sample Size, 2.3033 for Estimate of δ and 1.71047 for Standard Error of Estimate of δ. Click Recalc and OK to update the charts and tables in the dashboard. 12.1 Difference of Means – 12.1.4 Interim Monitoring 197 <<< Contents 12 * Index >>> Normal Non-inferiority Two-Sample Now the stopping boundary is crossed and the following dialog box appears. Click Stop. The dashboard will now include the following table. The adjusted confidence interval and p-value are calculated according to the approach proposed by Tsiatis, Rosner and Mehta (1984) and later extension by Kim and DeMets (1987). The basic idea here is to search for the confidence bounds such that the p-value under the alternative hypothesis just becomes statistically significant. 12.1.5 Trial Design Using a t-Test (Single Look) In Section 12.1 the sample size is obtained based on asymptotic approximation of the distribution of the test statistics δ̂ − δ q 0 var[δ̂] If the study under consideration is small, the above asymptotic approximation of the distribution may be poor. Using the student’s t-distribution with (n − 1) degrees of freedom, we may better size the trial to have appropriate power to reject the H0 . In East, this can be done through specifying distribution of test statistic as t. We shall illustrate this by designing the study described in Section 12.1 that aims to demonstrate that the AII inhibitor is non-inferior to the ACE inhibitor. 198 12.1 Difference of Means – 12.1.5 Trial Design Using a t-Test (Single Look) <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Select Des 1 from the Library. Click from the toolbar. Change the Test Statistic from Z to t. The entries for the other fields need not be changed. Click Compute. East will add an additional row to the Output Preview labeled as Des 5. The required sample size is 469. Select the rows corresponding to Des 1 and Des 5 and . This will display both designs in the Output Summary. Des 5, which used the t distribution, requires us to commit a combined total of 469 patients to the study, up from 467 in Des 1, which used the normal distribution. The extra patients are needed to compensate for the extra variability due to estimation of the var[δ̂]. 12.2 Ratio of Means 12.2.1 Trial design 12.2.2 Designing the study 12.2.3 Simulation Let µt and µc denote the means of the observations from the experimental treatment (T) and the control treatment (C), respectively, and let σt2 and σc2 denote the corresponding variances of the observations. It is assumed that σt /µt = σc /µc , i.e. the coefficient of variation CV = σ/µ is the same for t and c. Finally, let ρ = µt /µc . For a non-inferiority trial with ratio of means we define the null hypothesis as H0 : ρ ≤ ρ0 if ρ0 < 1 H0 : ρ ≥ ρ0 if ρ0 > 1 where ρ0 denotes the noninferiority margin. Consider the case when ρ0 < 1. Now define δ = ln(ρ) = ln(µt ) − ln(µc ), so the null hypothesis becomes H0 : δ ≤ δ0 where δ0 = ln(ρ0 ). Since we can translate the ratio hypothesis into a difference hypothesis, we can perform the test for difference as discussed in section 12.1 on log-transformed data. 12.2 Ratio of Means 199 <<< Contents 12 * Index >>> Normal Non-inferiority Two-Sample Here, we need the standard deviation of log transformed data. If we are provided with the coefficient of variation (CV) instead,qthe standard deviation of log transformed data can be obtained using the relation sd = 12.2.1 ln (1 + CV2 ). Trial design For illustration, we consider the example cited by Laster and Johnson (2003): A randomized clinical study of a new anti-hypertensive therapy known to produce fewer side-effects than a standard therapy but expected to be almost 95% effective (ρ1 = 0.95). To accept the new therapy, clinicians want a high degree of assurance that it is at least 80% as effective in lowering blood pressure as the standard agent. Accordingly we plan to design the study to test: H0 : µt /µc ≤ 0.8 against H1 : µt /µc > 0.8 Reductions in seated diastolic blood pressure are expected to average 10 mmHg (= µc ) with standard therapy with standard deviation as 7.5 mmHg (= σc ). Therefore, CV in the standard therapy is 7.5/10 = 0.75. We also assume that CV in both therapies are equal. We need to design a study that would have 90% power at ρ1 = 0.95 under H1 and maintains one-sided type I error at 5%. 12.2.2 Designing the study Start East afresh. Click Continuous: Two Samples, under the Design tab, and then click Parallel Design: Ratio of Means. In the input window, select Noninferiority for Design Type. Select Individual Means for Input Method and then specify the Mean Control (µc ) as 10, Noninferiority Margin (ρ0 ) as 0.8 and Ratio of Means (ρ1 ) as 0.95. Specify 0.75 200 12.2 Ratio of Means – 12.2.2 Designing the study <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 value for Coeff. Var.. The upper pane should appear as below: Click Compute. This will calculate the sample size for this design, and the output is shown as a row in the Output Preview located in the lower pane of this window. The computed total sample size (636 subjects) is highlighted in yellow. This design has default name Des 1. Select this design by clicking anywhere along the row in the Output Preview and click . Some of the design details will be 12.2 Ratio of Means – 12.2.2 Designing the study 201 <<< Contents 12 * Index >>> Normal Non-inferiority Two-Sample displayed in the Output Summary. In the Output Preview toolbar, click to save this design to Wbk1 in the Library. Double-click on Des 1 in the Library to see the details of the design. 202 12.2 Ratio of Means – 12.2.2 Designing the study <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Unequal allocation ratio Since the profile of standard therapy is well established and comparatively little is known about the new therapy, you want to put more subjects on the new therapy. You can do this by specifying allocation ratio greater than 1. Suppose you want 50% more subjects on new therapy compared to standard one. Then we need to specify allocation ratio (nt /nc ) as 1.5. Create a new design by selecting Des 1 in the Output Preview, and clicking on the Output toolbar. In the Input, change the Allocation Ratio from 1 to 1.5. Click Compute to obtain sample size for this design. A new row will be added labeled as Des 2. Save this design in the current workbook by selecting the corresponding row in Output Preview and clicking on the Output Preview toolbar. Select both rows in Output Preview using the Ctrl key and click . t distribution test statistic Create a new design by selecting Des 2 in the Output, and clicking on the Output toolbar. In the Input, change the Test Statistic from Z to t. Click Compute to obtain sample size for this design. A new row will be added labeled as Des 3. 12.2 Ratio of Means – 12.2.2 Designing the study 203 <<< Contents 12 * Index >>> Normal Non-inferiority Two-Sample A sample size of 664 will be needed, which is very close to the sample size 662 obtained in Des 2 under the normal distribution. Plotting With Des 2 selected in the Library, click on the Library toolbar, and then click Power vs Sample Size . The resulting power curve for this design will appear. You can export this chart in one of several image formats (e.g., Bitmap or JPEG) by clicking Save As.... Feel free to explore other plots as well. Once you have finished, close all charts before continuing. 204 12.2 Ratio of Means – 12.2.2 Designing the study <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 12.2.3 Simulation Select Des 2 in the Library, and click in the toolbar. Alternatively, right-click on Des 2 and select Simulate. A new Simulation window will appear. Click on the Response Generation Info tab, and specify: Mean control = 10; Mean Treatment = 9.5; CV of Data Control = 0.75. Click Simulate. Once the simulation run has completed, East will add an additional row to the Output Preview labeled as Sim 1. Select Sim 1 in the Output Preview and click . Double-click on Sim 1 in the Library. The simulation output details will be displayed. Out of 10,000 simulations, close to 90% are rejected for non-inferiority. Therefore, the simulation result verifies that the design attains 90% power. The simulation result might vary depending on the starting seed value chosen. 12.2 Ratio of Means – 12.2.3 Simulation 205 <<< Contents 12 12.3 * Index >>> Normal Non-inferiority Two-Sample Difference of Means in Crossover Designs 12.3.1 Trial Design In a 2 × 2 crossover design each subject is randomized to one of two sequence groups. Subjects in sequence group 1 receive the test drug (T) formulation in a first period, have their outcome variable, X recorded, wait out a washout period to ensure that the drug is cleared from their system, then receive the control drug formulation (C) in period 2 and finally have the measurement on X again. In sequence group 2, the order in which the T and C are assigned is reversed. The table below summarizes this type of trial design. Group 1(TC) 2(CT) Period 1 Test Control Washout — — Period 2 Control Test The resulting data are commonly analyzed using a linear model. The response yijk in period j on subject k in sequence group i, where i = 1, 2, j = 1, 2, and k = 1, . . . , ni is modeled as a linear function of an overall mean response µ, formulation effect τt and τc , period effects π1 and π2 , and sequence effects γ1 and γ2 . The fixed effects model can be displayed as: Group 1(TC) 2(CT) Period 1 µ + τt + π1 + γ1 µ + τc + π1 + γ2 Washout — — Period 2 µ + τc + π2 + γ1 µ + τt + π2 + γ2 Let µt = µ + τt and µc = µ + τc denote the means of the observations from the test and control formulations, respectively, and let M SE denote the mean-squared error. In a noninferiority trial, we test H0 : δ ≤ δ0 against H0 : δ > δ0 if δ0 < 0 or H0 : δ ≥ δ0 against H0 : δ < δ0 if δ0 > 0, where δ0 indicates the noninferiority margin. East uses following test statistic to test the above null hypothesis TL = (ȳ11 − ȳ12 − ȳ21 + ȳ22 )/2 − δ0 q σ̂ 2 1 1 2 ( n1 + n2 ) where, ȳij is the mean of the observations from group i and period j and σ̂ 2 is the estimate of error variance. Tτ is distributed with Student’s t distribution with (n1 + n2 − 2) degrees of freedom. 206 12.3 Difference of Means in Crossover Designs – 12.3.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 12.3.1 Trial Design Consider a 2 × 2 crossover trial between a Test drug (T) and a Reference Drug (C) where the noninferiority need to be established in terms of some selected treatment response. Let µT and µc denote the mean of Test and Reference drugs, respectively. Let δ = µt − µc be the difference in averages. The noninferiority margin were set at -3. Accordingly we plan to design the study to test: H0 : µt − µc ≤ −3 against H1 : µt − µc > −3 For this study, we consider µc = 21.62 ng.h/mL and µt = 23.19 ng.h/mL under H1 . Further we assume mean squared error (MSE) would be 2.5. We want to design a study that would have 90% power at δ1 = 23.19 − 21.62 = 1.57 under H1 . We want to perform this test at a one sided 0.025 level of significance. Start East afresh. First, Continuous: Two Samples on the Design tab, and then click Crossover Designs: Difference of Means. In the input window, select Noninferiority for Design Type. Select Individual Means for Input Method and then specify the Mean Control (µc ) as 21.62 and Mean Treatment (µt ) as 23.19. Enter the Type I Error (α) as 0.025. Select Sqrt(MSE) from the drop-down list and enter as 2.5. Finally, enter Noninferiority Margin (δ0 ) as −3. The upper pane should appear as below: Click Compute. The sample size required for this design is highlighted in yellow. Save this design in the current workbook by selecting the corresponding row in 12.3 Difference of Means in Crossover Designs – 12.3.1 Trial Design 207 <<< Contents 12 * Index >>> Normal Non-inferiority Two-Sample Output Preview and clicking on the Output Preview toolbar. Double-lick Des 1 in Library. This will display the design details. The sample size required for Des 1 is only 9 to establish non-inferiority with 90% power. 12.4 Ratio of Means in Crossover Designs 12.4.1 Trial Design We consider the same anti-hypertensive therapy example discussed in section 12.2, but this time we will assume that the data has come from a crossover design. We wish to test the following hypotheses: H0 : µt /µc ≤ 0.8 against H1 : µt /µc > 0.8 We want the study to have at least 90% power at ρ1 = 0.95 and maintains one-sided type I error at 5%. As before, we will consider CV = 0.75 for both treatment arms. Start East afresh. First, click Continuous: Two Samples under the Design tab, and then click Crossover Design: Ratio of Means. In the input window, select Noninferiority for Design Type. Select Individual Means for Input Method and then specify the Noninferiority 208 12.4 Ratio of Means in Crossover Designs – 12.4.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Margin (ρ0 ) as 0.8, Mean Control (µc ) as 10, and Mean Treatment (µt ) as 9.5. Using the relationship between CV (=0.75) and standard deviation of log-transformed data mentioned in section 12.2, we have standard deviaton for log-transformed data as 0.45. Specify 0.45 for Sqrt. of MSE Log. The upper pane should appear as below: Click Compute. The sample size required for this design is highlighted in yellow in the Output Preview pane. Save this design in the current workbook by selecting the corresponding row in Output Preview and clicking toolbar. Select Des 1 in Library and click on the Output Preview . This will display the design details. 12.4 Ratio of Means in Crossover Designs – 12.4.1 Trial Design 209 <<< Contents 12 * Index >>> Normal Non-inferiority Two-Sample In general, a crossover design requires fewer subjects compared to its parallel design counterpart, and may be preferred whenever it is feasible. 210 12.4 Ratio of Means in Crossover Designs <<< Contents * Index >>> 13 Normal Equivalence Two-Sample In many cases, the goal of a clinical trial is neither superiority nor non-inferiority, but equivalence. In Section 13.1, the problem of establishing the equivalence with respect to the difference of the means of two normal distributions using a parallel-group design is presented. The corresponding problem of establishing equivalence with respect to the log ratio of means is presented in Section 13.2. For the crossover design, the problem of establishing equivalence with respect to the difference of the means is presented in Section 13.3, and with respect to the log ratio of means in Section 13.4. 13.1 Difference in Means 13.1.1 Trial design 13.1.2 Simulation In some experimental situations, we want to show that the means of two normal distributions are “close”. For example, a test formulation of a drug (T) and the control (or reference) formulation of the same drug (C) are considered to be bioequivalent if the rate and extent of absorption are similar. Let µt and µc denote the means of the observations from the test and reference formulations, respectively, and let σ 2 denote the common variance of the observations. The goal is to establish that δL < µt − µc < δU , where δL and δU are a-priori specified values used to define equivalence. The two one-sided tests (TOST) procedure of Schuirmann (1987) is commonly used for this analysis, and is employed in this section for a parallel-group study. Let δ = µt − µc denote the true difference in the means. The null hypothesis H0 : δ ≤ δL or δ ≥ δU is tested against the two-sided alternative hypothesis H1 : δL < δ < δU at level α, using TOST. Here we perform the following two tests together: Test1: H0L : δ ≤ δL against H1L : δ > δL at level α Test2: H0U : δ ≥ δU against H1U : δ < δU at level α H0 is rejected in favor of H1 at level α if and only if both H0L and H0U are rejected. Note that this is the same as rejecting H0 in favor of H1 at level α if the (1 − 2α) 100% confidence interval for δ is completely contained within the interval (δL , δU ). Let N be the total sample size and µ̂t and µ̂c denote the estimates of the means T and C, respectively. Let δ̂ = µ̂t − µ̂c denote the estimated difference with standard error se(δ̂) We use the following two test statistics to apply Test1 and Test2, respectively: TL = 13.1 Difference in Means (δ̂ − δL ) se(δ̂) , TU = (δ̂ − δU ) se(δ̂) 211 <<< Contents 13 * Index >>> Normal Equivalence Two-Sample TL and TU are assumed to follow Student’s t-distribution with (N − 2) degrees of freedom under H0L and H0U , respectively. H0L is rejected if TL ≥ t1−α,(N −2) , and H0U is rejected if TU ≤ tα,(N −2) . The null hypothesis H0 is rejected in favor of H1 if TL ≥ t1−α,(N −2) and TU ≤ tα,(N −2) , or in terms of confidence intervals: Reject H0 in favor of H1 at level α if √ √ (13.1) δL + t1−α,(N −2) 2σ̂/ N < δ̂ < δU + tα,(N −2) 2σ̂/ N . We see that decision rule (13.1) is the same as rejecting H0 in favor of H1 if the (1 − 2 α) 100% confidence interval for δ is entirely contained within interval (δL , δU ). √ The above inequality (13.1) cannot hold if 4t1−α,(N −2) σ̂/ N ≥ (δU − δL ), in which √ case H0 is not rejected in favor of H1 . Thus, we assume that 4t1−α,(N −2) σ̂/ N < (δU − δL ). The impact of this assumption was examined by Bristol (1993a). The power or sample size of such a trial design is determined for a specified value of δ, denoted δ1 , for a single-look study only. The choice δ1 = 0, i.e. µt = µc , is common. For a specified value of δ1 , the power is given by Pr(Reject H0 ) = 1 − τν (tα,ν |∆1 ) + τν (−tα,ν |∆2 ) (13.2) where ν = N − 2 and ∆1 and ∆2 are non-centrality parameters given by ∆1 = (δ1 − δL )/se(δ̂) and ∆2 = (δ1 − δU )/se(δ̂), respectively. tα,ν denotes the upper α × 100% percentile from a Student’s t distribution with ν degrees of freedom. τν (x|∆) denotes the distribution function of a non-central t distribution with ν degrees of freedom and non-centrality parameter ∆, evaluated at x. Since we don’t know the sample size N ahead of time, we cannot characterize the bivariate t-distribution. Thus solving for sample size must be performed iteratively by equating the formula (13.2) to the power 1 − β. 13.1.1 Trial design Consider the situation where we need to establish equivalence between a test formulation of capsules (T) with the marketed capsules (C). The response variable is the change from baseline in total symptom score. Based on the studies conducted during the development program, it is assumed that µc = 6.5. Based on this value, equivalence limits were set as −δL = δU = 1.3(= 20%µc ). We assume that the common standard deviation is σ = 2.2. We want to have 90% power at µt = µc . 212 13.1 Difference in Means – 13.1.1 Trial design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Start East afresh. Click Continuous: Two Samples on the Design tab and then click Parallel Design: Difference of Means. This will launch a new window. The upper pane of this window displays several fields with default values. Select Equivalence for Design Type, and Individual Means for Input Method. Enter 0.05 for Type I Error. Specify both Mean Control (µc ) and Mean Treatment (µt ) as 6.5. We have assumed σ = 2.2. Enter this value for Std. Deviation(σ). Also enter −1.3 for Lower Equivalence Limit (δL ) and 1.3 for Upper Equivalence Limit (δU ). The upper pane should appear as below: Click Compute. The output is shown as a row in the Output Preview located in the lower pane of this window. The computed sample size (126 subjects) is highlighted in yellow. This design has default name Des 1. Select this design and click in the Output Preview toolbar. Some of the design details will be displayed in the upper pane, 13.1 Difference in Means – 13.1.1 Trial design 213 <<< Contents 13 * Index >>> Normal Equivalence Two-Sample labeled as Output Summary. A total of 126 subjects must be enrolled in order to achieve the desired 90% power under the alternative hypothesis. Of these 126 subjects 63 will be randomized to the test formulation, and the remaining 63 to the marketed formulation. In the Output Preview toolbar, select Des 1 and click Library. to save this design to Wbk1 in the Suppose that this sample size is not economically feasible and we want to examine power for a total sample size of 100. Select Des 1 in the Library, and click on the Library toolbar. In the Input, click the radiobutton for Power, and enter Sample Size (n) as 100. 214 13.1 Difference in Means – 13.1.1 Trial design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Click Compute. This will add a new row to the Output Preview and the calculated power is highlighted in yellow. We see that a power of 80.3% can be achieved with 100 subjects. Suppose we want to see how the design parameters such as power, sample size and treatment effect are interrelated. To visualize any particular relationship for Des 1, first select Des 1 from Library and then click in the toolbar. You will see a list of options available. To plot power against sample size, click Power vs Sample Size. Feel free to explore other plots and options available with them. Close the charts before continuing. 13.1.2 Simulation We wish to make sure that Design 1 has the desired power of 90%, and maintains the type I error of 5%. This examination can be conducted using simulation. Select Des 1 in the Library, and click in the toolbar. Alternatively, right-click Des 1 and 13.1 Difference in Means – 13.1.2 Simulation 215 <<< Contents 13 * Index >>> Normal Equivalence Two-Sample select Simulate. A new Simulation window will appear. Click on the Response Generation Info tab. We will first simulate under H1 . Leave the default values as below, and click Simulate. Once the simulation run has completed, East will add an additional row to the Output Preview labeled as Sim 1. Select Sim 1 in the Output Preview and click . Now double-click on Sim 1 in the Library. The simulation output details, including the table below, will be displayed. Observe that out of the 10,000 simulated trials, the null hypothesis was around 90% of the time. (Note: The numbers on your screen might differ slightly because you might be using a different starting seed for your simulations.) Next we will simulate from a point that belongs to the null hypothesis. Consider µc = 6.5 and µt = 7.8. Select Sim 1 in Library and click icon. Go to the Response Generation Info tab in the upper pane and specify: Mean Control (µc ) = 6.5 and Mean Treatment (µt ) = 7.8. 216 13.1 Difference in Means – 13.1.2 Simulation <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Click Simulate to start the simulation. Once the simulation run has completed, East will add an additional row to the Output Preview labeled as Sim 2. Select Sim 2 in the Output Preview and click . Now double-click on Sim 2 in the Library. You can see that when H0 is true, the simulated power is close to the specified type I error of 5%. 13.2 Ratio of Means 13.2.1 Trial design 13.2.2 Simulation For some pharmacokinetic parameters, the ratio of the means is a more appropriate measure of the distance between the treatments. Let µt and µc denote the means of the observations from the test formulation (T) and the reference (C), respectively, and let σt2 and σc2 denote the corresponding variances of the observations. It is assumed that σt /µt = σc /µc , i.e. the coefficient of variation CV = σ/µ is the same for T and C. Finally, let ρ = µt /µc . The goal is to establish that ρL < ρ < ρU , where ρL and ρU are specified values used to define equivalence. In practice, ρL and ρU are often chosen such that ρL = 1/ρU . The two one-sided tests procedure of Schuirmann (1987) is commonly used for this analysis, and is employed here for a parallel-group study. The null hypothesis H0 : ρ ≤ ρL or ρ ≥ ρU is tested against the two-sided alternative hypothesis H1 : ρL < ρ < ρU at level α, using two one-sided tests. Schuirmann (1987) proposed working this problem on the natural logarithm scale. Thus, we are interested in the parameter δ = ln(ρ) = ln(µt ) − ln(µc ) and the null hypothesis H0 : δ ≤ δL or δ ≥ δU is tested against the two-sided alternative hypothesis H1 : δL < δ < δU at level α, using two one-sided t-tests. Here δL = ln(ρL ) and δU = ln(ρU ). Since we have translated the ratio hypothesis into a difference hypothesis, we can perform the test for difference as discussed in section 13.1. Note that we need the standard deviation for log transformed data. However, if we are provided with information on CV instead, the standard deviation of log transformed data can be q obtained using the relation sd = 13.2.1 ln (1 + CV2 ). Trial design Suppose that the logarithm of area under the curve (AUC), a pharmacokinetic parameter related to the efficacy of a drug, is to be analyzed to compare the two formulations of a drug. We want to show that the two formulations are bioequivalent by showing that the ratio of the means satisfies 0.8 < µt /µc < 1.25. Thus ρL = 0.8 and ρU = 1.25. Also, based on previous studies, it is assumed that the coefficient of variation is CV = 0.25. 13.2 Ratio of Means – 13.2.1 Trial design 217 <<< Contents 13 * Index >>> Normal Equivalence Two-Sample Start East afresh. Click Continuous: Two Samples on the Design tab and then click Parallel Design: Ratio of Means. This will launch a new window. The upper pane of this window displays several fields with default values. Select Equivalence for Trial Type, and enter 0.05 for the Type I Error. For the Input Method, specify Ratio of Means. Enter 1 for Ratio of Means (ρ1 ), 0.8 for Lower Equivalence Limit (ρL ) and 1.25 for Upper Equivalence Limit (ρU ). Specify 0.25 for Coeff. Var.. The upper pane should appear as below: Click Compute. The output is shown as a row in the Output Preview located in the lower pane of this window. The computed total sample size (55 subjects) is highlighted in yellow. In the Output Preview toolbar, click to save this design to Wbk1 in the Library . Double-click Des 1 in the Library to see the details of the designs. Close 218 13.2 Ratio of Means – 13.2.1 Trial design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 this output window before continuing. Plotting With Des 1 selected in the Library, click on the Library toolbar, and then click Power vs Sample Size. The resulting power curve for this design will appear. 13.2 Ratio of Means – 13.2.1 Trial design 219 <<< Contents 13 * Index >>> Normal Equivalence Two-Sample You can export this chart in one of several image formats (e.g., Bitmap or JPEG) by clicking Save As.... Feel free to explore charts. Close all chart before continuing. 13.2.2 Simulation Suppose you suspect that CV will be smaller than 0.25; e.g., 0.2. Select Des 1 in the Library, and click in the toolbar. Click on the Response Generation Info tab and change C.V. of Data Control to 0.20. Click Simulate. Once the simulation run has completed, East will add an additional row to the Output Preview labeled as Sim 1. Select Sim 1 in the Output Preview and click . Now double-click on Sim 1 in the Library. The simulation output details will be displayed in the upper pane. Observe that out of 10,000 simulated trials, the null hypothesis was rejected over 98% of the time. (Note: The numbers on your screen might differ slightly depending on the starting seed.) 220 13.2 Ratio of Means – 13.2.2 Simulation <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 13.3 Difference of Means in Crossover Designs 13.3.1 Trial design 13.3.2 Simulation Crossover trials are widely used in clinical and medical research. The crossover design is often preferred over a parallel design, because in the former, each subject receives all the treatments and thus each subject acts as their own control. This leads to the requirement of fewer subjects in a crossover design. In this chapter, we show how East supports the design and simulation of such experiments with endpoint as difference of means. In a 2 × 2 crossover design each subject is randomized to one of two sequence groups (or, treatment sequences). Subjects in sequence group 1 receive the test drug (T) formulation in a first period, have their outcome variable, X recorded, wait out a washout period to ensure that the drug is cleared from their system, then receive the control drug formulation (C) in period 2 and finally have the measurement on X again. In sequence group 2, the order in which the T and C are assigned is reversed. The table below summarizes this type of trial design. Group 1(TC) 2(CT) Period 1 Test Control Washout — — Period 2 Control Test The resulting data are commonly analyzed using a linear model. The response yijk on the kth subject in period j of sequence group i, where i = 1, 2, j = 1, 2, and k = 1, . . . , ni is modeled as a linear function of an overall mean response µ, formulation effect τt and τc , period effects π1 and π2 , and sequence effects γ1 and γ2 . The fixed effects model can be displayed as: Group 1(TC) 2(CT) Period 1 µ + τt + π1 + γ1 µ + τc + π1 + γ2 Washout — — Period 2 µ + τc + π2 + γ1 µ + τt + π2 + γ2 Let µt = µ + τt and µc = µ + τc denote the means of the observations from the test and control formulations, respectively, and let M SE denote the mean-squared error of the log data obtained from fitting the model. This is nothing other than the M SE from a crossover ANOVA model for the 2 × 2 design (2 periods and 2 sequences). 13.3 Difference of Means in Crossover Designs 221 <<< Contents 13 * Index >>> Normal Equivalence Two-Sample In an equivalence trial, the goal is to establish δL < µt − µc < δU , where δL and δU are specified values used to define equivalence. In practice, δL and δU are often chosen such that δL = −δU The two one-sided tests (TOST) procedure of Schuirmann (1987) is commonly used for this analysis, and is employed here for a crossover study. Let δ = µt − µc denotes the true difference in the means. The null hypothesis H0 : δ ≤ δL or δ ≥ δU is tested against the two-sided alternative hypothesis H1 : δL < δ < δU at level α, using TOST. Here we perform the following two tests together: Test1: H0L : δ ≤ δL against H1L : δ > δL at level α Test2: H0U : δ ≥ δU against H1U : δ < δU at level α H0 is rejected in favor of H1 at level α if and only if both H0L and H0U are rejected. Note that this is the same as rejecting H0 in favor of H1 at level α if the (1 − 2α) 100% confidence interval for δ is completely contained within the interval (δL , δU ). East uses following test statistic to test the above two null hypotheses TL = (ȳ11 − ȳ12 − ȳ21 + ȳ22 )/2 − δL q M SE 1 1 2 ( n1 + n2 ) TU = (ȳ11 − ȳ12 − ȳ21 + ȳ22 )/2 − δU q M SE 1 1 2 ( n1 + n2 ) and where, ȳij is the mean of the observations from group i and period j. Both TL and TU are distributed as Student’s t distribution with (n1 + n2 − 2) degrees of freedom. The power of the test (i.e. probability of declaring equivalence) depends on the true value of µt − µc . The sample size (or power) is determined at a specified value of this difference, denoted δ1 . The choice δ1 = 0, i.e. µt = µc , is √ common. Note that the power and the sample size depend only on δL , δU , δ1 , and M SE. 13.3.1 Trial design Standard 2 × 2 crossover designs are often recommended in regulatory guidelines to establish bioequivalence of a generic drug with off patent brand-name drug. Consider a 2 × 2 bioequivalence trial between a Test drug (T) and a Reference Drug (C) where equivalence needs to be established in terms of the pharmacokinetic parameter Area Under the Curve (AUC). Let µT and µc denote the average AUC for Test and 222 13.3 Difference of Means in Crossover Designs – 13.3.1 Trial design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Reference drugs, respectively. Let δ = µt − µc be the difference. To establish average bioequivalence, the calculated 90% confidence interval of δ should fall within a pre-specified bioequivalence limit. The bioequivalence limits are set at -3 and 3. Accordingly we plan to design the study to test: H0 : µt − µc ≤ −3 or µt − µc ≥ 3 against H1 : −3 < µt − µc < 3 From this study, we consider µc = 21.62 ng.h/mL and µt = 23.19 ng.h/mL under H1 . Further, we assume that the mean squared error (MSE) from ANOVA would be 2.5. We wish to design a study that would have 90% power at δ1 = 23.19 − 21.62 = 1.57 under H1 . Start East afresh. Click Continuous: Two Samples on the Design tab and then click Crossover Design: Difference of Means. This will launch a new window. The upper pane displays several fields with default values. Select Equivalence for Design Type, and Individual Means for Input Method. Enter 0.05 for Type I Error. Specify the Mean Control (µc ) as 21.62 and Mean Treatment (µt ) as 23.19. Select Sqrt(MSE) from the drop-down list and specify as 2.5. Also specify the Lower Equiv. Limit (δL ) and Upper Equiv. Limit (δU ) as -3 and 3, respectively. The upper pane should appear as below: Click Compute. The output is shown as a row in the Output Preview located in the lower pane of this window. The computed sample size (54 subjects) is highlighted in 13.3 Difference of Means in Crossover Designs – 13.3.1 Trial design 223 <<< Contents 13 * Index >>> Normal Equivalence Two-Sample yellow. This design has default name Des 1. Select this design and click in the Output Preview toolbar. Some of the design details will be displayed in the upper pane, labeled as Output Summary. In the Output Preview toolbar, click to save this design to Wbk1 in the Library. Double-click Des 1 in the Library to see the details of the designs. Close the 224 13.3 Difference of Means in Crossover Designs – 13.3.1 Trial design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 output window before continuing. 13.3.2 Simulation Select Des 1 in the Library, and click in the toolbar. Alternatively, right-click Des 1 and select Simulate. A new Simulation window will appear. Click on the Response Generation Info tab, and specify: Mean control = 21.62; Mean Treatment = 23.19; Sqrt(MSE) = 2.5. Leave the other default values and click Simulate. Once the simulation run has completed, East will add an additional row to the Output Preview labeled as Sim 1. Select Sim 1 in the Output Preview and click . Now double-click on Sim 1 in 13.3 Difference of Means in Crossover Designs – 13.3.2 Simulation 225 <<< Contents 13 * Index >>> Normal Equivalence Two-Sample the Library. The simulation output details will be displayed. Notice that the number of rejections was close to 90% of the 10000 simulated trials. The exact result of the simulations may differ slightly, depending on the seed. The simulation we have just done was under H1 . We wish to simulate from a point that belongs to H0 . Right-click Sim 1 in Library and select Edit Simulation. Go to the Response Generation Info tab in the upper pane and specify: Mean control = 21.62; Mean Treatment = 24.62; Sqrt. MSE = 2.5. Click Simulate. Once the simulation run has completed, East will add an additional row to the Output Preview labeled as Sim 2. Select Sim 2 in the Output Preview and click 226 . Now double-click on Sim 2 in the Library. The simulation output 13.3 Difference of Means in Crossover Designs – 13.3.2 Simulation <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 details will be displayed. Notice that the upper efficacy stopping boundary was crossed very close to 5% of the 10000 simulated trials. The exact result of the simulations may differ slightly, depending on the seed. 13.4 Ratio of Means in Crossover Designs Often in crossover designs, an equivalence hypothesis is tested in terms of ratio of means. These types of trials are very popular in establishing bioavailability or bioequivalence between two formulations in terms of pharmacokinetic parameters (FDA guideline on BA/BE studies for orally administered drug products, 2003). In particular, the FDA considers two products bioequivalent if the 90% confidence interval of the ratio of two means lie within (0.8, 1.25). In this chapter, we show how East supports the design and simulation of such experiments with endpoint as ratio of means. In a 2 × 2 crossover design each subject is randomized to one of two sequence groups. We have already discussed 2 × 2 crossover design in section 13.3. However, unlike section 13.3, we are interested in the ratio of means. Let µt and µc denote the means of the observations from the experimental treatment (T) and the control treatment (C), respectively. In an equivalence trial with endpoint as ratio of means, the goal is to establish ρL < ρ < ρU , where ρL and ρU are specified values used to define equivalence. In practice, ρL and ρU are often chosen such that ρL = 1/ρU The null hypothesis H0 : ρ ≤ ρL or ρ ≥ ρU is tested against the two-sided alternative hypothesis H1 : ρL < ρ < ρU at level α, using two one-sided tests. Schuirmann (1987) proposed working this problem on the natural logarithm scale. Thus, we are interested in the parameter δ = ln(ρ) = ln(µt ) − ln(µc ) and the null hypothesis H0 : δ ≤ δL or 13.4 Ratio of Means in Crossover Designs 227 <<< Contents 13 * Index >>> Normal Equivalence Two-Sample δ ≥ δU is tested against the two-sided alternative hypothesis H1 : δL < δ < δU at level α, using two one-sided t-tests. Here δL = ln(ρL ) and δU = ln(ρU ). Since we have translated the ratio hypothesis into a difference hypothesis, we can perform the test for difference as discussed in section 13.1. Note that we need the standard deviation for log transformed data. However, if we are provided with information on CV instead, the standard deviation of log transformed data can be q obtained using the relation sd = 13.4.1 ln (1 + CV2 ). Trial design Standard 2 × 2 crossover designs are often recommended in regulatory guidelines to establish bioequivalence of a generic drug with off patent brand-name drug. Consider a 2 × 2 bioequivalence trial between a Test drug (T) and a Reference Drug (C) where the equivalence need to be established in terms of pharmacokinetic parameter Area Under the Curve (AUC). Let µT and µc denote the average AUC for Test and Reference drugs, respectively. Let ρ = µt /µc be the ratio of averages. To establish average bioequivalence, the calculated 90% confidence interval of ρ should fall within a pre-specified bioequivalence limit. The bioequivalence limits are set at 0.8 and 1.25. Accordingly we plan to design the study to test: H0 : µt /µc ≤ 0.8 or µt /µc ≥ 1.25 against H1 : 0.8 < µt /µc < 1.25 From this study, we consider µc = 21.62 ng.h/mL and µt = 23.19 ng.h/mL under H1 . Further, we assume that the coefficient of variation (CV), or intrasubject variability, is 17%. For a lognormal population, the mean squared error (MSE) from ANOVA of log-transformed data, and CV, are related by: M SE = log(1 + CV 2 ). Thus in this case MSE is 0.0285 and its square-root is 0.169. We wish to design a study that would have 90% power at ρ1 = 23.19/21.62 = 1.073 under H1 . Start East afresh. Click Continuous: Two Samples on the Design tab and then click Crossover Design: Ratio of Means. This will launch a new window. The upper pane displays several fields with default values. Select Equivalence for Design Type, and Individual Means for Input Method. Enter 0.05 for Type I Error. Then specify the Mean Control (µc ) as 21.62 and Mean Treatment (µt ) as 23.19. Specify 0.169 for Sqrt. of MSE Log. Also 228 13.4 Ratio of Means in Crossover Designs – 13.4.1 Trial design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 specify the Lower Equiv. Limit (ρL ) and Upper Equiv. Limit (ρU ) as 0.8 and 1.25, respectively. The upper pane should appear as below: Click Compute. The output is shown as a row in the Output Preview located in the lower pane of this window. The computed sample size (23 subjects) is highlighted in yellow. This design has default name Des 1. Select this design and click in the Output Preview toolbar. Some of the design details will be displayed in the upper pane, labeled as Output Summary. 13.4 Ratio of Means in Crossover Designs – 13.4.1 Trial design 229 <<< Contents 13 * Index >>> Normal Equivalence Two-Sample In the Output Preview toolbar, click to save this design to Wbk1 in the Library. Double-click Des 1 in the Library to see the details of the designs. 13.4.2 Simulation in the toolbar. Alternatively, right-click Select Des 1 in the Library, and click Des 1 and select Simulate. A new Simulation window will appear. Click on the Response Generation Info tab, and specify: Mean control = 21.62; Mean Treatment = 23.19; Sqrt. of MSE Log = 0.169. Click Simulate. Once the simulation run has completed, East will add an additional row to the Output Preview labeled as Sim 1. Select Sim 1 in the Output Preview and click 230 . Now double-click on Sim 1 in 13.4 Ratio of Means in Crossover Designs – 13.4.2 Simulation <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 the Library. The simulation output details will be displayed. Notice that the number of rejections was close to 90% of the 10,000 simulated trials. The exact result of the simulations may differ slightly, depending on the seed. 13.4 Ratio of Means in Crossover Designs 231 <<< Contents * Index >>> 14 Normal: Many Means In this section, we will illustrate various tests available for comparing more than two continuous means in East. 14.1 One Way ANOVA 14.1.1 One Way Contrast In a one-way ANOVA test, we wish to test the equality of means across R independent groups. The two sample difference of means test for independent data is a one-way ANOVA test for 2 groups. The null hypothesis H0 : µ1 = µ2 = . . . = µR is tested against the alternative hypothesis H1 : for at least one pair (i, j), µi 6= µj , where i, j = 1, 2, . . . R. Suppose n patients have been allocated randomly to R treatments. We assume that the data of the R treatment groups comes from R normally distributed populations with the same variance σ 2 , and with population means µ1 , µ2 , . . . , µR . To design a one-way ANOVA study in East, first click Continuous: Many Samples on the Design tab, and then click Factorial Design: One Way ANOVA. In the upper pane of this window is the input dialog box. Consider a clinical trial with four groups. Enter 4 in Number of Groups(R). The trial is comparing three different doses of a drug against placebo in patients with Alzheimer’s disease. The primary objective of the study is to evaluate the efficacy of these three doses, where efficacy is assessed by difference from placebo in cognitive performance measured on a 13-item cognitive subscale. On the basis of pilot data, the expected mean responses are 0, 1.5, 2.5, and 2, for Groups 1 to 4, respectively. The common standard deviation within each group is σ = 3.5. We wish to compute the required sample size to achieve 90% power with a type-1 error of 0.05. Enter these values into the dialog box as shown below. 232 14.1 One Way ANOVA <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Then, click Compute. The design is shown as a row in the Output Preview, located in the lower pane of the window. The computed sample size (203) is highlighted in yellow. Select this row, then click in the Output Preview toolbar to save this design to Workbook1 in the Library. With Des1 selected in the Library, click 14.1 One Way ANOVA to 233 <<< Contents 14 * Index >>> Normal: Many Means display the following output. The output indicates that 51 patients per group is necessary to achieve the desired power. Close this output window before continuing. 14.1.1 One Way Contrast A contrast of the population means is a linear combination of the µi ’s. Let ci denote the coefficient for population mean µi in the linear contrast. For a P single contrast test of many means in a one-way ANOVA, the null hypothesis is H : ciP µi = 0 versus a 0 P two-sided alternative H : c µ = 6 0, or a one-sided alternative H : ci µi < 0 or 1 i i 1 P H1 : ci µi > 0. . In the input dialog box, click the With Des1 selected in the Library, click checkbox titled Use Contrast, and select a two-sided test. Ensure that the means for each group are the same as those from Des1 (0, 1.5, 2.5, and 2). In addition, we wish the test the following contrast: −3, 1, 1, 1, which compares the placebo group with the average of the three treatment groups. Finally, we may enter unequal allocation ratios such as: 1, 2, 2, 2, which implies that twice as many patients will be assigned to each 234 14.1 One Way ANOVA – 14.1.1 One Way Contrast <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 treatment group as in the placebo group. Click Compute. The following row will be added to the Output Preview. Given the above contrast and allocation ratios, this study would require a total of 265 patients to achieve 90% power. 14.2 One Way Repeated Measures (Const. Correlation) ANOVA As with the one-way ANOVA discussed in subsection 14.1, the repeated measures ANOVA also tests for equality of means. However, in a repeated measures setting, all patients are measured under all levels of the treatment. As the sample is exposed to each condition in turn, the measurement of the dependent variable is repeated. Thus, there is some correlation between observations from the same patient, which needs to be accounted for. The constant correlation assumption means we assume that the correlation between observations from the same patient is constant for all patients. The correlation parameter (ρ) is an additional parameter that needs to be specified in the one way repeated measures study design. Start East afresh. To design a repeated measure ANOVA study, click Continuous: Many Samples, and click Factorial Design: One Way Repeated Measures (Constant Correlation) ANOVA. A specific type of repeated measures design is a longitudinal study in which patients are followed over a series of time points. As an illustration, we will consider a 14.2 One Way Repeated Measures ANOVA 235 <<< Contents 14 * Index >>> Normal: Many Means hypothetical study that investigated the effect of a dietary intervention on weight loss. The endpoint is decrease in weight (in kilograms) from baseline, measured at four time points: baseline, 4 weeks, 8 weeks, and 12 weeks. For Number of Levels, enter 4. We wish to compute the required sample size to achieve 90% power with a type-1 error of 0.05. The means at each of the four levels are: 0, 1.5, 2.5, 2 for Levels 1, 2, 3, and 4, respectively. Finally, enter σ = 5 and ρ = 0.2, and click Compute. The design is shown as a row in the Output Preview, located in the lower pane of the window. The computed sample size (330) is highlighted in yellow. Select this row, then click in the Output Preview toolbar to save this design to Workbook1 in the Library. With Des1 selected in the Library, click 236 14.2 One Way Repeated Measures ANOVA to <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 display the following output. The output indicates that 83 patients per group is necessary to achieve the desired power. 14.3 Two Way ANOVA In a two-way ANOVA, there are two factors to consider, say A and B. We can design a study to test equality of means across factor A, factor B, or the interaction between of A and B. In addition to the common standard deviation σ, you also need to specify the cell means. For example, consider a study to determine the combined effects of sodium restriction and alcohol restriction on lowering of systolic blood pressure in hypertensive men (Parker et al., 1999). Let Factor A be sodium restriction and Factor B be alcohol restriction. There are two levels of each factor (restricted vs usual sodium intake, and restricted vs usual alcohol intake), producing four groups. Each patient is randomly assigned to one of these four groups. Start East afresh. Click Continuous: Many Samples, and click Factorial Design: Two-Way ANOVA. 14.3 Two Way ANOVA 237 <<< Contents 14 * Index >>> Normal: Many Means Enter a type-1 error of 0.05. Then enter the following values in the input dialog box as shown below: Number of Factor A Levels as 2, Number of Factor B Levels as 2, Common Std. Dev. as 2, A1/B1 as 0.5, A1/B2 as 4.7, A2/B1 as 0.4, and A2/B2 as 6.9. We will first select Power for A, then click Compute. Leaving the same input values, click Compute after selecting Power for B in the input window. Similarly, click Compute after selecting Power for AB. The Output Preview should now have three rows, as shown below. In order to achieve at least 90% power to detect a different across means in factor A, factor B, as well as the interaction, a sample size of 156 patients is necessary (i.e., Des1). Select Des1 in the Output Preview, then click in the toolbar to save to Workbook1 in the Library. With Des1 selected in the Library, click 238 14.3 Two Way ANOVA to <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 display the following output. The output indicates that 39 patients per group is necessary to achieve 90% power to test the main effect of A. 14.3 Two Way ANOVA 239 <<< Contents * Index >>> 15 Multiple Comparison Procedures for Continuous Data It is often the case that multiple objectives are to be addressed in one single trial. These objectives are formulated into a family of hypotheses. Formal statistical hypothesis tests can be performed to see if there is strong evidence to support clinical claims. Type I error is inflated when one considers the inferences together as a family. Failure to compensate for multiplicities can have adverse consequences. For example, a drug could be approved when actually it is not better than placebo. Multiple comparison (MC) procedures provides a guard against inflation of type I error due to multiple testing. Probability of making at least one type I error is known as family wise error rate (FWER). East supports several parametric and p-value based MC procedures. In this chapter we explain how to design a study using a chosen MC procedure that strongly maintains FWER. In East, one can calculate the power from the simulated data under different MC procedures. With the information on power, one can choose the right MC procedure that provides maximum power yet strongly maintains the FWER. MC procedures included in East strongly control FWER. Strong control of FWER refers to preserving the probability of incorrectly claiming at least one null hypothesis. To contrast strong control with weak control of FWER, the latter controls the FWER under the assumption that all hypotheses are true. East supports following MC procedures based on continuous endpoint. Category Parameteric P-value Based 240 Procedure Dunnett’s Single Step Dunnett’s Step Down Dunnett’s Step Up Bonferroni Sidak Weighted Bonferroni Holm’s Step Down Hochberg’s Step Up Hommel’s Step Up Fixed Sequence Fallback Reference Dunnett CW (1955) Dunnett CW and Tamhane AC (1991) Dunnett CW and Tamhane AC (1992) Bonferroni CE (1935, 1936) Sidak Z (1967) Benjamini Y and Hochberg Y ( 1997) Holm S (1979) Hochberg Y (1988) Hommel G (1988) Westfall PH, Krishen A (2001) Wiens B, Dimitrienko A (2005) <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 15.1 Parametric Procedures 15.1.1 Dunnett’s single step 15.1.2 Dunnett’s stepdown and step-up procedures Assume that there are k arms including the placebo arm. Let ni be the number of Pk−1 subjects for i-th treatment arm (i = 0, 2, · · · , k − 1). Let N = i=0 ni be the total sample size and the arm 0 refers to placebo. Let Yij be the response from subject j in treatment arm i and yij be the observed value of Yij (i = 0, 2, · · · , k − 1, j = 1, 2, · · · , ni ). Suppose that Yij = µi + eij (15.1) where eij ∼ N (0, σ 2 ). We are interested in the following hypotheses: For the right tailed test: Hi : µi − µ0 ≤ 0 vs Ki : µi − µ0 > 0 For the left tailed test: Hi : µi − µ0 ≥ 0 vs Ki : µi − µ0 < 0 For the global null hypothesis at least one of the Hi is rejected in favor of Ki after controlling for FWER. Here Hi and Ki refer to null and alternative hypotheses, respectively, for comparison of i-th arm with the placebo arm. East supports three parametric MC procedures - single step Dunnett test (Dunnett, 1955), step-down Dunnett test and step-up Dunnett test. These procedures make two parametric assumptions - normality and homoscedasticity. Let ȳi be the sample mean for treatment arm i and s2 be the pooled sample variance for all arms. The test statistic for comparing treatment effect of arm i with placebo can be defined as ȳi − ȳ0 Ti = q s n1i + n10 (15.2) Let ti be the observed value of Ti and these observed values for K − 1 treatment arms can be ordered as t(1) ≥ t(2) ≥ · · · ≥ t(k−1) . Detailed formula to obtain critical boundaries for single step Dunnett and step-down Dunnett tests are discussed in Appendix H. In single step Dunnett test, the critical boundary remains same for all the k − 1 individual tests. Let cα be the critical boundary that maintains FWER of α and p̃i be the adjusted p− value associated with comparison of i-th arm and placebo arm. Then for a right tailed test, Hi is rejected if ti > cα and for a left tailed test Hi is rejected if ti < cα . Unlike in single step Dunnett test, the critical boundary does not remain same for all the k − 1 individual tests in step-down Dunnett test. Let ci be the critical boundary and p̃i be the adjusted p-value associated with comparison of i-th arm and placebo arm. For a right tailed test H(i) is rejected if t(i) > ci and H(1) , · · · , H(c−i) have been 15.1 Parametric Procedures 241 <<< Contents 15 * Index >>> Multiple Comparison Procedures for Continuous Data already rejected. For a left tailed test H(i) is rejected if t(i) < ck−i and H(i−1) , · · · , H(k−1) have been already rejected. Unlike step-down test, step-up Dunnett procedure starts with the least significant test statistic i.e., t(k−1) . Let ci be the critical boundary and p̃i be the adjusted p-value associated with comparison of i-th arm and placebo arm. The i-th test statistic in order i.e., t(i) will be tested if and only if none of H(i+1) , · · · , H(k−1) are rejected. If H(i) is rejected then stop and reject all of H(i) , · · · , H(1) . For a right tailed test, H(i) is rejected if t(i) > c(i) and for a left tailed test H(i) is rejected if t(i) < c(i) . For both single step Dunnett and step-down Dunnett tests, the global null hypothesis is rejected in favor of at least one right tailed alternative if H(1) is rejected and in favor of at least one left tailed alternative if H(k−1) is rejected . Single step Dunnett test and step-down Dunnett test can be seen as the parametric version of Bonferroni procedure and Holm procedure, respectively. Parametric tests are uniformly more powerful than the corresponding p-value based tests when the parametric assumption holds or at least approximately holds, especially when there are a large number of hypotheses. Parametric procedures may not control FWER if the standard deviations are different. 15.1.1 Dunnett’s single step Dunnett’s Single Step procedure is described below with an example. Example: Alzheimer’s Disease Clinical Trial In this section, we will use an example to illustrate how to design a study using the MCP module in East. This is a randomized, double-blind, placebo-controlled, parallel study to assess three different doses (0.3 mg, 1 mg and 2 mg) of a drug against placebo in patients with mild to moderate probable Alzheimer’s disease. The primary objective of this study is to evaluate the safety and efficacy of the three doses. The drugs are administered daily for 24 weeks to subjects with Alzheimer’s disease who are either receiving concomitant treatment or not receiving any co-medication. The efficacy is assessed by cognitive performance based on the Alzheimer’s disease assessment scale-13-item cognitive sub-scale. From previous studies, it is estimated that the common standard deviation of the efficacy measure is 5. It is expected that the dose-response relationship follows straight line within the dose range we are interested. We would like to calculate the power for a total sample size of 200. This will be a balanced study with a one-sided 0.025 significance level to detect at least one dose 242 15.1 Parametric Procedures – 15.1.1 Dunnett’s single step <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 with significant difference from placebo. We will show how to simulate the power of such a study using the multiple comparison procedures listed above. Designing the study First, click (Continuous: Many Samples) on the Design tab and then click Multi-Arm Design: Pairwise Comparisons to Control - Difference of Means. This will launch a new window. There is a box at the top with the label Number of Arms. For our example, we have 3 treatment groups plus a placebo. So enter 4 for Number of Arms. Under the Design Parameters tab, there are several fields which we will fill in. First, there is a box with the label Side. Here you need to specify whether you want a one-sided or two-sided test. Currently, only one-sided tests are available. Under it you will see the box with label Sample Size (n). For now skip this box and move to the next dropdown box with the label Rejection Region. If left tail is selected, the critical value for the test is located in the left tail of the distribution of the test statistic. Likewise, if right tail is selected the critical value for the test is located in the right tail of the distribution of the test statistic. For our example, we will select Right Tail. Under that, there is a box with the label Type - 1 Error (α). This is where you need to specify the FWER. For our example, enter 0.025. Now go to the box with the label Total Sample Size. Here we input the total number of subjects, including those in the placebo arm. For this example, enter 200. To the right, there will be a heading with the title Multiple Comparison Procedures. In the parametric grouping, check the box next to Dunnett’s single step, as this is the multiple comparison procedure we are illustrating in this subsection. After entering these parameters your screen should now look like this: Now click on Response Generation Info tab. You will see a table titled Table of Proportions. In this table we can specify the labels for treatment arms. Also you have to specify the dose level if you want to generate means through dose-response curve. Since we are comparing placebo and 3 dose groups, enter Placebo, Dose1, Dose2 and Dose3 in the 4 cells in first column labeled as Arm. 15.1 Parametric Procedures – 15.1.1 Dunnett’s single step 243 <<< Contents 15 * Index >>> Multiple Comparison Procedures for Continuous Data The table contains the default mean and standard deviation for each arm which we will change later. There are two check boxes in this tab above the table. The first is labeled Generate Means through DR Curve. There are two ways to specify the mean response for each arm: 1) generate means for each arm through a dose-response curve or 2) Specify the mean directly in the Table of Proportions. To specify the mean directly just enter the mean value for each arm in the table in Mean column. However, in this example, we will generate means through dose response curve. In order to do this, check Generate Means through DR Curve box. Once you check this box you will notice two things. First, an additional column with label Dose will appear in the table. Here you need to enter the dose levels for each arm. For this example, enter 0, 0.3, 1 and 2 for Placebo, Dose1, Dose2 and Dose3 arms, respectively. Secondly, you will notice an additional section will appear to the right which provides the option to generate the mean response from four families of parametric curves which are Four Parameter Logistic, Emax, Linear and Quadratic. The technical details about each curve can be found in the Appendix H. Here you need to choose the appropriate parametric curve from the drop-down list under Dose Response Curve and then you have to specify the parameters associated with these curves. For the Alzheimer’s disease example, suppose the dose response follows a linear curve with intercept 0 and slope 1.5. To do this, we would need to select ”Linear” from the dropdown list. To right of this dropdown box, specify the parameter values of the selected curve family by inputting 0 for Intercept(E0) and 1.5 for Slope(δ). After specifying this, the mean values in the table will be changed accordingly. Here we are generating the means using the following linear dose-response curve: E(Y |Dose) = E0 + δ × Dose (15.3) For placebo, the mean can be obtained by specifying Dose as 0 in the above equation. This gives the mean for placebo arm as 0. For arm Dose1, mean would be 0 + 1.5 × 0.3 or 0.45. Similarly the means for the arm Dose2 and Dose3 will be obtained as 1.5 and 3. You can verify that the values in Mean column is changed to 0, 0.45, 1.5 and 3 for the four arms, respectively. 244 15.1 Parametric Procedures – 15.1.1 Dunnett’s single step <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Now click Plot DR Curve to see the plot of means against the dose levels. You will see the linear dose response curve that intersects the Y-axis at 0. Now close this window. The dose response curve generates means, but still we have to specify the standard deviation. Standard deviation for each arm could be either equal or different. To specify the common standard deviation, check the box with label Common Standard Deviation and specify the common standard deviation in the field next to it. When standard deviations for different arms are not all equal, the standard deviations need to be directly specified in the table in column labeled with Std. Dev.. In this example, we are considering a common standard deviation of 5. So check the box for Common Standard Deviation and specify 5 in the field next to it. Now the column Std.Dev. will be updated with 5 for all the four arms. As we have finished specifying all the fields in the Response Generation Info tab, this should appear as below. 15.1 Parametric Procedures – 15.1.1 Dunnett’s single step 245 <<< Contents 15 * Index >>> Multiple Comparison Procedures for Continuous Data Click on the Include Options button located in the right-upper corner in the Simulation window and check Randomized Info. This will add an additional tab Randomization Info. Now click on the Randomization Info tab. Second column of the Table of Allocation table displays the allocation ratio of each treatment arm to that of control arm. The cell for control arm is always one and is not editable. Only those cells for treatment arms other than control need to be filled in. The default value for each treatment arm is one which represents a balanced design. For the Alzheimer’s disease example, we consider a balanced design and leave the default values for the allocation ratios unchanged. Your screen should now look like this: The last tab is Simulation Control Info. Specify 10000 as Number of Simulations and 1000 as Refresh Frequency in this tab. The box labeled Random Number Generator is where you can set the seed for the random number generator. You can either use the clock as the seed or choose a fixed seed (in order to replicate past simulations). The default is the clock and we will use that. The box on the right hand side is labeled Output Options. This is where you can choose to save summary statistics for each simulation run and/or to save subject level data for a specific number of simulation runs. To save the output for each simulation, check the box with label Save summary statistics for every simulation run. Now click Simulate to start the simulation. Once the simulation run has completed, East will add an additional row to the Output Preview labeled as Sim 1. Note that a simulation node Sim 1 is created in the library. Also note that another node is appended to the simulation node with label SummaryStat which contains detailed simulation summary statistics for each simulation run. Select Sim 1 in the Output Preview and click 246 icon to save the simulation in the library. Now double-click on 15.1 Parametric Procedures – 15.1.1 Dunnett’s single step <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Sim 1 in the Library. The simulation output details will be displayed in the right pane. The first section in the output is the Hypothesis section. In our situation, we are testing 3 hypotheses. We are comparing the mean score on the Alzheimer’s disease assessment scale (13-item cognitive sub-scale) for each dose with that of placebo. That is, we are testing the 3 hypotheses: H1 :µ1 = µ0 vs K1 :µ1 > µ0 H2 :µ2 = µ0 vs K2 :µ2 > µ0 H3 :µ3 = µ0 vs K3 :µ3 > µ0 Here, µP , µ1 , µ2 and µ3 represent the population mean score on the Alzheimer’s disease assessment scale for the placebo, 0.3 mg, 1 mg and 2 mg dose groups, respectively. Also, Hi and Ki are the null and alternative hypotheses, respectively, for the i-th test. The Input Parameters section provides the design parameters that we specified earlier. The next section Overall Power gives us estimated power based on the 15.1 Parametric Procedures – 15.1.1 Dunnett’s single step 247 <<< Contents 15 * Index >>> Multiple Comparison Procedures for Continuous Data simulation. The second line gives us the global power, which is about 75%. Global power indicates the power to reject global null H0 : µ1 = µ2 = µ3 = µ0 . Thus, the global power indicates that 75% of times the global null will be rejected. In other words, at least one of the H1 , H2 and H3 is rejected in about 75% of the occasion. Global power is useful to show the existence of dose-response relationship and dose-response may be claimed if any of the doses in the study is significantly different from placebo. The next line displays the conjunctive power. Conjunctive power indicates the proportion of cases in the simulation where all the Hi ’s, which are truly false, were rejected. In this example, all the Hi ’s are false. Therefore, for this example, conjunctive power is the proportion of cases where all of the H1 , H2 and H3 were rejected. For this simulation conjunctive power is only about 2.0% which means that only in 2.0% of time, all of the H1 , H2 and H3 were rejected. Disjunctive power indicates the proportion of rejecting at least one of those Hi ’s where Hi is truly false. The main distinction between global and distinctive power is that the former finds any rejection whereas the latter look for rejection only among those Hi ’s which are false. Since here all of the H1 , H2 and H3 are false, therefore, global and disjunctive power ought to be the same. The next section gives us the marginal power for each hypothesis. Marginal power finds the proportion of times when a particular hypothesis is rejected after applying multiplicity adjustment. Based on simulation results, H1 is rejected about 3% of times, H2 is rejected about 20% of times and H3 is rejected a little more than 70% of times. Recall that we have asked East to save the simulation results for each simulation run—. Open this file by clicking on SummaryStat in the library and you will see that it contains 10,000 rows - each rows represents results for a single simulation. Find the 3 columns with labels Rej Flag 1, Rej Flag 2 and Rej Flag 3, respectively. These columns represents the rejection status for H1 , H2 and H3 , respectively. A value of 1 is indicator of rejection on that particular simulation, otherwise the null is not rejected. Now the proportion of 1’s in Rej Flag 1 indicates the marginal power to reject H1 . Similarly we can find out the marginal power for H2 and H3 from Rej Flag 2 and Rej Flag 3, respectively. To obtain the global and disjunctive power, count the total number of cases where at least one of the H1 , H2 and H3 have been rejected and then divide by the total number of simulations of 10,000. Similarly, to obtain the conjunctive power count the total number of cases where all of the H1 , H2 and H3 have been rejected and then divide by the total number of simulations of 10,000. 248 15.1 Parametric Procedures – 15.1.1 Dunnett’s single step <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Next we will consider an example to show how global and disjunctive power are different from each other. Select Sim 1 in Library and click . Now go to the Response Generation Info tab and uncheck the Generate Means Through DR Curve box. The table will now have only three columns. Specify Dose1, Dose2 and Dose3 in the 4 cells in first column labeled as Arm and enter 0, 0, 1 and 1.2 in the 4 cells in second column labeled as Mean. Here we are generating response for placebo from distribution N (0, 52 ), for Dose1 from distribution N (0, 52 ), for Dose2 from distribution N (1, 52 ) and for Dose3 from distribution N (1.2, 52 ). Now click Simulate to start the simulation. Once the simulation run has completed, East will add an additional row to the Output Preview labeled as Sim 2. For Sim 2, the global power and disjunctive power are 17.9% and 17.6%, respectively. To understand why, we need to open the saved simulation data for Sim 2. The total number of cases where at least one of H1 , H2 and H3 is rejected is 1790 and dividing this by total number of simulation 10,000 gives the global power of 17.9%. Again, the total number of cases where at least one of H2 and H3 are rejected is 1760 and dividing this by total number of simulation 10,000 gives the disjunctive power of 17.6%. The exact result of the simulations may differ slightly, depending on the seed. 15.1 Parametric Procedures – 15.1.2 Dunnett’s step-down and step-up procedures 249 <<< Contents 15 * Index >>> Multiple Comparison Procedures for Continuous Data 15.1.2 Dunnett’s step-down and step-up procedures Dunnett’s Step-Down procedure is described below using the same Alzheimer’s Disease example from the previous section 15.1.1 on Dunnett’s Single Step. Since the other design specification remains same except that we are using Dunnett’s step-down in place of single step Dunnett’s test, we can design simulation in this section with only little effort. Select Sim 1 in Library and click . Now go to the Design Parameters tab. There in the Multiple Comparison Procedures box, uncheck the Dunnett’s single step box and check the Dunnett’s step-down and Dunnett’s step-up box. Now click Simulate to obtain power. Once the simulation run has completed, East will add two additional rows to the Output Preview labeled as Sim 3 and Sim 4. Dunnett step-down procedure and step-down have global and disjunctive power of close to 75% and conjunctive power of close to 4%. To see the marginal power for icon. Now, each test, select Sim 3 and Sim 4 in the Output Preview and click double-click on Sim 3 in the Library. The simulation output for Dunnett step-down 250 15.1 Parametric Procedures – 15.1.2 Dunnett’s step-down and step-up procedures <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 procedure details will be displayed in the right pane. The marginal power for comparison of Dose1, Dose2 and Dose3 using Dunnett step-down procedure are close to 5%, 23% and 74%, respectively. Similarly one can find the marginal power for individual tests in Dunnett step-up procedure. 15.2 p-value based Procedures 15.2.1 Single step MC procedures 15.2.2 Data-driven stepdown MC procedure 15.2.3 Data-driven step-up MC procedures 15.2.4 Fixed-sequence stepwise MC procedures p-value based procedures strongly control the FWER regardless of the joint distribution of the raw p-values as long as the individual raw p-values are legitimate p-values. Assume that there are k arms including the placebo arm. Let ni be the Pk−1 number of subjects for i-th treatment arm (i = 0, 2, · · · , k − 1). Let N = i=0 ni be the total sample size and the arm 0 refers to placebo. Let Yij be the response from subject j in treatment arm i and yij be the observed value of Yij (i = 0, 2, · · · , k − 1, j = 1, 2, · · · , ni ). Suppose that Yij = µi + eij (15.4) where eij ∼ N (0, σi2 ). We are interested in the following hypotheses: For the right tailed test: Hi : µi − µ0 ≤ 0 vs Ki : µi − µ0 > 0 For the left tailed test: Hi : µi − µ0 ≥ 0 vs Ki : µi − µ0 < 0 15.2 p-value based Procedures 251 <<< Contents 15 * Index >>> Multiple Comparison Procedures for Continuous Data For the global null hypothesis at least one of the Hi is rejected in favor of Ki after controlling for FWER. Here Hi and Ki refer to null and alternative hypotheses, respectively, for comparison of i-th arm with the placebo arm. Let ȳi be the sample mean for treatment arm i, s2i be the sample variance from i-th arm and s2 be the pooled sample variance for all arms. For the unequal variance case, the test statistic for comparing treatment effect of arm i with placebo can be defined as Ti = q ȳi − ȳ0 1 2 ni si + (15.5) 1 2 n0 s0 For the equal variance case, one need to replace s2i and s20 by the pooled sample variance s2 . For both the case, Ti is distributed as Student’s t distribution. However, the degrees of freedom varies for equal variance and unequal variance case. For equal variance case the degrees of freedom would be N − k. For the unequal variance case, the degrees of freedom is subject to Satterthwaite correction. Let ti be the observed value of Ti and these observed values for K − 1 treatment arms can be ordered as t(1) ≥ t(2) ≥ · · · ≥ t(k−1) . For the right tailed test the marginal p-value for comparing the i-th arm with placebo is calculated as pi = P (T > ti ) and for left tailed test pi = P (T < ti ), where T is distributed as Student’s t distribution. Let p(1) ≤ p(2) ≤ · · · ≤ p(k−1) be the ordered p-values. 15.2.1 Single step MC procedures East supports three p-value based single step MC procedures - Bonferroni procedure, Sidak procedure and weighted Bonferroni procedure. For the Bonferroni procedure, α and the adjusted p-value is given as min(1, (k − 1)pi ). For Hi is rejected if pi < k−1 1 the Sidak procedure, Hi is rejected if pi < 1 − (1 − α) k−1 and the adjusted p-value is given as 1 − (1 − pi )k−1 . For the weighted Bonferroni procedure, Hi is rejected if pi < wi α and the adjusted p-value is given as min (1, wpii ). Here wi denotes the Pk−1 1 , proportion of α allocated to the Hi such that i=1 wi = 1. Note that, if wi = k−1 then the Bonferroni procedure is reduced to the regular Bonferroni procedure. Bonferroni and Sidak procedures Bonferroni and Sidak procedures are described below using the same Alzheimer’s Disease example from the section 15.1.1 on Dunnett’s Single Step. Since the other design specification remains same except that we are using Bonferroni and Sidak in place of single step Dunnett’s test, we can design simulation in this 252 15.2 p-value based Procedures – 15.2.1 Single step MC procedures <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 section with only little effort.Select Sim 1 in Library and click . Now go to the Design Parameters tab. In the Multiple Comparison Procedures box, uncheck the Dunnett’s single step box and check the Bonferroni and Sidak boxes. Now click Simulate to obtain power. Once the simulation run has completed, East will add two additional rows to the Output Preview. Bonferroni and Sidak procedures have disjunctive and global powers of close to 73% and conjunctive power of about 1.8%. Now select Sim 5 and Sim 6 in the Output Preview using the Ctrl key and click icon. This will save Sim 5 and Sim 6 in the Wbk1 in Library. Weighted Bonferroni procedure As before we will use the same Alzheimer’s Disease example to illustrate weighted Bonferroni procedure. Select Sim 1 in Library and click . Now go to the Design Parameters tab. There in the Multiple Comparison Procedures box, uncheck the Dunnett’s single step box and check the Weighted Bonferroni box. Next click on Response Generation Info tab and look at the Table of Proportions. You will see an additional column with label Proportion of Alpha is added. Here you have to specify the proportion of total alpha you want to spend in each test. Ideally, the values in this column should add up to 1; if not, then East will normalize it to add them up to 1. By default, East distributes the total alpha equally among all tests. Here we have 3 tests in total, therefore each of the tests have proportion of alpha as 1/3 or 0.333. You can specify other proportions as well. For this example, keep the equal 15.2 p-value based Procedures – 15.2.1 Single step MC procedures 253 <<< Contents 15 * Index >>> Multiple Comparison Procedures for Continuous Data proportion of alpha for each test. Now click Simulate to obtain power. Once the simulation run has completed, East will add an additional row to the Output Preview labeled as Sim 7. The weighted Bonferroni MC procedure has global and disjunctive power of 73.7% and conjunctive power of 1.6%. Note that, the powers in the weighted Bonferroni procedure is quite close to the Bonferroni procedure. This is because the weighted Bonferroni procedure with equal proportion is equivalent to the simple Bonferroni procedure. The exact result of the simulations may differ slightly, depending on the seed. Now select Sim 7 in the Output Preview and click This will save Sim 7 in Wbk1 in Library. 15.2.2 icon. Data-driven step-down MC procedure In the single step MC procedures, the decision to reject any hypothesis does not depend on the decision to reject other hypotheses. On the other hand, in the stepwise procedures decision of one hypothesis test can influence the decisions on the other tests of hypotheses. There are two types of stepwise procedures. One type of procedures proceed in data-driven order. The other type proceeds in a fixed order set a priori. Stepwise tests in a data-driven order can proceed in step-down or step-up manner. East supports Holm step-down MC procedure which start with the most significant comparison and continue as long as tests are significant until the test for certain hypothesis fails. The testing procedure stops at the first time a non-significant comparison occurs and all remaining hypotheses will be retained. In i-th step, H(k−i) is rejected if p(k−i) ≤ αi and go to the next step. Holm’s step-down As before we will use the same Alzheimer’s Disease example to illustrate Holm’s . Now go to the Design step-down procedure. Select Sim 1 in Library and click Parameters tab. In the Multiple Comparison Procedures box, uncheck the Dunnett’s single step box and check the Holm’s Step-down box. Now click Simulate to obtain power. Once the simulation run has completed, East will 254 15.2 p-value based Procedures – 15.2.2 Data-driven step-down MC procedure <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 add an additional row to the Output Preview labeled as Sim 8. Holm’s step-down procedure has global and disjunctive power of 74% and conjunctive power of 4.5%. The exact result of the simulations may differ slightly, depending on the seed. Now select Sim 8 in the Output Preview and click Sim 8 in Wbk1 in Library. 15.2.3 icon. This will save Data-driven step-up MC procedures Step-up tests start with the least significant comparison and continue as long as tests are not significant until the first time when a significant comparison occurs and all remaining hypotheses will be rejected. East supports two such MC procedures 15.2 p-value based Procedures – 15.2.3 Data-driven step-up MC procedures 255 <<< Contents 15 * Index >>> Multiple Comparison Procedures for Continuous Data Hochberg step-up and Hommel step-up procedures. In the Hochberg step-up procedure, in i-th step H(k−i) is retained if p(k−i) > αi . In the Hommel step-up procedure, in i-th step H(k−i) is retained if p(k−j) > i−j+1 α for j = 1, · · · , i. Fixed i sequence test and fallback test are the types of tests which proceed in a prespecified order. Hochberg’s and Hommel’s step-up procedures Hochberg’s and Hommel’s step-up procedures are described below using the same Alzheimer’s Disease example from the section 15.1.1 on Dunnett’s Single Step. Since the other design specification remains same except that we are using Hocheberg and Hommel step-up procedures in place of single step Dunnett’s test we can design simulation in this section with only little effort. Select Sim 1 in Library and click . Now go to the Design Parameters tab. There in the Multiple Comparison Procedures box, uncheck the Dunnett’s single step box and check the Hochberg’s step-up and Hommel’s step-up boxes. Now click Simulate to obtain power. Once the simulation run has completed, East will add two additional rows to the Output Preview labeled as Sim 9 and Sim 10. Hocheberg and Hommel procedures have disjunctive and global powers of close to 74 256 15.2 p-value based Procedures – 15.2.4 Fixed-sequence stepwise MC procedures <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 15.2.4 Fixed-sequence stepwise MC procedures In data-driven stepwise procedures, we don’t have any control on the order of the hypotheses to be tested. However, sometimes based on our preference or prior knowledge we might want to fix the order of tests a priori. Fixed sequence test and fallback test are the types of tests which proceed in a pre-specified order. East supports both of these procedures. Assume that H1 , H2 , · · · , Hk−1 are ordered hypotheses and the order is pre-specified so that H1 is tested first followed by H2 and so on. Let p1 , p2 , · · · , pk−1 be the associated raw marginal p-values. In the fixed sequence testing procedure, for i = 1, · · · , k − 1, in i-th step, if pi < α, reject Hi and go to the next step; otherwise retain Hi , · · · , Hk−1 and stop. Fixed sequence testing strategy is optimal when early tests in the sequence have largest treatment effect and performs poorly when early hypotheses have small treatment effect or are nearly true (Westfall and Krishen 2001). The drawback of fixed sequence test is that once a hypothesis is not rejected no further testing is permitted. This will lead to lower power to reject hypotheses tested later in the sequence. Fallback test alleviates the above undesirable feature for fixed sequence test. Let wi be Pk−1 the proportion of α for testing Hi such that i=1 wi = 1. In the fixed sequence testing procedure, in i-th step (i = 1, · · · , k − 1), test Hi at αi = αi−1 + αwi if Hi−1 is rejected and at αi = αwi if Hi−1 is retained. If pi < αi , reject Hi ; otherwise retain it. Unlike the fixed sequence testing approach, the fallback procedure can continue testing even if a non-significant outcome is encountered by utilizing the fallback strategy. If a hypothesis in the sequence is retained, the next hypothesis in the sequence is tested at the level that would have been used by the weighted Bonferroni procedure. With w1 = 1 and w2 = · · · = wk−1 = 0, the fallback procedure simplifies to fixed sequence procedure. Fixed sequence testing procedure As before we will use the same Alzheimer’s Disease example to illustrate fixed sequence testing procedure. Select Sim 1 in Library and click . Now go to the Design Parameters tab. There in the Multiple Comparison Procedures box, 15.2 p-value based Procedures – 15.2.4 Fixed-sequence stepwise MC procedures 257 <<< Contents 15 * Index >>> Multiple Comparison Procedures for Continuous Data uncheck the Dunnett’s single step box and check the Fixed Sequence box. Next click on Response Generation Info tab and look at the Table of Proportions. You will see an additional column with label Test Sequence is added. Here you have to specify the order in which the hypotheses will be tested. Specify 1 for the test that will be tested first, 2 for the test that will be tested next and so on. By default East specifies 1 to the first test, 2 to the second test and so on. For now we will keep the default which means that H1 will be tested first followed by H2 and finally H3 will be tested. Now click Simulate to obtain power. Once the simulation run has completed, East will add an additional row to the Output Preview labeled as Sim 11. The fixed sequence procedure with the specified sequence has global and disjunctive power of less than 7% and conjunctive power of 5%. The reason for small global and 258 15.2 p-value based Procedures – 15.2.4 Fixed-sequence stepwise MC procedures <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 disjunctive power is due to the smallest treatment effect is tested first and the magnitude of treatment effect increases gradually for the remaining tests. For optimal power in fixed sequence procedure, the early tests in the sequence should have larger treatment effects. In our case, Dose3 has largest treatment effect followed by Dose2 and Dose1. Therefore, to obtain optimal power, H3 should be tested first followed by H2 and H1 . Select Sim 11 in the Output Previewand click icon. Select Sim 11 in Library, click and go to the Response Generation Info tab. In Test Sequence column in the table, specify 3 for Dose1, 2 for Dose2 and 1 for Dose3. Now click Simulate to obtain power. Once the simulation run has completed, East will add an additional rows to the Output Preview labeled as Sim 12. Now the fixed sequence procedure with the pre-specified sequence (H3 , H2 , H1 ) has global and disjunctive power close to 85% and conjunctive power close to 5%. This example illustrates that fixed sequence procedure is powerful provided the hypotheses are tested in a sequence of descending treatment effects. Fixed sequence procedure controls the FWER because for each hypothesis, testing is conditional upon rejecting all hypotheses earlier in sequence. The exact result of the simulations may differ slightly, depending on the seed. Select Sim 12 in the Output Preview and click 15.2 p-value based Procedures – 15.2.4 Fixed-sequence stepwise MC procedures 259 <<< Contents 15 * Index >>> Multiple Comparison Procedures for Continuous Data icon to save it in Library. Fallback procedure Again we will use the same Alzheimer’s Disease example to illustrate the fallback procedure. Select Sim 1 in Library and click . There in the Multiple Comparison Procedures box, uncheck the Dunnett’s single step box and check the Fallback box. Next click on Response Generation Info tab and look at the Table of Proportions. You will see two additional columns with label Test Sequence and Proportion of Alpha. In the column Test Sequence, you have to specify the order in which the hypotheses will be tested. Specify 1 for the test that will be tested first, 2 for the test that will be tested next and so on. By default East specifies 1 to the first test, 2 to the second test and so on. For now we will keep the default which means that H1 will be tested first followed by H2 and finally H3 will be tested. In the column Proportions of Alpha, you have to specify the proportion of total alpha you want to spend in each test. Ideally, the values in this column should add up to 1; if not, then East will normalize it to add them up to 1. By default East distributes the total alpha equally among the all tests. Here we have 3 tests in total, therefore each of the tests have proportion of alpha as 1/3 or 0.333. You can specify other proportions as well. For this example, keep the equal proportion of alpha for each test. 260 15.2 p-value based Procedures – 15.2.4 Fixed-sequence stepwise MC procedures <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Now click Simulate to obtain power. Once the simulation run has completed, East will add an additional rows to the Output Preview labeled as Sim 13. Now we will consider a sequence where H3 will be tested first followed by H2 and H1 because in our case, Dose3 has largest treatment effect followed by Dose2 and Dose1. Select Sim 13 in the Output Previewand click icon. Select Sim 12 in Library, click and go to the Response Generation Info tab. In Test Sequence column in the table, specify 3 for Dose1, 2 for Dose2 and 1 for Dose3. Now click Simulate to obtain power. Once the simulation run has completed, East will 15.2 p-value based Procedures – 15.2.4 Fixed-sequence stepwise MC procedures 261 <<< Contents 15 * Index >>> Multiple Comparison Procedures for Continuous Data add an additional rows to the Output Preview labeled as Sim 14. Note that the fallback test is more robust to the misspecification of the test sequence but fixed sequence test is very sensitive to the test sequence. If the test order is misspecified, fixed sequence test has very poor performance. 15.3 Comparison of MC procedures We have obtained the power (based on the simulation) for different MC procedures for the Alzheimer’s Disease example from the section 15.1.1. Now the obvious question is which MC procedure to choose. To compare all the MC procedure, we will perform simulation for all the MC procedures under the following scenario. Treatment arms: placebo, dose1 (dose=0.3 mg), dose2 (dose=1 mg) and dose3 (dose=2 mg) with respective groups means as 0, 0.45, 1.5 and 3, respectively. common standard deviation = 5 Type I Error: 0.025 (right-tailed) Number of Simulations:10000 Total Sample Size:200 Allocation ratio: 1 : 1 : 1 : 1 For comparability of simulation results, we have used similar seed for simulation under all MC procedures. Following output displays the powers under different MC 262 15.3 Comparison of MC procedures <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 procedures. Here we have used equal proportions for weighted Bonferroni and Fallback procedures. For the two fixed sequence testing procedures (fixed sequence and fallback) two sequences have been used - (H1 , H2 , H3 ) and (H3 , H2 , H1 ). As expected, Bonferroni and weighted Bonferroni procedures provides similar powers. It appears that fixed sequence procedure with the pre-specified sequence (H3 , H2 , H1 ) provides the power of close to 85% which is the maximum among all the procedures. However, fixed sequence procedure with the pre-specified sequence (H1 , H2 , H3 ) provides power of less than 7%. Therefore, power in fixed sequence procedure is largely dependent on the specification of sequence of testing and a mis-specification might result in huge drop in power. For this reason, fixed sequence procedure may not be considered as appropriate MC procedure to go with. Dunnett’s single step, step-down and step-up procedures are the next in order after fixed sequence procedure with the pre-specified sequence (H3 , H2 , H1 ). All the three procedures attain close to 75% of disjunctive power, respectively. However, all these three procedures assume that all the treatment arms have equal variance. Therefore, if homogeneity of variance between the treatment arms is a reasonable assumption, Dunnett’s step-down or single step procedure should be the best option based on these simulation results. However, when the assumption of equal variance is not met, Dunnett’s procedure may not be the appropriate procedure as the type I error might not be strongly controlled. Next in the list are the fallback procedures and both of them provides a little more than 73% power which is very close to the power attained by Dunnett’s procedures. Therefore, unlike fixed sequence procedure, fallback procedure does not depend much on the order of the hypotheses they are tested. Moreover, this does not require the 15.3 Comparison of MC procedures 263 <<< Contents 15 * Index >>> Multiple Comparison Procedures for Continuous Data assumption of equal variance among the treatment arms to be met. For all these reasons, fallback procedure seems to be the most appropriate MC procedure for the design we are interested in. Now, we will perform the comparison but this time with unequal variance between the treatment arms. Precisely, we simulate data under the following scenario to see the type I error rate control of different procedures. Treatment arms: placebo, dose1 (dose=0.3 mg), dose2 (dose=1 mg) and dose3 (dose=2 mg) with respective groups means as 0, 0, 0 and 0, respectively. standard deviation for placebo, dose1 and dose2 is 5; standard deviation for dose3 is 10 Type I Error: 0.025 (right-tailed) Number of Simulations:1000000 Total Sample Size:200 Allocation ratio: 1 : 1 : 1 : 1 Following output displays the type I error rate under different MC procedures for the unequal variance case. Note that the Dunnett tests slightly inflate type I error rate but all other procedures control the type I error rate below the nominal level 0.025. 264 15.3 Comparison of MC procedures <<< Contents * Index >>> 16 Multiple Endpoints-Gatekeeping Procedures 16.1 Introduction Clinical trials are often designed to assess benefits of a new treatment compared to a control treatment with respect to multiple clinical endpoints which are divided into hierarchically ordered families. Typically, the primary family of endpoints defines the overall outcome of the trial, provides the basis for regulatory claim and is included in the product label. The secondary families of endpoints play a supportive role and provide additional information for physicians, patients, payers and hence are useful for enhancing product label. Gatekeeping procedures are specifically designed to address this type of multiplicity problems by explicitly taking into account the hierarchical structure of the multiple objectives. The terminology-gatekeeping indicates the hierarchical decision structure where the higher ranked families serve as gatekeepers for the lower ranked family. The lower ranked families won’t be tested if the higher ranked families are not passed. Two types of gatekeeping procedures are described in this chapter. One is serial gatekeeping procedure and the other one is parallel gatekeeping procedure. In the next few sections, specific examples will be provided to illustrate how to design trials with each type of gatekeeping procedures. For more information about applications of gatekeeping procedures in a clinical trial setting and literature review on this topic, please refer to Dmitrienko and Tamhane (2007). 16.2 Simulate Serial Gatekeeping Design Serial gatekeeping procedures were studied by Maurer, Hothorn and Lehmacher (1995), Bauer et al. (1998) and Westfall and Krishen (2001). Serial gatekeepers are encountered in trials where endpoints are usually ordered from most important to least important. Reisberg et al. 2003 reported a study designed to investigate memantine, an N-methyl-D-aspartate (NMDA) antagonist, for the treatment of alzheimer’s disease in which patients with moderate-to-severe Alzheimer’s disease were randomly assigned to receive placebo or 20 mg of memantine daily for 28 weeks. The two primary efficacy variables were: (1) the Clinician’s Interview-Based Impression of Change Plus Caregiver Input (CIBIC-Plus) global score at 28 weeks, (2) the change from base line to week 28 in the Alzheimer’s Disease Cooperative Study Activities of Daily Living Inventory modified for severe dementia (ADCS-ADLsev). The CIBIC-Plus measures overall global change relative to base line and is scored on a seven-point scale ranging from 1 (markedly improved) to 7 (markedly worse). For illustration purpose, we redefine the primary endpoint of clinician’s global assessment score as 7 minus the CIBIC-Plus score so that a larger value indicates improvement (0 markedly worse and 6 markedly improved). The secondary efficacy endpoints included the Severe Impairment Battery and other measures of cognition, function, and behavior. Suppose that the trial is declared successful only if the treatment effect is demonstrated on both endpoints. If the trial is successful, it is of interest to assess the two secondary endpoints: (1) Severe Impairment Battery (SIB), (2) Mini-Mental State Examination 16.2 Simulate Serial Gatekeeping Design 265 <<< Contents 16 * Index >>> Multiple Endpoints-Gatekeeping Procedures (MMSE). The SIB was designed to evaluate cognitive performance in advanced Alzheimer’ disease. A 51-item scale, it assesses social interaction, memory, language, visuospatial ability, attention, praxis, and construction. The scores range from 0 (greatest impairment) to 100. The MMSE is a 30-point scale that measures cognitive function. The means of the endpoints for subjects in the control group and experimental group and the common covariance matrix are as follows CIBIC-Plus ADCS-ADLsev SIB MMSE Mean Treatment Mean Control 2.6 -2.5 -6.5 -0.4 2.3 -4.5 -10 -1.2 CIBIC-Plus ADCS-ADLsev SIB MMSE 1.2 3.6 6.8 1.6 3.6 42 38 9.3 6.8 38 145 17 1.6 9.3 17 8 Typically there are no analytical ways to compute the power for gatekeeping procedures. Simulations can be used to assess the operating characteristics of different designs. For example, one could simulate the power for given sample sizes. To start the simulations, click Two Samples in the Design tab and select Multiple Comparisons-Multiple Endpoints to see the following input windows 266 16.2 Simulate Serial Gatekeeping Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 On the top of this input window, one needs to specify the total number of endpoints and other input parameters such as Rejection Region, Type I Error, Sample Size. One also needs to select the multiple comparison procedure which will be used to test the last family of endpoints. The type I error specified on this screen is the nominal level of the familywise error rate which is defined as the probability of falsely declaring the efficacy of the new treatment compared to control with respect to any endpoint. For the Alzheimer’s disease example, CIBIC-Plus and ADCS-ADlsev form the primary family, and the other endpoints SIB and MMSE form the secondary family. Suppose that we would like to see the power for a sample size of 250 at a nominal type I error rate 0.025 using Bonferroni test for the secondary family, then the input window looks as follows 16.2 Simulate Serial Gatekeeping Design 267 <<< Contents 16 * Index >>> Multiple Endpoints-Gatekeeping Procedures Behind the window for Simulation Parameters, there is another window tab labeled as Response Generation Info. The window for Response Generation Info tab shown below allows one to specify the underlying joint distribution among the multiple endpoints for control arm and for experimental arm. The joint distribution among the endpoints are assumed to be multivariate normal with common covariance matrix. One also needs to specify which family each endpoint belongs to in the column with label Family Rank. One can also customize the label for each endpoint. For the Alzheimer’s disease example, the inputs for this window should be specified as follows One can specify the number of simulations to be performed on the window with the 268 16.2 Simulate Serial Gatekeeping Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 label Simulation Control Info. By default, 10000 simulations will be performed. One can also save the summary statistics for each simulated trial or save subject-level data by checking the appropriate box in the output option area. To simulate this design, click the Simulate button at the bottom right of the screen to see the preliminary output displayed in the output preview area as seen in the following screen. All the results displayed in the yellow cells are summary outputs generated from simulations. For example, the actually FWER, number of families, conjunctive power for the primary family, conjunctive power and disjunctive power for the last family. To view the detailed output, first save the simulation into a workbook in the library by clicking on the tool button and you will notice that a simulation node appears in the library as shown in the following screen. Now double click on the simulation node Sim1 to see the detailed output as shown in the following screen. The detailed output summarizes all the main input parameters such as the multiple comparison procedure used for the last family of endpoints, the nominal type I error level, total sample size, mean values for each endpoint in the control arm and that in the experimental arm etc. It also displays the attained overall FWER, conjunctive power, disjunctive power, the FWER and conjunctive power for each gatekeeper family, the FWER and conjunctive power and disjunctive power for the last family. The definitions of different types of power are as follows: Overall Power and FWER: Global: probability of declaring significance on any of the endpoints Conjunctive: probability of declaring significance on all of the endpoints for which the 16.2 Simulate Serial Gatekeeping Design 269 <<< Contents 16 * Index >>> Multiple Endpoints-Gatekeeping Procedures treatment arm is truly better than the control arm Disjunctive: probability of declaring significance on any of the endpoints for which the treatment arm is truly better than the control arm FWER: probability of making at least one type I error among all the endpoints Power and FWER for Individual Gatekeeper Family except the Last Family: Conjunctive Power: probability of declaring significance on all of the endpoints in the particular gatekeeper family for which the treatment arm is truly better than the control arm FWER: probability of making at least one type I error when testing the endpoints in the particular gatekeeper family Power and FWER for the Last Family: Conjunctive Power: probability of declaring significance on all of the endpoints in the last family for which the treatment arm is truly better than the control arm Disjunctive Power: probability of declaring significance on any of the endpoints in the last family for which the treatment arm is truly better than the control arm FWER: probability of making at least one type I error when testing the endpoints in the last family Marginal Power: probability of declaring significance on the particular endpoint For the Alzheimer’s disease example, the conjunctive power, which characterizes the power for the study, is 46.9% for a total sample size of 250. Using Bonferroni test for the last family, the design has 40.5% probability (disjunctive power for the last family) to detect the benefit of memantine with respect to at least one of the two secondary endpoints, SIB and MMSE. It has 25.1% chance (conjunctive power for the last family) 270 16.2 Simulate Serial Gatekeeping Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 to declare the benefit of memantine with respect to both of the secondary endpoints. One can find the sample size to achieve a target power by simulating multiple designs in a batch mode. For example, one could simulate a batch of designs for a range of 16.2 Simulate Serial Gatekeeping Design 271 <<< Contents 16 * Index >>> Multiple Endpoints-Gatekeeping Procedures sample size changing from 250 to 500 in step of 50 as shown in the following window. Note that a total sample size somewhere between 450 to 500 provides 80% power to detect the mean differences for both primary endpoints CIBIC-Plus and ADCS-ADLsev as seen in the following window. To get a more precise sample size to achieve 80% power, one could simulate a bunch of designs with the sample size ranging from 450 to 500 in step of 10. One will notice that a sample size of 480 provides over 80% power to claim the significant differences with respect to both primary endpoints. 272 16.2 Simulate Serial Gatekeeping Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 One could compare the multiple designs side by side by clicking on the tool button in the output preview area as follows: There is a special case where all the endpoints belong to one single family. The software handle this special case in a particular manner. Intersection-Union test will be applied to a single family of endpoints and the selected MCP for the last family in the Simulation Parameter tab is not applicable for this special case. For the Alzheimer disease example, if we are only interested in testing the two endpoints (CIBIC-Plus and ADCS-ADLsev) as co-primary endpoints as indicated by the family rank in the window for Response Generation Info, then the Intersection-Union test will be applied to the two endpoints so that each endpoint is tested at nominal level α. The detailed output window is slightly different in case of single family of endpoints as seen in the 16.2 Simulate Serial Gatekeeping Design 273 <<< Contents 16 * Index >>> Multiple Endpoints-Gatekeeping Procedures following window. 16.3 Simulate Parallel Gatekeeping Design Parallel gatekeeping procedures are often used in clinical trials with several primary objectives where each individual objective can characterize a successful trial outcome. In other words, the trial can be declared to be successful if at least one primary objective is met. Consider a randomized, double blinded and parallel group designed clinical trial to compare two vaccines against the human papilloma virus. Denote vaccine T the new vaccine and vaccine C the comparator. The primary objective of this study is to demonstrate that vaccine T is superior to vaccine C for the antigen type 16 or 18 which account for 70% of cervical cancer cases globally. If the new vaccine shows superiority over the comparator with respect to either antigen type 16 or 18, it is of interest to test the superiority of vaccine T to vaccince C for the antigen type 31 or 45. The two types of vaccines are compared based on the immunological response, i.e. the number of T-cell in the blood, seven months after the vaccination. Assume that the log transformed data is normally distributed with mean µiT or µiC (i = 1, 2, 3, 4) where the index 1, 2, 3, and 4 represent the four antigen types respectively. The null 274 16.3 Simulate Parallel Gatekeeping Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Table 16.1: Mean response and Standard Deviation Endpoints Mean for Vaccine C Mean for Vaccine T Standard Deviation Type 16 Type 18 Type 31 Type 45 4 3.35 2 1.42 4.57 4.22 2.34 2 0.5 0.5 0.6 0.3 hypotheses and alternative hypotheses can be formulated as Hi0 : µiT − µiC ≤ 0 vs Hi1 : µiT − µiC > 0 The parallel gatekeeping test strategy is suitable for this example. The two null hypotheses H10 and H20 for antigen type 16 and 18 constitute the primary family which serves as the gatekeeper for the second family of hypotheses which contains H30 and H40 . Assume that the means and the standard deviations for all four antigen types are as follows: Assume that the total sample size is 20 and one-sided significance level is 0.025. To assess the operating characteristics of the parallel gatekeeping procedures, we first need to open the simulation window for multiple endpoints. To this end, click on the Design menu, choose Two Sample for continuous endpoint and then select Multiple Endpoints from the drop-down list and the following screen will show up. On the top of the above screen, one need to specify the total number of endpoints. The 16.3 Simulate Parallel Gatekeeping Design 275 <<< Contents 16 * Index >>> Multiple Endpoints-Gatekeeping Procedures lower part of the above screen is the Simulation Parameters tab which allows one to specify the important design parameters including the nominal type I error rate, total sample size, multiple comparison procedures. Now select Parallel Gatekeeping and choose Bonferroni for the parallel gatekeeping methods. For the last family, select Bonferroni as the multiple testing procedure. Next to the Simulation Parameters tab are two additional tabs: Response Generation Info and Simulation Control Info. We need to specify the mean responses for each endpoint for both treatment and control arm as well as the covariance structure among the endpoints. In addition, we need to specify which family each specific endpoint belongs to in the column with the label Family Rank in the same table for specifying the mean responses. There are two ways of specifying the covariance structure: Covariance Matrix or Correlation Matrix. If the Correlation Matrix option is selected, one needs to input the standard deviation for each endpoint in the same table for specifying the mean responses. There is a simpler way to input the standard deviation for each endpoint if all the endpoints share a common standard deviation. This can be done by checking the box for Common Standard Deviation and specify the value of the common standard deviation in the box to the right hand side. One also need to specify the correlations among the endpoints in the table to the right hand side. Similarly, if all the endpoints have a common correlation, then we can just check the box for Common Correlation and specify the value of the common correlation in the box to the right. For the vaccine example, assume the endpoints share a common mild correlation 0.3. Then the window with completed inputs for generating data looks like the following screen. In the window for Simulation Control Info, we can specify the total number of simulations, refresh frequency, type of random number seed. We can also choose to save the simulation data for more advanced analyses. After finishing specifying all the 276 16.3 Simulate Parallel Gatekeeping Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 input parameter values, click on the Simulate button on the bottom right of the window to run the simualtions. The progress window will report how many simulations have been completed as seen in the following screen. When all the requested simulations have been completed, click on the Close button at the right bottom of the progress report screen and the preliminary simulation summary will show up in the output preview window where one can see overall power summary and the power summary for the primary family as well as the attained overal FWER etc. To see the detailed output, we need to save the simulation in the workbook by clicking on the icon on the top of the output preview window. A simulation node will be 16.3 Simulate Parallel Gatekeeping Design 277 <<< Contents 16 * Index >>> Multiple Endpoints-Gatekeeping Procedures appended in the corresponding workbook in the library as seen in the follow window. Next double click on the simulation node in the library and the detailed outputs will be displayed accordingly. In case of testing multiple endpoints, the power definition is not unique. East provides the overall power summary and the power summary for each specific family. In the 278 16.3 Simulate Parallel Gatekeeping Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 overall power summary table, the following types of power are provided with the overall FWER: global power, conjunctive power and disjunctive power, which capture the overall performance of this gatekeeping procedure. The definitions of the powers are given below: Overall Power and FWER: Global: probability of declaring significance on any of the endpoints Conjunctive: probability of declaring significance on all of the endpoints for which the treatment arm is truly better than the control arm Disjunctive: probability of declaring significance on any of the endpoints for which the treatment arm is truly better than the control arm FWER: probability of making at least one type I error among all the endpoints Power and FWER for Individual Gatekeeper Families except the Last Family: Conjunctive Power: probability of declaring significance on all of the endpoints in the particular gatekeeper family for which the treatment arm is truly better than the control arm Disjunctive Power: probability of declaring significance on any of the endpoints in the particular gatekeeper family for which the treatment arm is truly better than the control arm FWER: probability of making at least one type I error when testing the endpoints in the particular gatekeeper family Power and FWER for the Last Gatekeeper Family: Conjunctive Power: probability of declaring significance on all of the endpoints in the last family for which the treatment arm is truly better than the control arm Disjunctive Power: probability of declaring significance on any of the endpoints in the last family for which the treatment arm is truly better than the control arm FWER: probability of making at least one type I error when testing the endpoints in the last family Marginal Power: probability of declaring significance on the particular endpoint For the vaccine example, we see that the gatekeeping procedure using Bonferroni test for both the primary family and the secondary family provides 94.49% power to detect the difference in at least one of the two antigen types 16 and 18. It provides 52.19% power to detect the differences in both antigen types. Also note that this gatekeeping procedure only provides 89.55% power to detect the response difference in any of the other two antigen types 31 or 45 and only 12.53% to detect both antigen types 31 and 45. The marginal power table displays the probabilities of declaring significance on the 16.3 Simulate Parallel Gatekeeping Design 279 <<< Contents 16 * Index >>> Multiple Endpoints-Gatekeeping Procedures Table 16.2: Power Comparisons under Different Correlation Assumptions Correlation 0.3 0.5 0.8 Primary Family Disjunct. Conjunct. 0.9449 0.9324 0.9174 0.5219 0.5344 0.5497 Secondary Family Disjunct. Conjunct. 0.8955 0.8867 0.8855 0.1253 0.1327 0.1413 Overall Power Disjunct. Conjunct. 0.9449 0.9324 0.9174 0.1012 0.1192 0.1402 particular endpoint after multiplicity adjustment. For example, the power of detecting antigen type 16 is 55.22%. If it is of interest to assess the robustness of this procedure with respect to the correlation among the different endpoints, we can go back to the input window to change the correlations and run simulation again. To this end, right click on the Sim1 node in the library and select Edit Simulation from the dropdown list. Next click on the Response Generation Info tab, change the common correlation to 0.5 and click Simulate button. We can repeat this for a common correlation 0.8. The following table summarizes the power comparisons under different correlation assumptions. Note that the disjunctive power decreases as the correlation increases and conjunctive power increases as the correlation increases. There are three available parallel gatekeeping methods: Bonferroni, Truncated Holm and Truncated Hochberg. The multiple comparison procedures applied to the gatekeeper families need to satisfy the so-called separable condition. A multiple comparison procedure is separable if the type I error rate under partial null configuration is strictly less than the nominal level α. Bonferroni is a separable procedure. However, the regular Holm and Hochberg procedure are not separable and can’t be applied directly to the gatekeeper families. The truncated versions obtained by taking the convex combinations of the critical constants for the regular Holm/Hochberg procedure and Bonferroni procedure are separable and more powerful than Bonferroni test. The truncation constant leverages the degree of conservativeness. The larger value of the truncation constant results in more powerful procedure. If the truncation constant is set to be 1, it reduces to the regular Holm or Hochberg test. To see this, let’s simulate the design using the truncated Holm procedure for the primary family and Bonferroni test for the second family for the vaccine example with common correlation 0.3. Table 3 compares the conjunctive power and disjunctive power for each family and the overall ones for different truncation parameter values. As the value of the truncation parameter increases, the conjunctive power for the primary family increases and the disjunctive power remain unchanged. Both the conjunctive power 280 16.3 Simulate Parallel Gatekeeping Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Table 16.3: Impact of Truncation Constant in Truncated Holm Procedure on Overal Power and Power for Each Family Truncation Constant 0 0.25 0.5 0.8 Primary Family Conjunct. Disjunct. 0.5219 0.5647 0.5988 0.6327 0.9449 0.9449 0.9449 0.9449 Secondary Family Conjunct. Disjunct. 0.1253 0.1229 0.1212 0.1188 0.8955 0.8872 0.8747 0.84 Overall Power Conjunct. Disjunct. 0.1012 0.1065 0.1108 0.115 0.9449 0.9449 0.9449 0.9449 Table 16.4: Impact of Truncation Constant in Truncated Holm Procedure on Marginal Power Truncation Constant 0 0.25 0.5 0.8 Primary Family Type 16 Type 18 Secondary Family Type 31 Type 45 0.5522 0.5886 0.6183 0.6483 0.127 0.1246 0.1227 0.1203 0.9146 0.921 0.9254 0.9293 0.8938 0.8855 0.8731 0.8385 and disjunctive power for the secondary family decrease as we increase the truncation parameter. The overall conjunctive power also increases but the overall disjunctive power remains the same with the increase of truncation parameter. Table 4 shows the marginal powers of this design for different truncation parameter values. The marginal powers for the two endpoints in the primary family increase. On the other hand, the marginal powers for the two endpoints in the secondary family decrease. Table 5 and Table 6 displays the operating characteristics for truncation Hochberg test with different truncation constant values. Note that both the conjunctive and disjunctive powers for the primary family increase as the truncation parameter increases. However, the power for the secondary family decreases with the larger truncation parameter value. The marginal powers for the primary family and for the secondary family behave similarly. The overall conjunctive and disjunctive powers also increase as we increase the truncation parameter. If all the endpoints belong to one single family, the selected multiple testing procedures for the last family (Bonferroni, Sidak, Weighted Bonferroni, Holm’s step 16.3 Simulate Parallel Gatekeeping Design 281 <<< Contents 16 * Index >>> Multiple Endpoints-Gatekeeping Procedures Table 16.5: Impact of Truncation Constant in Truncated Hochberg Procedure on Overal Power and Power for Each Family Truncation Constant 0 0.25 0.5 0.8 Primary Family Conjunct. Disjunct. 0.5219 0.5652 0.6007 0.6369 0.9449 0.9455 0.9468 0.9491 Secondary Family Conjunct. Disjunct. 0.1253 0.1229 0.1213 0.119 0.8955 0.8877 0.8764 0.8439 Overall Power Conjunct. Disjunct. 0.1012 0.1065 0.1109 0.1152 0.9449 0.9455 0.9468 0.9491 Table 16.6: Impact of Truncation Constant in Truncated Hochberg Procedure on Marginal Power Truncation Constant 0 0.25 0.5 0.8 282 Primary Family Type 16 Type 18 Secondary Family Type 31 Type 45 0.5522 0.5892 0.6203 0.6525 0.127 0.1246 0.1228 0.1205 0.9146 0.9215 0.9273 0.9335 16.3 Simulate Parallel Gatekeeping Design 0.8938 0.886 0.8749 0.8424 <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 down, Hochberg’s step up, Hommel’s step up, Fixed Sequence or Fallback) will be applied for multiplicity adjustment. For example, if all the four antigen types in the vaccine example are treated as primary endpoints as indicated by the family rank in the window for Response Generation Info and Hochberg’s step up test is selected for the last family in the window for Simulation Parameters, then the regular Hochberg test will be applied to the four endpoints for multiplicity adjustment. The detailed output window is slightly different in case of single family of endpoints as seen in the following window. 16.3 Simulate Parallel Gatekeeping Design 283 <<< Contents 16 284 * Index >>> Multiple Endpoints-Gatekeeping Procedures 16.3 Simulate Parallel Gatekeeping Design <<< Contents * Index >>> 17 17.1 Continuous Endpoint: Multi-arm Multi-stage (MaMs) Designs Design Consider designing a placebo controlled, double blind and randomized trial to evaluate the efficacy, pharmacokinetics, safety and tolerability of a new therapy given as multiple weekly infusions in subjects with a recent acute coronary syndrome. There are four dose regimens to be investigated. The treatment effect is assessed through the change in PAV (percent atheroma volume) from baseline to Day 36 post-randomization, as determined by IVUS (intravascular ultrasound). The expected change in PAV for placebo group and the four dose regimens are: 0, 1,1.1,1.2 and 1.3 and the common standard deviation is 3. The objective of the study is to find the optimal dose regimen based on the totality of the evidence including benefit-risk assessment and cost considerations. To design such a study in EAST, we first need to invoke the design dialog window. To this end, one needs to click on the Design menu on the top of EAST window, select Many Samples for continuous type of response and then select Multiple Looks-Group Sequential in the drop-down list as shown in the following screen shot After selecting the design, we will see a dialog window for the user to specify the main design parameters. On the top of the window, we need to specify the number of arms including the control arm and the number of looks. We also need to specify the 17.1 Design 285 <<< Contents 17 * Index >>> Continuous Endpoint: Multi-arm Multi-stage (MaMs) Designs nominal significance level, power or sample size, mean response for each arm, standard deviation for each arm and allocation ratio of each arm to control arm. Suppose we would like to compute the sample size to achieve 90% power at one-sided 0.025 significance level. After filling in all the inputs, the design dialog window looks as follows: Now click on the compute button at the bottom right of the window to see the total sample size. Note that we need 519 subjects. Here the power is the probability of successfully detecting significant difference for at least one active treatment group 286 17.1 Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 compared to control arm. Suppose that now we would like to do a group sequential design with interim looks so that the trial can be terminated earlier if one or more of the treatment groups demonstrate overwhelming efficacy. To do this, we change the number of looks to 3. Note that there is another tab showing up beside the Test Parameter tab. This new tab with label Boundary is to specify efficacy boundary, futility boundary and the spacing of looks. Suppose we want to take two interim looks with equally spacing using O’Brien Fleming spending function from Lan-DeMats 1984. The input window looks 17.1 Design 287 <<< Contents 17 * Index >>> Continuous Endpoint: Multi-arm Multi-stage (MaMs) Designs like the following One can view the boundaries in terms of other scales including score, δ and p-value scale by clicking the drop-down box for boundary scale. For example, the δ scale boundary for this study is 2.904, 1.486 and 1.026. Now click on the compute button on the bottom right of the window to create the design. Note that the total sample size to achieve 90% power is now 525 compared to 519 for the fixed sample design created earlier. The power definition here is the probability of successfully detecting any active treatment group which is significantly 288 17.1 Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 different from control group at any look. To view the detailed design output, keep the design in the library and double click the design node. The first table shows the sample size information including the maximum sample size 17.1 Design 289 <<< Contents 17 * Index >>> Continuous Endpoint: Multi-arm Multi-stage (MaMs) Designs if the trial goes all the way to the end and the sample size per arm. It also shows that the expected sample size under the global null where none of the active treatment group is different from control group and the expected sample size under the design alternative specified by the user. The secondary table displays the look-by-look information including sample size, cumulative type I error, boundaries, boundary crossing probability under the global null and under user-specified design alternative. The boundary crossing probability at each look shows the likelihood of at least one active treatment group crossing the boundary at that particular look. The third table shows the Z scale boundary. One can also add a futility boundary to the design by clicking on the drop-down box for the futility boundary family. There are three families of boundary for futility: Spending Function, p value, δ which can be seen as in the following screen Now click on recalc button to see the cumulative α, efficacy boundary, cumulative β and futility boundary displayed in the boundary table. The futility boundary is non-binding and the details on the computation of futility boundary is provided in Section J.2. The futility boundary is computed such that the probability for the best performed arm (compared to control arm) to cross the futility boundary at any look is equal to the incremental β. For example, the probability for the best performed treatment arm crossing 0.178 is 0.005 under the design alternative. The probability for the trial to stay in the continuous region at the first look but cross the futility boundary 290 17.1 Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 1.647 at second look is 0.04 which is the incremental β spent. Now click on Compute to see the required sample size to achieve 90% power. Note that we need a larger sample size 560 to acheive the same target power with futility boundary compared to the design without futility boundary. However, the expected sample size under H0 with futility boundary is much smaller than the design without 17.1 Design 291 <<< Contents 17 * Index >>> Continuous Endpoint: Multi-arm Multi-stage (MaMs) Designs futility. One can also build a futility boundary based on δ. For example, one might want to terminate the study if negative δ is observed. It can be seen that such futility boundary is more conservative than the one constructed based on O’Brien-Fleming spending function in the sense that it terminates the trial earlier for futility with smaller 292 17.1 Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 probability. 17.2 Simulation Multi-arm multi-stage design is complex study design with pros and cons. One of the pros is that it saves subjects compared to conducting separate studies to assess each treatment to control. It may also be advantageous in terms of enrolment. One of the cons is that the hurdle for demonstrating statistical significance is higher due to multiplicity. One needs to evaluate the operating characteristics of such designs through intensive simulations and to assess the pros and cons of using such design. To simulate a MAMS design, select the design node in the library and click on the 17.2 Simulation 293 <<< Contents 17 * Index >>> Continuous Endpoint: Multi-arm Multi-stage (MaMs) Designs simulation icon located at the top of the library window This will open the simulation dialog window. There are four windows for inputing values for simulation parameters: Test Parameters, Boundary, Response Generation and Simulation Controls. The Test Parameters window provides the total sample size, test statistics and variance type to be used in simulations. The boundary tab has similar inputs as that for design. The default inputs for boundary are carried from the design. One can modify the boundary in the simulation mode without having to go back to design. One can even add a futility boundary. The next screen is Response Generation tab where one needs to specify the underlying mean, standard deviation and allocaton ratio for different treatment arm. The last tab, Simulation Control, allows one to specify the total number of simulations to be run and to save the intermediate simulation data for further analysis. For example, we can run simulation under the 294 17.2 Simulation <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 design alternative where the mean differences from control are 1,1.1,1.2 and 1.3. After filling in all the inputs, click on the Simulation button on the right bottom of the window. After the simulation is completed, it will show up in the ouput preview area. To view the detailed simulation output, we can save it into the library and double click the simulation node. The first table in the detailed output shows the overall power including global power, conjunctive power, disjunctive power and FWER. The definitions for different powers 17.2 Simulation 295 <<< Contents 17 * Index >>> Continuous Endpoint: Multi-arm Multi-stage (MaMs) Designs are as follows. Global Power: probability of demonstrating statistical significance on one or more treatment groups Conjunctive Power: probability of demonstrating statistical significance on all treatment groups which are truely effective Disjunctive Power: probability of demonstrating statistical significance on at least one treatment group which is truely effective FWER: probability of incorrectly demonstrating statistical significance on at least one treatment group which is truely ineffective For this example, the global power is about 90% which confirms the design power. The conjunctive power is about 8%. The second table for probability of trial termination at each look displays the average sample size, information fraction, cumulative α spent, bounary information, probability of trial termination at each look. For this example, the chance of terminating the trial at the very first look is less than 3%. The trial has about 55% chance to stop early by the second look. It can be seen that the average sample size for the trial is about 424 which is shown in the last entry of the average sample size column. In MAMS design, when the trial stops for efficacy, there might be one or more treatments crossing the efficacy boundary. Such information is valuable in some situations. For example, when multiple dose options are desired for patients with different demographic characteristics, it might be benificial to approve multiple doses on the product label which will give physicians the options to prescribe the appropriate dose for a specific patient. In this case, we are not only interested in the overal power of the study but also interested in the power of claiming efficacy on more than one dose groups. Such information is summarized in the third table. This table shows the probability of demonstrating significance on specific number of treatments at each look and across all looks. For example, the trial has about 90% overall power. With 39% probability out of 90%, it successully shows significance on only one treatment, 26% probability on two treatments, 17% on three treatments and about 8.5% for all four treatments. It also shows such breakdown look by look. The fourth table summarizes the marginal power for each treatment group look by look and across all looks. For example, the trial has a marginal power of 29% successfully demonstrating efficacy for Treatment 1, 38% for Treatment 2, 49% for Treatment 3 and 60% for Treatment 4. The detailed efficacy outcome table as seen in the following screen provides further efficacy details pertinent to treatment identities. For example, 296 17.2 Simulation <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 the trial has about 3.77% probability of demonstrating efficacy only on Treatment 1, 1.34% for both treatment 1 and 2, 1.7% for treatment 1, 2 and 3. It has 8.5% probability of showing significance on all four treatments. 17.2.1 Futility Stopping and Dropping the Losers In the simulation mode, the futility boundary can be utilized in two different manners. Futility boundary can be used to terminate the trial earlier if the best performing treatment isn’t doing well. It can also be used to drop arms which are futile along the way and only continue those treatments which are performing well. The two options can be accessed through the two radio buttons below the boundary table as seen in the 17.2 Simulation – 17.2.1 Futility Stopping and Dropping the Losers 297 <<< Contents 17 * Index >>> Continuous Endpoint: Multi-arm Multi-stage (MaMs) Designs following screen. Suppose that we would like to incorporate a conservative futility boundary so that we will terminate the trial if all δs are negative at any interim look. We would specify the 298 17.2 Simulation – 17.2.1 Futility Stopping and Dropping the Losers <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 futility boundary as in the following screen. Suppose we want to see how often the trial will be terminated early for futility if none of the treatments are effective. Click on the Simulate button on the right bottom of the window to start simulation. The detailed output is shown below. Note that the trial will have about 20% probability of stopping early for futility at the very first look and a little more than 9% chance of stopping for futility at the second look. The average 17.2 Simulation – 17.2.1 Futility Stopping and Dropping the Losers 299 <<< Contents 17 * Index >>> Continuous Endpoint: Multi-arm Multi-stage (MaMs) Designs sample size is about 437 compared to 523 for the design without futility boundary. Under the design alternative, there is a very small probability (less than 0.5%) to terminate the trial early for futility as seen from the following table. For the big companies, a more agressive futility boundary might be desirable so that trials for treatments with small effect can be terminated early and resources can be deployed to other programs. Suppose that a futility boundary based on δ = 0.5 to be used. Under the global null hypothesis, there is almost 70% chance for the trial to stop early for futility. The average sample size for the study is about 316 compared to 437 300 17.2 Simulation – 17.2.1 Futility Stopping and Dropping the Losers <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 for the design with futility based δ of zero. The other use of the futility boundary is to drop those arms which are ineffective along the way. Such design would be more efficient if it is anticipated that there is a strong heterogeneity among different treatment arms. Suppose that two of the four treatment regimens have relative smaller treatment effect. For example, the mean difference from control might be 0.1, 0.1, 1.2,1.3. Without applying any futility, the trial has about 85% and average sample size of 437. If we drop those doses which cross the futility boundary based on δ of 0.5, the trial has about 82% power and average sample size 328. From the table for probability of trial termination at each look, we can see that the trial has about 8% chance stopping early at the first interim look of which a little more than 2% for efficacy and about 5% chance for futility. The trial has 46% chance stopping earlier at the second look with about 45% for efficacy and less than 2% for futility. From the table for additional details of probability of trial termination at each look, we can see that the trial has 2.78% chance stopping for efficacy at the first look of which 2.55% probability the trial demonstrates significance on only one treatment. At the second look, the trial has about 45% probability stopping early for efficacy of which 29% probability it demonstrates significance on one treatment, 15% probability on two treatments and less than 1% probability on three or four treatments. This design has marginal power about 50% to detect significance on Treatment 3 and more than 60% probability on Treatment 4. Treatment 1 and Treatment 2 each has 70% chance being terminated at look 1 for futility. The marginal probability for futility stopping for each treatment counts those simulated trials for which the particular treatment crosses the futility boundary but it doesn’t counts those trials for which the particular treatment 17.2 Simulation – 17.2.1 Futility Stopping and Dropping the Losers 301 <<< Contents 17 * Index >>> Continuous Endpoint: Multi-arm Multi-stage (MaMs) Designs falls into the continuous region. The second table in the above screen shows the probability of demonstrating significance on specific number of treatments. However it doesn’t provide information on the likelihood of showing efficacy on specific treatment combinations. Such information is provided in the table for detailed efficacy outcomes. For example, the trial has about 20% probability of success with Treatment 3 only, 32% with Treatment 302 17.2 Simulation – 17.2.2 Interim Treatment Selection <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 4 only, 30% with both Treatment 3 and Treatment 4. 17.2.2 Interim Treatment Selection It might be desirable to select promising dose/treatment groups and drop those ineffective or unsafe groups after reviewing the interim data. In general, there are no analytical approach to evaluate such complex design. EAST provides the option to evaluate such adaptive design through intensive simulations. The treatment selection option can be incorporated by clicking on the icon located on the top bar of the main simulation dialog window. The treatment selection window screen looks as follows. It takes several inputs from the user. The first input is the drop-down box for the user to specify the look position for performing treatment selection. The next input is drop-down box for the treatment effect scale. There is a list of treatment effect scale available as seen in the following screen including Wald Statistic, Estimated Mean, 17.2 Simulation – 17.2.2 Interim Treatment Selection 303 <<< Contents 17 * Index >>> Continuous Endpoint: Multi-arm Multi-stage (MaMs) Designs Estimated δ etc. EAST provides three different dose/treatment selection rules: (1) Select best r treatment, (2) Select treatments wthin of the best treatment, (3) Select treatments greater than threshold ζ where r, , ζ accept inputs from the user. For the same example, suppose we select two best treatments at the second interim look. The inputs 304 17.2 Simulation – 17.2.2 Interim Treatment Selection <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 are as follows: 17.2 Simulation – 17.2.2 Interim Treatment Selection 305 <<< Contents 17 * Index >>> Continuous Endpoint: Multi-arm Multi-stage (MaMs) Designs Now click on simulation button to run simulations. When the simulation is done, save it into the library and view the detailed output as in the following screen. We can see that the trial has about 85% overall power to detect significance on at least one treatment group with an average sample size of 400 (Overall Powers). It has about 50% probability of stopping early by the second look (Prabability of Trial Termination at Each Look). From the third table (Additional Details of Probability of Trial Termination at Each Look), it can be seen that the trial has about 52% power to show significance on only one treatment and 33% probability on two treatments, less than 1% probability on three or four treatments. Marginally Treatment 3 has 53% chance of 306 17.2 Simulation – 17.2.2 Interim Treatment Selection <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 success and Treatment 4 has 66% chance of success. When we select two best treatments, the sample size for the selected two treatments remains the same as the designed one. However we can realloacate the remaining sample size from the dropped groups to the selected arm to gain more power. If the sample size for the dropped arms are reallocated to the selected arms, the efficacy stopping boundary for the remaining looks will have to be recomputed in order to preserve the type I error. This can be achieved by checking the box for Reallocating remaining sample size to selected arm on the Treatment Selection tab as seen in the 17.2 Simulation – 17.2.2 Interim Treatment Selection 307 <<< Contents 17 * Index >>> Continuous Endpoint: Multi-arm Multi-stage (MaMs) Designs following window. The simulation output is shown in the following screen. Note that the power of the study is almost 92% in exchange of a higher average sample size 436 compared to the design without sample size reallocation (85% power and 400 average sample size). Also with sample size reallocation, the study has a higher power 43% of demonstrating significance on both Treatment 3 and Treatment 4 compared to the design without sample size reallocation which has 33% power. 308 17.2 Simulation <<< Contents * Index >>> 18 Two-Stage Multi-arm Designs using p-value combination 18.1 Introduction In the drug development process, identification of promising therapies and inference on selected treatments are usually performed in two or more stages. The procedure we will be discussing here is an adaptive two-stage design that can be used for the situation of multiple treatments to be compared with a control. This will allow integration of both the stages within a single confirmatory trial controlling the multiple level type-I error. After the interim analysis in the first stage, the trial may be terminated early or continued with a second stage, where the set of treatments may be reduced due to lack of efficacy or presence of safety problems with some of the treatments. This procedure in East is highly flexible with respect to stopping rules and selection criteria and also allows re-estimation of the sample size for the second stage. Simulations show that the method may be substantially more powerful than classical one-stage multiple treatment designs with the same total sample size because second stage sample size is focused on evaluating only the promising treatments identified in the first stage. This procedure is available for continuous as well discrete endpoint studies. The current chapter deals with the continuous endpoint studies only; discrete endpoint studies are handled similarly. 18.2 Study Design This section will explore different design options available in East with the help of an example. 18.2.1 Introduction to the Study 18.2.2 Methodology 18.2.3 Study Design Inputs 18.2.4 Simulating under Different Alternatives 18.2.1 Introduction to the Study Consider designing a placebo controlled, double blind, randomized trial to evaluate the efficacy, pharmacokinetics, safety and tolerability of a New Chemical Entity (NCE) given as multiple weekly infusions in subjects with a recent acute coronary syndrome. There are four dose regimens to be investigated. The treatment effect is assessed through the change in PAV (percent atheroma volume) from baseline to Day 36 post-randomization, as determined by IVUS (intravascular ultrasound). The expected change in PAV for placebo group and the four dose regimens are: 0, 1, 1.1, 1.2, 1.3 and the common standard deviation is 3. The objective of the study is to find the optimal dose regimen based on the totality of the evidence including benefit-risk assessment and cost considerations. 18.2.2 Methodology This is a randomized, double-blind, placebo-controlled study conducted in two parts using a 2-stage adaptive design. In Stage 1, approximately 250 eligible subjects will be randomized equally to one of four treatment arms (NCE [doses: 1, 2.5, 5 or 10 mg]) and matching placebo (which is 50 subjects/dose group) After all subjects in Stage 1 18.2 Study Design – 18.2.2 Methodology 309 <<< Contents 18 * Index >>> Two-Stage Multi-arm Designs using p-value combination have completed treatment period or discontinued earlier, an interim analysis will be conducted to 1. compare the means each dose group 2. assess safety within each dose group and 3. drop the less efficient doses Based on the interim analysis, Stage 2 of the study will either continue with additional subjects enrolling into 1/2/3 arms (placebo and 1/2/3 favorable, active doses) or the study will be halted completely if unacceptable toxicity has been observed. In this example, we will have the following workflow to cover different options available in East: 1. Start with four arms (4 doses + Placebo) 2. Evaluate the four doses at the interim analysis and based on the Treatment Selection Rules carry forward some of the doses to the next stage 3. While we select the doses, also increase the sample size of the trial by using Sample Size Re-estimation (SSR) tool to improve conditional power if necessary In a real trial, both the above actions (early stopping as well as sample size re-estimation) will be performed after observing the interim data. 4. See the final design output in terms of different powers, probabilities of selecting particular dose combinations 5. See the early stopping boundaries for efficacy and futility on adjusted p-value scale 6. Monitor the actual trial using the Interim Monitoring tool in East. Start East. Click Design tab, then click Many Samples in the Continuous category, and then click Multiple Looks- Combining p-values test. 310 18.2 Study Design – 18.2.2 Methodology <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 This will bring up the input window of the design with some default values. Enter the inputs as discussed below. 18.2.3 Study Design Inputs The four doses of the treatment- 1mg, 2.5mg, 5mg, 10mg will be compared with the Placebo arm based on their treatment means. Preliminary sample size estimates are provided to achieve an overall study power of at least 90% at an overall, adequately adjusted 1-sided type-1 or alpha level of 2.5%, after taking into account all interim and final hypothesis tests. Note that we always use 1-sided alpha since dose-selection rules are usually 1-sided. In Stage 1, 250 subjects are initially planned for enrollment (5 arms with 50 subjects each). Following an interim analysis conducted after all subjects in Stage 1 have completed treatment period or discontinued earlier, an additional 225 subjects will be enrolled into three doses for Stage 2 (placebo and two active doses). So we start with the total of 250+225 = 475 subjects. The multiplicity adjustment methods available in East to compute the adjusted p-value (p-value corresponding to global NULL) are Bonferroni, Sidak, Simes. For discrete endpoint test, Dunnett Single Step is not available since we will be using Z-statistic. Let us use the Bonferroni method for this example. The p-values obtained from both the stages can be combined by using the “Inverse Normal” method. In the “Inverse 18.2 Study Design – 18.2.3 Study Design Inputs 311 <<< Contents 18 * Index >>> Two-Stage Multi-arm Designs using p-value combination Normal” method, East first computes the weights as follows: r n(1) (1) w = n And r w (2) = n(2) n (18.1) (18.2) where n(1) and n(2) are the total sample sizes corresponding to Stage 1 and stage 2 respectively and n is the total sample size. EAST displays these weights by default but they are editable and user can specify any other weights as long as 2 2 w(1) + w(2) = 1 (18.3) Final p-value is given by p = 1 − Φ w(1) Φ−1 (1 − p(1) ) + w(2) Φ−1 (1 − p(2) ) (18.4) The weights specified on this tab will be used for p-value computation. w(1) will be used for data before interim look and w(2) will be used for data after interim look. Thus, according to the samples p sizes planned pfor the two stages in this example, the weights are calculated as (250/475) and (225/475). Note : These weights are updated by East once we specify the first look position as 250/475 in the Boundary tab. So leave these as default values for now. Set the Number of Arms as 5 and enter the rest of the inputs as shown below: We can certainly have early stopping boundaries for efficacy and/or futility. But generally, in designs like this, the objective is to select the best dose(s) and not stop early. So for now, select the Boundary tab and set both the boundary families to “None”. Also, set the timing of the interim analysis as 0.526 which will be after 312 18.2 Study Design – 18.2.3 Study Design Inputs <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 observing the data on 250 subjects out of 475. Enter 250/475 as shown below. Notice the updated weights on the bf Test Parameters tab. The next tab is Response Generation which is used to specify the true underlying means on the individual dose groups and the initial allocation from which to generate the simulated data. One can also generate the mean response for all the arms using a dose-response curve like 4PL or Emax or Linear or Quadratic. It can be done by checking the box for Generate Means through DR Curve and entering appropriate parameters for DR 18.2 Study Design – 18.2.3 Study Design Inputs 313 <<< Contents 18 * Index >>> Two-Stage Multi-arm Designs using p-value combination model selected. For this example, we will use the given means and standard deviation and not generate them using a DR curve. Make sure the means are 0, 1, 1.1, 1.2, 1.3 and SD is 3. Before we update the Treatment Selection tab, go to the Simulation Control Parameters tab where we can specify the number of simulations to run, the random number seed and also to save the intermediate simulation data. For now, enter the inputs as shown below and keep all other inputs as default. Click on the Treatment Selection tab. This tab is to select the scale to compute the treatment-wise effects. For selecting treatments for the second stage, the treatment effect scale will be required, but the control treatment will not be considered for selection. It will always be there in the second stage. The list under Treatment Effect Scale allows you to set the selection rules on different scales. Select Estimated δ from this list. It means that all the selection rules we specify on this tab will be in terms of the estimated value of treatment effect, δ, i.e., difference from 314 18.2 Study Design – 18.2.3 Study Design Inputs <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 placebo. Here is a list of all available treatment effect scales: Estimated Mean, Estimated δ, Estimated δ/σ, Test Statistic, Conditional Power, Isotonic Mean, Isotonic δ, Isotonic δ/σ. For more details on these scales, refer to the Appendix K chapter on this method. The next step is to set the treatment selection rules for the second stage. Select Best r Treatments: The best treatment is defined as the treatment having the highest or lowest mean effect. The decision is based on the rejection region. If it is “Right-Tail” then the highest should be taken as best. If it is “Left-Tail” then the lowest is taken as best. Note that the rejection region does not affect the choice of treatment based on conditional power. Select treatments within of Best Treatment: Suppose the treatment effect scale is Estimated δ. If the best treatment has a treatment effect of δb and is specified as 0.1 then all the treatments which have a δ as δb − 0.1 or more are chosen for Stage 2. Select treatments greater than threshold ζ: The treatments which have the treatment effect scale greater or less than the threshold (ζ) specified by the user according to the rejection region. But if the treatment effect scale is chosen as the conditional power then it will be greater than all the time. Use R for Treatment Selection: If you wish to define any customized treatment selection rules, it can be done by writing an R function for those rules to be used within East. This is possible due to the R Integration feature in East. Refer to the appendix chapter on R Functions for more details on syntax and use of this feature. A template file for defining treatment selection rules is also available in the subfolder RSamples under your East installation directory. For more details on using R to define Treatment selection rules, refer to section O.10. 18.2 Study Design – 18.2.3 Study Design Inputs 315 <<< Contents 18 * Index >>> Two-Stage Multi-arm Designs using p-value combination Selecting multiple doses (arms) for Stage 2 would be more effective than selecting just the best one. For this example, select the first rule Select Best r treatments and set r = 2 which indicates that East will select the best two doses for Stage 2 out the four. We will leave the Allocation Ratio after Selection as 1 to yield equal allocation between the control and selected doses in Stage 2. Click the Simulate button to run the simulations. When the simulations are over, a row gets added in the Output Preview area. Save this row to the Library by clicking the icon in the toolbar. Rename this scenario as Best2. Double click it to see the detailed output. The first table in the detailed output shows the overall power including global power, conjunctive power, disjunctive power and FWER. The definitions for different powers are as follows: Global Power: probability of demonstrating statistical significance on one or 316 18.2 Study Design – 18.2.3 Study Design Inputs <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 more treatment groups Conjunctive Power: probability of demonstrating statistical significance on all treatment groups which are truly effective Disjunctive Power: probability of demonstrating statistical significance on at least one treatment group which is truly effective FWER: probability of incorrectly demonstrating statistical significance on at least one treatment group which is truly ineffective For our example, there is 88% global power which is the probability of this design to reject any null hypothesis, where the set of null hypothesis are the TRUE proportion of responders at each dose equals that of control. Also shown is conjunctive and disjunctive power, as well as Family Wise Error Rate (FWER). The Lookwise Summary table summarizes the number of simulated trials that ended with a conclusion of efficacy, i.e., rejected any null hypothesis, at each look. In this example, no simulated trial stopped at the interim analysis with an efficacy conclusion since there were no stopping boundaries, but 8845 simulations yielded an efficacy conclusion via the selected dose after Stage 2. This is consistent with the global power. The table Detailed Efficacy Outcomes for all 10000 Simulations summarizes the number of simulations for which each individual dose group or pairs of doses were selected for Stage 2 and yielded an efficacy conclusion. For example, the pair (2.5mg, 10mg only) was observed to be efficacious in approximately 16% of the trials (1576/10000). The next table Marginal Probabilities of Selection and Efficacy, summarizes the number and percent of simulations in which each dose was selected for Stage 2, regardless of whether it was found significant at end of Stage 2 or not, as well as the number and percent of simulations in which each dose was selected and found significant. Average sample size is also shown. It tells us how frequently the dose (either alone or with some other dose) was selected and efficacious. For example, dose 10mg was selected in approximately 65% trials and was efficacious in approximately 56% trials. (which is the sum of 631, 1144, 1576, 2254 simulations from previous table.) The advantage of 2-stage “treatment selection design” or “drop-the-loser” design is that it allows to drop the less performing/futile arms based on the interim data and still preserves the type-1 error as well as achieve the desired power. In the Best2 scenario, we dropped two doses (r = 2). Suppose, we had decided to proceed to stage 2 without dropping any doses. In this case, Power would have 18.2 Study Design – 18.2.3 Study Design Inputs 317 <<< Contents 18 * Index >>> Two-Stage Multi-arm Designs using p-value combination dropped significantly. To verify this in East, click the button on the bottom left corner of the screen. This will take us back to the input window of the last simulation scenario. Go to Treatment Selection tab and set r = 4 and save it to Library. Rename this scenario as All4. Double click it to see the detailed output. We can observe that the power drops from 88% to 78%. That is because the sample size of 225 is being shared among five arms as against three arms in the Best2 case. Now go back to Treatment Selection tab, set r = 2 as before. Select one more rule, Select Treatments within of Best Treatment and set the value as 0.05. The tab should look as shown below. Also set the Starting Seed on Simulation Controls tab to 100. Note that since we have selected two treatment selection rules, East will simulate two different scenarios, one for each rule. As we want to compare the results from these two scenarios, we use the same starting seed. That will ensure same random number generation and the only difference in results will be the effect of the two rules. Save these two scenarios in the Library as r=2 and epsilon=0.05, select them and click the 318 icon in the toolbar to see them side-by-side. 18.2 Study Design – 18.2.3 Study Design Inputs <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Notice the powers for the two scenarios. The scenario with the rule of δb − 0.05 yields more power than the Best2 Scenario. Note that δb is the highest value among the simulated of δ values for the four doses at the interim look. You can also view the Output Details of these two scenarios. Select the two nodes as 18.2 Study Design – 18.2.3 Study Design Inputs 319 <<< Contents 18 * Index >>> Two-Stage Multi-arm Designs using p-value combination before but this time, click the icon in the toolbar. Notice from this comparison, due to a more general rule based on , we can select multiple doses and not just two. At the same time, the marginal probability of selection as well as efficacy for each dose drops significantly. 18.2.4 Simulating under Different Alternatives Since this is a simulation based design, we can perform sensitivity analyses by changing some of the inputs and observing effects on the overall power and other output. Let us first make sure that this design preserves the total type1 error. It can be done by running the simulations under “Null” hypothesis. Select the last design created which would be epsilon = 0.05 in the Library and click the icon. This will take you to the input window of that design. Go to Response Generation tab and enter the inputs as shown below. Notice that all the means are 0 320 18.2 Study Design – 18.2.4 Simulating under Different Alternatives <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 which means the simulations will be run under NULL assumption. Run the simulations and go to the detailed output by saving the row from Output Preview to the Library. Notice the global power and the simulated FWER is less than 0.025 which means the overall type1 error is preserved. 18.3 Sample Size Reestimation As seen in the previous scenario, the desired power of approximately 92% is achieved with the sample size of 475 if the initial assumptions (µc = 0, µ1mg = 1, µ2.5mg = 1.1, µ5mg = 1.2 and µ10mg = 1.3) hold true. But if they do not, then the original sample size of 475 may be insufficient to achieve 92% power. The adaptive sample size re-estimation is suited to this purpose. In this approach we start out with a sample size of 475 subjects, but take an interim look after data are available on 250 subjects. The purpose of the interim look is not to stop the trial early but rather to examine the interim data and continue enrolling past the planned 475 subjects if the interim results are promising enough to warrant the additional investment of sample size. This strategy has the advantage that the sample size is finalized only after a thorough examination of data from the actual study rather than through making a large up-front sample size commitment before any data are available. Furthermore, if the sample size may only be increased but never decreased from the originally planned 475 subjects, there is no loss of efficiency due to overruns. Suppose the mean responses on the five doses are as shown below. Update the Response Generation tab 18.3 Sample Size Re-estimation 321 <<< Contents 18 * Index >>> Two-Stage Multi-arm Designs using p-value combination accordingly and also set the seed as 100 in the Simulation Controls tab. Run 10000 simulations and save the simulation row to the Library by clicking the 322 18.3 Sample Size Re-estimation <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 icon in the toolbar. See the details. Notice that the global power has dropped from 92% to 78%. Let us re-estimate the sample size to achieve the desired power. Add the Sample Size Re-estimation tab by clicking the button . A new tab gets added as shown below. SSR At: For a K-look group sequential design, one can decide the time at which conditions for adaptations are to be checked and actual adaptation is to be 18.3 Sample Size Re-estimation 323 <<< Contents 18 * Index >>> Two-Stage Multi-arm Designs using p-value combination carried out. This can be done either at some intermediate look or after some specified information fraction. The possible values of this parameter depend upon the user choice. The default choice for this design is always the Look #. and is fixed to 1 since it is always a 2-look design. Target CP for Re-estimating Sample Size: The primary driver for increasing the sample size at the interim look is the desired (or target) conditional power or probability of obtaining a positive outcome at the end of the trial, given the data already observed. For this example we have set the conditional power at the end of the trial to be 92%. East then computes the sample size that would be required to achieve this conditional power. Maximum Sample Size if Adapt (multiplier; total): As just stated, a new sample size is computed at the interim analysis on the basis of the observed data so as to achieve some target conditional power. However the sample size so obtained will be overruled unless it falls between pre-specified minimum and maximum values. For this example, let us use the multiplier as 2 indicating that we intend to double the original sample size if the results are promising. The range of allowable sample sizes is [475, 950]. If the newly computed sample size falls outside this range, it will be reset to the appropriate boundary of the range. For example, if the sample size needed to achieve the desired 90% conditional power is less than 475, the new sample size will be reset to 475. In other words we will not decrease the sample size from what was specified initially. On the other hand, the upper bound of 950 subjects demonstrates that the sponsor is prepared to double the sample size in order to achieve the desired 90% conditional power. But if 90% conditional power requires more than 950 subjects, the sample size will be reset to 950, the maximum allowed. Promising Zone Scale: One can define the promising zone as an interval based on conditional power, test statistic, or estimated δ/σ. The input fields change according to this choice. The decision of altering the sample size is taken based on whether the interim value of conditional power / test statistic / δ/σ lies in this interval or not. Let us keep the default scale which is Conditional Power. Promising Zone: Minimum/Maximum Conditional Power (CP): The sample size will only be altered if the estimate of CP at the interim analysis lies in a pre-specified range, referred to as the “Promising Zone”. Here the promising zone is 0.30 − 0.90. The idea is to invest in the trial in stages. Prior to the interim analysis the sponsor is only committed to a sample size of 475 subjects. If, however, the results at the interim analysis appear reasonably promising, the 324 18.3 Sample Size Re-estimation <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 sponsor would be willing to make a larger investment in the trial and thereby improve the chances of success. Here we have somewhat arbitrarily set the lower bound for a promising interim outcome to be CP = 0.30. An estimate CP < 0.30 at the interim analysis is not considered promising enough to warrant a sample size increase. It might sometimes be desirable to also specify an upper bound beyond which no sample size change will be made. Here we have set that upper bound of the promising zone at CP = 0.90. In effect we have partitioned the range of possible values for conditional power at the interim analysis into three zones; unfavorable (CP < 0.3), promising (0.3 ≤ CP < 0.9), and favorable (CP ≥ 0.9). Sample size adaptations are made only if the interim CP falls in the promising zone at the interim analysis. The promising zone defined on the Test Statistic scale or the Estimated δ/σ scale works similarly. SSR Function in Promising Zone: The behavior in the promising zone can either be defined by a continuous function or a step function. The default is continuous where East accepts the two quantities - (Multiplier, Target CP) and re-estimates the sample size depending upon the interim value of CP/test statistic/effect size. The SSR function can be defined as a step-function as well. This can be done with a single piece or with multiple pieces. For each piece, define the step function in terms of: the interval of CP/test statistic/δ/σ. This depends upon the choice of promising zone scale. the value of re-estimated sample size in that interval. for single piece, just the total re-estimated sample size is required as an input. If the interim value of CP/ test statistic/δ/σ lies in the promising zone then the re-estimation will be done using this step function. Let us set the inputs on Sample Size Re-estimation tab as shown below. Just for the comparison purpose, also run the simulations without adaptation. Both the scenarios 18.3 Sample Size Re-estimation 325 <<< Contents 18 * Index >>> Two-Stage Multi-arm Designs using p-value combination can also be run together by entering two values 1, 2 in the cell for Multiplier. Run 10000 simulations and see the Details. With Sample Size Re-estimation 326 18.3 Sample Size Re-estimation <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Without Sample Size Re-estimation We observe from the table the power of adaptive implementation is approximately 85% which is almost 8% improvement over the non-adaptive design. This increase in power has come at an average cost of 540-475 = 65 additional subjects. Next we observe from the Zone-wise Averages table that 1610 of 10000 trials (16%) underwent sample size re-estimation (Total Simulation Count in the “Promising Zone”) and of those 1610 trials, 89% were able to reject the Global null hypothesis. The average sample size, conditional on adaptation is 882. 18.4 Adding Early Stopping Boundaries One can also incorporate stopping boundaries to stop at the interim early for efficacy or futility. The efficacy boundary can be defined based on Adjusted p-value scale whereas futility boundary can be on Adjusted p-value or δ/σ scale. Click the button on the bottom left corner of the screen. This will take you back to the input window of the last simulation scenario. Go to Boundary tab and set Efficacy and Futility boundaries to “Adjusted p-value”. These boundaries are for 18.4 Adding Early Stopping Boundaries 327 <<< Contents 18 * Index >>> Two-Stage Multi-arm Designs using p-value combination early stopping at look1. As the note on this tab says: If any one adjusted p-value is ≤ efficacy p-value boundary then stop the trial for efficacy If only all the adjusted p-values are > futility p-value then stop the trial for futility. Else carry forward all the treatments to the next step of treatment selection. Stopping early for efficacy or futility is step which is carried out before applying the treatment selection rules. The simulation output has the same explanation as above except the Lookwise Summary table may have some trials stopped at the first look due to efficacy or futility. 18.5 Interim Monitoring with Treatment Selection Select the simulation node with SSR implementation and click the icon. It will invoke the Interim Monitoring dashboard. Click the icon to open the Test Statistic Calculator. The “Sample Size” column is filled out according to the originally planned design (50/arm). Enter the data as shown below: Click Recalc to calculate the test statistic as well as the raw p-values. Notice that the p-values for 1mg and 2.5mg are 0.069 and0.030 respectively which are greater than 0.025. We will drop these doses in the second stage. On clicking OK, it updates the dashboard. The overall adjusted p-value is 0.067. 328 18.5 Interim Monitoring with Treatment Selection <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Open the test statistic calculator for the second look and enter the following information and also drop the two doses 1mg and 2.5mg using the dropdown of 18.5 Interim Monitoring with Treatment Selection 329 <<< Contents 18 * Index >>> Two-Stage Multi-arm Designs using p-value combination “Action”. Click Recalc to calculate the test statistic as well as the raw p-values. On clicking OK, it updates the dashboard. Observe that the adjusted p-value for 10mg crosses the efficacy boundary. It can also be observed in the Stopping Boundaries chart. 330 18.5 Interim Monitoring with Treatment Selection <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 The final p-value adjusted for multiple treatments is 0.00353. 18.5 Interim Monitoring with Treatment Selection 331 <<< Contents * Index >>> 19 Normal Superiority Regression Linear regression models are used to examine the relationship between a response variable and one or more explanatory variables assuming that the relationship is linear. In this chapter, we discuss the design of three types of linear regression models. In Section 19.1, we examine the problem of testing a single slope in a simple linear regression model involving one continuous covariate. In Section 19.2, we examine the problem of testing the equality of two slopes in a linear regression model with only one observation per subject. Finally, in Section 19.3, we examine the problem of testing the equality of two slopes in a linear regression repeated measures model, applied to a longitudinal setting. 19.1 Linear Regression, Single Slope 19.1.1 Trial Design We assume that the observed value of a response variable Y is a linear function of an explanatory variable X plus random noise. For each of the i = 1, . . . , n subjects in a study Yi = γ + θ Xi + i Here the i are independent normal random variables with E(i ) = 0 and V ar(i ) = σ2 . We follow Dupont et al. (1998) and emphasize a slight distinction between observational and experimental studies. In an observational study, the values Xi are attributes of randomly chosen subjects and their possible values are not known to the investigator at the time of a study design. In an experimental study, a subject is randomly assigned (with possibly different probabilities) to one of the predefined experimental conditions. Each of these conditions is characterized by a certain value of explanatory variable X that is completely defined at the time of the study design. In both cases the value Xi characterizing either an attribute or experimental exposure of subject i is a random variable with a variance σx2 . We are interested in testing that the slope θ is equal to a specified value θ0 . Thus we test the null hypothesis H0 : θ = θ0 against the two-sided alternative H1 : θ 6= θ0 or a one-sided alternative hypothesis H1 : θ < θ0 or H1 : θ > θ0 . Let θ̂ denote the estimate of θ, and let σ̂2 and σ̂x2 denote the estimates of σ2 and σx2 based on n observations. The variance of θ̂ is σ2 = σ2 . nσx2 (19.1) The test statistic is defined as Z = (θ̂ − θ0 )/σ̂, 332 19.1 Linear Regression, Single Slope (19.2) <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 where σ̂ 2 = σ̂2 nσ̂x2 is the estimate of the variance of θ̂ based on n observations. Notice that the test statistic is centered so as to have a mean of zero under the null hypothesis. We want to design the study so the power is attained when θ = θ1 . The power depends on θ0 , θ1 , σx , and σ through θ0 − θ1 and σx /σ . 19.1.1 Trial Design During the development of medications, we often want to model the dose-response relationship, which may be done by estimating the slope of the regression, where Y is the appropriate response variable and the explanatory variable X is a set of specified doses. Consider a clinical trial involving four doses of a medication. The doses and randomization of subjects across the doses have been chosen so that the standard deviation σx = 9. Based on prior studies, it is assumed that σ = 15. If there is no dose response, the slope is equal to 0. Thus we will test the null hypothesis H0 : θ = 0 against a two-sided alternative H1 : θ 6= 0. The study is to be designed to have 90% power at the alternative θ1 = 0.5 with a type-1 error rate of 5%. Start East afresh. Click Continuous: Regression on the Design tab and then click Single-Arm Design: Linear Regression - Single Slope. This will launch a new input window. Select the 2-Sided for Test Type. Enter 0.05 and 0.9 for Type I Error (α) and Power, respectively. Enter the values of θ0 = 0, 19.1 Linear Regression, Single Slope – 19.1.1 Trial Design 333 <<< Contents 19 * Index >>> Normal Superiority Regression θ1 = 0.5, σx = 9, and σ = 15. Click Compute. This will calculate the sample size for this design and the output is shown as a row in the Output Preview located in the lower pane of this window. The computed sample size (119 subjects) is highlighted in yellow. Des 1 requires 119 subjects in order to attain 90% power. Select this design by clicking anywhere along the row in the Output Preview and click 334 19.1 Linear Regression, Single Slope – 19.1.1 Trial Design . Some of the design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 details will be displayed in the upper pane, labeled as Output Summary. In the Output Preview toolbar, click to save this design to Wbk1 in the Library. Now double-click on Des 1 in Library. You will see a summary of the design. 19.2 Linear Regression for Comparing Two Slopes 19.2.1 Trial Design In some experimental situations, we are interested in comparing the slopes of two regression lines. The regression model relates the response variable Y to the explanatory variable X using the model Yil = γ + θi Xil + il , where the error il has a normal distribution with mean zero and an unknown variance σ2 for Subject l in 2 2 Treatment i, i = c, t and l = 1, . . . , ni . Let σxc and σxt denote the variance of the explanatory variable X for control (c) and treatment (t), respectively. We are interested in testing the equality of the slopes θc and θt . Thus we test the null hypothesis 19.2 Linear Regression for Comparing Two Slopes 335 <<< Contents 19 * Index >>> Normal Superiority Regression H0 : θc = θt against the two-sided alternative H1 : θc 6= θt or a one-sided alternative hypothesis H1 : θc < θt or H1 : θc > θt . 2 2 , denote the , and σ̂xt Let θ̂c and θ̂t denote the estimates of θc and θt , and let σ̂2 , σ̂xc 2 2 2 estimates of σ , σxc , and σxt , based on nc and nt observations, respectively. The variance of θ̂i is σ2 σi2 = 2 . ni σxi Let n = nc + nt and let r = nt /n. Then, the test statistic is n1/2 (θ̂t − θ̂c ) Zj = σ̂ 19.2.1 1 2 (1−r)σ̂xc + 1 2 rσ̂xt 1/2 . (19.3) Trial Design We want to design the study so the power is attained for specified values of θc and θt . The power depends on θt , θc , σxc , σxt2 , and σ through θt − θc , σxc /σ , and σxt /σ . Suppose that a medication was found to have a response that depends on the level of a certain laboratory parameter. It was decided to develop a new formulation for which this interaction is decreased. The explanatory variable is the baseline value of the laboratory parameter. The study is designed with θt = 0.5, θc = 1, σxc = σxt = 6, and σ = 10. We examine the slopes of the two regressions by testing the null hypothesis H0 : θt = θc . Although we hope to decrease the slope, we test the null hypothesis against the two-sided alternative H1 : θt 6= θc . Start East afresh. Click Continuous: Regression on the Design tab and then click Parallel Design: Linear Regression - Difference of Slopes. This will launch a new input window. Select 2-Sided for Test Type. Enter 0.05 and 0.9 for Type I Error (α) and Power, respectively. Select Individual Slopes for Input Method, and enter the values of θc = 1, θt = 0.5, σxc = 6, σxt = 6, and 336 19.2 Linear Regression for Comparing Two Slopes – 19.2.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 σ = 10. Click Compute. This will calculate the sample size for this design and the output is shown as a row in the Output Preview located in the lower pane of this window. The computed sample size (469) is highlighted in yellow. Des 1 requires 469 subjects in order to attain 90% power. Select this design by clicking anywhere along the row in the Output Preview and click . Some of the design 19.2 Linear Regression for Comparing Two Slopes – 19.2.1 Trial Design 337 <<< Contents 19 * Index >>> Normal Superiority Regression details will be displayed in the upper pane, labeled as Output Summary. 19.3 Repeated Measures for Comparing Two Slopes In many clinical trials, each subject is randomized to one of two groups, and responses are collected at various timepoints on the same individual over the course of the trial. In these “longitudinal” trials, we are interested in testing the equality of slopes, or mean response changes per unit time, between the treatment group (t) and the control group (c). A major difficulty associated with designing such studies is the fact that the data are independent across individuals, but the repeated measurements on the same individual are correlated. The sample size computations then depend on within – and between – subject variance components that are often unknown at the design stage. One way to tackle this problem is to use prior estimates of these variance components (also known as nuisance parameters) from other studies, or from pilot data. Suppose each patient is randomized to either group c or group t. The data consist of a series of repeated measurements on the response variable for each patient over time. Let M denote the total number of measurements, inclusive of the initial baseline measurement, intended to be taken on each subject. These M measurements will be taken at times vm , m = 1, 2, . . . M , relative to the time of randomization, where v1 = 0. A linear mixed effects model is usually adopted for analyzing such data. Let Yilm denote the response of subject l, belonging to group i, at time point vm . Then the model asserts that Yclm = γc + θc vm + al + bl vm + elm (19.4) for the control group, and Ytlm = γt + θt vm + al + bl vm + elm 338 19.3 Repeated Measures (19.5) <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 for the treatment group, where the random effect (al , bl )0 is multivariate normal with mean (0, 0)0 and variance-covariance matrix G= σa2 σab σab σb2 , 2 2 denotes the “within – subject” ). In this model, σw and the elm ’s are all iid N (0, σw variability, attributable to repeated measurements on the same subject, while G denotes the “between – subjects” variability, attributable to the heterogeneity of the population being studied. Define δ = θt − θc We are interested in testing H0 : δ = 0 against the two-sided alternative H1 : δ 6= 0 or against one-sided alternative hypotheses of the form H1 : δ > 0 or H1 : δ < 0 Let (θ̂C , θ̂T ) be the maximum likelihood estimates of (θC , θT ), based on a enrollment of (nC , nT ), respectively. The estimate of the difference of slopes is δ̂ = θ̂T − θ̂C (19.6) and its standard error is denoted by se(δ̂). The test statistic is the familiar Wald statistic Z= 19.3.1 δ̂ se(δ̂) (19.7) Trial Design Consider a trial to compare an analgesic to placebo in the treatment of chronic pain using a 10 cm visual analogue scale (VAS). Measurements are taken on each subject at baseline and once a month for six months. Thus M = 7 and S = 6. It is assumed from past data that σw = 4 and σb = 6. We wish to test the null hypothesis H0 : θt = θc 19.3 Repeated Measures – 19.3.1 Trial Design 339 <<< Contents 19 * Index >>> Normal Superiority Regression with a two-sided level-0.05 test having 90% power to detect a 1 cm/month decline in slope, with θc = 2 and θt = 1 under H1 . Start East afresh. Click Continuous: Regression on the Design tab, and then click Parallel Design: Repeated Measures - Difference of Slopes. This will launch a new input window. Select 2-Sided for Test Type. Enter 0.05 and 0.9 for Type I Error (α) and Power, respectively. Select Individual Slopes for Input Method. Enter the values of θc = 2, θt = 1, Duration of Follow up (S) = 6, Number of Measurements (M) = 7, σw = 4, and σb = 6. Click Compute. This will calculate the sample size for this design and the output is shown as a row in the Output Preview located in the lower pane of this window. The 340 19.3 Repeated Measures – 19.3.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 computed sample size (1538) is highlighted in yellow. Des 1 requires 1538 completers in order to attain 90% power. Select this design by . Some of clicking anywhere along the row in the Output Preview and click the design details will be displayed in the upper pane, labeled as Output Summary. 19.3 Repeated Measures – 19.3.1 Trial Design 341 <<< Contents * Index >>> Volume 3 Binomial and Categorical Endpoints 20 Introduction to Volume 3 344 21 Tutorial: Binomial Endpoint 350 22 Binomial Superiority One-Sample 363 23 Binomial Superiority Two-Sample 394 24 Binomial Non-Inferiority Two-Sample 474 25 Binomial Equivalence Two-Sample 26 Binomial Superiority n-Sample 535 549 27 Multiple Comparison Procedures for Discrete Data 577 28 Multiple Endpoints-Gatekeeping Procedures for Discrete Data 601 29 Two-Stage Multi-arm Designs using p-value combination 30 Binomial Superiority Regression 31 Agreement 649 644 621 <<< Contents * Index >>> 32 Dose Escalation 658 343 <<< Contents * Index >>> 20 Introduction to Volume 3 This volume describes the procedures for discrete endpoints (binomial) applicable to one-sample, two-samples, many-samples, regression and agreement situations. All the three type of designs - superiority, non-inferiority and equivalence are discussed in detail. Chapter 21 introduces you to East on the Architect platform, using an example clinical trial to test difference of proportions. Chapter 22 deals with the design and interim monitoring of two types of tests involving binomial response rates that can be described as superiority one sample situation. Section 22.1 discusses designs in which an observed binomial response rate is compared to a fixed response rate, possibly derived from historical data. Section 22.2 deals with McNemar’s test for comparing matched pairs of binomial responses. Chapter 38 discusses in detail the Simon’s Two stage design. Chapter 23 discusses the superiority two-sample situation where the aim is to compare independent samples from two populations in terms of the proportion of sampling units presenting a given trait. East supports the design and interim monitoring of clinical trials in which this comparison is based on the difference of proportions, the ratio of proportions, or the odds ratio of the two populations, common odds ratio of the two populations. The four cases are discussed in Sections 23.1, 23.2, 23.3 and 23.4, respectively. Section 23.5 discusses the Fisher’s exact test for single look design. Chapter 24 presents an account of designing and monitoring non-inferiority trials in which the non-inferiority margin is expressed as either a difference, a ratio, or an odds ratio of two binomial proportions. The difference is examined in Section 24.1. This is followed by two formulations for the ratio: the Wald formulation in Section 24.2 and the Farrington-Manning formulation in Section 24.3. The odds ratio formulation is presented in Section 24.4. Chapter 25 narrates the details of the design and interim monitoring in equivalence two-sample situation where the goal is neither establishing superiority nor non-inferiority, but equivalence. Examples of this include showing that an aggressive therapy yields a similar rate of a specified adverse event to the established control, such as the bleeding rates associated with thrombolytic therapy or cardiac outcomes with a new stent. Chapter 26 details the design and interim monitoring superiority k-sample experimental situations where there are several binomial distributions indexed by an 344 <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 ordinal variable and where it is required to examine changes in the probabilities of success as the levels of the indexing variable changes. Examples of this include the examination of a dose-related presence of a response or a particular side effect, dose-related tumorgenicity, or presence of fetal malformations relative to levels of maternal exposure to a particular toxin, such as alcohol, tobacco, or environmental factors. Chapter 27 details the Multiple Comparison Procedures (MCP) for discrete data. It is often the case that multiple objectives are to be addressed in one single trial. These objectives are formulated into a family of hypotheses. Multiple comparison (MC) procedures provide a guard against inflation of type I error while testing these multiple hypotheses. East supports several parametric and p-value based MC procedures. This chapter explains how to design a study using a chosen MC procedure that strongly maintains FWER. Chapter 30 describes how East may be used to design and monitor two-arm randomized clinical trials with a binomial endpoints, while adjusting for the effects of covariates through the logistic regression model. These methods are limited to binary and categorical covariates only. A more general approach, not limited to categorical covariates, is to base the design on statistical information rather than sample size. This approach is further explained in Chapter 59 Chapter 31 discusses the tests available to check the inter-rater reliability. In some experimental situations, to check inter-rater reliability, independent sets of measurements are taken by more than one rater and the responses are checked for agreement. For a binary response, Cohen’s Kappa test for binary ratings can be used to check inter-rater reliability. Chapter 32 deals with the design, simulation, and interim monitoring of Phase 1 dose escalation trials. One of the primary goals of Phase I trials in oncology is to find the maximum tolerated dose (MTD). Sections 32.1, 32.2, 32.3 and 32.4 discusses the four commonly used dose escalation methods - 3+3, Continual Reassessment Method (CRM), modified Toxicity Probability Interval (mTPI) and Bayesian Logistic Regression Model (BLRM). 345 <<< Contents 20 20.1 * Index >>> Introduction to Volume 3 Settings Click the icon in the Home menu to adjust default values in East 6. The options provided in the Display Precision tab are used to set the decimal places of numerical quantities. The settings indicated here will be applicable to all tests in East 6 under the Design and Analysis menus. All these numerical quantities are grouped in different categories depending upon their usage. For example, all the average and expected sample sizes computed at simulation or design stage are grouped together under the category ”Expected Sample Size”. So to view any of these quantities with greater or lesser precision, select the corresponding category and change the decimal places to any value between 0 to 9. The General tab has the provision of adjusting the paths for storing workbooks, files, and temporary files. These paths will remain throughout the current and future sessions even after East is closed. This is the place where we need to specify the installation directory of the R software in order to use the feature of R Integration in East 6. 346 20.1 Settings <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 The Design Defaults is where the user can change the settings for trial design: Under the Common tab, default values can be set for input design parameters. You can set up the default choices for the design type, computation type, test type and the default values for type-I error, power, sample size and allocation ratio. When a new design is invoked, the input window will show these default choices. Time Limit for Exact Computation This time limit is applicable only to exact designs and charts. Exact methods are computationally intensive and can easily consume several hours of computation time if the likely sample sizes are very large. You can set the maximum time available for any exact test in terms of minutes. If the time limit is reached, the test is terminated and no exact results are provided. Minimum and default value is 5 minutes. Type I Error for MCP If user has selected 2-sided test as default in global settings, then any MCP will use half of the alpha from settings as default since MCP is always a 1-sided test. Sample Size Rounding Notice that by default, East displays the integer sample size (events) by rounding up the actual number computed by the East algorithm. In this case, the look-by-look sample size is rounded off to the nearest integer. One can also see the original floating point sample size by selecting the option ”Do not round sample size/events”. 20.1 Settings 347 <<< Contents 20 * Index >>> Introduction to Volume 3 Under the Group Sequential tab, defaults are set for boundary information. When a new design is invoked, input fields will contain these specified defaults. We can also set the option to view the Boundary Crossing Probabilities in the detailed output. It can be either Incremental or Cumulative. Simulation Defaults is where we can change the settings for simulation: If the checkbox for ”Save summary statistics for every simulation” is checked, then East simulations will by default save the per simulation summary data for all the 348 20.1 Settings <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 simulations in the form of a case data. If the checkbox for ”Suppress All Intermediate Output” is checked, the intermediate simulation output window will be always suppressed and you will be directed to the Output Preview area. The Chart Settings allows defaults to be set for the following quantities on East6 charts: We suggest that you do not alter the defaults until you are quite familiar with the software. 20.1 Settings 349 <<< Contents * Index >>> 21 Tutorial: Binomial Endpoint This tutorial introduces you to East on the Architect platform, using an example clinical trial to test difference of proportions. 21.1 Fixed Sample Design When you open East, you will see the following screen below. By default, the Design tab in the ribbon will be active. The items on this tab are grouped under the following categories of endpoints: Continuous, Discrete, Count, Survival, and General. Click Discrete: Two Samples, and then Parallel Design: Difference of Proportions. 350 21.1 Fixed Sample Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 The following input window will appear. By default, the radio button for Sample Size (n) is selected, indicating that it is the variable to be computed. The default values shown for Type I Error and Power are 0.025 and 0.9. Keep the same for this design. Since the default inputs provide all of the necessary input information, you are ready to compute sample size by clicking the Compute button. The calculated result will appear in the Output Preview pane, as shown below. This single row of output contains relevant details of inputs and the computed result of total sample size (and total completers) of 45. Select this row and save it in the Library under a workbook by clicking and click icon. Select this node in the Library, icon to display a summary of the design details in the upper pane 21.1 Fixed Sample Design 351 <<< Contents 21 * Index >>> Tutorial: Binomial Endpoint (known as Output Summary). The discussion so far gives you a quick feel of the software for computing sample size for a single look design. We will describe further features in an example for a group sequential design in the next section. 21.2 Group Sequential Design for a Binomial Superiority Trial 21.2.1 Study Background Design objectives and interim results from CAPTURE, a prospective randomized trial of placebo versus Abciximab for patients with refractory unstable angina were presented at a workshop on clinical trial data monitoring committees (Anderson, 2002). The primary endpoint was reduction in death or MI within 30 days of entering the study. The study was designed for 80% power to detect a reduction in the event rate from 15% on the placebo arm to 10% on the Abciximab arm. A two-sided test with a type-1 error of 5% was used. We will illustrate various design, simulation and interim monitoring features of East for studies with binomial endpoints with the help of this example. Let us modify Des1 to enter above inputs and create a group sequential design for icon. CAPTURE trial. Select the node for Des1 in the Library and click the This will take you back to the input window of Des1. Alternatively, you can also click the 352 button on the left hand bottom of East screen to go to the latest 21.2 Group Sequential Design – 21.2.1 Study Background <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 input window. Select 2-Sided for Test Type, enter 0.05 for Type I Error, 0.8 for Power, specify the Prop. under Control be 0.15, the Prop. under Treatment to be 0.1. Next, change the Number of Looks to be 3. You will see a new tab, Boundary Info, added to the input dialog box. Click the Boundary Info tab, and you will see the following screen. On this tab, you can choose whether to specify stopping boundaries for efficacy, or futility, or both. For this trial, choose efficacy boundaries only, and leave all other default values. We will implement the Lan-Demets (O’Brien-Fleming) spending function, with equally spaced looks. On the Boundary Info tab, click on the icons or 21.2 Group Sequential Design – 21.2.1 Study Background , to generate the 353 <<< Contents 21 * Index >>> Tutorial: Binomial Endpoint following charts. 354 21.2 Group Sequential Design – 21.2.1 Study Background <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 You can also view these boundaries on different scales like δ scale or p-value scale. Select the desired scale from the dropdown. Let us see the boundaries on δ scale. Click Compute. This will add another row for Des2 in the Output Preview area. The maximum sample size required under this design is 1384. The expected sample sizes under H0 and H1 are 1378 and 1183, respectively. Click in the Output Preview toolbar to save this design to Wbk1 in the Library. Double-click on Des2 to generate the following output. 21.2 Group Sequential Design – 21.2.2 Creating multiple designs easily 355 <<< Contents 21 * Index >>> Tutorial: Binomial Endpoint 21.2.2 Creating multiple designs easily In East, it is easy to create multiple designs by inputting multiple parameter values. In the trial described above, suppose we want to generate designs for all combinations of the following parameter values: Power = 0.8, 0.9, and Difference in Proportions = −0.04, −0.03, −0.02, −0.01. The number of such combinations is 2 × 4 = 8. East can create all 8 designs by a single specification in the input dialog box. Select Des2 and click icon. Enter the above values in the Test Parameters tab as shown below. The values of Power have been entered as a list of comma-separated values, while Difference in Proportions has been entered as a colon-separated range of values: -0.04 to -0.01 in steps of 0.01. Now click compute. East computes all 8 designs Des3-Des10, and displays them in the Output Preview as shown below. Click to maximize the Output Preview. Select the first Des2 to Des4 using the Ctrl key, and click 356 to display a summary 21.2 Group Sequential Design – 21.2.2 Creating multiple designs easily <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 of the design details in the upper pane, known as the Output Summary. Des2 is already saved in the workbook. We will use this design for simulation and interim monitoring, as described below. Now that you have saved Des2, delete all designs from the Output Preview before continuing, by selecting all designs with the Shift key, and clicking 21.2.3 in the toolbar. Simulation Right-click Des2 in the Library, and select Simulate. Alternatively, you can select Des2 and click the icon. We will carry out a simulation of Des2 to check whether it preserves the specified power. Click Simulate. East will execute by default 10000 simulations with the specified inputs. Close the intermediate window after examining the results. A row labeled as Sim1 will be added in the Output Preview. 21.2 Group Sequential Design – 21.2.3 Simulation 357 <<< Contents 21 * Index >>> Tutorial: Binomial Endpoint Click the icon to save this simulation to the Library. A simulation sub-node, Sim1, will be added under Des2 node. Double clicking on this node will display the detailed simulation output in the work area. In 80.46% of the simulated trials, the null hypothesis was rejected. This tells us that the design power of 80% is achieved. Simulations is a tool which can be used to assess the study design under various scenarios. The next section will explore interim monitoring with this design. 358 21.2 Group Sequential Design – 21.2.3 Simulation <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 21.2.4 Interim Monitoring Right-click Des2 in the Library and select Interim Monitoring. Click the to open the Test Statistic Calculator. Suppose that after 461 subjects, at the first look, you have observed 34 out of 230 responding on Control arm and 23 out of 231 responding on Treatment arm. The calculator computes the difference in proportions as −0.048 and its standard error of 0.031. Click OK to update the IM Dashboard. The Stopping Boundaries and Error Spending Function charts on the left: 21.2 Group Sequential Design – 21.2.4 Interim Monitoring 359 <<< Contents 21 * Index >>> Tutorial: Binomial Endpoint The Conditional Power and Confidence Intervals charts on the right: Suppose that after 923 subjects, at the second look, you have observed 69 out of 461 responding on Control arm and 23 out of 462 responding on Treatment arm. The 360 21.2 Group Sequential Design – 21.2.4 Interim Monitoring <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 calculator computes the difference in proportions as −0.1 and its standard error of 0.019. Click Recalc, and then OK to update the IM Dashboard. In this case, a boundary has been crossed, and the following window appears. Click Stop to complete the trial. The IM Dashboard will be updated accordingly, and a 21.2 Group Sequential Design 361 <<< Contents 21 * Index >>> Tutorial: Binomial Endpoint table for Final Inference will be displayed as shown below. 362 21.2 Group Sequential Design <<< Contents * Index >>> 22 Binomial Superiority One-Sample This chapter deals with the design, simulation, and interim monitoring of two types of tests involving binomial response rates. In Section 22.1, we discuss group sequential designs in which an observed binomial response rate is compared to a fixed response rate, possibly derived from historical data. Section 22.2 deals with McNemar’s test for comparing matched pairs of binomial responses in a group sequential setting. 22.1 Binomial One Sample 22.1.1 Trial Design 22.1.2 Trial Simulation 22.1.3 Interim Monitoring In experimental situations, where the variable of interest has a binomial distribution, it may be of interest to determine whether the response rate π differs from a fixed value π0 . Specifically we wish to test the null hypothesis H0 : π = π0 against the two sided alternative hypothesis H1 : π 6= π0 or against one sided alternatives of the form H1 : π > π0 or H1 : π < π0 . The sample size, or power, is determined for a specified value of π which is consistent with the alternative hypothesis, denoted π1 . 22.1.1 Trial Design Consider the design of a single-arm oncology trial in which we wish to determine if the tumor response rate of a new cytotoxic agent is at least 15%. Thus, it is desired to test the null hypothesis H0 : π = 0.15 against the one-sided alternative hypothesis H1 : π > 0.15. We will design this trial with a one sided test that achieves 80% power at π = π1 = 0.25 with a one-sided level 0.05 test. Single-Look Design To begin, click Design tab, then Single Sample under Discrete group, and then click Single Proportion. In the ensuing dialog box , choose the test parameters as shown below. We first consider a single-look design, so leave the default value for Number of Looks to 1. In the drop down menu, next to Test Type select 1-Sided. Enter 0.8 for Power. Enter 22.1 Binomial One Sample – 22.1.1 Trial Design 363 <<< Contents 22 * Index >>> Binomial Superiority One-Sample 0.15 in the box next to Prop. Response under Null (π0 ) and 0.25 in the box next to Prop. Response under Alt (π1 ). This dialog box also asks us to specify whether we wish to standardize the test statistic (for performing the hypothesis test of the null hypothesis H0 : π = 0.15) with the null or the empirical variance. We will discuss the test statistic and the method of standardization in the next subsection. For the present, select the default radio button Under Null Hypothesis. Now click Compute. The design is shown as a row in the Output Preview located in the lower pane of this window. The sample size required in order to achieve the desired 80% power is 91 subjects. You can select this design by clicking anywhere on the row in the Output Preview. Click icon to get the design output summary displayed in the upper pane. In the Output Preview toolbar, click icon to save this design Des1 to workbook Wbk1 in the Library. If you hover the cursor over the node Des1 in the Library, a 364 22.1 Binomial One Sample – 22.1.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 tooltip will appear that summarizes the input parameters of the design. With the design Des1 selected in the Library, click icon on the Library toolbar, and then click Power vs. Treatment Effect (δ). The power curve for this design will be displayed. You can save this chart to the Library by clicking Save in Workbook. Alternatively, you can export the chart in one of several image formats (e.g., Bitmap or JPEG) by clicking Save As.... For now, you may close the chart before continuing. 22.1 Binomial One Sample – 22.1.1 Trial Design 365 <<< Contents 22 * Index >>> Binomial Superiority One-Sample Three-Look Design In order to reach an early decision and enter into comparative trials, let us plan to conduct this single-arm study as a group sequential trial with a maximum of 3 looks. Create a new design by selecting Des1 in the Library, and clicking the icon on the Library toolbar. Change the Number of Looks from 1 to 3, to generate a study with two interim looks and a final analysis. A new tab Boundary will appear. Clicking on this tab will reveal the stopping boundary parameters. By default, the Spacing of Looks is set to Equal, which means that the interim analyses will be equally spaced in terms of the number of patients accrued between looks. The left side contains details for the Efficacy boundary, and the right side for the Futility boundary. By default, there is an efficacy boundary (to reject H0) selected, but no futility boundary (to reject H1). The Boundary Family specified is of the Spending Functions type. The default Spending function is the Lan-DeMets (Lan & DeMets, 1983), with Parameter OF (O’Brien-Fleming), which generates boundaries that are very similar, though not identical, to the classical stopping boundaries of O’Brien and Fleming (1979). Technical details of these stopping boundaries are available in Appendix F. Return to the test parameters by clicking Test Parameters tab. The dialog box requires us to make a selection in the section labeled Variance of Standardized Test Statistic. We are being asked to specify to East how we intend to standardize the test statistic when we actually perform the hypothesis tests at the various monitoring time points. There are two options: Under Null Hypothesis and Empirical Estimate. To understand the difference between these two options, let π̂j denote the estimate of π based on nj observations, up to and including the j th monitoring time point. 366 22.1 Binomial One Sample – 22.1.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Under Null Hypothesis The test statistic to be used for the interim monitoring is (N ) Zj =p π̂j − π0 . π0 (1 − π0 )/nj (22.1) Empirical The test statistic to be used for the interim monitoring is (E) Zj =p π̂j − π0 . π̂j (1 − π̂j )/nj (22.2) The choice of variance should not make much of a difference to the type 1 error or power for studies in which the sample size is large. In the present case however, it might matter. We shall therefore examine both the options. First, we select the Under Null Hypothesis radio button. Click Compute button to generate output for Design Des2. With Des2 selected in the Output Preview, click icon to save Des2 to the Library. In order to see the stopping probabilities, as well as other characteristics, select Des2 in the Library, and click icon. The cumulative boundary stopping probabilities are shown in the Stopping Boundaries table. We see that for Des2 the maximum sample size is 91 subjects, with 90 expected under the null hypothesis H0 : π = 0.15 and 73 expected when the true value is π=0.25. Close the Output window before continuing. The stopping boundary can be displayed by clicking on the icon on the Library toolbar, and then clicking Stopping 22.1 Binomial One Sample – 22.1.1 Trial Design 367 <<< Contents 22 * Index >>> Binomial Superiority One-Sample Boundaries. The following chart will appear. To examine the error spending function, click icon on the Library toolbar, and then click Error Spending. The following chart will appear. 368 22.1 Binomial One Sample – 22.1.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 To examine the impact of using the empirical variance to standardized test statistic, select Des2 in the Library, and click icon on the Library toolbar. In the Variance of Standardized Test Statistic box, now select Empirical Estimate. Next, click Compute. With Des3 selected in the Output Preview, click icon. In the Library, select the nodes Des2 and Des3, by holding the Ctrl key, and then click icon. The upper pane will display the summary details of the two designs side-by-side: The maximum sample size needed for 80% power is 119, and the expected sample size is 99 under the alternative hypothesis H1 with π1 = 0.25, if we intend to standardize the test statistic with the empirical variance. The corresponding maximum and 22.1 Binomial One Sample – 22.1.1 Trial Design 369 <<< Contents 22 * Index >>> Binomial Superiority One-Sample expected sample sizes if the null variance is to be used for the standardization are 91 and 73, respectively. Thus, for this configuration of design parameters, it would appear preferable to specify in advance that the test statistic will be standardized by the null variance. Evidently, this is the option with the smaller maximum and expected sample size. These results, however, are based on the large sample theory developed in Appendix B. Since the sample sizes in both Des2 and Des3 are fairly small, it would be advisable to verify that the power and type 1 error of both the plans are preserved by simulating these designs. We show how to simulate these plans in Section 22.1.2. In some situations, the sample size is subject to external constraints. Then, the power can be computed for a specified maximum sample size. Suppose that in the above situation, using the observed estimates for the computation of the variance, the total sample size is constrained to be at most, 80 subjects. Select Des3 in the Library and click on the Library toolbar. Change the selections in the ensuing dialog box so that the trial is now designed to compute power for a maximum sample size of 80 subjects, as shown below. Click Compute button to generate the output for Design Des4. With Des4 selected in the Output Preview, click icon. In the Library, select the nodes for Des2, Des3, and Des4 by holding the Ctrl key, and then click 370 22.1 Binomial One Sample – 22.1.1 Trial Design icon. The upper pane <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 will display the summary details of the three designs side-by-side: From this, we can see that Des4 has only 65.5 % power. 22.1.2 Trial Simulation In Section 22.1.1, we created group sequential designs with two different assumptions for the manner in which the test would be standardized at the interim monitoring stage. Under Des2, we assumed that the null variance, and hence the test statistic (22.1) would be used for the interim monitoring. This plan required a maximum sample size of 91 subjects. Under Des3, we assumed that the empirical variance, and hence the test statistic (22.2) would be used for the interim monitoring. This plan required a maximum sample size of 119 subjects. Since the sample sizes for both plans are fairly small and the calculations involved the use of large sample theory, it would be wise to verify the operating characteristics of these two plans by simulation. Select Des2 in the Library, and click the icon from Library toolbar. Alternatively, right-click on Des2 node and select Simulate. A new Simulation 22.1 Binomial One Sample – 22.1.2 Trial Simulation 371 <<< Contents 22 * Index >>> Binomial Superiority One-Sample worksheet will appear. Click Simulate to start the simulation. Once the simulation run has completed, East will add an additional row to the Output Preview labeled Sim1. Select Sim1 row in the Output Preview and click icon. Note that some of the simulation output details will be displayed in the upper pane. Click icon to save it to the Library. Double-click on Sim1 node in the Library. The simulation output details will be displayed. 372 22.1 Binomial One Sample – 22.1.2 Trial Simulation <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Upon running 10,000 simulations with π = 0.25 we obtain slightly over 80% power as shown above. Next we run 10,000 simulations under H0 by setting π = 0.15 in the choice of simulation parameters. Select Des2 in the Library, and click icon from Library toolbar. Under the Response Generation tab, change the Proportion Response to 0.15. Click Simulate to start the simulation. Once the simulation run has completed, East will add an additional row to the Output Preview labeled Sim2. Select Sim2 in the Output Preview. Click icon to save it to the Library. Double-click on Sim2 in the Library. The simulation output details will be displayed. We observe that 7% of these simulations reject the null hypothesis thereby confirming that these boundaries do indeed preserve the type 1 error (up to Monte Carlo accuracy). Finally we repeat the same set of simulations for Des3. Select Des3 in the Library, and click icon from Library toolbar. Upon running 10,000 simulations with 22.1 Binomial One Sample – 22.1.2 Trial Simulation 373 <<< Contents 22 * Index >>> Binomial Superiority One-Sample π = 0.25, we obtain 82% power. However, when we run the simulations under H0 : π = 0.15, we obtain a type 1 error of about 3% instead of the specified 5% as shown below. While this ensures that the type 1 error is preserved, it also suggests that the use of the empirical variance rather than the null variance to standardize the test statistic might be problematic with small sample sizes. 374 22.1 Binomial One Sample – 22.1.2 Trial Simulation <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Let us now investigate if the problem disappears with larger studies. Select Des3 in the Library and click on the Library toolbar. Change the value of Prop. Response under Alt (π1 ) from 0.25 to 0.18. Click Compute to generate the output for Des5. In the Output Preview, we see that Des5 requires a sample size of 1035 subjects. To verify whether the use of the empirical variance will indeed produce the correct type-1 error for this large trial, select Des5 in the Output Preview and click icon. In the Library, select Des5 icon from Library toolbar . First, run 10,000 trials with π = 0.15. On and click the Response Generation tab, change Proportion Response from 0.18 to 0.15. Next click Simulate. Observe that the type-1 error obtained by simulating Des5 is about 4.4%, an improvement over the corresponding type 1 error obtained by simulating Des3. 22.1 Binomial One Sample – 22.1.2 Trial Simulation 375 <<< Contents 22 * Index >>> Binomial Superiority One-Sample Next, verify that a sample size of 1035 suffices for producing 80% power by running 10,000 simulations with π = 0.18. This example has demonstrated the importance of simulating a design to verify that it does indeed possess the operating characteristics that are claimed for it. Since these operating characteristics were derived by large-sample theory, they might not hold for small sample sizes, in which case, the sample size or type-1 error might have to be adjusted appropriately. 22.1.3 Interim Monitoring Consider interim monitoring of Des3, the design that has 80% power when the empirical estimate of variance is used to standardize the test statistic. Select Des3 in the Library, and click icon from the Library toolbar. Alternatively, right-click on Des3 and select Interim Monitoring. The interim monitoring dashboard contains various controls for monitoring the trial, and is divided into two sections. The top section contains several columns for displaying output values based on the interim inputs. The bottom section contains four charts, each with a corresponding table to its right. These charts provide graphical and numerical descriptions of the progress of the 376 22.1 Binomial One Sample – 22.1.3 Interim Monitoring <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 clinical trial and are useful tools for decision making by a data monitoring committee. At the first interim look, when 40 subjects have enrolled, suppose that the observed response rate is 0.35. Click icon to invoke the Test Statistic Calculator. In the box next to Cumulative Sample Size enter 40. Enter 0.35 in the box next to Estimate of π. In the box next to Standard Error of Estimate of π enter 22.1 Binomial One Sample – 22.1.3 Interim Monitoring 377 <<< Contents 22 * Index >>> Binomial Superiority One-Sample 0.07542. Next click Recalc. Observe that upon pressing the Recalc button, the test statistic calculator automatically computes the value of the test statistic as 2.652. 378 22.1 Binomial One Sample – 22.1.3 Interim Monitoring <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Clicking OK results in the following output. Since our test statistic, 2.652, is smaller than the stopping boundary, 3.185, the trial continues. At the second interim monitoring time point, after 80 subjects have enrolled, suppose that the estimate of π̂ based on all data up to that point is 0.30. Click on the second row in the table in the upper section. Then click icon. In the box next to Cumulative Sample Size enter 80. Enter 0.30 in the box next to Estimate of π. In the box next to Standard Error of Estimate of π enter 0.05123. Next click Recalc. Upon clicking OK we observe that the stopping boundary is crossed and the following 22.1 Binomial One Sample – 22.1.3 Interim Monitoring 379 <<< Contents 22 * Index >>> Binomial Superiority One-Sample message is displayed. We can conclude that π > 0.15 and terminate the trial. Clicking Stop yields the following output. 22.2 380 McNemar’s Test McNemar’s Test is used in experimental situations where paired comparisons are observed. In a typical application, two binary response measurements are made on each subject – perhaps from two different treatments, or from two different time points. For example, in a comparative clinical trial, subjects are matched on baseline demographics and disease characteristics and then randomized with one subject in the 22.2 McNemar’s Test <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 pair receiving the experimental treatment and the other subject receiving the control. Another example is the cross over clinical trial in which each subject receives both treatments. By random assignment, some subjects receive the experimental treatment followed by the control while others receive the control followed by the experimental treatment. Let πc and πt denote the response probabilities for the control and experimental treatments, respectively. The probability parameters for McNemar’s test are displayed in Table 22.1. Table 22.1: A 2 x 2 Table of Probabilities for McNemar’s Test Control No Response Response Total Probability Experimental No Response Response π00 π01 π10 π11 1 − πt πt Total Probability 1 − πc πc 1 The null hypothesis H0 : πc = πt is tested against the alternative hypothesis H1 : πc 6= πt for the two sided testing problem or the alternative hypothesis H1 : πc > πt (or H1 : πc < π) for the one-sided testing problem. Since πt = πc if and only if π01 = π10 , the null hypothesis is also expressed as H0 : π01 = π10 , and is tested against corresponding one and two sided alternatives. The power of this test depends on two quantities: 1. The difference between the two discordant probabilities (which is also the difference between the response rates of the two treatments) δ = π01 − π10 = πt − πc ; 22.2 McNemar’s Test 381 <<< Contents 22 * Index >>> Binomial Superiority One-Sample 2. The sum of the two discordant probabilities ξ = π10 + π01 . East accepts these two parameters as inputs at the design stage. We next specify the test statistic to be used during the interim monitoring stage. Suppose we intend to execute McNemar’s test a maximum of K times in a group sequential setting. Let the cumulative data up to and including the j th interim look consist of N (j) matched pairs arranged in the form of the following 2 × 2 contingency table of counts: Table 22.2: 2 × 2 Contingency Table of Counts of Matched Pairs at Look j Control No Response Response Total Probability Experimental No Response Response n00 (j) n01 (j) n10 (j) n11 (j) c0 (j) c1 (j) Total Probability r0 (j) r1 (j) N (j) For a = 0, 1 and b = 0, 1 define π̂ab (j) = nab (j) N (j) (22.3) Then the sequentially computed McNemar test statistic at look j is Zj = δ̂j se(δ̂j ) (22.4) where δ̂j = π̂01 (j) − π̂10 (j) (22.5) and p se(δ̂j ) = 382 [n01 (j) + n10 (j)] N (j) 22.2 McNemar’s Test – 22.2.1 Trial Design (22.6) <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 We now show how to use East to design and monitor a clinical trial based on McNemar’s test. 22.2.1 Trial Design Consider a trial in which we wish to determine whether a transdermal delivery system (TDS) can be improved with a new adhesive. Subjects are to wear the old TDS (control) and new TDS (experimental) in the same area of the body for one week each. A response is said to occur if the TDS remains on for the entire one week observation period. From historical data, it is known that control has a response rate of 85% (πc = 0.85). It is hoped that the new adhesive will increase this to 95% (πt = 0.95). Furthermore, of the 15% of the subjects who did not respond on the control, it is hoped that 87% will respond on the experimental system. That is, π01 = 0.87 × 0.15 = 0.13. Based on these data, we can fill in all the entries of Table 22.1 as displayed in Table 22.2. Table 22.3: McNemar Probabilities for the TDS Trial Control No Response Response Total Probability Experimental No Response Response 0.02 0.13 0.03 0.82 0.05 0.95 Total Probability 0.15 0.85 1 Although it is expected that the new adhesive will increase the adherence rate, the comparison is posed as a two-sided testing problem, testing H0 : πc = πt against H1 : πc 6= πt at the 0.05 level. We wish to determine the sample size to have 90% power for the values displayed in Table 22.3. To design this trial, click Design tab, then Single Sample on the Discrete group, and then click McNemar’s Test for 22.2 McNemar’s Test – 22.2.1 Trial Design 383 <<< Contents 22 * Index >>> Binomial Superiority One-Sample Matched Pairs. Single-Look Design First, consider a study with no interim analyses, and 90% power for two sided test at α = 0.05. Choose the design parameters as shown below. We first consider a single-look design, so leave the default value for Number of Looks to 1. Enter 0.9 for Power. As shown in Table 22.2, we must specify δ1 = πt − πc = 0.1 and ξ = π01 + π10 = 0.16. Click Compute. The design Des1 is shown as a row in the Output Preview located in the lower pane of this window. A total of 158 subjects is required to have 90% power. You can select this design by clicking anywhere on the row in the Output Preview. 384 22.2 McNemar’s Test – 22.2.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Click on icon to get the output summary displayed in the upper pane. In the Output Preview toolbar, click the icon to save this design Des1 to workbook Wbk1 in the Library. If you hover the cursor over Des1 in the Library, a tooltip will appear that summarizes the input parameters of the design. Five-Look Design Now consider the same design with a maximum of 5 looks, using the default Lan-DeMets (O’Brien-Fleming) spending function. Create a new design by selecting Des1 in the Library, and clicking icon on the Library toolbar. Change the Number of Looks from 1 to 5, to generate a study with four interim looks and a final analysis. A new tab Boundary will appear. Clicking on this tab will reveal the stopping boundary parameters. By default, the Spacing of Looks is set to Equal, which means that the interim analyses will be equally spaced in terms of the number of patients accrued between looks. The left side contains details for the Efficacy boundary, and the right side for the Futility boundary. By default, there is an efficacy boundary (to reject H0) selected, but no futility boundary (to reject H1). The Boundary Family specified is of the Spending Functions type. The default Spending function is the Lan-DeMets (Lan & DeMets, 1983), with Parameter OF (O’Brien-Fleming), which generates boundaries that are very similar, though not identical, to the classical stopping boundaries of O’Brien and Fleming (1979). 22.2 McNemar’s Test – 22.2.1 Trial Design 385 <<< Contents 22 * Index >>> Binomial Superiority One-Sample Technical details of these stopping boundaries are available in Appendix F. Click Compute to generate output for Des2. With Des2 selected in the Output Preview, click the icon to save Des2 to the Library. In the Library, select the nodes for both Des1 and Des2, by holding the Ctrl key, and then click the icon. The upper pane will display the output summary of the two designs side-by-side: There has been a slight inflation in the maximum sample size, from 158 to 162. However, the expected sample size is 120 subjects if the alternative hypothesis of δ1 = 0.10 and ξ = 0.16 holds. The stopping boundary, spending function, and Power 386 22.2 McNemar’s Test – 22.2.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 vs. Sample Size charts can all be displayed by clicking on the appropriate icons from the Library toolbar. 22.2.2 Interim Monitoring Consider interim monitoring of Des2. Select Des2 in the Library, and click icon from the Library toolbar. Alternatively, right-click on Des2 and select Interim Monitoring. A new IM worksheet will appear. Suppose, that the results are to be analyzed after results are available for every 32 subjects. After the first 32 subjects were enrolled, one subject responded on the control arm and did not respond on the treatment arm; four subjects responded on the treatment arm but did not respond on the control arm; 10 subjects did not respond on either treatment; 17 subjects responded on both the arms. This information is sufficient to complete all the entries in Table 22.3 and hence to evaluate the test statistic value. Click icon to invoke the Test Statistic Calculator. In the box next to Cumulative Sample Size enter 32. Enter the values in the table as shown below and click Recalc. 22.2 McNemar’s Test – 22.2.2 Interim Monitoring 387 <<< Contents 22 * Index >>> Binomial Superiority One-Sample Clicking OK results in the following entry in the first look row. As you can see the value of the test statistic, 1.342, is within the stopping boundaries, (4.909,-4.909). Thus, the trial continues. The second interim analysis was performed after data were available for 64 subjects. A total of two subjects responded on the control arm and failed to respond on the treatment arm; seven subjects responded on the treatment arm and failed to respond on the control arm; 20 subjects responded on neither arm; 35 subjects responded on both the arms. Click on the second row in the table in the upper section. Then click icon. Enter the appropriate values in the table as shown below and click Recalc. 388 22.2 McNemar’s Test – 22.2.2 Interim Monitoring <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Then click OK. This results in the following screen. At the third interim analysis, after 96 subjects were enrolled, a total of two subjects responded on the control arm and failed to respond on the treatment arm; 13 subjects responded on the treatment arm and failed to respond on the control arm; 32 subjects did not respond on either arm; 49 subjects responded on both the arms. Click on the third row in the table in the upper section. Then click icon. Enter the appropriate values in the table as shown below and click Recalc. 22.2 McNemar’s Test – 22.2.2 Interim Monitoring 389 <<< Contents 22 * Index >>> Binomial Superiority One-Sample Then click OK. This results in the following message box. Clicking on Stop yields the following Interim Monitoring output. We reject the null hypothesis that δ = 0, based on these data. 22.2.3 Simulation Des2 can be simulated to examine the properties for different values of the parameters. First, we verify the results under the alternative hypothesis at which the power is to be controlled, namely δ1 =0.10 and ξ=0.16. Select Des2 in the Library, and click 390 22.2 McNemar’s Test – 22.2.3 Simulation icon from Library toolbar. Alternatively, <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 right-click on Des2 and select Simulate. A new Simulation worksheet will appear. Click Simulate to start the simulation. Once the simulation run has completed, East will add an additional row to the Output Preview labeled Sim1. Select Sim1 in the Output Preview. If you click icon, you will see some of the simulation output details displayed in the upper pane. Click icon to save it to the Library. Double-click on Sim1 in the Library. The simulation output details will be displayed 22.2 McNemar’s Test – 22.2.3 Simulation 391 <<< Contents 22 * Index >>> Binomial Superiority One-Sample as shown below. The results confirm that the power is at about 90%. To confirm the results under the null hypothesis, set δ1 = 0 in the Response Generation tab in the simulation worksheet and then click Simulate. The results, which confirm that the type-1 error rate is approximately 5%, are given below. 392 22.2 McNemar’s Test – 22.2.3 Simulation <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 While it is often difficult to specify the absolute difference of the discordant probabilities, δ1 , it is even more difficult to specify the sum of the discordant probabilities, ξ. Simulation can be used to examine the effects of misspecification of ξ. Run the simulations again, now with δ1 =0.10 and ξ=0.2. The results are given below. Notice that this provides a power of approximately 81%. Larger values of ξ would further decrease the power. However, values of ξ > 0.2 with δ1 = 0.1 would be inconsistent with the initial assumption of πc = 0.85 and πt =0.95. Additional simulations for various values of δ and ξ can provide information regarding the consequences of misspecification of the input parameters. 22.2 McNemar’s Test 393 <<< Contents * Index >>> 23 Binomial Superiority Two-Sample In experiments based on binomial data, the aim is to compare independent samples from two populations in terms of the proportion of sampling units presenting a given trait. In medical research, outcomes such as the proportion of patients responding to a therapy, developing a certain side effect, or requiring specialized care, would satisfy this definition. East supports the design, simulation, and interim monitoring of clinical trials in which this comparison is based on the difference of proportions, the ratio of proportions, or the odds ratio of the two populations. The three cases are discussed in the following sections. 23.1 Difference of Two Binomial Proportions 23.1.1 Trial Design 23.1.2 Interim Monitoring 23.1.3 Pooled versus Unpooled Designs Let πc and πt denote the binomial probabilities for the control and treatment arms, respectively, and let δ = πt − πc . Interest lies in testing the null hypothesis that δ = 0 against one and two-sided alternatives. A special characteristic of binomial designs is the dependence of the variance of a binomial random variable on its mean. Because of this dependence, even if we keep all other test parameters the same, the maximum sample size required to achieve a specified power will be affected by how we intend to standardize the difference of binomial response rates when computing the test statistic at the interim monitoring stage. There are two options for computing the test statistic – use either the unpooled or pooled estimate of variance for standardizing the observed treatment difference. Suppose, for instance, that at the jth interim look the observed response rate on the treatment arm is π̂tj , and the observed response rate on the control arm is π̂cj . Let ntj and ncj be the number of patients on the treatment and control arms, respectively. Then the test statistic based on the unpooled variance is (u) Zj =q π̂tj − π̂cj π̂tj (1−π̂tj ) ntj + π̂cj (1−π̂cj ) ncj (23.1) . In contrast, the test statistic based on the pooled variance is (p) Zj =q where π̂j = (p) π̂tj − π̂cj π̂j (1 − π̂j )[ n1tj + 1 ncj ] ntj π̂tj + ncj π̂cj . ntj + ncj , (23.2) (23.3) It can be shown that [Zj ]2 is the familiar Pearson chi-square statistic computed from all the data accumulated by the jth look. 394 23.1 Difference of Two Binomials <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 The maximum sample size required to achieve a given power depends on whether, at the interim monitoring stage, we intend to use the unpooled statistic (23.1) or the pooled statistic (23.2) to determine statistical significance. The technical details of the sample size computations for these two options are given in Appendix B, Section B.2.5. The CAPTURE clincial trial is designed in Section 23.1.1 and monitored in Section 23.1.2 under the assumption that the unpooled statistic will be used for interim monitoring. In Section 23.1.3, however, the same trial is re-designed, on the basis of the pooled variance. It is seen that the difference in sample size due to the two design assumptions is almost negligible. This is because the CAPTURE trial utilized balanced randomization. We show further in Section 23.1.3 that if the randomization is unbalanced, the difference in sample size based on the two design assumptions can be substantial. 23.1.1 Trial Design Design objectives and interim results from CAPTURE, a prospective randomized trial of placebo versus Abciximab for patients with refractory unstable angina were presented at a workshop on clinical trial data monitoring committees (Anderson, 2002). The primary endpoint was reduction in death or MI within 30 days of entering the study. The study was designed for 80% power to detect a reduction in the event rate from 15% on the placebo arm to 10% on the Abciximab arm. A two-sided test with a type-1 error of 5% was used. We will illustrate various design and interim monitoring features of East for studies with binomial endpoints with the help of this example. Thereby this example can serve as a model for designing and monitoring your own binomial studies. Single Look Design To begin, click Design tab, then Two Samples on the Discrete group, and then click Difference of Proportions. The goal of this study is to test the null hypothesis, H0 , that the Abciximab and placebo arms both have an event rate of 15%, versus the alternative hypothesis, H1 , that Abciximab reduces the event rate by 5%, from 15% to 10%. It is desired to have a two sided test with three looks at the data, a type-1 error of α = 0.05 and a power of (1 − β) = 0.8. Choose the test parameters as shown below. We first consider a single-look design, so leave the default value for Number of Looks to 1. Enter 0.8 for the Power. To specify the appropriate effect size, enter 0.15 for the Prop. Under Control and 0.10 for the Prop. Under Treatment. Notice that you have the option to select the manner in which the test statistic will be standardized at the hypothesis testing stage. If you choose Unpooled Estimate, the standardization will be according to equation (23.1). 23.1 Difference of Two Binomials – 23.1.1 Trial Design 395 <<< Contents 23 * Index >>> Binomial Superiority Two-Sample If you choose Pooled Estimate, the standardization will be according to equation (23.2). For the present, choose the Unpooled Estimate option. The other choice in this dialog box is whether or not to use the Casagrande-Pike-Smith (1978) correction for small sample sizes. This is not usually necessary as can be verified by the simulation options in East. The dialog box containing the test parameters will now look as shown below. Next, click Compute button. The design is shown as a row in the Output Preview located in the lower pane of this window. The computed sample size (1366 subjects) is highlighted in yellow. You can select this design Des1 by clicking anywhere on the row in the Output Preview. Now you can click icon to see the output summary displayed in the icon to save this design Des1 upper pane. In the Output Preview toolbar, click to Workbook Wbk1 in the Library. If you hover the cursor over Des1 in the Library, a 396 23.1 Difference of Two Binomials – 23.1.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 tooltip will appear that summarizes the input parameters of the design. With Des1 selected in the Library, click the icon on the Library toolbar, and the click Power vs Treatment Effect (δ). The resulting power curve for this design is shown. You can save this chart to the Library by clicking Keep. Alternatively, you can export the chart in one of several image formats (e.g., Bitmap or JPEG) by clicking Save As.... For now, you may close the chart before continuing. Group Sequential Design Create a new design by selecting Des1 in the Library, 23.1 Difference of Two Binomials – 23.1.1 Trial Design 397 <<< Contents 23 * Index >>> Binomial Superiority Two-Sample and clicking icon on the Library toolbar. Change the Number of Looks from 1 to 3, to generate a study with two interim looks and a final analysis. A new tab Boundary will appear. Clicking on this tab will reveal the stopping boundary parameters. By default, the Spacing of Looks is set to Equal, which means that the interim analyses will be equally spaced in terms of the number of patients accrued between looks. The left side contains details for the Efficacy boundary, and the right side for the Futility boundary. By default, there is an efficacy boundary (to reject H0) selected, but no futility boundary (to reject H1). The Boundary Family specified is of the Spending Functions type. The default Spending function is the Lan-DeMets (Lan & DeMets, 1983), with Parameter OF (O’Brien-Fleming), which generates boundaries that are very similar, though not identical, to the classical stopping boundaries of O’Brien and Fleming (1979). Technical details of these stopping boundaries are available in Appendix F. Click Boundary tab to see the details of cumulative alpha spent, and the boundary values, in the Look Details table. Click Compute to generate output for a new design Des2. The 3-look group sequential design displayed in Des2 requires an upfront commitment of up to a maximum of 1384 patients. That is 18 patients more than the fixed sample design displayed in Des1. Notice, however, that under the alternative hypothesis of a 5% drop in the event rate, the expected sample size is only 1183 patients – a saving of 201 patients relative to the fixed sample design. This is because the test statistic could cross a stopping boundary at one of the interim looks. With Des2 selected in the Output Preview, click icon to save Des2 to the Library. In order to see the stopping probabilities, as well as other characteristics, 398 23.1 Difference of Two Binomials – 23.1.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 select Des2 in the Library, and click icon. The cumulative boundary stopping probabilities are shown in the Stopping Boundaries table. Close the Output window before continuing. The stopping boundary chart can be brought up by clicking icon on the Library toolbar, and then clicking Stopping Boundaries. The following chart will appear. Lan-DeMets Spending Function: O’Brien-Fleming Version Close this chart, and click icon in the Library toolbar and then Error 23.1 Difference of Two Binomials – 23.1.1 Trial Design 399 <<< Contents 23 * Index >>> Binomial Superiority Two-Sample Spending The following chart will appear. This spending function was proposed by Lan and DeMets (1983), and for two-sided tests has the following functional form : zα/4 α(t) = 4 − 4Φ √ . (23.4) t Notice that hardly any type-1 error is spent in the early stages of the trial but the rate of error spending increases rapidly as the trial progresses. This is reflected in the corresponding stopping boundaries. The upper and lower boundary values are rather wide apart initially (±3.712 standard deviations) but come closer together with each succeeding interim look until at the last look the standardized test statistic crosses the boundary at ±1.993 standard deviations. This is not too far off from the corresponding boundary values, ±1.96, required to declare statistical significance at the 0.05 level for a fixed sample design. For this reason this spending function is often adopted in preference to other spending functions that spend the type-1 error more aggressively and thereby reduce the expected sample size under H1 by a greater amount. Lan-DeMets Spending Function: Pocock Version A more aggressive spending function, also proposed by Lan and DeMets (1983), is PK which refers to Pocock. This spending function captures the spirit of the Pocock (1977) stopping boundary belonging to the Wang and Tsiatis (1987) power family, and 400 23.1 Difference of Two Binomials – 23.1.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 has the functional form α(t) = α log(1 + (e − 1)t) . (23.5) Select Des2 in the Library, and click icon on the Library toolbar. On the Boundary tab, change the Parameter from OF to PK, and click Compute to create design Des3. With Des3 selected in the Output Preview, click the icon. In the Library, select the nodes for both Des2 and Des3, by holding the Ctrl key, and then click the side-by-side: icon. The upper pane will display the details of the two designs Under Des3, you must make an up-front commitment of up to 1599 patients, considerably more than you would need for a fixed sample design. However, because the type-1 error is spent more aggressively in the early stages, the expected sample size is only 1119 patients. For now, close this output window, and click icon on the Library toolbar to 23.1 Difference of Two Binomials – 23.1.1 Trial Design 401 <<< Contents 23 * Index >>> Binomial Superiority Two-Sample compare the two designs according to Power vs. Sample Size. Using the same icon, select Stopping Boundaries. Notice, by moving the cursor from right to left in the stopping boundary charts, that the stopping boundary derived from the PK spending function is approximately flat, requiring ±2.28 standard deviations at the first look and ±2.29 standard deviations at the second and ±2.30 third looks. In contrast, the stopping boundary derived from the OF spending function requires ±3.71 standard deviations at the first look, ±2.51 standard deviations at the second look and ±1.99 standard deviations at the third look. This translates into a smaller expected sample size under H1 for Des3 than for Des2. This advantage is, however, offset by at least two drawbacks of the stopping boundary derived from the PK spending function; the large up-front commitment of 1599 patients, and the large standardized test statistic of 2.295 (corresponding to a two-sided p value of 0.0217) 402 23.1 Difference of Two Binomials – 23.1.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 required at the last look in order to declare statistical significance. Using the same icon, select Error Spending to compare the two designs graphically in terms of error spending functions. Des3 (PK) spends the type-1 error probability at a much faster rate than Des2 (OF). Close the chart before continuing. 23.1 Difference of Two Binomials – 23.1.1 Trial Design 403 <<< Contents 23 * Index >>> Binomial Superiority Two-Sample Wang and Tsiatis Power Boundaries The stopping boundaries generated by the Lan-Demets OF and PK functions closely resemble closely the classical O’Brien-Fleming and Pocock stopping boundaries, respectively. These classical boundaries are a special case of a family of power boundaries proposed by Wang and Tsiatis (1987). For a two-sided α level test, using K equally spaced looks, the power boundaries for the standardized test statistic Zj at the j-th look are of the form C(∆, α, K) Zj ≥ . (23.6) (j/K)0.5−∆ The normalizing constant C(∆, α, K) is evaluated by recursive integration so as to ensure that the K-look group sequential test has type-1 error equal to α. Select Des3 in the Library and click on the Library toolbar. On the Boundary tab, change the Boundary Family from Spending Functions to Wang-Tsiatis. Leave the default value of ∆ as 0 and click Compute to create design Des4. With Des4 selected in the Output Preview, click the icon. In the Library, select both Des2 and Des4 by holding the Ctrl key. Click icon, and under Select Chart on the right, select Stopping Boundaries. As expected, the boundary values for Des2 (Lan-Demets, OF) and Des4 (Wang-Tsiatis, ∆ = 0) are very similar. 404 23.1 Difference of Two Binomials – 23.1.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Close the chart before continuing. The Power Chart and the ASN Chart East provides some additional tools for evaluating study designs. Select Des3 in the Library, click the icon, and then click Power vs. Treatment effect (δ). By scrolling from left to right with the vertical line cursor, one can observe the power for various values of the effect size. Close this chart, and with Des3 selected, click the 23.1 Difference of Two Binomials – 23.1.1 Trial Design icon again. Then click 405 <<< Contents 23 * Index >>> Binomial Superiority Two-Sample Expected Sample Size. The following chart appears: This chart displays the Expected Sample Size as a function of the effect size and confirms that for Des3 the average sample size is 1566 under H0 (effect size, zero) and 1120 under H1 (effect size, -0.05). Unequally spaced analysis time points In the above designs, we have assumed that analyses were equally spaced. This assumption can be relaxed if you know when interim analyses are likely to be performed (e.g. for administrative reasons). In either case, departures from this assumption are allowed during the actual interim monitoring of the study, but sample size requirements will be more accurate if allowance is made for this knowledge. icon. Under Spacing of Looks in With Des3 selected in the Library, click the the Boundary tab, click the Unequal radio button. The column titled Info. Fraction in the Look Details table can be edited to modify the relative spacing of the analyses. The information fraction refers to the proportion of the maximum (yet unknown) sample size. By default, this table displays equal spacing, but suppose that the two interim analyses will be performed with 0.25 and 0.5 (instead of 0.333 and 0.667) of the maximum sample size. Enter these new information fraction values and click Compute to create design Des5. Select Des5 in the Output Preview and click icon to save it in the Library for now. Arbitrary amounts of error probability to be spent at each analysis 406 23.1 Difference of Two Binomials – 23.1.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Another feature of East is the possibility to specify arbitrary amounts of cumulative error probability to be used at each look. This option can be combined with the option of unequal spacing of the analyses. With Des5 selected in the Library, click the icon on the Library toolbar. Under the Boundary tab, select Interpolated for the Spending Function. In the column titled Cum. α Spent, enter 0.005 for the first look and 0.03 for the second look, and click Compute to create design Des6. Select Des6 in the Output Preview and click icon. From the Library, select Des5 and Des6 by holding the Ctrl key. Click icon, and under Select Chart on the right, select Stopping Boundaries. The following chart will be displayed. Computing power for a given sample size When sample size is a given design constraint, East can compute the achieved power, given the other test parameters. Select Des6 in the Library and click icon. On the Test Parameters tab, click the radio button for Power(1 − β). You will notice that 23.1 Difference of Two Binomials – 23.1.1 Trial Design 407 <<< Contents 23 * Index >>> Binomial Superiority Two-Sample the field for power will contain the word Computed. You may now enter a value for the sample size: 1250, and click Compute. The following output will appear in Output Preview in Des7 row, where, as expected, the achieved power is less than 0.9, namely 0.714. To delete this design, click Des7 in the Output Preview, and click icon in the textOutput Preview toolbar. East will display a warning to make sure that you want to delete the selected row. Click Yes to continue. Stopping Boundaries for Early Rejection of H0 or H1 Although both Des2 and Des3 reduce the expected sample size substantially by rejecting H0 when H1 is true, they are unable to do so if H0 is true. It is, however, often desirable to terminate a study early if H0 is true since that would imply that the new treatment is no different than the standard treatment. East can produce stopping boundaries that result in early termination either under either H0 or H1 . Stopping boundaries for early termination if 408 23.1 Difference of Two Binomials – 23.1.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 H1 is true are known as efficacy boundaries. They are obtained by choosing an appropriate α-spending function. These boundaries ensure that the type 1 error does not exceed the pre-specified significance level α. East can also construct stopping boundaries for rejecting H1 and terminating early if H0 is true. These stopping boundaries are known as futility boundaries. They are obtained by choosing an appropriate β spending function. These boundaries ensure that the type 2 error does not exceed β and thereby ensure that the power of the study is preserved at 1 − β despite the possibility of early termination for futility. Pampallona and Tsiatis (1994) have extended the error spending function methodology of Lan and DeMets (1983) so as to spend both α, the type-1 error, and β, the type-2 error, and thereby obtain efficacy and futility boundaries simultaneously. East provides you with an entire catalog of published spending functions from which you can take your pick for generating both the H0 and H1 boundaries. For various reasons, investigators usually prefer to be very conservative about early stopping for efficacy but are likely to be more aggressive about cutting their losses and stopping early for futility. Suppose then that you wish to use the conservative Lan-DeMets (OF) spending function for early termination to reject H0 in favor of H1 , but use a more aggressive spending function for early termination to reject H1 in favor of H0 . Possible choices for spending functions to reject H1 that are more aggressive than Lan-DeMets(OF) but not as aggressive as Lan-DeMets(PK) are members of the Rho family (Jennison and Turnbull, 2000) and the Gamma family (Hwang, Shih and DeCani, 1990). For illustrative purposes we will use the Gamam(-1) spending function from the Gamma family. Select Des2 in the Library and click icon. For the futility boundary on the Boundary tab, select Spending Functions and then select Gamma Family. Set the Parameter to −1. Also, click on the Binding option to the right. The screen 23.1 Difference of Two Binomials – 23.1.1 Trial Design 409 <<< Contents 23 * Index >>> Binomial Superiority Two-Sample will look like this: On the Boundary tab, you may click icon, or icon to view plots of the error spending functions, or stopping boundaries, respectively. Observe that the β-spending function (upper in red) spends the type-2 error 410 23.1 Difference of Two Binomials – 23.1.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 substantially faster than the α-spending function (lower in blue). These stopping boundaries are known as inner-wedge stopping boundaries. They divide the sample space into three zones corresponding to three possible decisions. If the test statistic enters the lower blue zone, we terminate the trial, reject H0 , and conclude that the new treatment (Abciximab) is beneficial relative to the placebo. If the test statistic enters the upper blue zone, we terminate the trial, reject H0 , and conclude that the new treatment is harmful relative to the placebo. If the test statistic enters the center (pink) zone, we terminate the trial, reject H1 , and conclude that Abciximab offers no benefit relative to placebo. Assuming that the event rate is 0.15 for the placebo arm, this strategy has a 2.5% chance of declaring benefit and a 2.5% chance of declaring harm when the event rate for the Abciximax arm is also 0.15. Furthermore this strategy has a 20% chance of entering the pink zone and declaring no benefit when there actually is a substantial benefit with Abciximax, resulting in a drop in the event rate from 0.15 to 0.1. In other words, Des7 has a two-sided type-1 error of 5% and 80% power. Click Compute and with Des7 selected in the Output Preview, click the icon. To view the design details, click the icon. Des7 requires an up-front commitment of 1468 patients, but the expected sample size is 1028 patients under H0 , and 1164 patients under H1 . You may wish to save this output (e.g., in HTML format) by clicking on the icon, or to print by clicking on the 23.1 Difference of Two Binomials – 23.1.1 Trial Design icon. Close the 411 <<< Contents 23 * Index >>> Binomial Superiority Two-Sample output window before continuing. Boundaries with Early Stopping for Benefit or Futility Next suppose you are interested in designing the clinical trial in such a way that you can reach only two conclusions, not three. You wish to demonstrate either that Abciximab is beneficial relative to placebo or that it offers no benefit relative to placebo, but there is no interest in demonstrating that Abciximab is harmful relative to placebo. To design this two-decision trial select Des7 in the Library and click the icon. Change the entry in the Test Type cell from 2-Sided to 1-Sided. Check to ensure other specifications are same as in Des7. Click Compute to generate the design. The error spending functions are the same but this time the stopping boundaries divide 412 23.1 Difference of Two Binomials – 23.1.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 the sample space into two zones only as shown below. If the test statistic enters the lower (blue) zone, the null hypothesis is rejected in favor of concluding that Abciximab is beneficial relative to placebo. The probability of this event under H0 is 0.05. If the test statistic enters the upper (pink) zone the alternative hypothesis is rejected in favor of concluding that Abciximab offers no benefit relative to placebo. The probability of this event under H1 is 0.2. In other words, Design8 has a one sided type-1 error rate of 5% and 80% power. Since Design8 precludes the possibility of demonstrating that Abciximab is harmful relative to placebo, it requires far fewer patients. It only requires an up-front commitment of 1156 patients and the expected sample size is 681 if H0 is true and 892 if H1 is true. Before continuing to the next section, we will save the current workbook, and open a new workbook. Select the workbook node in the Library and Click the button in the top left hand corner, and click Save. Alternatively, select Workbook1 in the Library and right-click, then click Save. This saves all the work done so far on your directory. Next, click the button, click New, and then Workbook. A new workbook, Wbk2, should appear in the Library. Next, close the window to clear all designs from the Output Preview. 23.1 Difference of Two Binomials – 23.1.1 Trial Design 413 <<< Contents 23 * Index >>> Binomial Superiority Two-Sample Multiple designs for discrete outcomes East allows the user to easily create multiple designs by specifying a range of values for certain parameters in the design window. In studies with discrete outcomes, East supports the input of multiple key parameters at once to simultaneously create a number of different designs. For example, suppose in a multi-look study the user wants to generate designs for all combinations of the following parameter values in a two sample Difference of Proportions test: Power = 0.8 and 0.9, and Alternative Hypothesis - Prop. under Treatment = 0.4, 0.5 and 0.6. The number of combinations is 2 x 3 = 6. East creates all permutations using only a single specification under the Test Parameters tab in the design window. As shown below, the values for Power are entered as a list of comma separated values, while the Prop. under Treatment for the alternative hypothesis are entered as a colon separated range of values, 0.4. to 0.6 in steps of 0.1. East computes all 6 designs and displays them in the Output Preview window: East provides the capability to analyze multiple designs in ways that make comparisons between the designs visually simple and efficient. To illustrate this, a selection of a few of the above designs can be viewed simultaneously in both the Output Summary section as well as in the various tables and plots. The following is a subsection of the designs computed from the above example with differing values for number of looks, power and proportion under treatment. Designs are displayed side by side, allowing details to be easily compared. Save these designs in the newly created workbook. 414 23.1 Difference of Two Binomials – 23.1.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 In addition East allows multiple designs to be viewed simultaneously either graphically or in tabular format: Stopping Boundaries (table) 23.1 Difference of Two Binomials – 23.1.1 Trial Design 415 <<< Contents 23 * Index >>> Binomial Superiority Two-Sample Error Spending (table) Stopping Boundaries (plot) 416 23.1 Difference of Two Binomials – 23.1.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Power vs. Treatment Effect (plot) This capability allows the user to explore a greater space of possibilities when determining the best choice of study design. Select individual looks With Des8 selected in Wbk1, click icon. In the Spacing of Looks table of the Boundary tab, notice that there are ticked checkboxes under the columns Stop for Efficacy and Stop for Futility. East gives you the flexibility to remove one of the stopping boundaries at certain looks, subject to the following constraints: (1) both boundaries must be included at the final two looks, (2) at least one boundary, either efficacy or futility, must be present at each look, (3) once a boundary has been selected all subsequent looks must include this boundary as well and (4) efficacy boundary for the penultimate look cannot be absent. Untick the checkbox in the first look under the Stop for Futility column. 23.1 Difference of Two Binomials – 23.1.1 Trial Design 417 <<< Contents 23 * Index >>> Binomial Superiority Two-Sample Click Recalc, and click icon to view the new boundaries. Notice that the futility boundary does not begin until the second look. Simulation Tool Let us verify the operating characteristics of Des8 from Wkbk1 through Simulations. Select Des8 in the Library, and click icon from Library toolbar. Alternatively, right-click on Des8 and select Simulate. A new Simulation worksheet will appear. Let us first verify, by running 10,000 simulated clinical trials that the type-1 error is indeed 5%. That is, we must verify that if the event rate for both the placebo and treatment (Abciximab) arms is 0.15, only about 500 of these simulations will reject H0 . Click on the Response Generation tab, and change the entry in the cell labeled Prop. Under Treatment from 0.1 to 0.15. 418 23.1 Difference of Two Binomials – 23.1.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Next, click Simulate to start the simulation. Once the simulation run has completed, East will add an additional row to the Output Preview labeled Sim1. Select Sim1 in the Output Preview. Click icon to save it to the Library. Double-click on Sim1 in the Library. The simulation output details will be displayed. In the Deatils output, notice that 487 of the 10,000 simulations rejected H0 . (This number might vary, depending on the starting seed used for the simulations.) This confirms that the type-1 error is preserved (up to Monte Carlo accuracy) by these stopping boundaries. Next, run 10,000 simulations under the alternative hypothesis H1 that the event rate for placebo is 0.15 but the event rate for Abciximab is 0.1. Right-click Sim1 in the Library and click Edit Simulation. In the Response Generation tab, enter 0.10 for Prop. Under Treatment. Leave all other values as they are, and click Simulate to create output Sim2. Select Sim2 in the Output Preview and save it to Workbook Wbk1. In the Overall Simulation Result table, notice that the lower efficacy stopping boundary was crossed in 7996 out of 10000 simulated trials, which is consistent with 80% power (up to Monte Carlo accuracy) for the original design. Moreover, 393 of these simulations were able to reject the null hypothesis at the very first look. Feel free to 23.1 Difference of Two Binomials – 23.1.1 Trial Design 419 <<< Contents 23 * Index >>> Binomial Superiority Two-Sample experiment further with other simulation options before continuing. 23.1.2 Interim Monitoring The spending functions discussed above were for illustrative purposes only. They were not used in the actual CAPTURE trial. Instead, the investigators created their own spending function which is closely approximated by the Gamma spending function of Hwang, Shih and DeCani (1990) with parameter −4.5. The investigators then used this spending function to generate two-sided boundaries for early stopping only to reject H0 . Moreover since it was felt that the trial would enroll patients rapidly, the study was designed for three unequally spaced looks; one interim analysis after 25% enrollment, a second interim analysis after 50% enrollment, and a final analysis after all the patients had enrolled. icon. In the Boundary To design this trial, select Des2 in the Library and click tab, in the Efficacy box, set Spending Function to Gamma Family and change the Parameter (γ) to −4.5. In the Futility Box, make sure Boundary Family is set to None. Click the radio button for Unequal in the Spacing of Looks box. In the Looks Details table change the Info. Fraction to 0.25 and 0.50 for Looks 1 and 2, 420 23.1 Difference of Two Binomials – 23.1.2 Interim Monitoring <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 respectively. Click Comptue. In the Output Preview toolbar, click icon to save this design to Wbk1 in the Library. Select Des9 in the Library, and click the icon from the Library toolbar. Alternatively, right-click on Des9 and select Interim Monitoring dashboard. The interim monitoring dashboard contains various controls for monitoring the trial, and is divided into two sections. The top section contains several columns for displaying output values based on the interim inputs. The bottom section contains four charts, each with a corresponding table to its right. These charts provide graphical and numerical descriptions of the progress of the clinical trial and are useful tools for decision making by a data monitoring committee. Click on the icon to invoke the Test Statistic Calculator. The first interim look was taken after accruing a total of 350 patients, 175 per treatment arm. There were 30 events on the placebo arm and 14 on the Abciximab arm. Based on these data, the event rate for placebo is 30/175 = 0.17143 and the event rate for Abciximab is 14/175 = 0.08. Hence the estimate of δ = 0.08 − 0.17143 = −0.09143. The unpooled estimate of the SE of δ̂ is r (14/175)(161/175) (30/175)(145/175) + = 0.035103. (23.7) 175 175 So the value of test statistic is δ̂ −0.09143 = = −2.60457 SE 0.035103 (23.8) We will use the test statistic calculator and specify the values of δ̂ and SE in the same. 23.1 Difference of Two Binomials – 23.1.2 Interim Monitoring 421 <<< Contents 23 * Index >>> Binomial Superiority Two-Sample The test statistic calculator will then compute the test statistic value and post it into the interim monitoring sheet. This process will ensure that the RCI and final adjusted estimates will be computed using the estimates of δ and SE obtained from the observed data. Click on the Estimate of δ and Std. Error of δ radio button. Type in (14/175) − (30/175) for Estimate of δ. The Estimate of δ is computed as −0.091429. We can then enter the expression given by (23.7) for the Std. Error of Estimate of δ. Click on Recalc to get the Test Statistic value, then OK to continue. The top panel of the interim monitoring worksheet displays upper and lower stopping boundaries and upper and lower 95% repeated confidence intervals. The lower stopping boundary for rejecting H0 is -3.239. Since the current value of the test statistic is -2.605, the trial continues. The repeated confidence interval is (−0.205, 0.022). We thus conclude, with 95% confidence, that Abciximab arm is unlikely to increase the event rate by any more than 2.2% relative to placebo and might actually reduce the event rate by as much as 20.5%. 422 23.1 Difference of Two Binomials – 23.1.2 Interim Monitoring <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Now click on the second row in the table in the upper section. Then click the icon. A second interim look was taken after accruing a total of 700 patients, 353 on placebo and 347 on Abciximab. By this time point there were a total of 55 events on the placebo arm and 37 events on the Abciximab arm. Based on these data, the event rate for placebo is 55/353 = 0.15581 and the event rate for Abciximab is 37/347 = 0.10663. Hence the estimate of δ = 0.10663 − 0.15581 = −0.04918. The unpooled estimate of the SE of δ̂ is r (37/347)(310/347) (55/353)(298/353) + = 0.02544. (23.9) 347 353 So the value of test statistic is −0.04918 δ̂ = = −1.9332 SE 0.02544 (23.10) We will now enter the above values of δ̂ and SE in the test statistic calculator for posting the test statistic value into the interim monitoring sheet. Enter the appropriate values for Cumulative SS and Cumulative Response. Click the Recalc button. The calculator updates the fields - total sample size, δ and SE. 23.1 Difference of Two Binomials – 23.1.2 Interim Monitoring 423 <<< Contents 23 * Index >>> Binomial Superiority Two-Sample The updated sheet is displayed below. At this interim look, the stopping boundary for early rejection of H0 is ±2.868 and the 95% repeated confidence interval is still unable to exclude a difference of zero for the two event rates. Thus the study continues. The Stopping Boundaries chart of the dashboard displays the path traced out by the test statistic in relation to the upper and lower stopping boundaries at the first two interim looks. To expand this chart to full size, click the icon located at the top right of the chart. This full-sized chart displays stopping boundaries that have been recomputed on the basis of the error spent at each look, as shown on the Error Spending chart located at the bottom left of the dashboard. To display this full-sized chart, close the current chart 424 23.1 Difference of Two Binomials – 23.1.2 Interim Monitoring <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 and click the icon on the Error Spending chart. By moving the vertical cursor from left to right on this chart we observe that 0.0012 of the total error was spent by the first interim look and 0.005 of it was spent by the second interim look. Close this chart before continuing. Although this study was designed for two interim looks and one final look, the data monitoring committee decided to take a third unplanned look after accruing 1050 patients, 532 on placebo and 518 on Abciximab. The error spending function methodology permits this flexibility. Both the timing and number of interim looks may be modified from what was proposed at the design stage. East will recompute the new stopping boundaries on the basis of the error actually spent at each look rather than the error that was proposed to be spent. There were 84 events on the placebo arm and 55 events on the Abciximab arm. Hence the estimate of δ = 0.1062 − 0.1579 = −0.05171. The unpooled estimate of the SE of δ is 0.02081. So the value of test statistic is −2.4849. Click the third row of the table in the top portion and then click the icon. Upon entering this summary information, through the test statistic calculator, into the interim 23.1 Difference of Two Binomials – 23.1.2 Interim Monitoring 425 <<< Contents 23 * Index >>> Binomial Superiority Two-Sample monitoring sheet we observe that the stopping boundary is crossed. Press the Stop button and observe the results in the interim monitoring worksheet. The 95% repeated confidence interval is (−0.103, −0.011) and it excludes 0 thus confirming that the null hypothesis should be rejected. Once the study is terminated, East computes a final p-value, confidence interval and median unbiased point estimate, all adjusted for the multiple looks, using a stage wise ordering of the sample space as proposed by Tsiatis, Rosner and Mehta (1984). The adjusted p-value is 0.016. The adjusted confidence interval for the difference in event rates is (−0.092, −0.010) and the median unbiased estimate of the difference in event rates is −0.051. In general, the adjusted confidence interval produced at the end of the study is narrower than the final 426 23.1 Difference of Two Binomials – 23.1.2 Interim Monitoring <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 repeated confidence interval although both intervals provide valid coverage of the unknown effect size. 23.1.3 Pooled versus Unpooled Designs The manner in which the data will be analyzed at the interim monitoring stage should be reflected in the study design. We stated at the beginning of this chapter that the test statistic used to track the progress of a binomial endpoint study could be computed by using either the unpooled variance or the pooled variance to standardize the difference of binomial proportions. The design of the CAPTURE trial in Section 23.1.1 and its interim monitoring in Section 23.1.2 were both performed on the basis of the unpooled statistic. In this section we examine how the design would change if we intended to use the pooled statistic for the interim monitoring. It is seen that the change in sample size is negligible if the randomization is balanced. If, however, an unbalanced randomization rule is adopted, there can be substantial sample size differences between the unpooled and pooled designs. Consider once more the design of the CAPTURE trial with a maximum of K = 3 looks, stopping boundaries generated by the Gamma(-4.5) Gamma family spending function, and 80% power to detect a drop in the event rate from 0.15 on the placebo arm to 0.1 on the Abciximab arm using a two sided level 0.05 test. We now consider the design of this trial on the basis of the pooled statistic. Select Des9 in the Library and click icon. Then under the Test Parameters tab, in the Specify Variance box, select the radio button for Pooled Estimate. Click the Compute button to create Des10. Save Des10 to Wbk1. In the Library 23.1 Pooled Designs – 23.1.3 Pooled versus Unpooled Designs 427 <<< Contents 23 * Index >>> Binomial Superiority Two-Sample select Des9 and Des10 by holding the Ctrl key, and then click on the icon. It is instructive to compare Des9 with Des10. It is important to remember that Des9 utilized the unpooled design while Des10 utilized the pooled design. When we compare Des9 and Des10 side by side we discover that there is not much difference in terms of either the maximum or expected sample sizes. This is usually the case for balanced designs. If, however, we were to change the value of the Allocation Ratio parameter from 1 to 0.333 (which corresponds to assigning 25% of the patients to treatment and 75% to control), then we would find a substantial difference in the sample sizes of the two plans. In the picture below, Des11 utilizes the unpooled design 428 23.1 Pooled Designs – 23.1.3 Pooled versus Unpooled Designs <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 while Des12 utilizes the pooled design. Notice that because of the unbalanced randomization the unpooled design is able to achieve 80% power with 229 fewer patients than the pooled design. Specifically, if we decide to monitor the study with the test statistic (23.2) we need to commit a maximum of 1908 patients (Des12), whereas if we decide to monitor the study with the test statistic (23.1) we need to commit a maximum of only 1679 patients (Des11). We can verify, by simulation that both Des11 and Des12 produce 80% power under the alternative hypothesis. After saving Des11 and Des12 in Workbook1, select Des11 in the Library and click the icon. Next, click the Simulate button. The results are displayed below and demonstrate that the null hypothesis was rejected 7710 times in 10,000 trials (77.10%), 23.1 Pooled Designs – 23.1.3 Pooled versus Unpooled Designs 429 <<< Contents 23 * Index >>> Binomial Superiority Two-Sample very close to the desired 80% power. Next, repeat the procedure for Design12. Observe that once again, the desired power was almost achieved. This time the null hypothesis was rejected 7916 times in 10,000 430 23.1 Pooled Designs – 23.1.3 Pooled versus Unpooled Designs <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 trials (79.77%), just slightly under the desired 80% power. The power advantage of the unpooled design over the pooled design gets reversed if the proportion of patients randomized to the treatment arm is 75% instead of 25%. Edit 23.1 Pooled Designs – 23.1.3 Pooled versus Unpooled Designs 431 <<< Contents 23 * Index >>> Binomial Superiority Two-Sample Des11 and Des12, and change the Allocation Ratio parameter to 3. Now the pooled design (Des14) requires a maximum of 1770 patients whereas the unpooled des (Des13) requires a maximum of 1995 patients. This shows that when planning a binomial study with unbalanced randomization, it is important to try both the pooled and unpooled designs and choose the one that produces the same power with fewer patients. The correct choice will depend on the response rates of the control and treatment arms as well as on the value of the fraction assigned to the treatment arm. 23.2 Ratio of Proportions 23.2.1 Trial Design 23.2.2 Trial Simulation 23.2.3 Interim Monitoring Let πc and πt denote the binomial probabilities for the control and treatment arms, respectively, and let ρ = πt /πc . We want to test the null hypothesis that ρ = 1 against one or two-sided alternatives. It is mathematically more convenient to express this hypothesis testing problem in terms of the difference of the (natural) logarithms. Thus we define δ = ln(πt ) − ln(πc ). On this metric, we are interested in testing H0 : δ = 0 against one or two-sided alternative hypotheses. Let π̂ij denote the estimate of πi based on nij observations from Treatment i, up to and including the j th look, j = 1, . . . K, i = t, c , where a maximum of K looks are to be taken. Then the estimate of δ at the j-th look is δ̂j = ln(π̂tj ) − ln(π̂cj ) 432 23.2 Ratio of Proportions (23.11) <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 with estimated standard error se ˆj ={ (1 − π̂tj ) (1 − π̂cj ) 1/2 + } ntj π̂tj ncj π̂cj (23.12) if we use an unpooled estimate for the variance of δ̂ and estimated standard error se ˆj ={ (1 − π̂j ) −1 1/2 (ntj + n−1 , cj )} π̂j where π̂j = (23.13) ntj π̂tj + ncj π̂cj , ntj + ncj if we use a pooled estimate for the variance of δ̂. In general, for any twice-differentiable function h(.), with derivative h0 (.), h(π̂ij ) is approximately normal with mean h(πi ) and variance [h0 (πi )]2 πi (1 − πi )/nij for large values of nij . Using this asymptotic approximation, the test statistic at the j th look is (u) Zj = ln(π̂tj ) − ln(π̂cj ) , (1−π̂ ) 1/2 (1−π̂tj ) } { ntj π̂tj + ncj π̂cj cj (23.14) i.e. the ratio of (23.11) and (23.12) , if we use an unpooled estimate for the variance of ln(π̂tj ) − ln(π̂cj ) and (p) Zj = ln(π̂tj ) − ln(π̂cj ) (1−π̂j ) −1 1/2 { π̂j (n−1 tj + ncj )} , (23.15) i.e. the ratio of (23.11) and (23.13), if we use a pooled estimate for the variance of ln(π̂tj ) − ln(π̂cj ). 23.2.1 Trial Design Design objectives and interim results were presented from PRISM, a prospective randomized trial of Heparin alone (control arm), Tirofiban alone (monotherapy arm), and Heparin plus Tirofiban (combination therapy arm), at a DIA Workshop on Flexible Trial Design (Snappin, 2003). The composite endpoint was refractory ischemia, myocardial infact or death within seven days of randomization. The investigators were interested in comparing the two Tirofiban arms to the control arm with each test being conducted at the 0.025 level of significance (two sided). It was assumed that the control arm has a 30% event rate. Thus, πt = πc = 0.3 under H0 . The investigators wished to determine the sample size to have power of 80% if there was a 25% decline 23.2 Ratio of Proportions – 23.2.1 Trial Design 433 <<< Contents 23 * Index >>> Binomial Superiority Two-Sample in the event rate, i.e. πt /πc = 0.75. It is important to note that the power of the test depends on πc and πt , not just the ratio, so different values of the pair (πc , πt ) with the same ratio will have different solutions. We will now design a two-arm study that compares the control arm, Heparin, to the combination therapy arm, Heparin plus Tirofiban. First click Design tab, then Two Samples on the Discrete group, and then click Ratio of Proportions. We want to determine the sample size required to have power of 80% when πc =0.3 and ρ = πt /πc =0.75, using a two-sided test with a type 1 error rate of 0.025. Single-Look Design- Unpooled Estimate of Variance First consider a study with only one look and equal sample sizes in the two groups. Select the input parameters as displayed below. We will use the test statistic (23.14) with the unpooled estimate of the variance. Click the Compute button. The design Des1 is shown as a row in the Output Preview, located in the lower pane of this window. This single-look design requires a combined 434 23.2 Ratio of Proportions – 23.2.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 total of 1328 subjects from both treatments in order to attain 80% power. You can select this design by clicking anywhere on the row in the Output Preview. If you click , some of the design details will be displayed in the upper pane. In the Output Preview toolbar, click wbk1 in the Library. icon to save this design to Workbook Three-Look Design - Unpooled Estimate of Variance For the above study, suppose we wish to take up to two equally spaced interim looks and one final look at the accruing data, using the Lan-DeMets (O’Brien-Fleming) stopping boundary. Create a new design by selecting Des1 in the Library, and clicking the icon. In the input, change the Number of Looks from 1 to 3, to generate a study with two interim looks and a final analysis. A new tab for Boundary will appear. Click this tab to reveal the stopping boundary parameters. By default, the Lan-DeMets (O’Brien-Fleming) 23.2 Ratio of Proportions – 23.2.1 Trial Design 435 <<< Contents 23 * Index >>> Binomial Superiority Two-Sample stopping boundary and equal spacing of looks are selected. Click Compute to create design Des2. The results of Des2 are shown in the Output Preview window. With Des2 selected in the Output Preview, click icon. In the Library, select the nodes for both Des1 and Des2, by holding the Ctrl key, and then click the side-by-side: 436 icon. The upper pane will display the details of the two designs 23.2 Ratio of Proportions – 23.2.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Although, the maximum sample size has increased from 1328 to 1339, using three planned looks may result in a smaller sample size than that required for the single-look design, with an expected sample size of 1168 subjects under the alternative hypothesis (πc = 0.3, ρ = 0.75), and still ensures that the power is 80%. Additional information can also be obtained from Des2. The Lan-DeMets spending function corresponding to the O’Brien-Fleming boundary can be viewed by selecting Des2 in the Library, clicking on the icon, and selecting Stopping Boundaries. The following chart will appear: The alpha-spending function can be viewed by selecting Des2 in the Library, clicking 23.2 Ratio of Proportions – 23.2.1 Trial Design 437 <<< Contents 23 * Index >>> Binomial Superiority Two-Sample on the icon, and selecting Error Spending. In order to see the stopping probabilities, as well as other characteristics, select Des2 in the Library, and click the icon. The cumulative boundary stopping probabilities are shown in the Stopping Boundaries table. Close this window before continuing. Three-Look Design - Pooled Estimate of Variance We now consider this design using the statistic (23.15) with the pooled estimate of the variance. Create a new icon. Under the Test design by selecting Des2 in the Library, and clicking the Parameters tab, select the radio button for Pooled Estimate in the Variance of 438 23.2 Ratio of Proportions – 23.2.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Standardized Test Statistic box. Leave everything else unchanged. Click the Compute button to generate the output for Des3. Save Des3 by selecting it in the Output Preview and clicking the icon. In the Library, select the nodes for Des1, Des2, and Des3, by holding the Ctrl Key, and then click the upper pane will display the details of the three designs side-by-side: icon. The For this problem, the test statistic (23.14) with the unpooled estimate of the variance requires a smaller sample size than the test statistic (23.15) with the pooled estimate of the variance. Close this window before continuing. 23.2.2 Trial Simulation Suppose we want to see the impact of πt on the behavior of the test statistic (23.14) with the unpooled estimate of the variance. First we consider πt = 0.225 as specified by the alternative hypothesis. With Des2 selected in the Library, click the icon. Click on the Simulate button. The results of the simulation will appear under Sim1 in the Output Preview. Select Sim1 in the Output Preview and click icon to save it to Wbk1. Double-click on Sim1 in the Library to display the results of the simulation. Although the actual values may differ, we see that the power is 23.2 Ratio of Proportions – 23.2.2 Trial Simulation 439 <<< Contents 23 * Index >>> Binomial Superiority Two-Sample approximately 80% and the probability of stopping early is about 0.37. Now we consider πt = 0.25, which will provide us with the impact if we were too optimistic about the treatment effect. Select Sim1 in the Library and click the icon. Under the Response Generation tab, enter the value of 0.25 next to Prop. Under Treatment (πt1 ). Click Simulate button. Although the actual values may 440 23.2 Ratio of Proportions – 23.2.2 Trial Simulation <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 differ, we see that the power is approximately 41%. 23.2.3 Interim Monitoring Consider interim monitoring of Des2. Select Des2 in the Library, and click the icon from the Library toolbar. The interim monitoring dashboard contains various controls for monitoring the trial, and is divided into two sections. The top section contains several columns for displaying output values based on the interim inputs. The bottom section contains four charts, each with a corresponding table to its right. Suppose that the results are to be analyzed after results are available for every 450 icon in the upper left to invoke the Test subjects. Click on Statistics Calculator. Select the radio-button to enter δ̂ and its standard error. Enter 450 in the box next to Cumulative Sample Size. Suppose that after the data were available for first 450 subjects, 230 subjects were randomized to the control arm (c) and 220 subjects were randomized to the treatment arm (t). Of the 230 subjects in the control arm, there were 65 events; of the 220 subjects in the treatment arm, there were 45 events. In the box next to Estimate of δ enter: ln((45/220)/(65/230)) and then hit Enter. EAST will compute the estimate of δ. Enter 0.169451 in the box next to Std. 23.2 Ratio of Proportions – 23.2.3 Interim Monitoring 441 <<< Contents 23 * Index >>> Binomial Superiority Two-Sample Error of δ. Next click Recalc. You should now see the following: Next, click OK. The following table will appear in the top section of the IM Dashboard. Note - Click on icon to hide or unhide the columns of your interest. RCI for δ. Keeping all the four boxes checked can display RCI on both the scales. The boundary was not crossed as the value of the test statistic Test Statistic is -1.911, which is within the boundaries (-4.153, 4.153), so the trial continues. After data were available for an additional 450 subjects, the second analysis is performed. Suppose that among the 900 subjects, 448 were randomized to control (c) and 452 were randomized to (t). Of the 448 subjects in the control arm, there were 132 events; of the 452 subjects in the treatment arm, there were 90 events. Click on the second row in the table in the upper section. Then click 442 23.2 Ratio of Proportions – 23.2.3 Interim Monitoring <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 icon. Enter 900 box next to Sample Size (Overall). Then in the box next to Estimate of δ enter: ln((90/452)/(132/448)). Next hit Enter, then enter 0.119341 in the box next to Std. Error of δ. Click Recalc then OK. The value of the test statistic is -3.284, which is less than -2.833, the value of the lower boundary, so the following dialog box appears. Click on Stop to stop any further analyses. The Final Inference Table shows that the adjusted point estimate of ln(ρ) is -0.392 (p = 0.001) and the final adjusted 97.5% confidence interval for ln(ρ) is (-0.659, -0.124). 23.2 Ratio of Proportions – 23.2.3 Interim Monitoring 443 <<< Contents 23 23.3 * Index >>> Binomial Superiority Two-Sample Odds Ratio of Proportions Let πt and πc denote the two binomial probabilities associated with the treatment and the control, respectively. Furthermore, let the odds ratio be 23.3.1 Trial Design 23.3.2 Trial Simulation 23.3.3 Interim Monitoring ψ= πt (1 − πc ) πt /(1 − πt ) = . πc /(1 − πc ) πc (1 − πt ) (23.16) We are interested in testing H0 : ψ = 1 against the two-sided alternative H1 : ψ 6= 1 or against a one-sided alternative H1 : ψ < 1 or H1 : ψ > 1. It is convenient to express this hypothesis testing problem in terms of the (natural) logarithm of ψ. Let π̂tj and π̂cj denote the estimates of πt and πc based on ntj and ncj observations from the treatment and the control, respectively, up to and including the j th look, j = 1, . . . , K, where a maximum of K looks are to be made. The difference between treatments at the j th look is assessed using δ̂j = ln(π̂tj /(1 − π̂tj )) ln(π̂cj /(1 − π̂cj )). (23.17) Using the asymptotic approximation presented in section 23.2, the estimate of the standard error of δˆj at the j th look is se ˆ j = {1/ntj π̂tj (1 − π̂tj ) + 1/ncj π̂cj (1 − π̂cj )}1/2 , (23.18) and the test statistic at the j-th look is the ratio of δˆj , given by (23.17), and the estimate of the standard error of δj , given by (23.18), namely, Zj = 23.3.1 ln(π̂tj /(1 − π̂tj )) − ln(π̂cj /(1 − π̂cj )) . {1/ntj π̂tj (1 − π̂tj ) + 1/ncj π̂cj (1 − π̂cj )}1/2 (23.19) Trial Design Suppose that the response rate for the control treatment is 10% and we hope that the experimental treatment can triple the odds ratio; that is, we desire to increase the response rate to 25%. Although we hope to increase the odds ratio, we solve this problem using a two-sided testing formulation. The null hypothesis H0 : ψ = 1 is tested against the two-sided alternative H1 : ψ 6= 1. The power of the test is computed at specified values of πc and ψ. Note that the power of the test depends on πc and ψ, or equivalently πc and πt , not just the odds ratio. Thus, different values of πc with the same odds ratio will have different solutions. First, click Design tab, then click Two Samplesin the Discrete group, and then click 444 23.3 Binomial Odds Ratio – 23.3.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Odds Ratio of Proportions. Suppose we want to determine the sample size required to have power of 80% when πc = 0.1 and ψ1 = 3 using a two-sided test with a type-1 error rate of 0.05. Single-Look Design First consider a study with only one look and equal sample sizes in the two groups. Enter the appropriate design parameters so that the dialog box appears as shown. Then click Compute. The design Des1 is shown as a row in the Output Preview, located in the lower pane of this window. This single-look design requires a combined total of 214 subjects from 23.3 Binomial Odds Ratio – 23.3.1 Trial Design 445 <<< Contents 23 * Index >>> Binomial Superiority Two-Sample both treatments in order to attain 80% power. You can select this design by clicking anywhere on the row in the Output Preview. If you click icon, some of the design details will be displayed in the upper pane. In the Output Preview toolbar, click the Library. icon to save this design to Wbk1 in the Three-Look Design For the above study, suppose we wish to take up to two equally spaced interim looks and one final look at the accruing data, using the Lan- DeMets (O’Brien-Fleming) stopping boundary. Create a new design by selecting Des1 in the Library, and clicking icon. In the input, change the Number of Looks from 1 to 3, to generate a study with two interim looks and a final analysis. A new tab for Boundary will appear. Click this tab to reveal the stopping boundary parameters. By default, the Lan-DeMets (O’Brien-Fleming) stopping boundary and equal spacing of 446 23.3 Binomial Odds Ratio – 23.3.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 looks are selected. Click Compute button to design Des2. The results of Des2 are shown in the Output Preview window. With Des2 selected in the Output Preview, click the icon. In the Library, select the rows for both Des1 and Des2, by holding the Ctrl key, and then click the icon. The upper pane will display the details of the two designs 23.3 Binomial Odds Ratio – 23.3.1 Trial Design 447 <<< Contents 23 * Index >>> Binomial Superiority Two-Sample side-by-side: Using three planned looks may result in a smaller sample size than that required for the single-look design, with an expected sample size of 186 subjects under the alternative hypothesis (πc = 0.1, ψ = 3), and still ensures the power is 80%. Additional information can also be obtained from Des2. The Lan-DeMets spending function corresponding to the O’Brien-Fleming boundary can be viewed by selecting Des2 in the Library, clicking on the icon, and selecting Stopping Boundaries. 448 23.3 Binomial Odds Ratio – 23.3.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 The following chart will appear: The alpha-spending function can be viewed by selecting Des2 in the Library, clicking 23.3 Binomial Odds Ratio – 23.3.1 Trial Design 449 <<< Contents 23 * Index >>> Binomial Superiority Two-Sample on the icon, and selecting Error Spending. In order to see the stopping probabilities, as well as other characteristics, select Des2 in icon. The cumulative boundary stopping the Library, and click the probabilities are shown in the Stopping Boundaries table. East displays the stopping boundary, the type-1 error spent and the boundary crossing probabilities under H0 : πc = 0.1, ψ = 1 and the alternative hypothesis H1 : πc = 0.1, ψ = 3. Close this window before continuing. 450 23.3 Binomial Odds Ratio – 23.3.2 Trial Simulation <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 23.3.2 Trial Simulation Suppose we want to see the impact of πt on the behavior of the test statistic (23.19). First we consider πt = 0.25 as specified by the alternative hypothesis. With Des2 selected in the Library, click icon. Next, click Simulate button. The results of the simulation will appear under Sim1 in the Output Preview. Highlight Sim1 in the Output Preview and click icon to save it to workbook Wbk1. Double-click on Sim1 in the Library to display the results of the simulation. Although your results may differ slightly, we see that the power is approximately 83% and the probability of stopping early is about 0.39. Now we consider πt = 0.225, which will provide us with the impact if we were too icon. optimistic about the treatment effect. Select Sim1 in the Library and click Under the Response Generation tab, enter the value of 0.225 next to Prop. Under Treatment (πt ). Click Simulate. Although, the actual values may differ, we see that the power is approximately 68% and the probability of stopping early is about 0.26. 23.3.3 Interim Monitoring Consider interim monitoring of Des2. Select Des2 in the Library, and click 23.3 Binomial Odds Ratio – 23.3.3 Interim Monitoring 451 <<< Contents 23 * Index >>> Binomial Superiority Two-Sample icon from the Library toolbar. The interim monitoring dashboard contains various controls for monitoring the trial, and is divided into two sections. The top section contains several columns for displaying output values based on the interim inputs. The bottom section contains four charts, each with a corresponding table to its right. Suppose that the results are to be analyzed after results are available for every 70 subjects. Click on icon in the upper left to invoke the Test Statistics Calculator. Select the second radio button on the calculator to enter values of δ̂ and its standard error. Before that enter 70 in the box next to Cumulative Sample Size. Suppose, after the data were available for first 70 subjects, 35 subjects were randomized to the control arm (c), of whom 5 experienced a response, and 35 subjects were randomized to the treatment arm (t), of whom 9 subjects experienced a response. In the box next to Estimate of δ enter 0.730888 and in the box next to Std. Error of δ enter 0.618794. Next click Recalc. You should now see the following: 452 23.3 Binomial Odds Ratio – 23.3.3 Interim Monitoring <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Click OK and the following entry will appear in the top section of the IM Dashboard. Note - Click on icon to hide or unhide the columns of your interest. The boundary was not crossed, as value of the test statistic (1.181) is within the boundaries (−3.777, 3.777), so the trial continues. After data were available for an additional 70 subjects, the second analysis was performed. Suppose that among the 140 subjects, 71 were randomized to c and 69 were randomized to t. Click on the second row in the table in the upper section. Then click icon. Enter 140 in the box next to Cumulative Sample Size. Then in the box next to Estimate of δ enter: 1.067841 and in the box next to Std. Error of δ enter: 0.414083. Next, click on Recalc then OK. The test statistic 2.579 exceeds the upper boundary (2.56), so the following screen appears. Click Stop to halt any further analyses. The Final Inference Table shows that the adjusted point estimate of ln(ψ) is 1.068 (p = 0.01) and the adjusted 95% confidence 23.3 Binomial Odds Ratio – 23.3.3 Interim Monitoring 453 <<< Contents 23 * Index >>> Binomial Superiority Two-Sample interval for ln(ψ) is (0.256, 1.879). 23.4 Common Odds Ratio of Stratified Tables 23.4.1 Trial Design 23.4.2 Interim Monitoring Some experiments are performed with several disjoint groups (strata) within each treatment group. For example, multicenter clinical trials are conducted using several investigator sites. Other situations include descriptive subsets, such as baseline and demographic characteristics. Let πtg and πcg denote the two binomial probabilities in Group g, g = 1, . . . , G, for the treatment and control, respectively. It is assumed that the odds ratio πtg /(1πtg ) πtg (1πcg ) ψ= = (23.20) πcg /(1πcg ) πcg (1πtg ) is the same for each group (stratum). The Cochran-Mantel-Haensel test is used for testing H0 : ψ = 1 against the two-sided alternative H1 : ψ 6= 1 or against a one-sided alternative H1 : ψ > 1 or H1 : ψ < 1. Let π̂tjg and π̂cjg denote the estimates of πt and πc based on ntjg and ncjg observations in Group g from the treatment (t) and the control (c), respectively, up to and including the j th look, j = 1, . . . K, where a maximum of K looks are to be taken. Then the estimate of δ = ln(ψ) from the g-th group at the j-th look is δ̂jg = ln( π̂tjg π̂cjg ) ln( ). 1π̂tjg 1π̂cjg Then the estimate of δ = ln(ψ) at the j-th look is the average of δ̂jg , g = 1, . . . , G; 454 23.4 Common Odds Ratio <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 namely, G P δ̂j = δ̂jg g=1 G or, equivalently, G P δˆj = g=1 π̂ π̂ ) ln( 1π̂cjg )) (ln( 1π̂tjg tjg cjg G . (23.21) The estimate of the standard error of δ̂jg at the j th look is se ˆ jg = { 1 1 + }1/2 . ntjg π̂tjg (1 − π̂tjg ) ncjg π̂cjg (1π̂cjg ) The estimated variance of δ̂ at the j-th look is the average of the variances of δ̂jg , g = 1, . . . , G. Thus, G P se ˆ 2jg g=1 }1/2 . se ˆj ={ G The test statistic used at the j-th look is Zj = 23.4.1 δ̂j . se ˆj (23.22) (23.23) Trial Design First consider a simple example with two strata, such as males and females, with an equal number of subjects in each stratum and the same response rate of 60% for the control in each stratum. We hope that the experimental treatment can triple the odds ratio. Although we hope to increase the odds ratio, we solve this problem using a two-sided testing formulation. The null hypothesis H0 : ψ = 1 is tested against the two-sided alternative H1 : ψ 6= 1. The power of the test is computed at specified values of πcg , g = 1, . . . , G, and ψ. To begin, click Design tab, then click Two Samples in the Discrete group, and then 23.4 Common Odds Ratio – 23.4.1 Trial Design 455 <<< Contents 23 * Index >>> Binomial Superiority Two-Sample click Common Odds Ratio for Stratified 2 x 2 Tables. Suppose that we want to determine the sample size required to have power of 80% when πc1 = πc2 = 0.6 and ψ = 3 using a two-sided test with a type-1 error rate of 0.05. Single-Look Design - Equal Response Rates First consider a study with only one look and equal sample sizes in the two groups. Enter the appropriate test parameters so that the dialog box appears as shown. Then click Compute. The design is shown as a row in the Output Preview, located in the lower pane of this window. This single-look design requires a combined total of 142 subjects from both treatments in order to attain 80% power. 456 23.4 Common Odds Ratio – 23.4.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 You can select this design by clicking anywhere on the row in the Output Preview. If you click icon, some of the design details will be displayed in the upper pane. In the Output Preview toolbar, click Wbk1 in the Library. icon to save this design to workbook Single-Look Design - Unequal Response Rates Now, we consider a more realistic clinical trial. Suppose that males and females respond differently, so that the response rate for males is πc1 = 0.6 and the response rate for females is πc2 = 0.3. First, we consider a study without any interim analyses. Create a new design by selecting Des1 in the Library, and clicking the Change πc2 in the Stratum Specific Input table to 0.3 as shown below. icon. Click Compute to create design Des2. The results of Des2 are shown in the Output icon. In the Preview window. With Des2 selected in the Output Preview, click Library, select the rows for both Des1 and Des2, by holding the Ctrl key, and then 23.4 Common Odds Ratio – 23.4.1 Trial Design 457 <<< Contents 23 * Index >>> Binomial Superiority Two-Sample click the side-by-side: icon. The upper pane will display the details of the two designs This single-look design requires a combined total of 127 subjects from both treatments in order to attain 80% power. Three-Look Design - Unequal Response Rates For the above study, suppose we wish to take up to two equally spaced interim looks and one final look at the accruing data, using the Lan- DeMets (O’Brien-Fleming) stopping boundary. Create a new design by selecting Des2 in the Library, and clicking the icon. In the input, change the Number of Looks from 1 to 3, to generate a study with two interim looks and a final analysis. A new tab for Boundary will appear. Click this tab to reveal the stopping boundary parameters. By default, the Lan-DeMets (O’Brien-Fleming) stopping boundary and equal spacing of looks are selected. 458 23.4 Common Odds Ratio – 23.4.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Click the Compute button to generate output for Des3. The results of Des3 are shown in the Output Preview window. With Des3 selected in the Output Preview, click icon. In the Library, select the nodes for Des1, Des2, and Des3 by holding the Ctrl Key, and then click the designs side-by-side: icon. The upper pane will display the details of the three Using three planned looks requires an up-front commitment of 129 subjects, a slight increase over the single-look design, which required 127 subjects. However, the three look design may result in a smaller sample size than that required for the single look design, with an expected sample size of 111 subjects under the alternative hypothesis (πc1 = 0.6, πc2 = 0.3, ψ = 3), and still ensures that the power is 80%. icon, East displays the By selecting only Des3 in the Library and clicking stopping boundary, the type-1 error spent and the boundary crossing probabilities under H0 : πc1 = 0.6, πc2 = 0.3, ψ = 1 and the alternative hypothesis 23.4 Common Odds Ratio – 23.4.1 Trial Design 459 <<< Contents 23 * Index >>> Binomial Superiority Two-Sample H1 : πc1 = 0.6, πc2 = 0.3, ψ = 3. Close this window before continuing. Three-Look Design - Unequal Response Rates - Unequal Strata Sizes Some disorders have different prevalence rates across various strata. Consider the above example, but with the expectation that 30% of the subjects will be males and 70% of the subjects will be females. Create a new design by selecting Des3 in the Library, and clicking the icon. Under the Test Parameters tab in the Stratum Specific Input box select the radio button Unequal. You can now edit the Stratum Fraction column for Stratum 1. Change this value from 0.5 to 0.3 as shown below. Click the Compute button to generate output for Des4. The results of Des4 are shown in the Output Preview window. With Des4 selected in the Output Preview, click the icon. In the Library, select the rows for Des1, Des2, Des3, and Des4 by holding the Ctrl key, and then click 460 icon. The upper pane will display the details of the 23.4 Common Odds Ratio – 23.4.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 four designs side-by-side: Note that, for this example, unequal sample sizes for the two strata result in a smaller total sample size than that required for equal sample sizes for the two strata. 23.4.2 Interim Monitoring Consider interim monitoring of Des4. Select Des4 in the Library, and click the icon from the Library toolbar. The interim monitoring dashboard contains various controls for monitoring the trial, and is divided into two sections. The top section contains several columns for displaying output values based on the interim inputs. The bottom section contains four charts, each with a corresponding table to its right. Suppose that the results are to be analyzed after results are available for every 40 icon in the upper left to invoke the Test subjects. Click on the Statistics Calculator. Enter 40 in the box next to Cumulative Sample Size. Suppose that δ̂1 = 0.58 and se ˆ 1 = 0.23. Enter these values and click on Recalc. You should 23.4 Common Odds Ratio – 23.4.2 Interim Monitoring 461 <<< Contents 23 * Index >>> Binomial Superiority Two-Sample now see the following: Click OK and the following table will appear in the top section of the IM Dashboard. The boundary was not crossed, as value of the test statistic (2.522) is within the boundaries (-3.777, 3.777), so the trial continues. Click on the second row in the table in the upper section. Then click the icon. Enter 80 in the box next to Cumulative Sample Size. Suppose that δ̂2 = 0.60 and se ˆ 2 = 0.21. Enter these 462 23.4 Common Odds Ratio – 23.4.2 Interim Monitoring <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 values and click Recalc. You should now see the following: Click the OK button. The test statistic 2.857 exceeds the upper boundary (2.56), so the following dialog box appears. Click Stop to stop any further analyses. The Final Inference Table shows the adjusted point estimate of ln(ψ) is 0.600 (p = 0.004) and the adjusted 95% confidence interval 23.4 Common Odds Ratio – 23.4.2 Interim Monitoring 463 <<< Contents 23 * Index >>> Binomial Superiority Two-Sample for ln(ψ) is (0.188, 1.011). 23.5 Fisher’s Exact Test (Single Look) 23.5.1 Trial Design In some experimental situations, the normal approximation to the binomial distribution may not be appropriate, such as the probabilities of interest are large or small. This may lead to incorrect p-values, and thus the incorrect conclusion. For this reason, Fisher’s exact test may be used. Let πt and πc denote the two response probabilities for the treatment and the control, respectively. Interest lies in testing H0 : πt = πc against the two-sided alternative H1 : πt 6= πc . Results are presented here only for the situation where there is a single analysis; that is, no interim analysis, for the two-sided test with equal sample sizes for the two treatments. Let π̂t and π̂c denote the estimates of πt and πc , respectively, based on nt = nc = 0.5N observations from the treatment (t) and the control (c). The parameter of interest is δ = πt − πc , which is estimated by δ̂ = π̂t − π̂c . The estimate of the standard error used in the proposed test statistic uses of the pooled estimate of the common value of πt and πc under H0 , given by se ˆ = 2{π̂(1 − π̂)}1/2 , N (23.24) where π̂ = 0.5(π̂t + π̂c ). Incorporating a continuity correction factor, the test statistic is Z= 23.5.1 |δ̂|2/N . se ˆ (23.25) Trial Design Consider the example where the probability of a response for the control is 5% and it is 464 23.5 Fisher’s Exact Test – 23.5.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 hoped that the experimental treatment can increase this rate to 25%. First, in the Discrete area, click Two Samples on the Design tab, and then click Fisher Exact Test. Suppose we want to determine the sample size required to have power of 90% when πc = 0.05 and πt = 0.25 using a two-sided test with a type-1 error rate of 0.05. Enter the appropriate test parameters so that the dialog box appears as shown. Then click Compute. The design is shown as a row in the Output Preview, located in the lower pane of this window. This single-look design requires a combined total of 136 subjects from both treatments in order to attain 90% power. 23.5 Fisher’s Exact Test – 23.5.1 Trial Design 465 <<< Contents 23 * Index >>> Binomial Superiority Two-Sample You can select this design by clicking anywhere along the row in the Output Preview. If you click icon, some of the design details will be displayed in the upper pane. In the Output Preview toolbar, click the in the Library. icon to save this design to Workbook1 Suppose that this sample size is larger than economically feasible and it is desired to evaluate the power when a total of 100 subjects are enrolled. Create a new design by selecting Des1 in the Library, and clicking the icon. In the input, select the radio button in the box next to Power. The box next to Power will now say Computed, since we wish to compute power. In the box next to Sample Size (n) enter 100. Click Compute to create design Des2. The results of Des2 are shown in the Output Preview window. With Des2 selected in the Output Preview, click the icon. In the Library, select the rows for both Des1 and Des2, by holding the Ctrl key, and then click the 466 icon. The upper pane will display the details of the two designs 23.5 Fisher’s Exact Test – 23.5.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 side-by-side: Des2 yields a power of approximately 75% as shown. Noting that 100 subjects is economically feasible and yields reasonable power, the question arises as to the sample size required to have 80%, which might still be economically feasible. This can be accomplished by selecting Des1 in the Library, and clicking the icon. In the input, change the Power from 0.9 to 0.8. Click Compute to generate the output for Des3. The results of Des3 are shown in the Output Preview window. With Des3 selected in the Output Preview, click the icon. In the Library, select the rows for both Des1, Des2, and Des3 by holding the Ctrl key, and then click the The upper pane will display the details of the three designs side-by-side: 23.5 Fisher’s Exact Test – 23.5.1 Trial Design icon. 467 <<< Contents 23 * Index >>> Binomial Superiority Two-Sample Entering 0.8 for the power yields a required sample size of 110 subjects. 23.6 Assurance (Probability of Success) Assurance, or probability of success, is a Bayesian version of power, which corresponds to the (unconditional) probability that the trial will yield a statistically significant result. Specifically, it is the prior expectation of the power, averaged over a prior distribution for the unknown treatment effect (see O’Hagan et al., 2005). For a given design, East allows you to specify a prior distribution, for which the assurance or probability of success will be computed. First, enter the following values in the Input window: A 3-look design for testing the difference in proportions of two distinct populations with Lan-DeMets(OF) efficacy only boundary, Superiority Trial, 1-sided test, 0.025 type-1 error, 80% power, πc = 0.15, and πt = 0.1. Select the Assurance checkbox in the Input window. The following options will appear as below. To address our uncertainty about the treatment proportion, we specify a prior distribution for πt . In the Distribution list, click Beta, and in the Input Method list, click Beta Parameters (a and b). Enter the values of a = 11 and b = 91. Recall that a−1 the mode of the Beta distribution is a+b−2 . Thus, these parameter values generate a Beta distribution that is peaked at 0.1, which matches the assumed treatment 468 23.6 Assurance (Probability of Success) <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 proportion. Click Compute. The computed probability of success (0.597) is shown above. Note that for this prior, assurance is very less than the specified power (0.8); incorporating the uncertainty about πt has yielded a much less optimistic estimate of power. Save this design in the Library and rename it as Bayes1. 23.6 Assurance (Probability of Success) 469 <<< Contents 23 * Index >>> Binomial Superiority Two-Sample East also allows you to specify an arbitrary prior distribution through a CSV file. In the Distribution list, click User Specified, and then click Browse... to select the CSV file where you have constructed a prior. If you are specifying a prior for one parameter only (either πc or πt , but not both), then the CSV file should contain two columns, where the first column lists the grid points 470 23.6 Assurance (Probability of Success) <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 for the parameter of interest, and the second column lists the prior probability assigned to each grid point. If you are specifying priors for both πc and πt , the CSV file should contain four columns (from left to right): values of πc , probabilities for πc , values of πt , and probabilities for πt . The number of points for πc and number of points for πt may differ. For example, we consider a 5-point prior for πt only, with probability = 0.2 at each point. Once the CSV filename and path has been specified, click Compute to calculate the assurance, which will be displayed in the box below: As stated in O’Hagan et al. (2005, p.190): “Assurance is a key input to decision-making during drug development and provides a reality check on other methods of trial design.” Indeed, it is not uncommon for assurance to be much lower than the specified power. The interested reader is encouraged to refer to O’Hagan et al. for further applications and discussions on this important concept. 23.7 Predictive Power and Bayesian Predictive Power Similar Bayesian ideas can be applied to conditional power for interim monitoring. Rather than calculating conditional power for a single assumed value of the treatment effect, δ, such as at δ̂, we may account for the uncertainty about δ by taking a weighted average of conditional powers, weighted by the posterior distribution for δ. East calculates an average power, called the predictive power (Lan, Hu, & Proschan, 2009), assuming a diffuse prior for the drift parameter, η. In addition, if the user specified a beta prior distribution at the design stage to calculate assurance, then East will also calculate the average power, called Bayesian predictive power, for the corresponding posterior. We will demonstrate these calculations for the design renamed as Bayes1 earlier. 23.7 Predictive Power and Bayesian Predictive Power 471 <<< Contents 23 * Index >>> Binomial Superiority Two-Sample In the Library, right-click Bayes1 and click Interim Monitoring, then click the toolbar of the IM Dashboard. in In the Show/Hide Columns window, make sure to show the columns for: CP (Conditional Power), Predictive Power, Bayes Predictive Power, Posterior Distribution of πt a, and Posterior Distribution of πt b, and click OK. The following columns will be added to the main grid of the IM Dashboard. In the toolbar of the IM Dashboard, open the Test Statistic Calculator by clicking . In order to appropriately update the posterior distribution, you will need to use the Test Statistic Calculator to enter the sample size and number of responses for each arm. Enter 34 events out of 230 patients in the control arm, and 23 472 23.7 Predictive Power and Bayesian Predictive Power <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 out of 231 patients in the treatment arm, then click OK. The main grid of the IM Dashboard will be updated as follows. In particular, notice the differing values for CP and the Bayesian measures of power. 23.7 Predictive Power and Bayesian Predictive Power 473 <<< Contents * Index >>> 24 Binomial Non-Inferiority Two-Sample In a binomial non-inferiority trial the goal is to establish that the response rate of an experimental treatment is no worse than that of an active control, rather than attempting to establish that it is superior. A therapy that is demonstrated to be non-inferior to the current standard therapy for a particular indication might be an acceptable alternative if, for instance, it is easier to administer, cheaper, or less toxic. Non-inferiority trials are designed by specifying a non-inferiority margin. The amount by which the response rate on the experimental arm is worse than the response rate on the control arm must fall within this margin in order for the claim of non-inferiority to be sustained. In this chapter, we shall design and monitor non-inferiority trials in which the non-inferiority margin is expressed as either a difference, a ratio, or an odds ratio of two binomial proportions. The difference is examined in Section 24.1. This is followed by two formulations for the ratio: the Wald formulation in Section 24.2 and the Farrington-Manning formulation in Section 24.3. The odds ratio formulation is presented in Section 24.4. 24.1 Difference of Proportions 24.1.1 Trial Design 24.1.2 Trial Simulation 24.1.3 Interim Monitoring Let πc and πt denote the response rates for the control and experimental treatments, respectively. Let δ = πt − πc . The null hypothesis is specified as H0 : δ = δ0 and is tested against one-sided alternative hypotheses. If the occurrence of a response denotes patient benefit rather than harm, then δ0 > 0 and the alternative hypothesis is H1 : δ < δ0 or equivalently as H1 : πt > πc − δ0 . Conversely, if the occurrence of a response denotes patient harm rather than benefit, then δ0 < 0 and the alternative hypothesis is H1 : δ > δ 0 or equivalently as H1 : πt < πc − δ0 . For any given πc , the sample size is determined by the desired power at a specified value of δ = δ1 . A common choice is δ1 = 0 (or equivalently πt = πc ) but East permits you to power the study at any value of δ1 which is consistent with the choice of H1 . 474 24.1 Difference of Proportions <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Let π̂tj and π̂cj denote the estimates of πt and πc based on ntj and ncj observations from the experimental and control treatments, respectively, up to and including j-th look, j = 1, . . . K, where a maximum of K looks are to be made. The test statistic at the j-th look is δ̂j − δ0 (24.1) Zj = se(δ̂j ) where δ̂j = π̂cj − π̂tj and s se(δ̂j ) = 24.1.1 π̂cj (1 − π̂cj ) π̂tj (1 − π̂tj ) + . ncj ntj (24.2) (24.3) Trial Design The 24-week disease-free rate with a standard therapy for HIV is 80%. Suppose that the claim of non-inferiority for an experimental therapy can be sustained if its response rate is greater than 75%; i.e., the non-inferiority margin is δ0 = 0.05. For studies of this type, we specify inferiority as the null hypothesis, non-inferiority as the alternative hypothesis, and attempt to reject the null hypothesis using a one-sided test. We will specify to East that, under the null hypothesis H0 , πc = 0.8 and πt = 0.75. We will test this hypothesis with a one-sided level 0.05 test. Suppose we require 90% power at the alternative hypothesis, H1 , that both response rates are equal to the null response rate of the control arm, i.e. δ1 = 0. Thus, under H1 , we have πc = πt = 0.8. To begin click Two Samples on the Design tab in the Discrete group, and then click Difference of Proportions. inxxnon-inferiority,binomial Single-Look Design Powered at δ = 0 To begin with, suppose we will design a single-look study for rejection of H0 only, with 90% power at a 0.025 significance level. Enter the relevant parameters into the dialog box as shown below. In the drop 24.1 Difference of Proportions – 24.1.1 Trial Design 475 <<< Contents 24 * Index >>> Binomial Non-Inferiority Two-Sample down box next to Trial be sure to select Noninferiority. Now click Compute. The design is shown as a row in the Output Preview located in the lower pane of this window. The single-look design requires a combined total of 2690 patients on both arms in order to attain 90% power. We can, however, reduce the expected sample size without any loss of power if we use a group sequential design. This is considered next. Before continuing we will save Design1 to the Library. You can select this design by clicking anywhere along the row in the Output Preview. Some of the design details will be displayed in the upper pane, labeled Compare Designs. In the Output Preview toolbar, click the icon to save this design to Workbook1 in the Library. If you hover the cursor over Design1 in the Library, a tooltip will appear that 476 24.1 Difference of Proportions – 24.1.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 summarizes the input parameters of the design. Three-Look Design Powered at δ = 0 For the above study, suppose we wish to take up to two interim looks and one final look at the accruing data. Create a new design by icon on the Library toolbar. selecting Design1 in the Library, and clicking the Change the Number of Looks from 1 to 3, to generate a study with two interim looks and a final analysis. A new tab Boundary Info will appear. Clicking on this tab will reveal the stopping boundary parameters. By default, the Spacing of Looks is set to Equal, which means that the interim analyses will be equally spaced in terms of the number of patients accrued between looks. The left side contains details for the Efficacy boundary, and the right side contains details for the Futility boundary. By default, there is an efficacy boundary (to reject H0) selected, but no futility boundary (to reject H1). The Boundary Family specified is of the Spending Functions type. The default Spending function is the Lan-DeMets (Lan & DeMets, 1983), with Parameter OF (O’Brien-Fleming), which generates boundaries that are very similar, though not identical, to the classical stopping boundaries of O’Brien and Fleming (1979). Now suppose, in our example, that the three looks are unequally spaced, with the first look being taken after 50% of the committed accrual, and the second look being taken when after 75% of the committed accrual. Under Spacing of Looks in the Boundary Info tab, click the Unequal radio button. The column titled Info. Fraction in the Look Details table can be edited to modify the relative spacing of the analyses. The information fraction refers to the proportion of the maximum (yet unknown) sample size. By default, this table displays equal spacing. Enter the new information fraction values as shown below and click Recalc to see the updated values of the stopping 24.1 Difference of Proportions – 24.1.1 Trial Design 477 <<< Contents 24 * Index >>> Binomial Non-Inferiority Two-Sample boundaries populated in the Look Details table. On the Boundary Info tab, you may also click the 478 24.1 Difference of Proportions – 24.1.1 Trial Design or icons to view plots <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 of the error spending functions, or stopping boundaries, respectively. Click the Compute button to generate output for Design2. With Design2 selected in the Output Preview, click the icon to save Design2 to the Library. In theLibrary, select the rows for Design1 and Design2, by holding the Ctrl key, and then click the icon. The upper pane will display the details of the two designs 24.1 Difference of Proportions – 24.1.1 Trial Design 479 <<< Contents 24 * Index >>> Binomial Non-Inferiority Two-Sample side-by-side: Let us examine the design output from Design2. The maximum number of subjects that we must commit to this study in order to achieve 90% power is 2740. That is 50 patients more than are needed for Design1. However, since Design1 is a single-look design, there is no prospect of saving resources if indeed H1 is true and the two treatments have the same response rates. In contrast, Design2 permits the trial to stop early if the test statistic crosses the stopping boundary. For this reason, the expected sample size under H1 is 2094, a saving of 596 patients relative to Design1. If H0 is true, the expected sample size is 2732 and there is no saving of patient resources. In order to see the stopping probabilities, as well as other characteristics, select Design2 in the Library, and click the 480 icon. The cumulative boundary stopping 24.1 Difference of Proportions – 24.1.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 probabilities are shown in the Stopping Boundaries table. To display a chart of average sample number (ASN) versus the effect size, πt − πc , select Design2 in the Library and click on the icon and select Average Sample Number (ASN). To display a chart of power versus treatment size, select Design2 in the Library and click on the icon and select Power vs. Treatment Effect (δ). In Design2, we utilized Lan-DeMets (Lan & DeMets, 1983) spending function, with Parameter OF (O’Brien-Fleming to generate the stopping boundary for early stopping under H1 . One drawback of Design2 is the large expected sample size if H0 is true. We can guard against this eventuality by introducing a futility boundary which will allow us to stop early if H0 is true. A popular approach to stopping early for futility is to compute the conditional power at each interim monitoring time point and stop the study if this quantity is too low. This approach is somewhat arbitrary since there is no guidance as to what constitutes low conditional power. In East, we compute futility boundaries that protect β, the type-2 error, so that the power of the study will not deteriorate. This is achieved by using a β-spending function to generate the futility boundary. Thereby the type-2 error will not exceed β and the power of the study will be preserved. This approach was published by Pampallona and Tsiatis (1994). Suppose we now wish to include a futility boundary. To design this trial select Design2 icon. In the Boundary Info tab, in the Futility in the Library and click the box, set Boundary Family to Spending Function. Change the Spending Function to Gamma Family and change the Parameter (Γ) to −8. This family is parameterized by the single parameter γ which can take all possible non-zero values. 24.1 Difference of Proportions – 24.1.1 Trial Design 481 <<< Contents 24 * Index >>> Binomial Non-Inferiority Two-Sample Its functional form is β(t) = β(1 − e−γt ) . (1 − e−γ ) (24.4) Next click Refresh Boundary. Your screen should now look like the following: On the Boundary Info tab, you may also click the or icons to view plots of the error spending functions, or stopping boundaries, respectively. Notice how conservative the β-spending function is compared to the α-spending 482 24.1 Difference of Proportions – 24.1.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 function. Its rate of error spending is almost negligible until about 60% of the information has accrued. One can view the stopping boundaries on various alternative scales by selecting the appropriate scale from the drop-down list of boundary scales to the right of the chart. It is instructive to view the stopping boundaries on the p-value scale. 24.1 Difference of Proportions – 24.1.1 Trial Design 483 <<< Contents 24 * Index >>> Binomial Non-Inferiority Two-Sample By moving the vertical scroll bar from left to right in the above chart, one can observe the p-values required for early stopping at each look. The p-values needed to stop the study and declare non-inferiority at the first, second and third looks are, respectively, 0.0015, 0.0092 and 0.022. The p-values needed to stop the study for futility at the first and second looks are, respectively, 0.7244 and 0.2708. Other useful scales for displaying the futility boundary are the conditional power scales. They are the cp delta1 Scaleand the cp deltahat scale. Here ‘cp’ refers to conditional power. The suffix ‘delta1’ implies that we will represent the futility boundary in terms of conditional power evaluated at the value of δ = δ1 specified at the design stage under the alternative hypothesis. The suffix ‘deltahat’ implies that we will represent the futility boundary in terms of conditional power evaluated at the value of δ̂ at which the test statistic Z = δ̂/se(δ̂) would just hit the futility boundary. The screenshot below represents the first two values of the futility boundary on the cp delta1 Scale. For example, the stopping boundary at the first look is cp delta1=0.1137. This is to be interpreted in the following way: if at the first look the value of the test statistic Z just falls on the futility boundary, then the conditional power, as defined by Section C.3 of Appendix C with δ = δ1 = 0, will be 0.1137. This gives us a way to express the futility boundary in terms of conditional power. The cp delta1 Scale might not give one an accurate picture of futility. This is because, on this scale, the conditional power is evaluated at the value of δ = δ1 484 24.1 Difference of Proportions – 24.1.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 specified at the design stage. However, if the test statistic has actually fallen on the futility boundary, the data are more suggestive of the null than the alternative hypothesis and it is not very likely that δ = δ1 . Thus it might be more reasonable to evaluate conditional power at the observed value δ = δ̂. The screenshot below represents the futility boundary on the cp deltahat Scale. For example, the stopping boundary at the second look is cp deltahat=0.0044. This is to be interpreted in the following way: if at the second look, the value of test statistic Z just falls on the futility boundary, then the conditional power, as defined by Section C.3 of Appendix C with δ = δ̂ = Z × se(δ̂), will be 0.0044. It is important to realize that the futility boundary has not changed. It is merely being expressed on a different scale. On the whole, it is probably more realistic to express the futility boundary on the cp deltahat scale than on the cp delta1 scale since it is highly unlikely that the true value of δ is equal to δ1 if Z has hit the futility boundary. Close this chart before continuing. Click the Compute button to generate output for Design3. With Design3 selected in the Output Preview, click the icon. In the Library, select the rows for Design1, Design2, and Design3, by holding the Ctrl key, and then click the icon. The upper pane will display the details of the three 24.1 Difference of Proportions – 24.1.1 Trial Design 485 <<< Contents 24 * Index >>> Binomial Non-Inferiority Two-Sample designs side-by-side: Observe that Design3 will stop with a smaller expected sample size under either H0 or H1 compared to Design2. Three-Look Design Powered at δ 6= 0 The previous designs were all powered to detect the alternative hypothesis that the new treatment and the active control have the same response rate (δ1 = 0). As is usually the case with non-inferiority trials, the distance between the non-inferiority margin δ0 = 0.05 and the alternative hypothesis δ1 = 0 is rather small, thereby resulting in a very large sample size commitment to this trial. Sometimes a new treatment is actually believed to have a superior response rate to the active control. However the anticipated treatment benefit might be too small to make it feasible to run a superiority trial. Suppose, for example, that it is anticipated that the treatment arm could improve upon the 80% response rate of the active control by about 2.5%. A single-look superiority trial designed for 90% power to detect this small of a difference would require over 12000 subjects. In this situation, the sponsor might prefer to settle for a non-inferiority claim. A non-inferiority trial in which the active control has a response probability of πc = 0.8, the non-inferiority margin is δ0 = −0.05, and the alternative hypothesis is δ1 = πc − πt = −0.025 can be designed as follows. Create a new design by selecting Design3 in the Library, and clicking the icon on the Library toolbar. In the ensuing dialog box, choose the design parameters as 486 24.1 Difference of Proportions – 24.1.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 shown below. Click the Compute button to generate output for Design4. Notice that this design requires only 1161 subjects. This is 1585 fewer subjects than under Design3. 24.1.2 Trial Simulation You can simulate Design 3 by selecting Design3 in the Library, and clicking the icon from Library toolbar. Alternatively, right-click on Design3 and select Simulate. 24.1 Difference of Proportions – 24.1.2 Trial Simulation 487 <<< Contents 24 * Index >>> Binomial Non-Inferiority Two-Sample A new Simulation worksheet will appear. Try different choices for the simulation parameters to verify the operating characteristics of the study. For instance under the Response Generation Info tab, set Prop. Under Control to 0.8 and Prop. Under Treatment to 0.75. You will be simulating under the null hypothesis and should achieve a rejection rate of 2.5%. Now, click on the Simulate button. Once the simulation run has completed, East will add an additional row to the Output Preview labeled Simulation1. Select Simulation1 in the Output Preview. Note that some of the design details will be displayed in the upper pane, labeled Compare Designs. Click the 488 icon to save it to the Library. Double-click on Simulation1 24.1 Difference of Proportions – 24.1.2 Trial Simulation <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 in the Library. The simulation output details will be displayed. We see above that we achieved a rejection rate of 2.5%. Now suppose that the new treatment is actually slightly superior to the control treatment. For example, πc = 0.8 and πt = 0.81. Since this study is designed for 90% power when πc = πt = 0.8, we would expect the simulations to reveal power in excess of 90%. Select Sim1 node in the Library, and click the icon from Library toolbar. Under the Response Generation Info tab change the Prop. Under Treatment to 0.81. Click Simulate to start the simulation. Once the simulation run has completed, East will add an additional row to the Output Preview labeled Simulation2. Select icon to save it to the Library. Simulation2 in the Output Preview. Click the Double-click on Simulation2 in the Library. The simulation output details will be 24.1 Difference of Proportions – 24.1.2 Trial Simulation 489 <<< Contents 24 * Index >>> Binomial Non-Inferiority Two-Sample displayed. These results show that the power exceeds 97%. The power of the study will deteriorate if the response rate of the control arm is less than 0.8, even if πc = πt . To see this, let us simulate with πc = πt = 0.7. The results 490 24.1 Difference of Proportions – 24.1.2 Trial Simulation <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 are shown below. Notice that the power has dropped from 90% to 80% even though the new treatment and the control treatment have the same response rates. This is because the lower response rates for πc and πt induce greater variability into the distribution of the test statistic. In order to preserve power, the sample size must be increased. This can be achieved without compromising the type-1 error within the group sequential framework by designing the study for a maximum amount of (Fisher) information instead of a maximum sample size. We discuss maximum information studies later, in Chaper 59. 24.1.3 Interim Monitoring Consider interim monitoring of Design3. Select Design3 in the Library, and click the icon from the Library toolbar. Alternatively, right-click on Design3 and select Interim Monitoring. The interim monitoring dashboard contains various controls for monitoring the trial, and is divided into two sections. The top section contains several columns for displaying output values based on the interim inputs. The bottom section contains four charts, each with a corresponding table to its right. These charts provide graphical and numerical descriptions of the progress of the clinical trial and are useful tools for decision making by a data monitoring committee. Suppose that the trial is first monitored after accruing 500 subjects on each treatment arm, with 395 responses on the treatment arm and 400 responses on the control arm. 24.1 Difference of Proportions – 24.1.3 Interim Monitoring 491 <<< Contents 24 * Index >>> Binomial Non-Inferiority Two-Sample Click on the icon to invoke the Test Statistic Calculator. In the box next to Cumulative Sample Size enter 1000. Enter −0.01 in the box next to Estimate of δ. In the box next to Std. Errof of δ enter 0.02553. Next click Recalc. Note that the test statistic is computed to be 1.567. Upon clicking the OK button, East will produce the interim monitoring report shown below. The stopping boundary for declaring non-inferiority is 3.535 whereas the value of the test statistic is only 1.567. Thus the trial should continue. 492 24.1 Difference of Proportions – 24.1.3 Interim Monitoring <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Suppose that the next interim look occurs after accruing 1250 patients on each arm with 1000 responses on the control arm and 990 responses on the treatment arm. Click on the second row in the table in the upper section. Then click the icon. The estimate of δ is -0.008 and the standard error is 0.016118. Enter the appropriate values as shown below and click Recalc. Note that the value of the test statistic is now 2.606. Now click the OK button. This time the stopping boundary for declaring non-inferiority is crossed. The following 24.1 Difference of Proportions – 24.1.3 Interim Monitoring 493 <<< Contents 24 * Index >>> Binomial Non-Inferiority Two-Sample message box appears. Click the Stop button to stop the study. The analysis results are shown below. The lower bound on the 87.5% repeated confidence interval is -0.042, comfortably within the non-inferiority margin of -0.05 specified at the design stage. East also provides a p-value, confidence interval and median unbiased point estimate for πt − πc using stage-wise ordering of the sample space as described in Jennison and Turnbull (2000, page 179). This is located in the Adjusted Inference Table, located in the lower section of the IM Worksheet. In the present example, the lower confidence 494 24.1 Difference of Proportions – 24.1.3 Interim Monitoring <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 bound is -0.040, slightly greater than the corresponding bound from the repeated confidence interval. 24.2 Ratio of Proportions: Wald Formulation 24.2.1 Trial Design 24.2.2 Trial Simulation 24.2.3 Interim Monitoring Let πc and πt denote the response rates for the control and the experimental treatments, respectively. Let the difference between the two arms be captured by the ratio πt ρ= . πc The null hypothesis is specified as H0 : ρ = ρ0 and is tested against one-sided alternative hypotheses. If the occurrence of a response denotes patient benefit rather than harm, then ρ0 < 1 and the alternative hypothesis is H1 : ρ > ρ0 or equivalently as H1 : πt > ρ0 πc . Conversely, if the occurrence of a response denotes patient harm rather than benefit, then ρ0 > 1 and the alternative hypothesis is H1 : ρ < ρ0 or equivalently as H1 : πt < ρ0 πc . For any given πc , the sample size is determined by the desired power at a specified value of ρ = ρ1 . A common choice is ρ1 = 1 (or equivalently πt = πc ), but East permits you to power the study at any value of ρ1 which is consistent with the choice of H1 . Let π̂tj and π̂cj denote the estimates of πt and πc based on ntj and ncj observations from the experimental and control treatments, respectively, up to and including j-th look, j = 1, . . . , K, where a maximum of K looks are to be made. It is convenient to express the treatment effect on the logarithm scale as δ = ln ρ = ln πt − ln πc . (24.5) The test statistic at the jth look is then defined as Zj = δ̂j − δ0 se(δ̂j ) 24.2 Ratio of Proportions: Wald Formulation (24.6) 495 <<< Contents 24 * Index >>> Binomial Non-Inferiority Two-Sample where δ̂j = ln π̂tj π̂cj , δ0 = ln(ρ0 ) and s se(δ̂j ) = 24.2.1 1 − π̂cj 1 − π̂tj + . ncj π̂cj ntj π̂tj (24.7) (24.8) (24.9) Trial Design The Coronary Artery Revascularization in Diabetes (CARDia) trial (Kapur et. al., 2005) was designed to compare coronary bypass graft surgery (CABG) and percutaneous coronary intervention (PCI) as strategies for revascularization, with the goal of showing that PCI is noninferior to CABG. We use various aspects of that study to exemplify the methodology to test for inferiority. The endpoint is the one-year event rate, where an event is defined as the occurrence of death, nonfatal myocardial infarction, or cerebrovascular accident. Suppose that the event rate for the CABG is πc = 0.125 and that the claim of non-inferiority for PCI can be sustained if one can demonstrate statistically that the ratio ρ = πt /πc is at most 1.3. In other words, PCI is considered to be non-inferior to CABG as long as πt < 0.1625. Thus the null hypothesis H0 : ρ = 1.3 is tested against the one-sided alternative hypothesis H1 : ρ < 1.3. We want to determine the sample size required to have power of 80% when ρ = 1 using a one-sided test with a type-1 error rate of 0.05. Single Look Design Powered at ρ = 1 First we consider a study with only one look and equal sample sizes in the two groups. To begin click Two Proportions on the Design tab under Discrete, and then click Ratio of Proportions. 496 24.2 Ratio of Proportions: Wald Formulation – 24.2.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 In the ensuing dialog box, next to Trial, select Noninferiority from the drop down menu. Choose the remaining design parameters as shown below. Make sure to select the radio button for Wald in the Test Statistic box. We will discuss the Score (Farrington Manning) test statistic in the next section. Now click Compute. The design is shown as a row in the Output Preview located in the lower pane of this window. This single-look design requires a combined total of 2515 subjects from both treatments in order to attain 80% power. You can select this design by clicking anywhere along the row in the Output Preview. Some of the design details will be displayed in the upper pane, labeled Compare Designs. In the Output Preview toolbar, click the icon to save this design to Workbook1 in the Library. If you hover the cursor over Design1 in the Library, a 24.2 Ratio of Proportions: Wald Formulation – 24.2.1 Trial Design 497 <<< Contents 24 * Index >>> Binomial Non-Inferiority Two-Sample tooltip will appear that summarizes the input parameters of the design. Three-Look Design Powered at ρ = 1 For the above study, suppose we wish to take up to two equally spaced interim looks and one final look at the accruing data, using the Lan- DeMets (O’Brien-Fleming) stopping boundary. Create a new design by selecting Design1 in the Library, and clicking the icon on the Library toolbar. Change the Number of Looks from 1 to 3, to generate a study with two interim looks and a final analysis. A new tab Boundary Info will appear. Clicking on this tab will reveal the stopping boundary parameters. By default, the Spacing of Looks is set to Equal, which means that the interim analyses will be equally spaced in terms of the number of patients accrued between looks. The left side contains details for the Efficacy boundary, and the right side contains details for the Futility boundary. By default, there is an efficacy boundary (to reject H0) selected, but no futility boundary (to reject H1). The Boundary Family specified is of the Spending Functions type. The default Spending function is the Lan-DeMets (Lan & DeMets, 1983), with Parameter OF (O’Brien-Fleming), which generates boundaries that are very similar, though not identical, to the classical stopping boundaries of O’Brien and Fleming 498 24.2 Ratio of Proportions: Wald Formulation – 24.2.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 (1979). Technical details of these stopping boundaries are available in Appendix F. Click the Compute button to generate output for Design2. With Design2 selected in the Output Preview, click the icon to save Design2 to the Library. In the Library, select the rows for Design1 and Design2, by holding the Ctrl key, and then click the side-by-side: icon. The upper pane will display the details of the two designs Using three planned looks requires an up-front commitment of 2566 subjects, a slight 24.2 Ratio of Proportions: Wald Formulation – 24.2.1 Trial Design 499 <<< Contents 24 * Index >>> Binomial Non-Inferiority Two-Sample inflation over the single-look design which required 2515 subjects. However, the three-look design may result in a smaller sample size than that required for the single-look design, with an expected sample size of 2134 subjects under the alternative hypothesis (πc = 0.125, ρ = 1), and still ensures that the power is 80%. icon, East By selecting Design2 in the Library and clicking on the click the displays the cumulative accrual, the stopping boundary, the type-1 error spent and the boundary crossing probabilities under the null hypothesis H0 : ρ = 1.3, and the alternative hypothesis H1 : ρ = 1 . Single-Look Design Powered at ρ 6= 1 Sample sizes for non-inferiority trials powered at ρ = 1 are generally rather large, because regulatory requirements usually impose small non-inferiority margins (see, for example, Wang et. al., 2001). Observe that both Design1 and Design2 were powered at ρ = 1 and required sample sizes in excess of 2500 subjects. However, based on Kapur et al (2005), it is reasonable to expect πt < πc . We now consider the same design as in Design1, but we will power at the alternative hypothesis ρ1 = 0.72. That is, we will design this study to have 80% power to claim non-inferiority if πc = 0.125 and πt = 0.72 × 0.125 = 0.09. Create a new design by selecting Design1 in the Library, and clicking the icon on the Library toolbar. In the ensuing dialog box, change the design parameters as 500 24.2 Ratio of Proportions: Wald Formulation – 24.2.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 shown below. Click the Compute button to generate output for Design3. With Design3 selected in the Output Preview, click the icon. In the Library, select the rows for Design1, Design2, and Design3, by holding the Ctrl key, and then click the icon. The upper pane will display the details of the three designs side-by-side: This single-look design requires a combined total of 607 subjects from both treatments in order to attain 80% power. This is a considerable decrease from the 2515 subjects required to attain 80% power using Design1with ρ1 = 1. Three-Look Design Powered at ρ 6= 1 We now consider the impact of multiple looks on Design3. Suppose we wish to take up to two equally spaced interim looks and one final look at the accruing data, using the Lan-DeMets (O’Brien-Fleming) stopping 24.2 Ratio of Proportions: Wald Formulation – 24.2.1 Trial Design 501 <<< Contents 24 * Index >>> Binomial Non-Inferiority Two-Sample boundary. Create a new design by selecting Design3 in the Library, and clicking the icon on the Library toolbar. In the ensuing dialog box, change the Number of Looks to 3. Click the Compute button to generate output for Design4. Using three planned looks inflates the maximum sample size slightly, from 607 to 619 subjects. However it results in a smaller expected sample size under H1 . Observe that the expected sample size is only 515 subjects under the alternative hypothesis (πc = 0.125, ρ = 0.72), and still ensures the power is 80%. 502 24.2 Ratio of Proportions: Wald Formulation – 24.2.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 24.2.2 Trial Simulation You can simulate Design4 by selecting it from the Library and clicking on the icon. Try different choices for the simulation parameters to verify the operating characteristics of the study. For instance, under the Response Generation Info tab set Prop. Under Control to 0.125 and Prop. Under Treatment to 0.72 × 0.125 = 0.09. Click Simulate button. Once the simulation run has completed, East will add an additional row to the Output Preview labeled Simulation1. Select Simulation1 in the Output Preview. Note that some of the design details will be displayed in the upper icon to save it to the Library. pane, labeled Compare Designs. Click the Double-click on Simulation1 in the Library. The simulation output details will be displayed. 24.2 Ratio of Proportions: Wald Formulation – 24.2.2 Trial Simulation 503 <<< Contents 24 * Index >>> Binomial Non-Inferiority Two-Sample We simulated the data under the alternative hypothesis and should achieve a rejection rate of 80%. This is confirmed above (up to Monte Carlo accuracy). Next, to simulate under the null hypothesis, under the Response Generation Info tab set Prop. Under Treatment to 1.3 × 0.125 = 0.1625. Click Simulate button. This time the rejection rate is only 5% (up to Monte Carlo accuracy), as we would expect under the null hypothesis. You may experiment in this manner with different values of πc and πt and observe the rejection rates look by look as well as averaged over all looks. 24.2.3 Interim Monitoring icon from the Library toolbar. Select Design4 in the Library, and click the Alternatively, right-click on Design4 and select Create IM Dashboard. The interim monitoring dashboard contains various controls for monitoring the trial, and is divided into two sections. The top section contains several columns for displaying output values based on the interim inputs. The bottom section contains four charts, each with a corresponding table to its right. These charts provide graphical and numerical descriptions of the progress of the clinical trial and are useful tools for decision making by a data monitoring committee. Suppose that the trial is first monitored after accruing 125 subjects on each treatment 504 24.2 Ratio of Proportions: Wald Formulation – 24.2.3 Interim Monitoring <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 arm, with 15 responses on the control arm and 13 responses on the treatment arm. Click on the icon to invoke the Test Statistic Calculator. In the box next to Cumulative Sample Size enter 250. Enter −0.143101 in the box next to Estimate of δ. In the box next to Std. Error of δ enter 0.357197. Next click Recalc. Notice that the test statistic is computed to be -1.135. This value for the test statistic was obtained by substituting the observed sample sizes and responses into equations (24.6) through (24.9). Upon clicking the OK button, East will produce the interim monitoring report shown below. 24.2 Ratio of Proportions: Wald Formulation – 24.2.3 Interim Monitoring 505 <<< Contents 24 * Index >>> Binomial Non-Inferiority Two-Sample Note - Click on icon to hide or unhide the columns of your interest. The stopping boundary for declaring non-inferiority is -2.872 whereas the value of the test statistic is only -1.135. Thus the trial should continue. This conclusion is supported by the value of the 97.5% upper confidence bound of the repeated confidence interval for δ = ln(ρ). The non-inferiority claim could be sustained only if this bound were less than ln(1.3) = 0.262. At the current interim look, however, the upper bound on δ is 0.883, indicating that the non-inferiority claim is not supported by the data. Suppose that the next interim look occurs after accruing 250 patients on each arm with 31 responses on the control arm and 22 responses on the treatment arm. Click on the second row in the table in the upper section. Then click the icon. In the box next to Cumulative Sample Size enter 500. Enter −0.342945 in the box next to Estimate of δ. In the box next to Std. Error of δ enter 0.264031. Next click 506 24.2 Ratio of Proportions: Wald Formulation – 24.2.3 Interim Monitoring <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Recalc. Notice that the test statistic is computed to be -2.293. Click the OK button. This time the stopping boundary for declaring non-inferiority is crossed. The following message box appears. 24.2 Ratio of Proportions: Wald Formulation – 24.2.3 Interim Monitoring 507 <<< Contents 24 * Index >>> Binomial Non-Inferiority Two-Sample Click the Stop button to stop the study. The analysis results are shown below. The upper bound on the 95.0% repeated confidence interval for δ is 0.159. Thus the upper confidence bound on ρ is exp(0.159) = 1.172, comfortably within the non-inferiority margin ρ0 = 1.3 specified at the design stage. In the Final Inference Table in the bottom portion of the IM worksheet, East also provides a p-value, confidence interval and median unbiased point estimate for δ using stage-wise ordering of the sample space as described in Jennison and Turnbull (2000). This approach often yields narrower confidence intervals than the repeated confidence intervals approach although both approaches have the desired 95.0% coverage. In the present example, the upper confidence bound is 0.098, slightly less than the corresponding bound from the repeated confidence interval. 24.3 Ratio of Proportions: Farrington-Manning Formulation 24.3.1 Trial Design 24.3.2 Trial Simulation 24.3.3 Interim Monitoring 508 An alternative approach to establishing non-inferiority of an experimental treatment to the control treatment with respect to the ratio of probabilities was proposed by Farrington and Manning (1990). Let πc and πt denote the response rates for the control and the experimental treatments, respectively. Let the difference between the two arms be expressed by the ratio πt ρ= πc 24.3 Ratio of Proportions: Farrington-Manning <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 The null hypothesis is specified as H0 : ρ = ρ0 , or equivalently H0 : π t = ρ 0 π c , which is tested against one-sided alternative hypotheses. If the occurrence of a response denotes patient benefit rather than harm, then ρ0 < 1 and the alternative hypothesis is H1 : ρ > ρ0 or equivalently as H1 : πt > ρ0 πc . Conversely, if the occurrence of a response denotes patient harm rather than benefit, then ρ0 > 1 and the alternative hypothesis is H1 : ρ < ρ0 or equivalently as H1 : πt < ρ0 πc . For any given πc , the sample size is determined by the desired power at a specified value of ρ = ρ1 . A common choice is ρ1 = 1 (or equivalently πt = πc ), but East permits you to power the study at any value of ρ1 which is consistent with the choice of H1 . Let π̂tj and π̂cj denote the estimates of πt and πc based on ntj and ncj observations from the experimental and control treatments, respectively, up to and including the j-th look, j = 1, . . . , K, where a maximum of K looks are to be made. The test statistic at the j-th look is defined as Z j = rh π̂tj − ρ0 π̂cj π̂tj (1−π̂tj ) ntj + ρ20 π̂cj (1−π̂cj ) ncj i. (24.10) The choice of test statistic is the primary distinguishing feature between the above Farrington-Manning formulation and the Wald formulation of the non-inferiority test discussed in Section 24.2. The Wald statistic (24.6) measures the standardized difference between the observed ratio of proportions and the non-inferiority margin on the natural logarithm scale. The corresponding repeated one-sided confidence bounds displayed in the interim monitoring worksheet estimate ln(πt /πc ) and may be converted to estimates of the ratio of proportions by exponentiation. On the other hand, 24.3 Ratio of Proportions: Farrington-Manning 509 <<< Contents 24 * Index >>> Binomial Non-Inferiority Two-Sample the Farrington-Manning formulation focuses on the expression of the null hypothesis as H0 : πt − ρ0 πc = 0. Thus, we consider δ = πt − ρ0 πc (24.11) as the parameter of interest. The test statistic (24.10) is the standardized estimate of this difference obtained at the j-th look. A large difference in the direction of the alternative hypothesis is indicative of non-inferiority. The corresponding repeated one-sided confidence bounds displayed in the interim monitoring worksheet provide estimates of δ rather than directly estimating ρ or ln(ρ). The Farrington-Manning and Wald procedures are equally applicable for hypothesis testing since the null hypothesis δ = 0 is rejected if and only if the corresponding null hypothesis ρ = ρ0 is rejected. 24.3.1 Trial Design We consider the Coronary Artery Revascularization in Diabetes (CARDia) trial (Kapur et al, 2005) compared coronary bypass graft surgery (CABG) and percutaneous coronary intervention (PCI) as strategies for revascularization, with the goal of showing that PCI is noninferior to CABG, presented in Section 24.2. We use various aspects of that study to exemplify the use of the methodology to test for inferiority with respect to the one-year event rate where an ”event” is the occurrence of death, nonfatal myocardial infarction, or cerebrovascular accident, using the Farrington-Manning formulation. Suppose that the event rate for the CABG is πc = 0.125 and that the claim of non-inferiority for PCI can be sustained if the ratio ρ is at most 1.3; that is, the event rate for the PCI (πt ) is at most 0.1625. The null hypothesis H0 : ρ = 1.3 is tested against the alternative hypothesis H1 : ρ < 1.3. We want to determine the sample size required to have power of 80% when ρ = 1 using a one-sided test with a type-1 error rate of 0.05. Single Look Design Powered at ρ = 1 First we consider a study with only one look and equal sample sizes in the two groups. To begin click Two Proportions on the 510 24.3 Ratio of Proportions: Farrington-Manning – 24.3.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Design tab, and then click Ratio of Proportions. In the ensuing dialog box, next to Trial, select Noninferiority from the drop down menu. Choose the remaining design parameters as shown below. Now click Compute. The design is shown as a row in the Output Preview located in the lower pane of this window. This single-look design requires a combined total of 2588 subjects from both treatments in order to attain 80% power. You can select this design by clicking anywhere along the row in the Output Preview. Some of the design details will be displayed in the upper pane, labeled Compare Designs. In the Output Preview toolbar, click the icon to save this design to Workbook1 in the Library. If you hover the cursor over Design1 in the Library, a 24.3 Ratio of Proportions: Farrington-Manning – 24.3.1 Trial Design 511 <<< Contents 24 * Index >>> Binomial Non-Inferiority Two-Sample tooltip will appear that summarizes the input parameters of the design. Three-Look Design Powered at ρ = 1 For the above study, suppose we wish to take up to two equally spaced interim looks and one final look at the accruing data, using the Lan- DeMets (O’Brien-Fleming) stopping boundary. Create a new design by selecting Design1 in the Library, and clicking the icon on the Library toolbar. Change the Number of Looks from 1 to 3, to generate a study with two interim looks and a final analysis. A new tab Boundary Info will appear. Clicking on this tab will reveal the stopping boundary parameters. By default, the Spacing of Looks is set to Equal, which means that the interim analyses will be equally spaced in terms of the number of patients accrued between looks. The left side contains details for the Efficacy boundary, and the right side contains details for the Futility boundary. By default, there is an efficacy boundary (to reject H0) selected, but no futility boundary (to reject H1). The Boundary Family specified is of the Spending Functions type. The default Spending function is the Lan-DeMets (Lan & DeMets, 1983), with Parameter OF (O’Brien-Fleming), which generates boundaries that are very similar, though not identical, to the classical stopping boundaries of O’Brien and Fleming 512 24.3 Ratio of Proportions: Farrington-Manning – 24.3.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 (1979). Click the Compute button to generate output for Design2. With Design2 selected in the Output Preview, click the icon to save Design2 to the Library. In the Library, select the rows for Design1 and Design2, by holding the Ctrl key, and then click the icon. The upper pane will display the details of the two designs 24.3 Ratio of Proportions: Farrington-Manning – 24.3.1 Trial Design 513 <<< Contents 24 * Index >>> Binomial Non-Inferiority Two-Sample side-by-side: Using three planned looks requires an up-front commitment of 2640 subjects, a slight inflation over the single-look design which required only 2588 subjects. However, the three-look design may result in a smaller sample size than that required for the single-look design, with an expected sample size of 2195 subjects under the alternative hypothesis (πc = 0.125, ρ = 1), and still ensures that the power is 80%. By selecting Design2 in the Library and clicking on the click the icon, East displays the cumulative accrual, the stopping boundary, the type-1 error spent and the boundary crossing probabilities under the null hypothesis H0 : ρ = 1.3, and the alternative hypothesis H1 : ρ = 1 . 514 24.3 Ratio of Proportions: Farrington-Manning – 24.3.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Single-Look Design Powered at ρ 6= 1 Sample sizes for non-inferiority trials powered at ρ = 1 are generally rather large because regulatory requirements usually impose small non-inferiority margins. Observe that both Design1 and Design2 were powered at ρ = 1 and required sample sizes in excess of 2500 subjects. However, based on Kapur et al (2005), it is reasonable to expect πt < πc . We now consider the same design as in Design1, but we will power at the alternative hypothesis ρ1 = 0.72. That is, we will design this study to have 80% power to claim non-inferiority if πc = 0.125 and πt = 0.72 × 0.125 = 0.09. Create a new design by selecting Design1 in the Library, and clicking the icon on the Library toolbar. In the ensuing dialog box, change the design parameters as shown below. Click the Compute button to generate output for Design3. With Design3 selected in the Output Preview, click the icon. In the Library, select the rows for Design1, Design2, and Design3, by holding the Ctrl key, and then click the 24.3 Ratio of Proportions: Farrington-Manning – 24.3.1 Trial Design 515 <<< Contents 24 * Index >>> Binomial Non-Inferiority Two-Sample icon. The upper pane will display the details of the three designs side-by-side: This single-look design requires a combined total of 628 subjects from both treatments in order to attain 80% power. This is a considerable decrease from the 2588 subjects required to attain 80% power using Design1, i.e. with ρ1 = 1. Three-Look Design Powered at ρ 6= 1 We now consider the impact of multiple looks on Design3. Suppose we wish to take up to two equally spaced interim looks and one final look at the accruing data, using the Lan- DeMets (O’Brien-Fleming) stopping boundary. Create a new design by selecting Design3 in the Library, and clicking the icon on the Library toolbar. In the ensuing dialog box, change the Number of Looks to 3. 516 24.3 Ratio of Proportions: Farrington-Manning – 24.3.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Click the Compute button to generate output for Design4. Using three planned looks inflates the maximum sample size slightly, from 628 to 641 subjects. However it results in a smaller expected sample size under H1 . Observe that the expected sample size is only 533 subjects under the alternative hypothesis (πc = 0.125, ρ = 0.72), and still ensures the power is 80%. 24.3.2 Trial Simulation You can simulate Design4 by selecting Design4 in the Library and clicking on the icon. Try different choices for the simulation parameters to verify the operating characteristics of the study. For instance, under the Response Generation Info tab set Prop. Under Control to 0.125 and Prop. Under Treatment to 0.72 × 0.125 = 0.09. Click Simulate button. Once the simulation run has completed, East will add an additional row to the Output Preview labeled Simulation1. Select Simulation1 in the Output Preview. Note that some of the design details will be displayed in the upper pane, labeled Compare Designs. Click the icon to save it to the Library. Double-click on Simulation1 in the Library. The simulation output details will be 24.3 Ratio of Proportions: Farrington-Manning – 24.3.2 Trial Simulation 517 <<< Contents 24 * Index >>> Binomial Non-Inferiority Two-Sample displayed. We simulated the data under the alternative hypothesis and should achieve a rejection rate of 80%. This is confirmed above (up to Monte Carlo accuracy). Next, to simulate under the null hypothesis. Edit the Sim1 node by clicking icon and under the Response Generation Info tab, set Prop. Under Treatment to 518 24.3 Ratio of Proportions: Farrington-Manning – 24.3.2 Trial Simulation <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 1.3 × 0.125 = 0.1625. Click Simulate button. This time the rejection rate is only 5% (up to Monte Carlo accuracy), as we would expect under the null hypothesis. You may experiment in this manner with different values of πc and πt and observe the rejection rates look by look as well as averaged over all looks. 24.3.3 Interim Monitoring icon from the Library toolbar. Select Design4 in the Library, and click the Alternatively, right-click on Design4 and select Interim Monitoring. The interim monitoring dashboard contains various controls for monitoring the trial, and is divided into two sections. The top section contains several columns for displaying output values based on the interim inputs. The bottom section contains four charts, each with a corresponding table to its right. These charts provide graphical and numerical descriptions of the progress of the clinical trial and are useful tools for decision making by a data monitoring committee. Suppose that the trial is first monitored after accruing 125 subjects on each treatment arm, with 15 responses on the control arm and 13 responses on the treatment arm. Click on the icon to invoke the Test Statistic Calculator. In the box next to Cumulative Sample Size enter 250. Enter −0.052 in the box next to Estimate of δ. In the box next to Std. Error of δ enter 0.046617. Next click Recalc. 24.3 Ratio of Proportions: Farrington-Manning – 24.3.3 Interim Monitoring 519 <<< Contents 24 * Index >>> Binomial Non-Inferiority Two-Sample The test statistic is computed to be -1.115. This value for the test statistic was obtained by substituting the observed sample sizes and responses into equation (24.10). Upon clicking the OK button, East will produce the interim monitoring report shown below. The stopping boundary for declaring non-inferiority is -2.929 whereas the value of the test statistic is only -1.115. Thus the trial should continue. This conclusion is also 520 24.3 Ratio of Proportions: Farrington-Manning – 24.3.3 Interim Monitoring <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 supported by the upper confidence bound on δ = πt − ρ0 πc which at present equals 0.085. A necessary and sufficient condition for the stopping boundary to be crossed, and non-inferiority demonstrated thereby, is for this upper confidence bound to be less than zero. Suppose that the next interim look occurs after accruing 250 patients on each arm with 31 responses on the control arm and 22 responses on the treatment arm. Click on the second row in the table in the upper section. Then click the icon. In the box next to Cumulative Sample Size enter 500. Enter −0.0732 in the box next to Estimate of δ. In the box next to Std. Error of δ enter 0.032486. Next click Recalc. Notice that the test statistic is computed to be -2.253. Click the OK button. This time the stopping boundary for declaring non-inferiority is 24.3 Ratio of Proportions: Farrington-Manning – 24.3.3 Interim Monitoring 521 <<< Contents 24 * Index >>> Binomial Non-Inferiority Two-Sample crossed. The following message box appears. Click the Stop button to stop the study. The analysis results are shown below. Notice that the upper confidence bound of the repeated confidence interval for δ excludes zero. In the Final Inference Table in the bottom portion of the IM worksheet, East also provides a p-value, confidence interval and median unbiased point estimate for δ using stage-wise ordering of the sample space as described in Jennison and Turnbull (2000, page 179). The upper confidence bound for δ based on the stage-wise method likewise excludes zero. 24.4 522 Odds Ratio Test Let πt and πc denote the two binomial probabilities associated with the treatment (t) 24.4 Odds Ratio Test <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 and the control (c). Let the difference between the two treatment arms be captured by the odds ratio πt /(1 − πt ) πt (1 − πc ) ψ= = . πc /(1 − πc ) πc (1 − πt ) The null hypothesis is specified as H0 : ψ = ψ 0 and is tested against one-sided alternative hypotheses. If the occurrence of a response denotes patient benefit rather than harm, then ψ0 > 1 and the alternative hypothesis is H1 : ψ > ψ 0 . Conversely, if the occurrence of a response denotes patient harm rather than benefit, then ψ0 < 1 and the alternative hypothesis is H1 : ψ < ψ 0 . For any given πc , the sample size is determined by the desired power at a specified value ψ = ψ1 . A common choice is ψ1 = 1 (or equivalently πt = πc ), but East permits you to power the study at any value of ψ1 which is consistent with the choice of H1 . Let π̂tj and π̂cj denote the estimates of πt and πc based on ntj and ncj observations from the experimental and control treatments, respectively, up to and including j-th look, j = 1, . . . , K, where a maximum of K looks are to be made. It is convenient to express the treatment effect on the logarithmic scale as δ = ln ψ . (24.12) The test statistic at the jth look is then defined as Zj = 24.4.1 δ̂j − δ0 se(δ̂j ) =q ln(ψ̂j ) − ln(ψ0 ) 1 ntj π̂tj (1−π̂tj ) + 1 ncj π̂cj (1−π̂cj ) . (24.13) Trial Design Suppose that the response rate for the control treatment is 90%, where higher response rates imply patient benefit. Assume that a claim of non-inferiority can be sustained if we can demonstrate statistically that the experimental treatment has a response rate of at least 80%. In other words the non-inferiority margin is ψ0 = 0.8(1 − 0.9) = 0.444 . 0.9(1 − 0.8) 24.4 Odds Ratio Test – 24.4.1 Trial Design 523 <<< Contents 24 * Index >>> Binomial Non-Inferiority Two-Sample The null hypothesis H0 : ψ = 0.444 is to be tested against the one-sided alternative H1 : ψ > 0.444. Suppose that we want to determine the sample size required to have power of 90% when πc = 0.9 and ψ1 = 1, i.e. πc = πt , using a test with a type-1 error rate of 0.05. Single-Look Design Powered at ψ = 1 First we consider a study with only one look and equal sample sizes in the two groups. To begin click Two Proportions on the Design tab, and then click Odds Ratio of Proportions. In the ensuing dialog box, next to Trial, select Noninferiority from the drop down menu. Choose the remaining design parameters as shown below. Now click Compute. The design is shown as a row in the Output Preview located in the lower pane of this window. This single-look design requires a combined total of 524 24.4 Odds Ratio Test – 24.4.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 579 subjects from both treatments in order to attain 90% power. You can select this design by clicking anywhere along the row in the Output Preview. Some of the design details will be displayed in the upper pane, labeled Compare Designs. In the Output Preview toolbar, click the icon to save this design to Workbook1 in the Library. If you hover the cursor over Design1 in the Library, a tooltip will appear that summarizes the input parameters of the design. Three-Look Design Powered at ψ = 1 For the above study, suppose we wish to take up to two equally spaced interim looks and one final look at the accruing data, using the default Lan- DeMets (O’Brien-Fleming) stopping boundary. Create a new design by selecting Design1 in the Library, and clicking the icon on the Library toolbar. Change the Number of Looks from 1 to 3, to generate a study with two interim looks and a final analysis. A new tab Boundary Info will appear. Clicking on this tab will reveal the stopping boundary parameters. By default, the Spacing of Looks is set to Equal, which means that the interim analyses will be equally spaced in terms of the number of patients accrued between looks. The left side contains details for the Efficacy boundary, and the right side contains details for the Futility boundary. By default, there is an efficacy boundary (to reject H0) selected, but no futility boundary (to reject H1). The Boundary Family specified is of the Spending Functions type. The default Spending function is the Lan-DeMets (Lan & DeMets, 1983), with Parameter OF (O’Brien-Fleming), which generates boundaries that are very similar, though not identical, to the classical stopping boundaries of 24.4 Odds Ratio Test – 24.4.1 Trial Design 525 <<< Contents 24 * Index >>> Binomial Non-Inferiority Two-Sample O’Brien and Fleming (1979). Technical details of these stopping boundaries are available in Appendix F. Click the Compute button to generate output for Design2. With Design2 selected in the Output Preview, click the icon to save Design2 to the Library. In the Library, select the rows for Design1 and Design2, by holding the Ctrl key, and then click the side-by-side: icon. The upper pane will display the details of the two designs Using three planned looks requires an up-front commitment of 590 subjects, a slight inflation over the single-look design which required 579 subjects. However, the 526 24.4 Odds Ratio Test – 24.4.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 three-look design may result in a smaller sample size than that required for the single-look design, with an expected sample size of 457 subjects under the alternative hypothesis (πc = 0.9, ψ = 1), and still ensures that the power is 90%. Single-Look Design Powered at ψ 6= 1 Suppose that it is expected that the new treatment is a bit better than the control, but it is unnecessary and unrealistic to perform a superiority test. The required sample size for ψ1 = 1.333, i.e. πt = 0.92308, is determined. Create a new design by selecting Design1 in the Library, and clicking the icon on the Library toolbar. In the ensuing dialog box, change the design parameters as shown below. Click the Compute button to generate output for Design3. With Design3 selected in the Output Preview, click the icon. In the Library, select the rows for Design1, Design2, and Design3, by holding the Ctrl key, and then click the 24.4 Odds Ratio Test – 24.4.1 Trial Design 527 <<< Contents 24 * Index >>> Binomial Non-Inferiority Two-Sample icon. The upper pane will display the details of the three designs side-by-side: We observe that a single-look design powered at ψ1 = 1.333 reduces the sample size considerably relative to the single-look design powered at ψ1 = 1. The reduction in maximum sample size for the three-look design is approximately 38% (=(579-358)/579). However, Design3 should be implemented after careful consideration, since its favorable operating characteristics are only applicable to the optimistic situation where ψ1 = 1.333. If ψ1 < 1.33, the power under Design3 decreases and may be too small to establish noninferiority, even if the true value > 1, but is < 1.333. Three-Look Design Powered at ψ 6= 1 For the above study (Design3), suppose we wish to take up to two equally spaced interim looks and one final look at the accruing data, using the default Lan- DeMets (O’Brien-Fleming) stopping boundary. Create a new design by selecting Design3 in the Library, and clicking the icon on the Library toolbar. In the ensuing dialog box, change the Number of Looks to 3. Click the Compute button to generate output for Design4. 528 24.4 Odds Ratio Test – 24.4.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Using three planned looks requires an up-front commitment of 365 subjects, a small inflation over the single-look design which required 358 subjects. However, the three-look design may result in a smaller sample size than that required for the single-look design, with an expected sample size of 283 subjects under the alternative hypothesis (πc = 0.9, ψ = 1.333), and still ensures that the power is 90%. 24.4.2 Trial Simulation You can simulate Design4 by selecting Design4 in the Library and clicking on the icon. Try different choices for the simulation parameters to verify the operating characteristics of the study. First, we verify the results under the alternative hypothesis at which the power is to be controlled, namely πc = 0.9 and πt = 0.92308. Under the Response Generation Info tab set Prop. Under Control to 0.9 and Prop. Under Treatment to 0.92308. Click Simulate button. Once the simulation run has completed, East will add an additional row to the Output Preview labeled Simulation1. Select Simulation1 in the Output Preview. Note that some of the design details will be displayed in the upper pane, labeled Compare Designs. Click the icon to save it to the Library. Double-click on Simulation1 in the Library. The simulation output details will be 24.4 Odds Ratio Test – 24.4.2 Trial Simulation 529 <<< Contents 24 * Index >>> Binomial Non-Inferiority Two-Sample displayed. We see here that the power is approximately 90%. Now let’s consider the impact if the sample size was determined assuming πc = 0.9, ψ1 = 1.333 when the true values are πc = 0.9 and ψ1 = 1. Under the Response Generation Info tab set Prop. Under Treatment to 0.9. Click Simulate 530 24.4 Odds Ratio Test – 24.4.2 Trial Simulation <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 button. This results in a power of approximately 74%. From this we see that if that optimistic choice is incorrect, then the power to establish nonninferiority has decreased to a possibly unacceptable value of 74%. 24.4.3 Interim Monitoring Select Design4 in the Library, and click the icon from the Library toolbar. Alternatively, right-click on Design4 and select Interim Monitoring. The interim monitoring dashboard contains various controls for monitoring the trial, and is divided into two sections. The top section contains several columns for displaying output values based on the interim inputs. The bottom section contains four charts, each with a corresponding table to its right. These charts provide graphical and numerical descriptions of the progress of the clinical trial and are useful tools for decision making by a data monitoring committee. Suppose that the trial is first monitored after accruing 60 subjects on each treatment arm, with 50 responses on the control arm and 52 responses on the treatment arm. Click on the icon to invoke the Test Statistic Calculator. In the box next to Cumulative Sample Size enter 120. Enter 0.264231 in the box next to Estimate of δ. In the box next to Std. Error of δ enter 0.514034. Next click Recalc. Notice that the test statistic is computed to be 2.092. This value for the test statistic was 24.4 Odds Ratio Test – 24.4.3 Interim Monitoring 531 <<< Contents 24 * Index >>> Binomial Non-Inferiority Two-Sample obtained by substituting the observed sample sizes and responses into equation (24.13). Upon clicking the OK button, East will produce the interim monitoring report shown below. Note - Click on icon to hide or unhide the columns of your interest. The critical value is 3.22, and since the observed value of the test statistic (24.13) is less than this value, the null hypothesis cannot be rejected. Therefore, noninferiority cannot as yet be concluded. Suppose that the second look is made after accruing 120 subjects on each treatment 532 24.4 Odds Ratio Test – 24.4.3 Interim Monitoring <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 arm, with 112 responses on the control arm and 115 responses on the treatment arm. Click on the second row in the table in the upper section. Then click the icon. In the box next to Cumulative Sample Size enter 240. Enter 1.43848 in the box next to Estimate of δ. In the box next to Std. Error of δ enter 0.801501. Next click Recalc. Notice that the test statistic is computed to be 2.808. This value for the test statistic was obtained by substituting the observed sample sizes and responses into equation (24.13). Click the OK button. This time the stopping boundary for declaring non-inferiority is 24.4 Odds Ratio Test – 24.4.3 Interim Monitoring 533 <<< Contents 24 * Index >>> Binomial Non-Inferiority Two-Sample crossed. The following message box appears. Click the Stop button to stop the study. The analysis results are shown below. The null hypothesis is rejected and we conclude that the treatment is noninferior to the control. In the Final Inference Table in the bottom portion of the IM worksheet, East also provides a stage-wise adjusted p-value, median unbiased point estimate and confidence interval for ψ as described in Jennison and Turnbull (2000) and in Appendix C of the East user manual. In the present example the adjusted p-value is 0.003, the point estimate for ψ is exp(1.427) = 4.166 and the upper 95% confidence bound for ψ is exp(0.098) = 1.103. 534 24.4 Odds Ratio Test <<< Contents * Index >>> 25 25.1 Equivalence Test Binomial Equivalence Two-Sample In some experimental situations, it is desired to show that the response rates for the control and the experimental treatments are ”close”, where ”close” is defined prior to the collection of any data. Examples of this include showing that an aggressive therapy yields a similar rate of a specified adverse event to the established control, such as the bleeding rates associated with thrombolytic therapy or cardiac outcomes with a new stent. Let πc and πt denote the response rates for the control and the experimental treatments, respectively, and let π̂t and π̂c denote the estimates of πt and πc based on nt and nc observations from the experimental and control treatments. Furthermore, let δ = πt − πc , (25.1) δ̂ = π̂t − π̂c . (25.2) which is estimated by Finally, let the variance of δ̂ be σ2 = πc (1 − πc ) πt (1 − πt ) + , nc nt (25.3) σ̂ 2 = π̂c (1 − π̂c ) π̂t (1 − π̂t ) + . nc nt (25.4) which is estimated by The null hypothesis H0 : |πt − πc | = δ0 is tested against the two-sided alternative hypothesis H1 : |πt − πc | < δ0 , where δ0 (> 0) is specified to define equivalence. Following Machin and Campbell (1987), we present the solution to this problem as a one-sided α -level test. The decision rule is to declare equivalence if −δ0 + zα σ̂ ≤ π̂t − π̂c ≤ δ0 − zα σ̂. (25.5) We see that decision rule (25.5) is the same as declaring equivalence if the (1 − 2α) 100% confidence interval for πt − πc is entirely contained with the interval (−δ0 , δ0 ). The power or sample size are determined for a single-look study only. The extension to multiple looks is given in the next section. The sample size, or power, is determined at a specified difference πt − πc , denoted δ1 , where −δ0 < δ1 < δ0 . The probability of declaring equivalence depends on the true values of πc and πt . Based on the results of Machin and Campbell (1987), the required total sample size (N) is, for nt = rN and nc = (1 − r)N , (zα + zβ )2 πc (1 − πc ) (πc + δ1 )(1 − (πc + δ1 )) + . (25.6) N= (δ0 − δ1 )2 1−r r 25.1 Equivalence Test – 25.1.1 Trial Design 535 <<< Contents 25 * Index >>> Binomial Equivalence Two-Sample 25.1.1 Trial Design Consider the development of a new stent which is to be compared to the standard stent with respect to target vessel failure (acute failure, target vessel revascularization, myocardial infarction, or death) after one year. The standard stent has an assumed target vessel failure rate of 20%. Equivalence is defined as δ0 = 0.075. The sample size is to be determined with α = 0.025 (one-sided) and power, i.e. probability of declaring equivalence, of 1 − β = 0.80. To begin click Two Samples on the Design tab, and then click Difference of Proportions. Suppose that we want to determine the sample size required to have power of 80% when δ1 = 0. Enter the relevant parameters into the dialog box as shown below. In the drop down box next to Trial Type be sure to select Equivalence. 536 25.1 Equivalence Test – 25.1.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Click on the Compute button. The design is shown as a row in the Output Preview located in the lower pane of this window. The sample size required in order to achieve the desired 80% power is 1203 subjects. You can select this design by clicking anywhere along the row in the Output Preview. If you double click anywhere along the row in the Output Preview some of the design details will be displayed in the upper pane, labeled Output Summary. In the Output Preview toolbar, click the icon to save this design to Workbook1 in the Library. If you hover the cursor over Des1 in the Library, a tooltip will appear that summarizes the input parameters of the design. If the assumed difference δ1 is not zero, it is more difficult to establish equivalence, in the sense that the power is lower and thus the required sample size is larger. Consider δ1 = 0.025, so that the new stent increases the rate to 22.5%. Create a new design 25.1 Equivalence Test – 25.1.1 Trial Design 537 <<< Contents 25 * Index >>> Binomial Equivalence Two-Sample Des2 by selecting Des1 in the Library, and clicking the icon on the Library toolbar. Change the value of Expected Diff. from 0 to 0.025 as shown below. Click on the Compute button. The design is shown as a row in the Output Preview located in the lower pane of this window. With Design2 selected in the Output Preview, click the icon. In the Library, select the rows for Des1 and Des2, by holding the Ctrl key, and then click the details of the two designs side-by-side: icon. The upper pane will display the This single-look design requires a combined total of 2120 subjects from both 538 25.1 Equivalence Test – 25.1.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 treatments in order to attain 80% power. Consider δ1 = −0.025, so that the new stent decreases the rate to 17.5%. Create a new design, as above, and change the value of Expected Diff. to −0.025. Click the Compute button to generate the output for Des3. With Des3 selected in the Output Preview, click the icon. In the Library, select the nodes for Des1, Des2, and Des3 by holding the Ctrl key, and then click the display the details of the three designs side-by-side: icon. The upper pane will Des3 yields a required total sample size of 1940 subjects. This asymmetry is due to the fact that the variance is smaller for values of πc + δ1 further from 0.5. 25.1.2 Extension to Multiple Looks Although the details presented in the previous section are related to a single-look design only, these results can be used to extend the solution to allow for multiple equally-spaced looks. We can use the General Design Module to generalize the solution to this problem to the study design with multiple looks. Details are given in Chapters 60 and 59. Let π̂tj and π̂cj denote the estimates of πt and πc based on ntj and ncj observations from the experimental and control treatments, respectively, up to and including the j-th look, j = 1, . . . , K, where a maximum of K looks are to be used. Let nj = ncj + ntj and δ̂j = π̂tj − π̂cj (25.7) 25.1 Equivalence Test – 25.1.2 Extension to Multiple Looks 539 <<< Contents 25 * Index >>> Binomial Equivalence Two-Sample denote the estimate of δ, given by (25.1), and let σ̂j2 = π̂cj (1 − π̂cj ) π̂tj (1 − π̂tj ) + ncj ntj (25.8) denote the estimate of σ 2 , given by (25.3), using the data available at the j-th look. At the j-th look, the inference is based on Zj = δ̂j . σ̂j (25.9) Let η=δ p Imax , where Imax is described in Chapter 59. Let tj = nj /nmax , j = 1, . . . , K. Then, using the multivariate normal approximation to the distribution of Z1 , . . . , ZK , with the 1/2 expected value of Zj equal to tj η and the variance of Zj equal to 1, the (1 − α)100% repeated confidence intervals for η are ! Zj + CLj Zj + CU j , , (25.10) 1/2 1/2 tj tj where CLj and CU j are the values specified by the stopping boundary. The corresponding (1 − α)100% repeated confidence intervals for δ are (δj + CLj , δj + CU j ). (25.11) Using the General Design Module, East provides these repeated confidence intervals for η. By considering the decision rule (25.5) as declaring equivalence if the (1 − 2α) 100% confidence interval for πt − πc is entirely contained with the interval (−δ0 , δ0 ), we generalize the decision rule to a multiple-look design by concluding equivalence and stopping the study the first time one of the repeated (1 − 2α) 100% confidence intervals for η is entirely contained within the interval (−η0j , η0j ), where 1/2 η0j = δ0 /tj σ̂j . Consider Design1 (i.e. πc = 0.20, δ0 = 0.075, and δ1 = 0). As we saw above, a total of 1203 subjects are required for decision rule (25.5) to have power of 80% of declaring equivalence, using a 95% confidence interval. 540 25.1 Equivalence Test – 25.1.2 Extension to Multiple Looks <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 To begin click on the Other Designs on the Design tab and then click Sample Size-Based. Enter the parameters as shown below. For the Sample Size for Fixed-Sample Study enter 1203, the value obtained from Des1. Also, be sure to set the Number of Looks to 5. Recall that the choice here is twice the (one-sided) value specified for the single-look design. The General Design Module is designed for testing the null hypothesis H00 : η = 0. Thus, the specified power of the test pertains to testing H00 and is not directly related to the procedure using the confidence interval. The expected sample sizes under H0 and H1 depend on the specified value of the power and pertain to the null hypothesis H00 and the corresponding alternative hypothesis H10 : η 6= 0 or a corresponding one-sided alternative. These expected sample sizes are not directly applicable to the equivalence problem of testing H0 against H1 . Next click on the Boundary Info tab. The repeated confidence intervals for η depend on the choice of spending function boundaries. The sample size for this group sequential study also depends on the choice of the spending function, as well as the choice of the power. Although the boundaries themselves are not used in the decision rule, the width of the repeated confidence intervals for η are determined by the choice of the spending function. Here we will use the Lan- DeMets (O’Brien-Fleming) 25.1 Equivalence Test – 25.1.2 Extension to Multiple Looks 541 <<< Contents 25 * Index >>> Binomial Equivalence Two-Sample stopping boundary, with the looks spaced equally apart, as shown below. Click Compute. With Des4 selected in the Output Preview, click the icon. In the Library, select the rows for Des1 and Des4, by holding the Ctrl key, and then click icon. The upper pane will display the summary details of the two designs the side-by-side: We see that the extension of Des1 to a five-look design requires a commitment of 1233 subjects, a small inflation over the sample size of 1203 subjects required for Des1. 542 25.1 Equivalence Test – 25.1.2 Extension to Multiple Looks <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Select Design4 in the Library, and click the icon from the Library toolbar. Alternatively, right-click on Design4 and select Create IM Dashboard. This will invoke the interim monitoring worksheet, from which the repeated 95% confidence intervals will be provided. The interim monitoring dashboard contains various controls for monitoring the trial, and is divided into two sections. The top section contains several columns for displaying output values based on the interim inputs. The bottom section contains four charts, each with a corresponding table to its right. These charts provide graphical and numerical descriptions of the progress of the clinical trial and are useful tools for decision making by a data monitoring committee. We want to perform up to five looks, as data becomes available for every 200 subjects. Suppose that, after 200 subjects, π̂cj = 18/100 = 0.18 and π̂tj = 20/100 = 0.2. Then, from (25.2) and (25.4), the estimates of δ and the standard error of δ̂ are 0.02 icon to invoke the Test Statistic Calculator. Enter the and 0.0555. Click on the appropriate values as shown below and click Recalc. Notice that the test statistic is 25.1 Equivalence Test – 25.1.2 Extension to Multiple Looks 543 <<< Contents 25 * Index >>> Binomial Equivalence Two-Sample computed to be 0.357. Next click OK . The following screen is shown. The first repeated 95% confidence interval for η is (-12.628, 14.402). Since this confidence interval is not contained in the interval (-3.357, 3.357), where η01 = δ0 1/2 t1 σ̂1 = 0.075 = 3.357, (0.162)1/2 (0.0555) we take a second look after 400 subjects. Click on the second row in the table in the upper section. Then click the icon to invoke the Test Statistic Calculator. Suppose that π̂cj = 36/200 = 0.18 and π̂tj = 38/200 = 0.19. Then, from (25.2) and (25.4), the estimates of δ and the standard error of δ̂ are 0.01 and 0.0388. Enter these 544 25.1 Equivalence Test – 25.1.2 Extension to Multiple Looks <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 values as shown below and click on the Recalc button. Click on the OK button and the following values are presented in the interim monitoring worksheet. The second repeated 95% confidence interval for η is (-6.159, 7.064) is not contained in the interval (-3.396, 3.396), where η02 = δ0 1/2 t2 σ̂2 = 0.075 = 3.396, (0.324)1/2 (0.0388) so we cannot conclude equivalence. Continue the study and we take a third look after 600 subjects. Click on the third row in the table in the upper section. Then click the icon to invoke the Test Statistic Calculator. Suppose that π̂cj = 51/300 = 0.17 and π̂tj = 60/300 = 0.2. Then, from (25.2) and (25.4), the estimates of δ and the standard error of δ̂ are 0.03 and 0.0317. Enter these values as 25.1 Equivalence Test – 25.1.2 Extension to Multiple Looks 545 <<< Contents 25 * Index >>> Binomial Equivalence Two-Sample shown below and click on the Recalc button. The following screen is shown. Click on the OK button and the following values are presented in the interim monitoring worksheet. The third repeated 95% confidence interval for η is (-2.965, 5.679) is not contained in the interval (-3.390, 3.390), where η03 = δ0 1/2 t3 σ̂3 = 0.075 = 3.390, (0.487)1/2 (0.0317) so we cannot conclude equivalence. Continue the study and we take a fourth look after 850 subjects. Click on the fourth row in the table in the upper section. Then click the icon to invoke the Test Statistic Calculator. Suppose that π̂cj = 91/450 = 0.2022 and π̂tj = 88/450 = 0.1956. Then, from (25.2) and (25.4), 546 25.1 Equivalence Test – 25.1.2 Extension to Multiple Looks <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 the estimates of δ and the standard estimate of δ are -0.007 and 0.027. Enter these values as shown below and click on the Recalc button. The following screen is shown. Click on the OK button and the following values are presented in the interim monitoring worksheet. The fourth confidence interval is (-3.302, 2.678) is entirely contained in the interval (-3.346, 3.346), where η04 = δ0 1/2 t4 σ̂4 = 0.075 = 3.346 (0.689)1/2 (0.027) and thus we conclude that the two treatments are equivalent. To express the results in terms of the δ, the final confidence interval for η can be transformed to a confidence interval for δ by multiplying the confidence limits by 1/2 t4 σ̂4 = (0.689)1/2 (0.027) = 0.0224, 25.1 Equivalence Test – 25.1.2 Extension to Multiple Looks 547 <<< Contents 25 * Index >>> Binomial Equivalence Two-Sample resulting in a confidence interval for δ of (-0.074, 0.060), which is entirely contained within the interval (-0.075, 0.075). 548 25.1 Equivalence Test <<< Contents * Index >>> 26 26.1 Chi-Square for Specified Proportions in C Categories 26.1.1 Trial Design Binomial Superiority n-Sample Let π0i and π1i for i = 1, 2, ..., C denote the response proportions under null and alternative hypotheses respectively where C denotes the number of categories. The null hypothesis states that the observed frequencies follow multinomial distribution with null proportions as probabilities. The test is performed for only two sided alternative. The sample size, or power, is determined for a specified value of the proportions which is consistent with the alternative hypothesis, denoted by π1i . Table 26.1: Table: Contingency Table Categoris\Response Age Group A Age Group B Age Group C Marginal Cured n11 n12 n13 n1. Not Cured n21 n22 n23 n2. The null hypothesis is H0 : πi = π0i , i = 1, 2, 3, ..., C and is tested against two-sided alternative. The test statistic is given as, χ2 = X (n1i − µi )2 i µi (26.1) where µi = n1 π0i Let χ20 be the observed value of χ2 . For large samples, χ2 has approximately Chi-squared distribution with d.f. C − 1. The p-value is approximated by P (χ2c−1 ≥ χ20 ), where χ2c−1 denotes a Chi-squared random variable with d.f. = C − 1. 26.1.1 Trial Design Consider the design of a single-arm trial with binary response - Cured and Not Cured. The responses for Cured population for three categories are of interest - Age group A, Age group B and Age group C. We wish to determine whether the proportion of cured in the three age groups are 0.25, 0.25, and 0.50 respectively. Thus it is desired to test H0 : πA = 0.25, πB = 0.25, πC = 0.50. We wish to design the trial with a two-sided 26.1 Chi-Square-C categories – 26.1.1 Trial Design 549 <<< Contents 26 * Index >>> Binomial Superiority n-Sample test that achieves 90% power at H1 : πA = 0.3, πB = 0.4, πC = 0.3 at level of significance 0.05. Start East. Click Design tab, then click Many Samples in the Discrete group, and then click Chi-Square Test of Specified Proportions in C Categories . In the upper pane of this window is the Input dialog box, which displays default input values. Enter the Number of Categories (C) as 3. Under Table of Proportion of Response, enter the values of proportions under Null Hypothesis and Alternative Hypothesis for each category except the last one such that the sum of values in a row equals to 1. Enter the inputs as shown below and click Compute. The design output will be displayed in the Output Preview, with the computed sample size highlighted in yellow. A total of 71 subjects must be enrolled in order to achieve 90% power under the alternative hypothesis. Besides sample size, one can also compute the power and the level of significance for this Chi-Square Test of Specified Proportions in C Categories study design. You can select this design by clicking anywhere on the row in the Output Preview. If you click icon, some of the design details will be displayed in the upper pane. In the Output Preview toolbar, click the icon, to save this design to workbook Wbk1 in the Library. If you hover the cursor over Des1 in the Library, a tooltip will appear 550 26.1 Chi-Square-C categories – 26.1.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 that summarizes the input parameters of the design. With Des1 selected in the Library, click icon on the Library toolbar, and then click Power vs. Sample Size. The resulting power curve for this design is shown. You can export the chart in one of several image formats (e.g., Bitmap or JPEG) by clicking Save As.... For now, you may close the chart before continuing. 26.1 Chi-Square-C categories – 26.1.1 Trial Design 551 <<< Contents 26 26.2 * Index >>> Binomial Superiority n-Sample Two-Group Chi-square for Proportions in C Categories Let π1j and π2j denote the response proportions of group 1 and group 2 respectively for the j-th category, where j = 1, 2, ..., C. The null hypothesis H0 : π1j = π2j ∀j = 1, 2, ..., C is tested against the alternative hypothesis that for at least one j, π1j differs from π2j . 26.2.1 Trial Design Table 26.2: Table: Contingency Table Categories \ Groups A B C Marginal 552 26.2 Two-Group Chi-square Test Group 1 n11 n12 n13 n10 Group 2 n21 n22 n23 n20 Marginal n01 n02 n03 n <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 The test statistic is given as, χ2 = where µij = noj nio ,j n X (nij − µij )2 µi j ij (26.2) = 1, 2, ..., C and i = 1, 2. Let χ20 be the observed value of χ2 . For large samples, χ2 has approximately Chi-squared distribution with d.f. C − 1. The p-value is approximated by P (χ2C−1 ≥ χ20 ), where χ2C−1 denotes a Chi-squared random variable with d.f. = C − 1. 26.2.1 Trial Design Suppose researchers want to investigate the relationship between different dose levels (level 1, level 2 and level 3) of a drug and the type adverse events (serious or not serious). The proportions who were treated with different dose levels will be compared using a Chi-square test. Suppose the expected proportions of patients for three different dose levels are 0.30, 0.35 and 0.35 where patients had no serious adverse events and the expected proportions are 0.20, 0.30 and 0.50 where patients had serious adverse events. We wish to design the trial with a two-sided test that achieves 90% power at level of significance 0.05. Start East. Click Design tab, then click Many Samples in the Discrete group, and then clickTwo-Group Chi-square for Proportions in C Categories. The Input dialog box, with default input values will appear in the upper pane. Enter the Number of Categories (C) as 3. Under Table of Proportion of Response, enter the values of proportions under Control and Treatment for each category except the last one such that the sum of values in a row equals to 1. Enter the inputs as shown below and click Compute. 26.2 Two-Group Chi-square Test – 26.2.1 Trial Design 553 <<< Contents 26 * Index >>> Binomial Superiority n-Sample The design output will be displayed in the Output Preview, with the computed sample size highlighted in yellow. A total of 503 subjects must be enrolled in order to achieve 90% power under the alternative hypothesis. Besides sample size, one can also compute the power and the level of significance for this Chi-Square Test of Specified Proportions in C Categories study design. You can select this design by clicking anywhere on the row in the Output Preview. If you click n icon, some of the design details will be displayed in the upper pane. icon to save this design to Wbk1 in the In the Output Preview toolbar, click Library. If you hover the cursor over Des1 in the Library, a tooltip will appear that summarizes the input parameters of the design. 554 26.2 Two-Group Chi-square Test – 26.2.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 With Des1 selected in the Library, click icon on the Library toolbar, and then click Power vs. Sample Size. The resulting power curve for this design is shown. You can export the chart in one of several image formats (e.g., Bitmap or JPEG) by clicking Save As.... For now, you may close the chart before continuing. 26.2 Two-Group Chi-square Test – 26.2.1 Trial Design 555 <<< Contents 26 26.3 * Index >>> Binomial Superiority n-Sample Nonparametric: Wilcoxon Rank Sum for Ordered Categorical Data 26.3.1 Trial Design 556 When we compare two treatments with respect to signs and symptoms associated with a disease, we may base the comparison on a variable that assesses degree of response or the degree of severity, using an ordinal categorical variable. For example, investigators may report the severity of an adverse event, or other abnormality, using a specified grading system or using a simple scale, such as”none”, ”mild”, moderate”, or ”severe”. The latter rating scale might be used in an analgesia study to report the severity of pain. Although this four-point scale is often used and intuitively appealing, additional categories, such as ”very mild” and ”very severe”, may be added. In other situations, the efficacy of the treatment is best assessed by the subject reporting response to therapy using a similar scale. The Wilcoxon test for ordered categories is a nonparametric test for use in such situations. East provides the power for a specified sample size for a single-look design using the constant proportional odds ratio model. Let πcj and πtj denote the probabilities for category j, j = 1, 2, ..., J for the control c Pi Pi and the treatment t respectively. Let γci = j=1 πcj and γti = j=1 πtj . We assume that γci ψ γti 1−γci = e 1−γti , i = 1, 2, .., J − 1, 26.3 NPAR:Wilcoxon Rank Sum Test <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 or, equivalently, ψ = ln(γci ) − ln(1 − γci ) − (ln(γti ) − ln(1 − γti )) (26.3) We compare the two distributions by focusing on the parameter ψ. Thus we test the null hypothesis H0 : ψ = 0 against the two-sided alternative H1 : ψ 6= 0 or a one-sided alternative hypothesis H1 : ψ > 0. East requires the specified value of ψ to be positive. Technical details can be found in Rabbee et al.,2003. 26.3.1 Trial Design We consider here a placebo-controlled parallel-group study where subjects report the response to treatment as ”none”, ”slight” ”considerable”, or ”total”. We expect that most of the subjects in the placebo group will report no response. Start East. Click Design tab, then click Many Samples in the Discrete group, and then clickNon Parametric: Wilcoxon Rank Sum for Ordered Categorical Data. The Input dialog box, with default input values will appear in the upper pane. We want to determine the power, using a two-sided test with a type-1 error rate of 0.05, with a total of 100 subjects, and equal sample sizes for the two groups. Enter Number of Categories as 4. We will use User Specified for Specify Pop 1 Probabilities and Proportional Odd Model for Pop2 Probabilities here. Click Proportional Odds Model radio button. A new field for Shift will appear. Enter 1.5 in this field. Based on the results of a pilot study, the values of 0.55, 0.3, 0.1, and 0.05 are used as Pop 1 probabilities. Enter the inputs as shown below and click Compute. 26.3 NPAR:Wilcoxon Rank Sum Test – 26.3.1 Trial Design 557 <<< Contents 26 * Index >>> Binomial Superiority n-Sample The design output will be displayed in the Output Preview, with the computed power highlighted in yellow. This design results in a power of approximately 98% for a total sample size of 100 subjects. You can select this design by clicking anywhere on the row in the Output Preview. If you click icon, some of the design details will be displayed in the upper pane. In the Output Preview toolbar, click icon, to save this design to workbook Wbk1 in the Library. If you hover the cursor over Des1 in the Library, a tooltip will appear that summarizes the input parameters of the design. 558 26.3 NPAR:Wilcoxon Rank Sum Test – 26.3.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 With Des1 selected in the Library, click icon, on the Library toolbar, and then click Power vs. Sample Size. The resulting power curve for this design is shown. You can export the chart in one of several image formats (e.g., Bitmap or JPEG) by clicking Save As.... For now, you may close the chart before continuing. With such high power, a total sample size of 100 subjects may be an inefficient use of resources. We are willing to use a smaller sample size to achieve a lower power. Change the maximum sample size to 50 in the previous design. Leave all other values as defaults, and click Compute. This design results in approximately 80% power using a total sample size of 50 subjects. 26.3 NPAR:Wilcoxon Rank Sum Test – 26.3.1 Trial Design 559 <<< Contents 26 26.4 * Index >>> Binomial Superiority n-Sample Trend in R Ordered Binomial Proportions 26.4.1 Trial Design In some experimental situations, there are several binomial distributions indexed by an ordinal variable and we want to examine changes in the probabilities of success as the levels of the indexing variable changes. Examples of this include the examination of a dose-related presence of a response or a particular side effect, dose-related tumorgenicity, or presence of fetal malformations relative to levels of maternal exposure to a particular toxin, such as alcohol, tobacco, or environmental factors. The test for trend in R ordered proportions is based on the Cochran Armitage trend test. Let πj denote the probability of interest for the j-th category of the ordinal variable, j = 1, 2, ..., R and let scores be denoted by ω1 , ω2 , ...ωR . It is assumed that the odds ratio relating to j-th category to the (j − 1)-th category satisfies πj πj−1 = ψ ωj −ωj−1 1 − πj 1 − πj−1 (26.4) or equivalently, ln( πj−1 πj ) = (ωj − ωj−1 ) ln(ψ) + ln( ) 1 − πj 1 − πj−1 (26.5) This assumption can also be equivalently expressed as a relationship between the odds ratio for the j -th category to that of the first category; namely, πj π1 = ψ ωj −ω1 1 − πj 1 − π1 (26.6) or equivalently, ln( 560 πj π1 ) = (ωj − ω1 ) ln(ψ) + ln( ) 1 − πj 1 − π1 26.4 Trend in R Ordered Binomial Proportions (26.7) <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 It is assumed that π1 < ... < πR with ψ > 1 or π1 > ... > πR with ψ < 1. We want to test the null hypothesis H0 : ψ = 1 against the two sided alternative H1 : ψ 6= 1 or against a one-sided alternative H1 : ψ > 1 or H1 : ψ < 1. The sample size required to achieve a specified power or the power for a specified sample size is determined for a single-look design with the specified parameters. The sample size calculation is conducted using the methodology presented below, which is similar to that described in Nam, 1987. Let nj = rj N denote the sample size for the j-th category where rj is the j-th sample fraction and N is the total sample size. The determination of the sample size required to control the power of the test of H0 is based on W = R X rj (ωj − ω̄)πˆj (26.8) j=1 with ω̄ = PR j=1 rj ωj The expected value of W is R X rj (ωj − ω̄)πj (26.9) rj (ωj − ω̄)2 πj (1 − πj ) (26.10) E(W ) = j=1 and the variance of W is V (W ) = R X j=1 The expected value of W under H0 is E0 (W ) = π R X rj (ωj − ω̄) (26.11) j=1 and the variance of W under H0 is V0 (W ) = π(1 − π) R X rj (ωj − ω̄)2 (26.12) j=1 26.4 Trend in R Ordered Binomial Proportions 561 <<< Contents 26 * Index >>> Binomial Superiority n-Sample Where, π= R X rj πj (26.13) j=1 The test statistic used to determine the sample size is Z= W − E0 (W ) (26.14) 1 V0 (W ) 2 The total sample size required for a two-sided test with type-1 error rate of α to have power 1 − β when ψ = ψ1 is 1 N= 1 [zα/2 V0 (W ) 2 + zβ V (W ) 2 ]2 E(W )2 (26.15) The total sample size required for a one-sided test with type-1 error rate of α to have power 1 − β when ψ = ψ1 is determined from (1.11) with α/2 replaced by α. 26.4.1 Trial Design Consider the problem of comparing three durations of therapy for a specific disorder. We want to have sufficiently large power when 10% of subjects with shorter duration, 25% of subjects with intermediate duration and 50% of subjects with extensive duration will respond by the end of therapy. These parameters result in an odds ratio of ψ = 3 or equivalently ln(ψ) = 1.1 . We would like to determine the sample size to achieve 90% power when ln(ψ) = 1.1 based on a two-sided test at significance level 0.05. Start East. Click Design tab, then click Many Samples in the Discrete group, and then click Trend in R Ordered Binomial Proportions. The Input dialog box, with default input values will appear in the upper pane. Response probabilities can be specified in one of the two ways, selected from Response Probabilities: (1) User Specified Probabilities or (2) Model Based Probabilities. User can specify probabilities for each population if he or she chooses User Specified Probabilities whereas Model Based Probabilities are based on logit 562 26.4 Trend in R Ordered Binomial Proportions – 26.4.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 transformation. We will use Model Based Probabilities here. Under Response Probabilities, click Model Based Probabilities radio button. A new field for log of Common odds Ratio will appear. Enter 1.1 in this field. Enter 0.1 in Prop. of Response field. One can specify the Scores (W(i)) also in monotonically increasing order. We will use Equally Spaced here. Enter the inputs as shown below and click Compute. The design output will be displayed in the Output Preview, with the computed sample size highlighted in yellow. A total of 75 subjects must be enrolled in order to achieve 90% power under the alternative hypothesis. Besides sample size, one can also compute the power and the level of significance for this design. You can select this design by clicking anywhere on the row in the Output Preview. If you click on icon, some of the design details will be displayed in the upper pane. icon, to save this design to Wbk1 in the In the Output Preview toolbar, click Library. If you hover the cursor over Des1 in the Library, a tooltip will appear that summarizes the input parameters of the design. 26.4 Trend in R Ordered Binomial Proportions – 26.4.1 Trial Design 563 <<< Contents 26 * Index >>> Binomial Superiority n-Sample With Des1 selected in the Library, click icon on the Library toolbar, and then click Power vs. Sample Size. The resulting power curve for this design is shown. You can export the chart in one of several image formats (e.g., Bitmap or JPEG) by clicking Save As.... For now, you may close the chart before continuing. 564 26.4 Trend in R Ordered Binomial Proportions – 26.4.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 The default specification of equally spaced scores is useful when the categories are ordinal, but not numerical. If the categories are numerical, such as doses of a therapy, then the numerical value will be more appropriate. Consider three doses of 10, 20, and 30. One must exhibit care in specification of log(ψ) when the differences between scores for adjacent categories are equal, but this common difference is not equal to one. Although the differences are equal, user defined scores must be used. If the common difference is equal to a positive value A, then equating log(ψ) to 1/A of that for the default of equally spaced scores, with a common difference of one, will provide identical results. With three doses of (Scores W(i)) of 10, 20, and 30 and and log of Common odds Ratio = 0.11, the results are the same as those shown above. This is shown in the following screenshot. 26.4 Trend in R Ordered Binomial Proportions – 26.4.1 Trial Design 565 <<< Contents 26 * Index >>> Binomial Superiority n-Sample The design output will be displayed in the Output Preview, with the computed sample size highlighted in yellow. A total of 75 subjects must be enrolled in order to achieve 90% power under the alternative hypothesis when log(ψ) = .11 and π1 = 0.1. Similarly, if the differences between scores for adjacent categories are not equal, user defined scores must be used. Consider three doses of 10, 20, and 50, with log of Common odds Ratio= 0.11. Change the scores (Scores W(i)) to 10, 20, and 50 in the previous design. This is shown in the following screenshot. The design output will be displayed in the Output Preview, with the computed sample size highlighted in yellow. A total of 16 subjects must be enrolled in order to achieve 90% power under the alternative hypothesis when log(ψ) = .11 and π1 = 0.1. Although, a small sample size is usually desirable, here it may be due to a value of π3 (= 0.90) which may be too large to be meaningful. Then the power should be controlled at a smaller value of log(ψ). Consider log(ψ) = 0.07. Change the log of Common odds Ratio value to 0.07 . This is shown in the following screenshot. 566 26.4 Trend in R Ordered Binomial Proportions – 26.4.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 The design output will be displayed in the Output Preview, with the computed sample size highlighted in yellow. A total of 37 subjects must be enrolled in order to achieve 90% power under the alternative hypothesis when log(ψ) = .07 and π1 = 0.1. The trend test is particularly useful in situations where there are several categories. Consider now an example of a dose-ranging study to examine the safety of a therapy, with respect to the occurrence of a specified adverse event (AE), such as a dose-limiting toxicity (DLT). Six doses (1, 2, 4, 8, 12, 16) have been selected. It is expected that approximately 5% on the lowest dose will experience the AE. The study is to be designed to have power of 90% if approximately 20% on the highest dose experience the AE. This suggests that the study should be designed with log(ψ) approximately (log(0.20) − log(0.05))/15 = 0.092. Enter log of Common odds Ratio as 0.1 , Prop. Of Response as 0.05 and Number of Populations as 6. Enter the Scores W(i) as 1, 2, 4, 8, 12, and 16. Leave all other values as defaults, and click Compute. 26.4 Trend in R Ordered Binomial Proportions – 26.4.1 Trial Design 567 <<< Contents 26 * Index >>> Binomial Superiority n-Sample The design output will be displayed in the Output Preview, with the computed sample size highlighted in yellow. A total of 405 subjects must be enrolled in order to achieve 90% power under the alternative hypothesis when log(ψ) = .1 and π1 = 0.05. This sample size may not be economically feasible, so we instead select the sample size to achieve a power of 80%. Selecting Power(1-β) as 0.8 yields the result shown in the following screen shot. This design requires a combined total of 298 subjects from all groups to attain 80% power when log(ψ) = 0.1 and π1 = 0.05. 568 26.4 Trend in R Ordered Binomial Proportions – 26.4.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 26.5 Chi-Square for R Unordered Binomial Proportions 26.5.1 Trial Design Let πij denote proportions of response in i-th group and j-th category with i = 1.2, ...., R and j = 1, 2 where R denotes the number of groups. The null hypothesis of equality of proportions in all groups for every category is tested against the alternative that at least one proportion is different across all groups for any category. The null hypothesis is defined as, H0 : πi1 = π0 ∀i The alternative is defined as, H1 : πi1 6= π0 for any i = 1, 2, ..., R Table 26.3: Table: R × 2 Contingency Table Rows Row 1 Row 2 · · Row R Col Total Col 1 n11 n21 · · nR1 n1 Col 2 n12 n22 · · nR2 n2 Row Total m1 m2 · · mR N The test statistic is given as, 2 χ = R X 2 X (nij − i=1 j=1 mi nj 2 N ) m i nj N (26.16) Let χ20 be the observed value of χ2 . For large samples, χ2 has approximately Chi-squared distribution with d.f. R − 1. The p-value is approximated by P (χ2R−1 ≥ χ20 ), where χ2R−1 denotes a Chi-squared random variable with d.f. = R − 1. 26.5.1 Trial Design Consider a 3-arm trial with treatments A, B and C. The response is the reduction in 26.5 Chi-Square for R Unordered Binomial Proportions – 26.5.1 Trial Design 569 <<< Contents 26 * Index >>> Binomial Superiority n-Sample blood pressure (BP). From historical data it is known that the response rates of treatment A, B and C are 37.5%, 59% and 40% respectively. That is, out of 40 individuals under treatment A, 15 had a reduction in BP, out of 68 individuals under treatment B, 40 had a reduction in BP and out of 30 individuals under treatment C, 12 had a reduction in BP. Based on these data we can fill the entries in the table of proportions. Table 26.4: Table: Proportion of Response Groups\Categories: Treatment A Treatment B Treatment C Reduction in BP 0.375 0.59 0.4 No Reduction 0.625 0.41 0.6 Marginal 1 1 1 This can be posed as a two-sided testing problem for testing H0 : πA = πB = πC (= π0 , say) against H1 : πi 6= π0 (for at least any i = A, B, C) at 0.05 level. We wish to determine the sample size to have 90% power for the values displayed in the above table. Start East. Click Design tab, then click Many Samples in the Discrete group, and then clickChi-Square Test for Unordered Binomial Proportions. The Input dialog box, with default input values will appear in the upper pane. Enter the values of Response Proportion in each group and Alloc.Ratio ri = ni /n1 where Alloc.Ratio ri = ni /n1 is the corresponding weights relative to the first group . Enter the inputs as shown below and click Compute. The design output will be displayed in the Output Preview, with the computed sample size highlighted in yellow. A total of 301 subjects must be enrolled in order to achieve 570 26.5 Chi-Square for R Unordered Binomial Proportions – 26.5.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 90% power under the alternative hypothesis. Besides sample size, one can also compute the power and the level of significance for this Chi-Square test for R × 2 Table study design. You can select this design by clicking anywhere on the row in the Output Preview. If you click icon, some of the design details will be displayed in the upper pane. In the Output Preview toolbar, click icon to save this design to Wbk1 in the Library. If you hover the cursor over Des1 in the Library, a tooltip will appear that summarizes the input parameters of the design. With Des1 selected in the Library, click icon on the Library toolbar, and then 26.5 Chi-Square for R Unordered Binomial Proportions – 26.5.1 Trial Design 571 <<< Contents 26 * Index >>> Binomial Superiority n-Sample click Power vs Sample Size. The resulting power curve for this design is shown. You can export the chart in one of several image formats (e.g., Bitmap or JPEG) by clicking Save As.... For now, you may close the chart before continuing 26.6 Chi-Square for R Unordered Multinomial Proportions Let πij denote the response proportion in i-th group and j-th category. The null hypothesis H0 : π1j = π2j = .... = πRj ∀j = 1, 2...C is tested against the alternative hypothesis that for at least one category, the response proportions in all groups are not same. The test statistic is given as, χ2 = R X C X (nij − i=1 j=1 mi nj 2 N ) m i nj N Let χ20 be the observed value of χ2 . For large samples, χ2 has approximately 572 26.6 Chi-square Test-RxC Table (26.17) <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Table 26.5: Table: Contingency Table Rows Row 1 Row 2 · · Row R Col Total Col 1 n11 n21 · · nR1 n1 Col 2 n12 n22 · · nR2 n2 · · · · · · · · · · · · · · Col C n1C n2C · · nRC nC Row Total m1 m2 · · mR mN Chi-squared distribution with d.f. (R − 1)(C − 1). The p-value is approximated by P (χ2(R−1)(C−1) ≥ χ20 ), where χ2(R−1)(C−1) denotes a Chi-squared random variable with d.f. = (R − 1)(C − 1). 26.6.1 Trial Design Consider a 3-arm oncology trial with treatments A, B and C. The responses in 4 categories - CR (complete response), PR (partial response), SD (stable disease) and PD (disease progression) are of interest. We wish to determine whether the response proportion in each of the 4 categories is same for the three treatments. From historical data we get the following proportions for each category for the three treatments. Out of 100 patients, 30 were treated with treatment A, 35 were treated with treatment B and 35 were treated with treatment C. The response proportion information for each treatment is given below. Assuming equal allocation in each treatment arm, we wish to design a two-sided test which achieves 90% power at significance level 0.05. Table 26.6: Table: Contingency Table Categories \ Treatment CR PR SD PD Marginal Treatment A 0.019 0.001 0.328 0.652 1 Treatment B 0.158 0.145 0.154 0.543 1 Treatment C 0.128 0.006 0.003 0.863 1 Start East. Click Design tab, then click Many Samples in the Discrete group, and then clickChi-Square R Unordered Multinomial Proportions. 26.6 Chi-square Test-RxC Table – 26.6.1 Trial Design 573 <<< Contents 26 * Index >>> Binomial Superiority n-Sample The Input dialog box with default input values will appear in the upper pane of this window. Enter Number of Categories (C) as 4. Enter the values of Proportion of Response and ri = ni /n1 where ri = ni /n1 is the corresponding weights relative to the first group. Enter the inputs as shown below and click Compute. The design output will be displayed in the Output Preview, with the computed sample size highlighted in yellow. A total of 69 subjects must be enrolled in order to achieve 90% power under the alternative hypothesis. Besides sample size, one can also compute the power and the level of significance for this Chi-Square Test of Comparing Proportions in R by C Table study design. You can select this design by clicking anywhere on the row in the Output Preview. If you click icon, some of the design details will be displayed in the upper pane. In the Output Preview toolbar, click icon, to save this design to Wbk1 in the Library. If you hover the cursor over Des1 in the Library, a tooltip will appear that summarizes the input parameters of the design. 574 26.6 Chi-square Test-RxC Table – 26.6.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 With Des1 selected in the Library, click icon on the Library toolbar, and then click Power vs Sample Size. The resulting power curve for this design is shown. You can export the chart in one of several image formats (e.g., Bitmap or JPEG) by clicking Save As.... For now, you may close the chart before continuing 26.6 Chi-square Test-RxC Table 575 <<< Contents 26 576 * Index >>> Binomial Superiority n-Sample 26.6 Chi-square Test-RxC Table <<< Contents * Index >>> 27 Multiple Comparison Procedures for Discrete Data Sometime it might be the case that multiple treatment arms are compared with a placebo or control arm in one single trial on the basis of a primary endpoint that is binary. These objectives are formulated into a family of hypotheses. Formal statistical hypothesis tests can be performed to see if there is strong evidence to support clinical claims. Type I error is inflated when one considers the inferences together as a family. Failure to compensate for multiplicities can have adverse consequences. For example, a drug could be approved when actually it is not better than placebo. Multiple comparison (MC) procedures provides a guard against inflation of type I error due to multiple testing. The probability of making at least one type I error is known as family wise error rate (FWER). East supports following MC procedures based on binary endpoint. Procedure Bonferroni Sidak Weighted Bonferroni Holm’s Step Down Hochberg’s Step Up Hommel’s Step Up Fixed Sequence Fallback Reference Bonferroni CE (1935, 1936) Sidak Z (1967) Benjamini Y and Hochberg Y ( 1997) Holm S (1979) Hochberg Y (1988) Hommel G (1988) Westfall PH and Krishen A (2001) Wiens B, Dmitrienko A (2005) In this chapter we explain how to design a study using a MC procedure. In East, one can calculate the power from the simulated data under different MC procedures. With the information on power, one can choose the right MC procedure that provides maximum power yet strongly maintains the FWER. MC procedures included in East strongly control FWER. Strong control of FWER refers to preserving the probability of incorrectly claiming at least one null hypothesis. To contrast strong control with weak control of FWER, the latter controls the FWER under the assumption that all hypotheses are true. 27.1 Bonferroni Procedure 27.1.1 Example: HIV Study Bonferroni procedure is described below with an example. Assume that there are k arms including the control where the treatments arms will be compared with placebo on the basis of a binary response variable X. Let ni be the 27.1 Bonferroni Procedure 577 <<< Contents 27 * Index >>> Multiple Comparison Procedures for Discrete Data Pk−1 number of subjects for i-th treatment arm (i = 0, 2, · · · , k − 1). Let N = i=0 ni be the total sample size and the arm 0 refers to control. Also, assume πi be the response probabilities in i-th arm. We are interested in the following hypotheses: For the right tailed test: Hi : πi − π0 ≤ 0 vs Ki : πi − π0 > 0 For the left tailed test: Hi : πi − π0 ≥ 0 vs Ki : πi − π0 < 0 For the global null hypothesis at least one of the Hi is rejected in favor of Ki after controlling for FWER. Here Hi and Ki refer to the null and alternative hypotheses, respectively, for comparison of i-th arm with the control arm. Let π̂i be the sample proportion for treatment arm i and π̂0 be the sample proportion for the control arm. For unpooled variance case, the test statistic to compare i-th arm with control (i.e., Hi vs Ki ) is defined as Ti = q π̂i − π̂0 1 ni π̂i (1 − π̂i ) + 1 n0 π̂0 (1 (i = 0, 2, · · · , k − 1) (27.1) − π̂0 ) For the pooled variance case, one need to replace π̂i and π̂0 by the pooled sample proportion π̂. Pooled sample proportion π̂ is defined as π̂ = ni π̂i + n0 π̂0 ni + n0 (i = 0, 2, · · · , k − 1) (27.2) Let ti be the observed value of Ti and these observed values for K − 1 treatment arms can be ordered as t(1) ≥ t(2) ≥ · · · ≥ t(k−1) . For the right tailed test the marginal p-value for comparing the i-th arm with placebo is calculated as pi =P (Z > ti )=Φ(−ti ) and for left tailed test pi =P (Z < ti )=Φ(ti ), where Z is distributed as standard normal and Φ(·) is the the cumulative distribution function of a standard normal variable. Let p(1) ≤ p(2) ≤ · · · ≤ p(k−1) be the ordered p-values. East supports three single step MC procedures for comparing proportions- Bonferroni procedure, Sidak procedure and weighted Bonferroni procedure. For the Bonferroni α and the adjusted p-value is given as procedure, Hi is rejected if pi < k−1 min(1, (k − 1)pi ). 27.1.1 Example: HIV Study This is a randomized, double-blind, parallel-group, placebo-controlled, multi-center study to assess the efficacy and safety of 125mg, 250 mg, and 500 mg orally twice daily of a new drug for a treatment of HIV associated diarrhea. The primary efficacy endpoint is clinical response, defined as two or less watery bowel movements per 578 27.1 Bonferroni Procedure – 27.1.1 Example: HIV Study <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 week, during at least two of the four weeks of the 4-week efficacy assessment period. The efficacy will be evaluated by comparing the proportion of responders in the placebo group to the proportion of responders in the three treatment groups at a one-sided alpha of 0.025. The estimated response rate in placebo group is 35%. The response rates in the treatment groups are expected to be 40% for 125mg, 45% for 250mg and 55% for 500 mg. Dose (mg) Placebo 125 250 500 Estimated proportion 0.35 0.40 0.45 0.55 With the above underlying scenario, we would like to calculate the power for a total sample size of 500. This will be a balanced study with a one-sided 0.025 significance level to detect at least one dose with significant difference from placebo. We will show how to simulate the power of such a study using the multiple comparison procedures listed above. Designing the Study Start East. Click Design tab, then click Many Samples in the Discrete group, and then click Single Look under Multiple Pairwise Comparisons to Control - Differences of Proportions. This will launch a new window which asks the user to specify the values of a few design parameters including the number of arms, overall type I error, total sample size and multiple comparison procedure. For our example, we have 3 treatment groups plus a placebo. So enter 4 for Number of Arms. Under the Test Parameters tab, there are several fields which we will fill in. First, there is a box with the label Test Type. Here you need to specify whether you want a one-sided or two-sided test. Currently, only one-sided tests are available. The next dropdown box has the label Rejection Region. If left tail is selected, the critical value for the test is located in the left tail of the distribution of the test statistic. Likewise, if right tail is selected the critical value for the test is located in the right tail of the distribution of the test statistic. For our example, we will select Right Tail. Under that, there is a box with the label Type 1 Error (α). This is where you need to specify the FWER. For our example, enter 0.025. Now go to the box with the label Sample Size (n). Here we input the total number of subjects, including those in the placebo arm. For this example, enter 500. To the right, there will be a heading with the title Multiple Comparison Procedures. 27.1 Bonferroni Procedure – 27.1.1 Example: HIV Study 579 <<< Contents 27 * Index >>> Multiple Comparison Procedures for Discrete Data Check the box next to Bonferroni, as this is the multiple comparison procedure we are illustrating in this subsection. After entering these parameters your screen should now look like this: Now click on Response Generation tab. You will see a table titled Table of Proportions. In this table we can specify the labels for treatment arms. Also you have to specify the dose level if you want to generate proportions through dose-response curve. There are two fields in this tab above the table. The first one is labeled as Variance and this has drop down list with two options - Pooled and Unpooled. Here you have to select whether you are considering pooled variance or unpooled variance for the calculation of test statistics for each test. For this example, select Unpooled for Variance. Next to the Variance there is check box labeled Generate Proportions Through DR Curve. If you want to generate response rate for each arm according to dose-response 580 27.1 Bonferroni Procedure – 27.1.1 Example: HIV Study <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 curve, you need to check this box. Check the box Generate Proportions Through DR Curve. Once you check this box you will notice two things. First, an additional column with label Dose will appear in the table. Here you need to enter the dose levels for each arm. For this example, enter 0, 125, 250 and 500 for Placebo, Dose1, Dose2 and Dose3 arms, respectively. Secondly, you will notice an additional section will appear to the right which provides the option to generate the response rate from four families of parametric curves which are Four Parameter Logistic, Emax, Linear and Quadratic. The technical details about each curve can be found in the Appendix H. Here you need to choose the appropriate parametric curve from the drop-down list under Dose Response Curve and then you have to specify the parameters associated with these curves. Suppose the response rate follows the following four parameter logistic curve: δ E(π|D) = β + (27.3) 1 + exp( θ−D τ ) where D indicates dose. The parameter for the logistic dose-response curve should be chosen with care. We want to parameterize the above logistic model such that the proportions from logistic model agrees as close as possible to the estimated proportions stated at the beginning of the example. We will consider a situation where the response rate at dose 0 is very close to the parameter β. In other words, β indicates the placebo effect. For this to hold, 1+exp(δ θ−D ) should be very close to 0 at D = 0. τ For now, assume that it holds and we will return to this later. We have assumed 35% response rate in placebo arm. Therefore, we specify β as 0.35. The parameter β + δ indicates the maximum response rate. Since the response rate cannot exceed 1, δ should be chosen such a way that β + δ ≤ 1. The situation where the 100% response rate can never be achieved, δ would be even less. For this example, the response rate for the highest dose of 550 mg is 55%. Therefore, we assume that maximum response rate with the new drug could be achieved as only 60%. Therefore, we specify the δ as 0.60 - 0.35 or 0.25. The parameter θ indicates the median dose that can produce 50% of maximum improvement in response rate or a response that is equal to β + 2δ . With β = 0.35 and δ = 0.25, β + 2δ is 0.475. Note that we have assumed the dose 250 mg can provide response rate of 45%. Therefore, we assume θ as 300. τ need to be selected in such a way that 1+exp(δ θ−D ) should be very close to 0 at D = 0. We can τ assure this condition by choosing any small value of τ . However, a very small τ is an indicator of sharp improvement in response rate around the median dose and negligible improvement for almost other doses. In the HIV example, the estimated response rates indicate improvement in all the dose levels. With τ as 75, 1+exp(δ θ−D ) is 0.0045 and τ the proportions from the logistic regression are close to the estimated proportions for the chosen doses. Therefore, β = 0.35, δ = 0.25, θ = 300 and τ = 75 seems to be a reasonable for our example. 27.1 Bonferroni Procedure – 27.1.1 Example: HIV Study 581 <<< Contents 27 * Index >>> Multiple Comparison Procedures for Discrete Data Select Four Parameter Logistic from drop-down list of Dose Response Curve. To the right of this dropdown box, Now we need to specify the 4 parameter values in the Parameters box. Enter 0.35 for β, 0.25 for δ, 250 for θ and 75 for τ . You can verify that the values in Response Rate column is changed to 0.359, 0.39, 0.475 and 0.591 for the four arms, respectively. These proportions are very close to the estimated proportions stated at the beginning of the example. Now click Plot DR Curve located below the parameters to see the dose-response curve. You will see the logistic dose response curve that intersects the Y-axis at 0.359. Close this plot. Since the response rates from logistic regression is close but not exactly 582 27.1 Bonferroni Procedure – 27.1.1 Example: HIV Study <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 similar to the estimated proportions stated at the beginning of the example. Therefore, we will specify directly the estimated response rates in the Table of Proportions. In order to do this first uncheck Generate Proportions Through DR Curve. You will notice two things. First, the column with label Dose will disappear in the table. Second, the section in right will disappear as well. Now enter the estimated proportions in the Response Rate column. Enter 0.35, 0.40, 0.45 and 0.55 in this column. Now the Response Generation tab should appear as below. Click on the Include Options button located in the right-upper corner in the Simulation window and check Randomized. This will add Randomization tab. Now click on the Randomization tab. Second column of the Table of Allocation table displays the allocation ratio of each treatment arm to that of control arm. The cell for the control arm is always one and is not editable. Only those cells for treatment arms other than control need to be filled in. The default value for each treatment arm is one which represents a balanced design. For the HIV study example, we consider a balanced design and leave the default values for the allocation ratios unchanged. Your 27.1 Bonferroni Procedure – 27.1.1 Example: HIV Study 583 <<< Contents 27 * Index >>> Multiple Comparison Procedures for Discrete Data screen should now look like this: The last tab is Simulation Control. Specify 10000 as Number of Simulations and 1000 as Refresh Frequency in this tab. The box labeled Random Number Seed is where you can set the seed for the random number generator. You can either use the clock as the seed or choose a fixed seed (in order to replicate past simulations). The default is the clock and we will use that. The box besides that is labeled Output Options. This is where you can choose to save summary statistics for each simulation run and/or to save the subject level data for a specific number of simulation runs. To save the output for each simulation, check the box with label Save summary statistics for every simulation run. Click Simulate to start the simulation. Once the simulation run has completed, East will add an additional row to the Output Preview labeled as Sim1. 584 27.1 Bonferroni Procedure – 27.1.1 Example: HIV Study <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Select Sim1 in the Output Preview and click icon. Now double-click on Sim1 in the Library. The simulation output details will be displayed in the right pane. The first section in the output is the Hypothesis section. In our situation, we are testing 3 hypotheses. We are comparing the estimated response rate of each dose group with that of placebo. That is, we are testing the 3 hypotheses: H1 :π1 = π0 vs K1 :π1 > π0 H2 :π2 = π0 vs K2 :π2 > π0 H3 :π3 = π0 vs K3 :π3 > π0 Here, π0 , π1 , π2 and π3 represent the population response rate for the placebo, 125 mg, 250 mg and 500 mg dose groups, respectively. Also, Hi and Ki are the null and alternative hypotheses, respectively, for the i-th test. The Input Parameters section provides the design parameters that we specified earlier. The next section Overall Power gives us estimated power based on the simulation. The second line gives us the global power, which is 0.807. Global power indicates the power to reject global null H0 :µ1 = µ2 = µ3 = µ0 . Thus, the global power of 0.807 indicates that 80.7% of times the global null will be rejected. In other words, at least one of the H1 , H2 and H3 is rejected in 81.2% of the occasions. Global 27.1 Bonferroni Procedure – 27.1.1 Example: HIV Study 585 <<< Contents 27 * Index >>> Multiple Comparison Procedures for Discrete Data power is useful to show the existence of dose-response relationship and the dose-response may be claimed if any of the doses in the study is significantly different from placebo. The next line displays the conjunctive power. Conjunctive power indicates the proportion of cases in the simulation where all the Hi ’s, which are truly false, were rejected. In this example, all the Hi ’s are false. Therefore, for this example, conjunctive power is the proportion of cases where all of the H1 , H2 and H3 were rejected. For this simulation conjunctive power is only 0.035 which means that only in 3.5% of time, all of the H1 , H2 and H3 were rejected. Disjunctive power indicates the proportion of rejecting at least one of those Hi ’s where Hi is truly false. The main distinction between global and distinctive power is that the former finds any rejection whereas the latter looks for rejection only among those Hi ’s which are false. Since here all of the H1 , H2 and H3 are false, therefore, global and disjunctive power ought to be the same. The next section gives us the marginal power for each hypothesis. Marginal power finds the proportion of times when a particular hypothesis is rejected. Based on simulation results, H1 is rejected about 6% of times, H2 is rejected about 22% of times and H3 is rejected about 80% of times. Recall that we have asked East to save the simulation results for each simulation run—. Open this file by clicking on SummaryStat in the library and you will see that it contains 10,000 rows - each rows represents results for a single simulation. Find the 3 columns with labels Rej Flag 1, Rej Flag 2 and Rej Flag 3, respectively. These columns represents the rejection status for H1 , H2 and H3 , respectively. A value of 1 is indicator of rejection on that particular simulation, otherwise the null is not rejected. Now the proportion of 1’s in Rej Flag 1 indicates the marginal power to reject H1 . Similarly we can find out the marginal power for H2 and H3 from Rej Flag 2 and Rej Flag 3, respectively. To obtain the global and disjunctive power, count the total number of cases where at least one of the H1 , H2 and H3 have been rejected and then divide by the total number of simulations of 10,000. Similarly, to obtain the conjunctive power count the total number of cases where all of the H1 , H2 and H3 have been rejected and then divide by the total number of simulations of 10,000. Next we will consider an example to show how global and disjunctive power are different from each other. Select Sim 1 in Library and click . Now go to the the Response Generation tab and enter 0.35, 0.35, 0.38 and 0.42 in the 4 cells in second 586 27.1 Bonferroni Procedure – 27.1.1 Example: HIV Study <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 column labeled as Response Rate. Here we are generating response for placebo from distribution Bin(125, 0.35), for Dose1 from distribution Bin(125, 0.35), for Dose2 from distribution Bin(125, 0.38) and for Dose3 from distribution Bin(125, 0.42). Click Simulate to start the simulation. Once the simulation run has completed, East will add an additional row to the Output Preview labeled as Sim 2. For Sim 2, the global power and disjunctive power are close to 12%. To understand why, click on SummaryStat in the library for Sim 2. The total number of cases where at least one of H1 , H2 and H3 are rejected is about 1270 and dividing this by total number of simulation 10,000 gives the global power of 12.7%. Again, the total number of cases where at least one of H2 and H3 are rejected is close to1230 and dividing this by total number of simulation 10,000 gives the disjunctive power of 12.3%. The exact result of the simulations may differ slightly, depending on the seed. Now, delete the Sim 2 from the Output Preview because we have modified the design in HIV example to explain the difference between global power and disjunctive power. In order to do this, select row corresponding to Sim 2 in Output Preview and click in the toolbar. 27.1 Bonferroni Procedure – 27.1.1 Example: HIV Study 587 <<< Contents 27 27.2 * Index >>> Multiple Comparison Procedures for Discrete Data Weighted Bonferroni procedure In this section we will cover the weighted Bonferroni procedure with the same HIV example. For the weighted Bonferroni procedure, Hi is rejected if pi < wi α and the adjusted p-value is given as min (1, wpii ). Here wi denotes the proportion of α allocated to the Pk−1 1 Hi such that i=1 wi = 1. Note that, if wi = k−1 , then the Bonferroni procedure is reduced to the regular Bonferroni procedure. Since the other design specifications remain same except that we are using weighted Bonferroni procedure in place of Bonferroni procedure, we can design simulation in this section with only little effort. Select Sim 1 in Library and click . Now go to the the Test Parameters tab. In the Multiple Comparison Procedures box, uncheck the Bonferroni box and check the Weighted Bonferroni box. Next click on Response Generation tab and look at the Table of Proportions. You will see an additional column with label Proportion of Alpha is added. Here you have to specify the proportion of total alpha you want to spend in each test. Ideally, the values in this column should add up to 1; if not, then East will normalize it to add them up to 1. By default, East distributes the total alpha equally among all tests. Here we have 3 tests in total, therefore each of the tests have proportion of alpha as 1/3 or 0.333. You can specify other proportions as well. For this example, keep the equal 588 27.2 Weighted Bonferroni procedure <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 proportion of alpha for each test. Now click Simulate to obtain power. Once the simulation run has completed, East will add an additional row to the Output Preview labeled as Sim 2. The weighted Bonferroni MC procedure has global and disjunctive power of 81% and conjunctive power of 3.4%. Note that, the powers in the weighted Bonferroni procedure is quite close to the Bonferroni procedure. This is because the weighted Bonferroni procedure with equal proportion is equivalent to the simple Bonferroni procedure. The difference in power between Bonferroni test in previous section and the weighted Bonferroni power in this section attributed to simulation error. The exact result of the simulations may differ slightly, depending on the seed. Now select Sim2 in the Output Preview and click the Library. 27.2 Weighted Bonferroni procedure icon. This will save Sim2 in Wbk1 in 589 <<< Contents 27 27.3 * Index >>> Multiple Comparison Procedures for Discrete Data Sidak procedures Sidak procedures are described below using the same HIV example from the 1 section 27.1. For the Sidak procedure, Hi is rejected if pi < 1− (1 − α) k−1 and the adjusted p-value is given as 1 − (1 − pi )k−1 . . Now go to the Test Parameters tab. In the Select Sim1 in Library and click Multiple Comparison Procedures box, uncheck the Bonferroni box and check the Sidak box. Now click Simulate to obtain power. Once the simulation run has completed, East will add an additional rows to the Output Preview labeled as Sim3. Sidak procedure has disjunctive and global powers of 81% and conjunctive powers of 3.8%. The exact result of the simulations may differ slightly, depending on the seed. Now select Sim 3 in the Output Preview using the Ctrl key and click the This will save Sim 3 in the Wbk1 in Library. 27.4 590 Holm’s step-down procedure icon. In the single step MC procedures, the decision to reject any hypothesis does not depend on the decision to reject other hypotheses. On the other hand, in the stepwise procedures decision of one hypothesis test can influence the decisions on the other tests of hypotheses. There are two types of stepwise procedures. One type of procedures proceeds in data-driven order. The other type proceeds in a fixed order set a priori. Stepwise tests in a data-driven order can proceed in step-down or step-up manner. East supports Holm step down MC procedure which start with the most significant comparison and continue as long as tests are significant until the test for 27.4 Holm’s step-down procedure <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 certain hypothesis fails. The testing procedure stops at the first time a non-significant comparison occurs and all remaining hypotheses will be retained. In i-th step, H(i) is α rejected if p(i) ≤ k−i and goes to the next step. Holm’s step down As before we will use the same HIV example to illustrate Holm’s step down procedure. Select Sim1 in Library and click . Now go to the the Test Parameters tab. In the Multiple Comparison Procedures box, uncheck the Bonferroni box and check the Holm’s Step down box. Now click Simulate to obtain power. Once the simulation run has completed, East will add an additional row to the Output Preview labeled as Sim4. Holm’s step down procedure has global and disjunctive power close to 81% and conjunctive power close to 9%. The exact result of the simulations may differ slightly, depending on the seed. Now select Sim4 in the Output Preview and click the icon. This will save Sim4 in Wbk1 in Library. 27.5 Hocheberg and Hommel procedures Step-up tests start with the least significant comparison and continue as long as tests are not significant until the first time when a significant comparison occurs and all remaining hypotheses will be rejected. East supports two such MC procedures Hochberg step-up and Hommel step-up procedures. In the Hochberg step-up procedure, in i-th step H(k−i) is retained if p(k−i) > αi . In the Hommel step-up procedure, in i-th step H(k−i) is retained if p(k−j) > i−j+1 α for j = 1, · · · , i. Fixed i 27.5 Hocheberg and Hommel procedures 591 <<< Contents 27 * Index >>> Multiple Comparison Procedures for Discrete Data sequence test and fallback test are the types of tests which proceed in a prespecified order. Hochberg’s and Hommel’s step up procedures are described below using the same HIV example from the section 27.1 on Bonferroni procedure. Since the other design specifications remain same except that we are using Dunnett’s step down in place of single step Dunnett’s test we can design simulation in this section with only little effort. Select Sim1 in Library and click . Now go to the the Test Parameters tab. In the Multiple Comparison Procedures box, uncheck the Bonferroni box and check the Hochberg’s step up and Hommel’s step up boxes. Now click Simulate to obtain power. Once the simulation run has completed, East will add two additional rows to the Output Preview labeled as Sim 5 and Sim 6. The Hocheberg and Hommel procedures have disjunctive and global powers of 81.2% and 81.4%, respectively and conjunctive powers close to 10%. The exact result of the simulations may differ slightly, depending on the seed. Now select Sim5 and Sim6 in the Output Preview using Ctrl key and click the Sim6 in Wbk1 in Library. 592 27.5 Hocheberg and Hommel procedures icon. This will save Sim5 and <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 27.6 Fixed-sequence testing procedure In data-driven stepwise procedures, we don’t have any control on the order of the hypotheses to be tested. However, sometimes based on our preference or prior knowledge we might want to fix the order of tests a priori. Fixed sequence test and fallback test are the types of tests which proceed in a pre-specified order. East supports both of these procedures. Assume that H1 , H2 , · · · , Hk−1 are ordered hypotheses and the order is prespecified so that H1 is tested first followed by H2 and so on. Let p1 , p2 , · · · , pk−1 be the associated raw marginal p-values. In the fixed sequence testing procedure, for i = 1, · · · , k − 1, in i-th step, if pi < α, reject Hi and go to the next step; otherwise retain Hi , · · · , Hk−1 and stop. Fixed sequence testing strategy is optimal when early tests in the sequence have largest treatment effect and performs poorly when early hypotheses have small treatment effect or are nearly true (Westfall and Krishen 2001). The drawback of fixed sequence test is that once a hypothesis is not rejected no further testing is permitted. This will lead to lower power to reject hypotheses tested later in the sequence. As before we will use the same HIV example to illustrate fixed sequence testing procedure. Select Sim 1 in Library and click . Now go to the the Test Parameters tab. In the Multiple Comparison Procedures box, uncheck the Bonferroni box and check the Fixed Sequence box. Next click on Response Generation tab and look at the Table of Proportions. You will see an additional column with label Test Sequence is added. Here you have to specify the order in which the hypotheses will be tested. Specify 1 for the test that will be tested first, 2 for the test that will be tested next and so on. By default East specifies 1 to the first test, 2 to the second test and so on. For now we will keep the default 27.6 Fixed-sequence testing procedure 593 <<< Contents 27 * Index >>> Multiple Comparison Procedures for Discrete Data which means that H1 will be tested first followed by H2 and finally H3 will be tested. Now click Simulate to obtain power. Once the simulation run has completed, East will add an additional row to the Output Preview labeled as Sim7. The fixed sequence procedure with the specified sequence has global and disjunctive power close to13% and conjunctive power close to 10%. The reason for small global and disjunctive power is due to the smallest treatment effect is tested first and the magnitude of treatment effect increases gradually for the remaining tests. For optimal power in fixed sequence procedure, the early tests in the sequence should have larger treatment effects. In our case, Dose3 has largest treatment effect followed by Dose2 and Dose1. Therefore, to obtain optimal power, H3 should be tested first followed by H2 and H1 . Select Sim7 in the Output Preview and click the Library, click 594 icon. Now, select Sim7 in and go to the the Response Generation tab. In Test Sequence 27.6 Fixed-sequence testing procedure <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 column in the table, specify 3 for Dose1, 2 for Dose2 and 1 for Dose3. Now click Simulate to obtain power. Once the simulation run has completed, East will add an additional rows to the Output Preview labeled as Sim8. Now the fixed sequence procedure with the pre-specified sequence (H3 , H2 , H1 ) has global and disjunctive power close to 89% and conjunctive power of 9.7%. This example illustrates that fixed sequence procedure is powerful provided the hypotheses are tested in a sequence of descending treatment effects. Fixed sequence procedure controls the FWER because for each hypothesis, testing is conditional upon rejecting all hypotheses earlier in sequence. The exact result of the simulations may differ slightly, depending on the seed. Select Sim8 in the Output Preview and click the icon to save it in Library. 27.7 Fallback procedure Fallback test alleviates the above undesirable feature for fixed sequence test. Let wi be Pk−1 the proportion of α for testing Hi such that i=1 wi = 1. In the fixed sequence 27.7 Fallback procedure 595 <<< Contents 27 * Index >>> Multiple Comparison Procedures for Discrete Data testing procedure, in i-th step (i = 1, · · · , k − 1), test Hi at αi = αi−1 + αwi if Hi−1 is rejected and at αi = αwi if Hi−1 is retained. If pi < αi , reject Hi ; otherwise retain it. Unlike the fixed sequence testing approach, the fallback procedure can continue testing even if a non-significant outcome is encountered by utilizing the fallback strategy. If a hypothesis in the sequence is retained, the next hypothesis in the sequence is tested at the level that would have been used by the weighted Bonferroni procedure. With w1 = 1 and w2 = · · · = wk−1 = 0, the fallback procedure simplifies to fixed sequence procedure. Again we will use the same HIV example to illustrate the fallback procedure. Select Sim 1 in Library and click . Now go to the the Test Parameters tab. In the Multiple Comparison Procedures box, uncheck the Dunnett’s single step box and check the Fallback box. Next click on Response Generation tab and look at the Table of Proportions. You will see two additional columns with label Test Sequence and Proportion of Alpha. In the column Test Sequence, you have to specify the order in which the hypotheses will be tested. Specify 1 for the test that will be tested first, 2 for the test that will be tested next and so on. By default East specifies 1 to the first test, 2 to the second test and so on. For now we will keep the default which means that H1 will be tested first followed by H2 and finally H3 will be tested. In the column Proportions of Alpha, you have to specify the proportion of total alpha you want to spend in each test. Ideally, the values in this column should add up to 1; if not, then East will normalize it to add them up to 1. By default East distributes the total alpha equally among the all tests. Here we have 3 tests in total, therefore each of the tests have proportion of alpha as 1/3 or 0.333. You can specify other proportions as 596 27.7 Fallback procedure <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 well. For this example, keep the equal proportion of alpha for each test. Now click Simulate to obtain power. Once the simulation run has completed, East will add an additional rows to the Output Preview labeled as Sim9. The fixed sequence procedure with the specified sequence had global and disjunctive power close to 13% and conjunctive power of 9%. With the same pre-specified order for testing hypotheses, fallback procedure has superior power compared to fixed sequence procedure. This is because the fallback procedure can continue testing even if a non-significant outcome is encountered whereas the fixed sequence procedure has to stop when a hypothesis in the sequence is not rejected. Now we will consider a sequence where H3 will be tested first followed by H2 and H1 because in our case, Dose3 has largest treatment effect followed by Dose2 and Dose1. Select Sim 9 in the Output Previewand click the in Library, click icon. Now, select Simulation 9 and go to the the Response Generation tab. In Test Sequence 27.7 Fallback procedure 597 <<< Contents 27 * Index >>> Multiple Comparison Procedures for Discrete Data column in the table, specify 3 for Dose1, 2 for Dose2 and 1 for Dose3. Now click Simulate to obtain power. Once the simulation run has completed, East will add an additional rows to the Output Preview labeled as Sim 10. Now the fixed sequence procedure with the pre-specified sequence (H3 , H2 , H1 ) had global and disjunctive power of 89% and conjunctive power of 9.7%. The obtained power is very close to Sim 9. Therefore, specification of sequence in descending treatment effect does not make much difference in terms of power. The exact result of 598 27.7 Fallback procedure <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 the simulations may differ slightly, depending on the seed. Select Sim10 in the Output Previewand click the 27.8 Comparison of MC procedures icon to save it in Library. We have obtained the power (based on the simulations) for different MC procedures for the HIV example in the previous sections. Now the obvious question is which MC procedure to choose. To compare all the MC procedure, we will perform simulations for all the MC procedures under the following scenario. Treatment arms: placebo, dose1 (dose=0.3 mg), dose2 (dose=1 mg) and dose3 (dose=2 mg) with respective proportions as 0.35, 0.4, 0.45 and 0.55, respectively. Variance: Unpooled Proportion of Alpha: Equal (0.333) Type I Error: 0.025 (right-tailed) Number of Simulations:10000 Total Sample Size:500 Allocation ratio: 1 : 1 : 1 : 1 For comparability of simulation results, we have used similar seed for simulation under all MC procedures (we have used seed as 5643). Following output displays the powers under different MC procedures. Clean up the Output Preview area, select all the checkboxes corresponding to the procedures and hit Simulate. Here we have used equal proportions for weighted Bonferroni and Fallback procedures. For the two fixed sequence testing procedures (fixed sequence and fallback) two sequences have been used - (H1 , H2 , H3 ) and (H3 , H2 , H1 ). As expected, Bonferroni and weighted Bonferroni procedures provides similar powers. It appears that fixed sequence procedure with the pre-specified sequence (H3 , H2 , H1 ) provides the power of 89.5% which is the maximum among all the procedures. However, fixed sequence procedure with the pre-specified sequence (H1 , H2 , H3 ) provides power of 13.6%. Therefore, power in fixed sequence procedure is largely 27.8 Comparison of MC procedures 599 <<< Contents 27 * Index >>> Multiple Comparison Procedures for Discrete Data dependent on the specification of sequence of testing and a mis-specification might result in huge drop in power. All the remaining remaining procedures have almost equal global and disjunctive powers - about 82%. Now, in terms of conjunctive power, Hochberg’s step-up and Hommel’s step-up procedures have the highest conjunctive power of 9.9%. Therefore, we can choose either Hochberg’s step-up or Hommel’s step-up procedure for a prospective HIV study discussed in section 27.1. 600 27.8 Comparison of MC procedures <<< Contents * Index >>> 28 Multiple Endpoints-Gatekeeping Procedures for Discrete Data Clinical trials are often designed to assess benefits of a new treatment compared to a control treatment with respect to multiple clinical endpoints which are divided into hierarchically ordered families. Typically, the primary family of endpoints defines the overall outcome of the trial, provides the basis for regulatory claim and is included in the product label. The secondary families of endpoints play a supportive role and provide additional information for physicians, patients, payers and are useful for enhancing the product label. Gatekeeping procedures address multiplicity problems by explicitly taking into account the hierarchical structure of the multiple objectives. The term ”gatekeeping” indicates the hierarchical decision structure where the higher ranked families serve as ”gatekeepers” for the lower ranked family. Lower ranked families won’t be tested if the higher ranked families have not passed requirements. Two types of gatekeeping procedures for discrete outcomes, parallel and serial, are described in this chapter. For more information about applications of gatekeeping procedures in a clinical trial setting and literature review on this topic, please refer to Dmitrienko and Tamhane (2007). East uses simulations to assess the operating characteristics of different designs using gatekeeping procedures. For example, one could simulate the power for a variety of sample sizes in a simple batch procedure. It is important to note that when determining the sample size for a clinical trial with multiple co-primary endpoints, if the correlation among the endpoints is not taken into consideration, the sample size may be overestimated (Souza, et al 2010). East uses information about the correlation among the multiple endpoints in order to determine a more feasible sample size. 28.1 MK-0974 (telcagepant) Consider the randomized, placebo-controlled, double blind, parallel treatment clinical for Acute Migraine trial designed to compare two treatments for migraine, a common disease and leading cause of disability. Standard treatment includes the use of Triptans, which although generally well tolerated, have a vasoconstrictor effect, which can be problematic. This leaves a certain population of patients with underlying cardiovascular disease, uncontrolled hypertension or certain subtypes of migraine unable to access this treatment. In addition, for some patients this treatment has no or low beneficial effect and is associated with some undesirable side effects resulting in the discontinuation of the drug (Ho et al, 2008). In this study, multiple doses of the drug Telcagepant (300 mg, 150 mg), an antagonist of the CGRP receptor associated with migraine, and zolmitriptan (5mg) the standard treatment against migraine, are compared against a placebo. The five co-primary endpoints include pain freedom, pain relief, absence of photophobia (sensitivity to light), absence of phonophobia (sensitivity to sound), and absence of nausea two hours post treatment. Three co-secondary endpoints included more sustained measurements of pain freedom, pain relief, and total migraine freedom 28.1 MK-0974 (telcagepant) for Acute Migraine 601 <<< Contents 28 * Index >>> Multiple Endpoints-Gatekeeping Procedures for Discrete Data for up to a 24 hour period. The study employed a full analysis set where the multiplicity of endpoints was addressed using a step-down closed testing procedure. Due to the negative aspects of zolmitriptan, investigators were primarily interested in determining the efficacy of Telcagepant for the acute treatment of migraine with the hope of an alternative treatment with fewer associated side effects. This study will be used to illustrate the two gatekeeping procedures East provides for multiple discrete endpoints. 28.2 Serial Gatekeeping Design - Simulation for Discrete Outcomes Serial gatekeeping procedures were studied by Maurer, Hothorn and Lehmacher (1995), Bauer et al. (1998) and Westfall and Krishen (2001). Serial gatekeepers are encountered in trials where endpoints are usually ordered from most important to least important. Suppose that a trial is declared successful only if the treatment effect is demonstrated on both primary and secondary endpoints. If endpoints in the primary trial are successful, it is only then of interest to assess the secondary endpoints. Correlation coefficients between the endpoints are bounded and East computes the valid range of acceptable values. As the number of endpoints increases, the restriction imposed on the valid range of correlation values is also greater. Therefore for illustration purpose, the above trial is simplified to consider three primary endpoints, pain freedom (PF), absence of phonophobia (phono) and absence of photophobia (photo) at two hours post treatment. Only one endpoint from the secondary family, sustained pain freedom (SPF), will be included in the example. Additionally, where the original trial studied multiple doses and treatments, this example will use only two groups to focus the comparison on the higher dose of Telcagepant of 300mg, and placebo. The example includes correlation values intended to represent zero, mild and moderate correlation accordingly, to examine its effect on power. The efficacy, or response rate, of the endpoints for subjects in the treatment group and placebo group and a sample correlation matrix follows: 602 Response Telcagepant 300mg Response Placebo PF phono photo 0.269 0.578 0.51 0.096 0.368 0.289 SPF 0.202 0.05 28.2 Serial Gatekeeping Design - Simulation for Discrete Outcomes <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 ρ12 ρ13 ρ23 ρ14 ρ24 ρ34 Sim 1 Sim 2 Sim 3 Sim 4 Sim 5 Sim 6 Sim 7 0 0 0 0 0.3 0.3 0.3 0 0 0 0 0.3 0.3 0.3 0 0.3 0.5 0.8 0.3 0.5 0.8 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Sim 8 Sim 9 Sim 10 Sim 11 Sim 12 Sim 13 Sim 14 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.3 0.3 0.3 0.8 0.8 0.8 0.8 0.3 0.5 0.7 0.3 0.5 0.7 0.7 0 0 0 0 0 0 0 0 0 0 0 0 0 0 To construct the above simulations, in the Design tab on the Discrete group, click Two Samples and select Multiple Comparisons-Multiple Endpoints 28.2 Serial Gatekeeping Design - Simulation for Discrete Outcomes 603 <<< Contents 28 * Index >>> Multiple Endpoints-Gatekeeping Procedures for Discrete Data At the top of this input window, the user must specify the total number of endpoints in the trial. Other input parameters such as Test Type, Type I Error (α), Sample Size (n), and whether or not a Common Rejection Region is to be used for the endpoints. If a different rejection region is desired for different endpoints, this information should be specified in the Endpoint Information box. Here the user can change the label, select the family rank for each endpoint and choose the rejection region (either right or left tailed). As discussed above there are typically two types of gatekeeping procedures - serial and parallel. Parallel gatekeeping requires the rejection of at least one hypothesis test - that is only one of the families of endpoints must be significant, no matter the rank. Serial gatekeeping uses the fact that the families are hierarchically ordered, and subsequent families are only tested if the previously ranked families are significant. Once the Gatekeeping Procedure is selected, the user must then select the multiple comparison procedure which will be used to test the last family of endpoints. These tests are discussed in Chapter 27. If Parallel Gatekeeping is selected, the user must also specify a test for Gatekeeper Families, specifically Bonferonni, Truncated Holm or Truncated Hochberg, and is discussed more in the Parallel example which follows. The type I error specified on this screen is the nominal level of the family-wise error rate, which is defined as the probability of falsely declaring the efficacy of the new treatment compared to control with respect to any endpoint. 604 28.2 Serial Gatekeeping Design - Simulation for Discrete Outcomes <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 For the migraine example, PF, phono, and photo form the primary family, and SPF is the only outcome in the secondary family. Suppose that we would like to see the power for a sample size of 200 at a nominal type I error rate 0.025 using Bonferroni test for the secondary family. The input window will look as follows: In addition to the Test Parameters tab, there is a tab labeled Response Generation. This is where the user specifies the underlying joint distribution among the multiple endpoints for the control arm and for the treatment arm. This is assumed to be multivariate binary with a specified correlation matrix. For the first simulation, the 28.2 Serial Gatekeeping Design - Simulation for Discrete Outcomes 605 <<< Contents 28 * Index >>> Multiple Endpoints-Gatekeeping Procedures for Discrete Data Common Correlation box can be checked with default value of 0. The number of simulations to be performed and other simulation parameters can be specified in bf Simulation Controls window. By default, 10000 simulations will be performed. The summary statistics for each simulated trial and subject-level data can be saved by checking the appropriate boxes in the Output Options area. Once all design parameters are specified, click the Simulate button at the bottom right of the screen. Preliminary output is displayed in the output preview area and all results 606 28.2 Serial Gatekeeping Design - Simulation for Discrete Outcomes <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 displayed in the yellow cells are summary outputs generated from simulations. To view the detailed output, first save the simulation into a workbook in the library by selecting the simulation in the Output Preview window and clicking node will appear in the library under the current workbook. A simulation Double click the simulation node Sim1 in the Library to see the detailed output which summarizes all the main input parameters, including the multiple comparison procedure used for the last family of endpoints, the nominal type I error level, total sample size, mean values for each endpoint in the control arm and that in the experimental arm etc. It also displays a comprehensive list of different types of power: These different types of power are defined as follows: 28.2 Serial Gatekeeping Design - Simulation for Discrete Outcomes 607 <<< Contents 28 * Index >>> Multiple Endpoints-Gatekeeping Procedures for Discrete Data Overall Power and FWER: Global: probability of declaring significance on any of the endpoints Conjunctive: probability of declaring significance on all of the endpoints for which the treatment arm is truly better than the control arm Disjunctive: probability of declaring significance on any of the endpoints for which the treatment arm is truly better than the control arm FWER: probability of making at least one type I error among all the endpoints Power and FWER for Individual Gatekeeper Family except the Last Family: Conjunctive Power: probability of declaring significance on all of the endpoints in the particular gatekeeper family for which the treatment arm is truly better than the control arm FWER: probability of making at least one type I error when testing the endpoints in the particular gatekeeper family Power and FWER for the Last Family: Conjunctive Power: probability of declaring significance on all of the endpoints in the last family for which the treatment arm is truly better than the control arm Disjunctive Power: probability of declaring significance on any of the endpoints in the last family for which the treatment arm is truly better than the control arm FWER: probability of making at least one type I error when testing the endpoints in the last family 608 28.2 Serial Gatekeeping Design - Simulation for Discrete Outcomes <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Marginal Power: probability of declaring significance on the particular endpoint For the migraine example, the conjunctive power, which characterizes the power for the study, is 0.701% for a total sample size of 200. Using Bonferroni test for the last family, the design has 0.651% probability (disjunctive power for the last family) to 28.2 Serial Gatekeeping Design - Simulation for Discrete Outcomes 609 <<< Contents 28 * Index >>> Multiple Endpoints-Gatekeeping Procedures for Discrete Data detect the benefit of Telcagepant 300mg with respect to at least one secondary endpoints. It has 0.651% chance (conjunctive power for the last family) to declare the benefit of Telcagepant 300 mg with respect to both of the secondary endpoints. For a sample size of 200 this relatively low power is typically undesirable. One can find the sample size to achieve a target power by simulating multiple designs in a batch mode. For example, the simulation of a batch of designs for a range of sample size 200 to 300 in steps of 20 is shown by the following. Multiple designs can be viewed side by side for easy comparison by selecting the simulations and clicking the in the output preview area: For this example, to obtain a conjunctive power between 80% and 90% the study 610 28.2 Serial Gatekeeping Design - Simulation for Discrete Outcomes <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 would need to be constructed with somewhere between 250 and 300 subjects. For the remainder of this example, we will use sample size of 250 subjects under the correlation assumptions in the above table. 28.3 Parallel Gatekeeping Design - Simulation for Discrete Outcomes A common concern in clinical trials with multiple primary endpoints, is whether or not statistical significance should be achieved on all endpoints. As the number of endpoints increases, this generally becomes more difficult. Parallel gatekeeping procedures are often used in clinical trials with multiple primary objectives where each individual objective can characterize a successful overall trial outcome. In other words, the trial can be declared to be successful if at least one primary objective is met. Again, consider the same randomized, placebo-controlled, double blind, parallel treatment clinical trial designed to compare two treatments for migraine presented in the serial gatekeeping example. For the purpose of this example the trial is again simplified to study only three primary family endpoints, pain freedom (PF), absence of phonophobia (phono) and absence of photophobia (photo) at two hours post treatment. The singular endpoint in the secondary family is sustained pain freedom (SPF), and will be included in the example where, using East, power estimates will be computed via simulation. The example correlation values are intended to represent a common and moderate association among the endpoints. In general, serial gatekeeping designs require a larger sample size than parallel designs, therefore this example will use a total sample size of 125, at one-sided significance level of α = 0.025. The efficacy, or response rate, of the endpoints for subjects in the treatment group and placebo group and a sample correlation matrix are as follows: Response Telcagepant 300mg Response Placebo PF phono photo 0.269 0.578 0.51 0.096 0.368 0.289 SPF 0.202 0.05 28.3 Parallel Gatekeeping Design - Simulation for Discrete Outcomes 611 <<< Contents 28 * Index >>> Multiple Endpoints-Gatekeeping Procedures for Discrete Data Sim 1 Sim 2 Sim 3 ρ12 ρ13 ρ23 ρ14 ρ24 ρ34 0.3 0 0.3 0.3 0 0.3 0.3 0.8 0.8 0.3 0.3 0.3 0.3 0.0 0 0.3 0.0 0 We now construct a new set of simulations to assess the operating characteristics of the study using a Parallel Gatekeeping design for the above response generation information. In the Design tab on the Discrete group, click Two Samples and select Multiple Comparisons-Multiple Endpoints In the Gatekeeping Procedure box, keep the default of Parallel and Bonferroni for the Test for Gatekeeper Families. For the Test for Last Family, also ensure that Bonferroni is selected as the multiple testing procedure. In the Endpoint Information box, specify which family each specific endpoint belongs to using the 612 28.3 Parallel Gatekeeping Design - Simulation for Discrete Outcomes <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 column with the label Family Rank. In the Response Generation window the Variance can be specified to be either Pooled or Un-pooled. In the Endpoint Information box, the Response Rates for treatment and control for each endpoint are specified. If the endpoints share a common correlation, select the Common Correlation checkbox and enter the correlation value to the right. East will only allow a value within the Valid Range. Otherwise input the specific correlation for each pair of endpoints in the Correlation Matrix. In the Simulation Controls window, the user can specify the total number of simulations, refresh frequency, and random number seed. Simulation data can be saved for more advanced analyses. After all the input parameter values have been specified, click the Simulate button on the bottom right of the window to begin the simulation. 28.3 Parallel Gatekeeping Design - Simulation for Discrete Outcomes 613 <<< Contents 28 * Index >>> Multiple Endpoints-Gatekeeping Procedures for Discrete Data The progress window will report how many simulations have been completed. When complete, close the progress report screen and the preliminary simulation summary will be displayed in the output preview window. Here, one can see the overall power summary. 614 28.3 Parallel Gatekeeping Design - Simulation for Discrete Outcomes <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 To see the detailed output, save the simulation in the current workbook by clicking the icon. A simulation node will be appended to the corresponding workbook in the library. Double click the simulation node in the library to display the detailed outputs. 28.3 Parallel Gatekeeping Design - Simulation for Discrete Outcomes 615 <<< Contents 28 * Index >>> Multiple Endpoints-Gatekeeping Procedures for Discrete Data As with serial gatekeeping, East provides following types of power: Overall Power and FWER: Global: probability of declaring significance on any of the endpoints. Conjunctive: probability of declaring significance on all of the endpoints for which the treatment arm is truly better than the control arm. Disjunctive: probability of declaring significance on any of the endpoints for which the treatment arm is truly better than the control arm. FWER: probability of making at least one type I error among all the endpoints. Power and FWER for Individual Gatekeeper Families except the Last Family: Conjunctive Power: probability of declaring significance on all of the endpoints in the particular gatekeeper family for which the treatment arm is truly better than the control arm. Disjunctive Power: probability of declaring significance on any of the endpoints in the particular gatekeeper family for which the treatment arm is truly better than the control arm. FWER: probability of making at least one type I error when testing the endpoints in the particular gatekeeper family. Power and FWER for the Last Gatekeeper Family: Conjunctive Power: probability of declaring significance on all of the endpoints in the last family for which the treatment arm is truly better than the control arm. Disjunctive Power: probability of declaring significance on any of the endpoints in the last family for which the treatment arm is truly better than the control arm. FWER: probability of making at least one type I error when testing the endpoints in the last family. Marginal Power: probability of declaring significance on the particular endpoint. For the migraine example under the lower common correlation assumption, we see that the gatekeeping procedure using the Bonferroni test for both the primary family and the secondary family provides 84.4% power to detect the difference in at least one of the three primary measures of migraine relief. It only provides 24.1% power to detect the differences in all types of relief. The marginal power table displays the probabilities of declaring significance on the particular endpoint after multiplicity adjustment. For example, the power to detect sustained pain relief beyond 2 hours for a dose of 300 mg of telecapant is 60.3 To assess the robustness of this procedure with respect to the correlation among the 616 28.3 Parallel Gatekeeping Design - Simulation for Discrete Outcomes <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Table 28.1: Power Comparisons under Different Correlation Assumptions Correlation Sim 1 Sim 2 Sim 3 Primary Family Disjunct. Conjunct. 0.839 0.838 0.787 0.242 0.244 0.286 Secondary Family Disjunct. Conjunct. 0.599 0.579 0.554 0.99 0.579 0.554 Overall Power Disjunct. Conjunct. 0.839 0.838 0.787 0.218 0.202 0.234 different endpoints, the simulation can be run again with different combinations of correlations. Right click on the simulation node in the Library and select Edit Simulation from the dropdown list. Next click on the Response Generation tab, update the correlation matrix, and click Simulate. This can be repeated for all desired correlation combinations and be compared in an output summary. The following table summarizes the power comparisons under different correlation assumptions. Note that the disjunctive power decreases as the correlation increases and conjunctive power increases as the correlation increases. There are three available parallel gatekeeping methods: Bonferroni, Truncated Holm and Truncated Hochberg. The multiple comparison procedures applied to the gatekeeper families need to satisfy the so-called separable condition. A multiple comparison procedure is separable if the type I error rate under partial null configuration is strictly less than the nominal level α. Bonferroni is a separable 28.3 Parallel Gatekeeping Design - Simulation for Discrete Outcomes 617 <<< Contents 28 * Index >>> Multiple Endpoints-Gatekeeping Procedures for Discrete Data Table 28.2: Impact of Truncation Constant on Power in the Truncated Holm Procedure Truncation Constant 0 0.25 0.5 0.8 Primary Family Conjunct. Disjunct. 0.234 0.28 0.315 0.383 0.84 0.833 0.836 0.838 Secondary Family Conjunct. Disjunct. 0.59 0.569 0.542 0.488 0.59 0.569 0.542 0.488 Overall Power Conjunct. Disjunct. 0.21 0.248 0.275 0.334 0.84 0.833 0.836 0.838 procedure, however, the regular Holm and Hochberg procedure are not separable and can’t be applied directly to the gatekeeper families. The truncated versions obtained by taking the convex combinations of the critical constants for the regular Holm/Hochberg procedure and Bonferroni procedure are separable and more powerful than Bonferroni test. The truncation constant leverages the degree of conservativeness. The larger value of the truncation constant results in more powerful procedure. If the truncation constant is set to be 1, it reduces to the regular Holm or Hochberg test. To see this, simulate the design using the truncated Holm procedure for the primary family and Bonferroni test for the second family for the migraine example with common correlation 0.3. The table below compares the conjunctive power and disjunctive power for each family and the overall ones for different truncation parameter values. As the value of the truncation parameter increases, the conjunctive power for the primary family increases and the disjunctive power remain unchanged. Both the conjunctive power and disjunctive power for the secondary family decrease as we increase the truncation parameter. The overall conjunctive power also increases but the overall disjunctive power remains the same with the increase of truncation parameter. The next table shows the marginal powers of this design for different truncation parameter values. The marginal powers for the two endpoints in the primary family increase. On the other hand, the marginal powers for the endpoint in the secondary family decrease. The last two tables display the operating characteristics for the Hochberg test with different truncation constant values. Note that both the conjunctive and disjunctive powers for the primary family increase as the truncation parameter increases. However, the power for the secondary family decreases with the larger truncation 618 28.3 Parallel Gatekeeping Design - Simulation for Discrete Outcomes <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Table 28.3: Impact of Truncation Constant on Marginal Power in the Truncated Holm Procedure Truncation Constant 0 0.25 0.5 0.8 Primary Family PF Phono Photo 0.54 0.582 0.591 0.625 0.512 0.512 0.541 0.568 Secondary Family SPF 0.568 0.58 0.596 0.631 0.59 0.569 0.542 0.488 Table 28.4: Impact of Truncation Constant on Power in the Truncated Hochberg Procedure Truncation Constant 0 0.25 0.5 0.8 Primary Family Conjunct. Disjunct. 0.234 0.303 0.322 0.407 0.844 0.838 0.841 0.847 Secondary Family Conjunct. Disjunct. 0.595 0.578 0.544 0.494 0.595 0.578 0.544 0.494 Overall Power Conjunct. Disjunct. 0.208 0.268 0.281 0.351 0.844 0.838 0.841 0.847 parameter value. The marginal powers for the primary family and for the secondary family behave similarly. The overall conjunctive and disjunctive powers also increase as we increase the truncation parameter. 28.3 Parallel Gatekeeping Design - Simulation for Discrete Outcomes 619 <<< Contents 28 * Index >>> Multiple Endpoints-Gatekeeping Procedures for Discrete Data Table 28.5: Impact of Truncation Constant in Truncated Hochberg Procedure on Marginal Power Truncation Constant 0 0.25 0.5 0.8 620 Primary Family PF Photo Phono 0.552 0.595 0.603 0.642 0.52 0.529 0.54 0.592 0.564 0.603 0.598 0.647 Secondary Family SPF 0.595 0.578 0.544 0.494 28.3 Parallel Gatekeeping Design - Simulation for Discrete Outcomes <<< Contents * Index >>> 29 Two-Stage Multi-arm Designs using p-value combination 29.1 Introduction In the drug development process, identification of promising therapies and inference on selected treatments are usually performed in two or more stages. The procedure we will be discussing here is an adaptive two-stage design that can be used for the situation of multiple treatments to be compared with a control. This will allow integration of both the stages within a single confirmatory trial controlling the multiple level type-I error. After the interim analysis in the first stage, the trial may be terminated early or continued with a second stage, where the set of treatments may be reduced due to lack of efficacy or presence of safety problems with some of the treatments. This procedure in East is highly flexible with respect to stopping rules and selection criteria and also allows re-estimation of the sample size for the second stage. Simulations show that the method may be substantially more powerful than classical one-stage multiple treatment designs with the same total sample size because second stage sample size is focused on evaluating only the promising treatments identified in the first stage. This procedure is available for continuous as well discrete endpoint studies. The current chapter deals with the discrete endpoint studies only; continuous endpoint studies are handled similarly. 29.2 Study Design This section will explore different design options available in East with the help of an example. 29.2.1 Introduction to the Study 29.2.2 Methodology 29.2.3 Study Design Inputs 29.2.4 Simulating under Different Alternatives 29.2.1 Introduction to the Study A new chemical entity (NCE) is being developed for the treatment of reward deficiency syndrome, specifically alcohol dependence and binge eating disorder. Compared with other orally available treatments, NCE was designed to exhibit enhanced oral bioavailability, thereby providing improved efficacy for the treatment of alcohol dependence. Primary Objective: To evaluate the safety and efficacy of NCE compared with placebo when administered daily for 12 weeks to adults with alcohol dependence. Secondary Objective: To determine the optimal dose or doses of NCE. The primary endpoint is defined as the percent of subjects abstinent from heavy drinking during Weeks 5 through 12 of treatment based on self-report of drinking activity. A heavy drinking day is defined as 4 or more standard alcoholic drinks in 1 day for females and 5 or more standard alcoholic drinks in 1 day for males. The endpoint is based on the patient-reported number of standard alcoholic drinks per day, transformed into a binary outcome measure, abstinence from heavy drinking. 29.2 Study Design – 29.2.2 Methodology 621 <<< Contents 29 * Index >>> Two-Stage Multi-arm Designs using p-value combination 29.2.2 Methodology This is a multicenter, randomized, double-blind, placebo-controlled study conducted in two parts using a 2-stage adaptive design. In Stage 1, approximately 400 eligible subjects will be randomized equally among four treatment arms (NCE [doses: 1, 2.5, or 10 mg]) and matching placebo. After all subjects in Stage 1 have completed the 12-week treatment period or discontinued earlier, an interim analysis will be conducted to 1. compare the proportion of subjects in each dose group who have achieved abstinence from heavy drinking during Weeks 5 through 12, 2. to assess safety within each dose group and 3. drop the less efficient doses. Based on the interim analysis, Stage 2 of the study will either continue with additional subjects enrolling into 2 or 3 arms (placebo and 1 or 2 favorable, active doses) or the study will be halted completely if unacceptable toxicity has been observed. In this example, we will have the following workflow to cover different options available in East: 1. Start with four arms (3 doses + Placebo) 2. Evaluate the three doses at the interim analysis and based on the Treatment Selection Rules carry forward one or two of the doses to the next stage 3. While we select the doses, also increase the sample size of the trial by using Sample Size Re-estimation (SSR) tool to improve conditional power if necessary In a real trial, both the above actions (early stopping as well as sample size re-estimation) will be performed after observing the interim data. 4. See the final design output in terms of different powers, probabilities of selecting particular dose combinations 5. See the early stopping boundaries for efficacy and futility on adjusted p-value scale 6. Monitor the actual trial using the Interim Monitoring tool in East. Start East. Click Design tab, then click Many Samples in the Discrete category, and then click Multiple Looks- Combining p-values test. 622 29.2 Study Design – 29.2.2 Methodology <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 This will bring up the input window of the design with some default values. Enter the inputs as discussed below. 29.2.3 Study Design Inputs Let us assume that three doses of the treatment 1mg, 2.5mg, 10mg are compared with the Placebo arm. Preliminary sample size estimates are provided to achieve an overall study power of at least 80% at an overall, adequately adjusted 1-sided type-1 or alpha level of 2.5%, after taking into account all interim and final hypothesis tests. Note that we always use 1-sided alpha since dose-selection rules are usually 1-sided. In Stage 1, 400 subjects are initially planned for enrollment (4 arms with 100 subjects each). Following an interim analysis conducted after all subjects in Stage 1 have completed 12 weeks of treatment or discontinued earlier, an additional 200 subjects will be enrolled into 2 doses for Stage 2 (placebo and one active dose). So we start with the total of 400+200 = 600 subjects. The multiplicity adjustment methods available in East to compute the adjusted p-value 29.2 Study Design – 29.2.3 Study Design Inputs 623 <<< Contents 29 * Index >>> Two-Stage Multi-arm Designs using p-value combination (p-value corresponding to global NULL) are Bonferroni, Sidak, Simes. For discrete endpoint test, Dunnett Single Step is not available since we will be using Z-statistic. Let us use the Bonferroni method for this example. The p-values obtained from both the stages can be combined by using the “Inverse Normal” method. In the “Inverse Normal” method, East first computes the weights as follows: r n(1) (1) w = (29.1) n And r n(2) w(2) = (29.2) n where n(1) and n(2) are the total sample sizes corresponding to Stage 1 and stage 2 respectively and n is the total sample size. EAST displays these weights by default but these values are editable and user can specify any other weights as long as 2 2 w(1) + w(2) = 1 (29.3) p = 1 − Φ w(1) Φ−1 (1 − p(1) ) + w(2) Φ−1 (1 − p(2) ) (29.4) Final p-value is given by The weights specified on this tab will be used for p-value computation. w(1) will be used for data before interim look and w(2) will be used for data after interim look. Thus, according to the samples pfor the two stages in this example, the p sizes planned weights are calculated as (400/600) and (200/600). Note : These weights are updated by East once we specify the first look position as 400/600 in the Boundary tab. So leave these as default values for now. Set the Number of Arms as 4 and enter the rest of the inputs as shown below: We can certainly have early stopping boundaries for efficacy and/or futility. But generally, in designs like this, the objective is to select the best dose(s) and not stop 624 29.2 Study Design – 29.2.3 Study Design Inputs <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 early. So for now, select the Boundary tab and set both the boundary families to “None”. Also, set the timing of the interim analysis as 0.667 which will be after observing the data on 400 subjects out of 600. Enter 400/600 as shown below. Notice the updated weights on the bf Test Parameters tab. The next tab is Response Generation which is used to specify the true underlying proportion of response on the individual dose groups and the initial allocation from which to generate the simulated data. Before we update the Treatment Selection tab, go to the Simulation Control Parameters tab where we can specify the number of simulations to run, the random number seed and also to save the intermediate simulation data. For now, enter the 29.2 Study Design – 29.2.3 Study Design Inputs 625 <<< Contents 29 * Index >>> Two-Stage Multi-arm Designs using p-value combination inputs as shown below and keep all other inputs as default. Click on the Treatment Selection tab. This tab is to select the scale to compute the treatment-wise effects. For selecting treatments for the second stage, the treatment effect scale will be required, but the control treatment will not be considered for selection. It will always be there in the second stage. The list under Treatment Effect Scale allows you to set the selection rules on different scales. Select Estimated δ from this list. It means that all the selection rules we specify on this tab will be in terms of the estimated value of treatment effect, δ, i.e., difference from placebo. Here is a list of all available treatment effect scales: Estimated Proportion, Estimated δ, Test Statistic, Conditional Power, Isotonic Proportion, Isotonic δ. For more details on these scales, refer to the Appendix K chapter on this method. The next step is to set the treatment selection rules for the second stage. Select Best r Treatments: The best treatment is defined as the treatment having the highest or lowest mean effect. The decision is based on the rejection region. If it is “Right-Tail” then the highest should be taken as best. If it is “Left-Tail” then the lowest is taken as best. Note that the rejection region does not affect the choice of treatment based on conditional power. Select treatments within of Best Treatment: Suppose the treatment effect scale is Estimated δ. If the best treatment has a treatment effect of δb and is specified 626 29.2 Study Design – 29.2.3 Study Design Inputs <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 as 0.1 then all the treatments which have a δ as δb − 0.1 or more are chosen for Stage 2. Select treatments greater than threshold ζ: The treatments which have the treatment effect scale greater or less than the threshold (ζ) specified by the user according to the rejection region. But if the treatment effect scale is chosen as the conditional power then it will be greater than all the time. Use R for Treatment Selection: If you wish to define any customized treatment selection rules, it can be done by writing an R function for those rules to be used within East. This is possible due to the R Integration feature in East. Refer to the appendix chapter on R Functions for more details on syntax and use of this feature. A template file for defining treatment selection rules is also available in the subfolder RSamples under your East installation directory. For more details on using R to define Treatment selection rules, refer to section O.10. For this example, select the first rule Select Best r treatments and set r = 1 which indicates that East will select the best dose for Stage 2 out the three doses. We will leave the default allocation ratio selections to yield equal allocation between the 29.2 Study Design – 29.2.3 Study Design Inputs 627 <<< Contents 29 * Index >>> Two-Stage Multi-arm Designs using p-value combination control and selected best dose in Stage 2. Click the Simulate button to run the simulations. When the simulations are over, a row gets added in the Output Preview area. Save this row to the Library by clicking the icon in the toolbar. Rename this scenario as Best1. Double click it to see the 628 29.2 Study Design – 29.2.3 Study Design Inputs <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 detailed output. The first table in the detailed output shows the overall power including global power, conjunctive power, disjunctive power and FWER. The definitions for different powers are as follows: Global Power: probability of demonstrating statistical significance on one or more treatment groups Conjunctive Power: probability of demonstrating statistical significance on all treatment groups which are truly effective Disjunctive Power: probability of demonstrating statistical significance on at least one treatment group which is truly effective FWER: probability of incorrectly demonstrating statistical significance on at 29.2 Study Design – 29.2.3 Study Design Inputs 629 <<< Contents 29 * Index >>> Two-Stage Multi-arm Designs using p-value combination least one treatment group which is truly ineffective For our example, there is 0.8 global power, i.e., the probability of this design to reject any null hypothesis, where the set of null hypothesis are the TRUE proportion of responders at each dose equals that of control. Also shown are conjunctive and disjunctive power, as well as Family Wise Error Rate (FWER). The Lookwise Summary table summarizes the number of simulated trials that ended with a conclusion of efficacy, i.e., rejected any null hypothesis, at each look. In this example, no simulated trial stopped at the interim analysis with an efficacy conclusion since there were no stopping boundaries, but 8083 simulations yielded an efficacy conclusion via the selected dose after Stage 2. This is consistent with the global power. The next table Detailed Efficacy Outcomes for all 10000 Simulations, summarizes the number of simulations for which each dose was selected for Stage 2 and yielded an efficacy conclusion. For example, the dose 10mg was observed to be efficacious in 63% of simulated trials whereas none of the three doses were efficacious in 19% of trials. The last output table Marginal Probabilities of Selection and Efficacy, summarizes the number and percent of simulations in which each dose was selected for Stage 2, regardless of whether it was found significant at end of Stage 2 or not, as well as the number and percent of simulations in which each dose was selected and found significant. Average sample size is also shown. Note that since this design only selected the single best dose, this table gives almost the same information as the above one. Selecting multiple doses (arms) for Stage 2 would be of more effective than selecting just the best one. Click the button on the bottom left corner of the screen. This will take us back to the input window of the last simulation scenario. Go to Treatment Selection tab and set r = 2. It means that we are interested in carrying forward the two best doses out of the three. Run the simulations by keeping the sample size fixed as 600. The simulated power drops to approximately 73%. Note that the loss of power for this 2-best-doses-choice scenario in comparison to the previous example which chose only the best dose. This is because of the smaller sample sizes per dose in stage 2 for this 2-best-doses scenario since the sample size is split in Stage 2 among 2 doses and control instead of between only 1 dose and control in the best dose scenario. 630 29.2 Study Design – 29.2.3 Study Design Inputs <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Now go to Test Parameters tab and change the sample size to 700 assuming that each of the two doses and Placebo will get 100 subjects in Stage 2. Accordingly, update the look position on Boundaries tab to 400/700 as well. Click the Simulate button to run the simulations. When the simulations are over, a row gets added in the Output icon in the toolbar. Preview area. Save this row to the Library by clicking the Rename this scenario as Best2. Double click it to see the detailed output. The interpretation of first two tables is same as described above. It restores the power to 80% and also gives us the design details when two of the three doses were selected. 29.2 Study Design – 29.2.3 Study Design Inputs 631 <<< Contents 29 * Index >>> Two-Stage Multi-arm Designs using p-value combination The table Detailed Efficacy Outcomes for all 10000 Simulations summarizes the number of simulations for which each individual dose group or pairs of doses were selected for Stage 2 and yielded an efficacy conclusion. For example, the pair (2.5mg, 10mg only) was observed to be efficacious in 41% of the trials (4076/10000). The next table Marginal Probabilities of Selection and Efficacy, summarizes the number and percent of simulations in which each dose was selected for Stage 2, regardless of whether it was found significant at end of Stage 2 or not, as well as the number and percent of simulations in which each dose was selected and found significant. Average sample size is also shown. It tells us how frequently the dose (either alone or with some other dose) was selected and efficacious. For example, dose 1mg was selected in approximately 25% trials and was efficacious in approximately 7% trials (which is the sum of 10, 130 and 555 simulations from previous table.) The advantage of 2-stage “treatment selection design” or “drop-the-loser” design is that it allows to drop the less performing/futile arms based on the interim data and still preserves the type-1 error as well as achieve the desired power. In the Best1 scenario, we dropped two doses (r = 1) and in the Best2 scenario, we dropped one dose (r = 2). Suppose, we had decided to proceed to stage 2 without dropping any doses. In this case, Power would have dropped significantly. To verify this in East, run the above scenario with r = 3 and save it to Library. Rename this scenario as All3. Double click it to see the detailed output. We can observe that the power drops from 80% to 72%. The three scenarios created so far can be compared in the tabular manner as well. Select the three nodes in the Library, click the 632 icon in the toolbar and select 29.2 Study Design – 29.2.4 Simulating under Different Alternatives <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 “Power” from the dropdown. A table as shown below will be created by East. 29.2.4 Simulating under Different Alternatives Since this is a simulation based design, we can perform sensitivity analyses by changing some of the inputs and observing effects on the overall power and other output. Let us first make sure that this design preserves the total type1 error. It can be done by running the simulations under “Null” hypothesis. Click the button on the bottom left corner of the screen. Go to Response Generation tab and enter the inputs as shown below: Also set r = 2 in the Treatment Selection tab. Run the simulations and go to the 29.2 Study Design – 29.2.4 Simulating under Different Alternatives 633 <<< Contents 29 * Index >>> Two-Stage Multi-arm Designs using p-value combination detailed output by saving the row from Output Preview to the Library. Notice the global power and simulated FWER is less than design type I error which means the overall type1 error is preserved. 29.3 Sample Size Reestimation As we have seen above, the desired power of 80% is achieved with the sample size of 700 if the initial assumptions (πc = 0.1, π1mg = 0.14, π2.5mg = 0.18, π10mg = 0.22) hold true. But if they do not, then the original sample size of 700 may be insufficient to achieve 80% power. The adaptive sample size re-estimation is suited to this purpose. In this approach we start out with a sample size of 700 subjects, but take an interim look after data are available on 400 subjects. The purpose of the interim look is not to stop the trial early but rather to examine the interim data and continue enrolling past the planned 700 subjects if the interim results are promising enough to warrant the additional investment of sample size. This strategy has the advantage that the sample size is finalized only after a thorough examination of data from the actual study rather than through making a large up-front sample size commitment before any data are available. Furthermore, if the sample size may only be increased but never decreased from the originally planned 700 subjects, there is no loss of efficiency due to overruns. Suppose the proportions of response on the four arms are as shown below. Update the Response Generation tab accordingly and also set the seed as 100 in the Simulation Controls tab. Run 10000 simulations and save the simulation row to the Library by clicking the 634 29.3 Sample Size Re-estimation <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 icon in the toolbar. Notice that the global power has dropped from 80% to 67%. Let us re-estimate the sample size to achieve the desired power. Add the Sample Size Re-estimation tab by clicking the button . A new tab is added as shown below. SSR At: For a K-look group sequential design, one can decide the time at which conditions for adaptations are to be checked and actual adaptation is to be carried out. This can be done either at some intermediate look or after some specified information fraction. The possible values of this parameter depend 29.3 Sample Size Re-estimation 635 <<< Contents 29 * Index >>> Two-Stage Multi-arm Designs using p-value combination upon the user choice. The default choice for this design is always the Look #. and is fixed to 1 since it is always a 2-look design. Target CP for Re-estimating Sample Size: The primary driver for increasing the sample size at the interim look is the desired (or target) conditional power or probability of obtaining a positive outcome at the end of the trial, given the data already observed. For this example we have set the conditional power at the end of the trial to be 80%. East then computes the sample size that would be required to achieve this desired conditional power. Maximum Sample Size if Adapt (multiplier; total): As just stated, a new sample size is computed at the interim analysis on the basis of the observed data so as to achieve some target conditional power. However the sample size so obtained will be overruled unless it falls between pre-specified minimum and maximum values. For this example, the range of allowable sample sizes is [700, 1400]. If the newly computed sample size falls outside this range, it will be reset to the appropriate boundary of the range. For example, if the sample size needed to achieve the desired 80% conditional power is less than 700, the new sample size will be reset to 700. In other words we will not decrease the sample size from what was specified initially. On the other hand, the upper bound of 1400 subjects demonstrates that the sponsor is prepared to increase the sample size up to double the initial investment in order to achieve the desired 80% conditional power. But if 80% conditional power requires more than 1400 subjects, the sample size will be reset to 1400, the maximum allowed. Promising Zone Scale: One can define the promising zone as an interval based on conditional power, test statistic, or estimated δ. The input fields change according to this choice. The decision of altering the sample size is taken based on whether the interim value of conditional power / test statistic / δ lies in this interval or not. Let us keep the default scale which is Conditional Power. Promising Zone: Minimum/Maximum Conditional Power (CP): The sample size will only be altered if the estimate of CP at the interim analysis lies in a pre-specified range, referred to as the “Promising Zone”. Here the promising zone is 0.30 − 0.80. The idea is to invest in the trial in stages. Prior to the interim analysis the sponsor is only committed to a sample size of 700 subjects. If, however, the results at the interim analysis appear reasonably promising, the sponsor would be willing to make a larger investment in the trial and thereby improve the chances of success. Here we have somewhat arbitrarily set the lower bound for a promising interim outcome to be CP = 0.30. An estimate 636 29.3 Sample Size Re-estimation <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 CP < 0.30 at the interim analysis is not considered promising enough to warrant a sample size increase. It might sometimes be desirable to also specify an upper bound beyond which no sample size change will be made. Here we have set that upper bound of the promising zone at CP = 0.80. In effect we have partitioned the range of possible values for conditional power at the interim analysis into three zones; unfavorable (CP < 0.3), promising (0.3 ≤ CP < 0.8), and favorable (CP ≥ 0.8). Sample size adaptations are made only if the interim CP falls in the promising zone at the interim analysis. The promising zone defined on the Test Statistic scale or the Estimated δ scale works similarly. SSR Function in Promising Zone: The behavior in the promising zone can either be defined by a continuous function or a step function. The default is continuous where East accepts the two quantities - (Multiplier, Target CP) and re-estimates the sample size depending upon the interim value of CP/test statistic/effect size. The SSR function can be defined as a step-function as well. This can be done with a single piece or with multiple pieces. For each piece, define the step function in terms of: the interval of CP/test statistic/effect size. This depends upon the choice of promising zone scale. the value of re-estimated sample size in that interval. for single piece, just the total re-estimated sample size is required as an input. If the interim value of CP/ test statistic/effect size lies in the promising zone then the re-estimation will be done using this step function. Let us set the inputs on Sample Size Re-estimation tab as shown below: 29.3 Sample Size Re-estimation 637 <<< Contents 29 * Index >>> Two-Stage Multi-arm Designs using p-value combination Run 10000 simulations and see the Details. Just for the comparison purpose, re-run the simulations but this time, set the multiplier in the Sample Size Re-estimation tab to 1 which means we are not interested in sample size re-estimation. Both the scenarios can also be run by entering two values 1, 2 in the cell for Multiplier. 638 29.3 Sample Size Re-estimation <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 With Sample Size Re-estimation Without Sample Size Re-estimation We observe from the table the power of adaptive implementation is approximately 75% which is almost 8% improvement over the non-adaptive design. This increase in power has come at an average cost of 805-700 = 105 additional subjects. Next we observe from the Zone-wise Averages table that 1563 of 10000 trials (16%) underwent sample size re-estimation and of those 1563 trials, 84% were able to reject the Global null hypothesis. The average sample size, conditional on adaptation is 1376. 29.4 Adding Early Stopping Boundaries One can also incorporate stopping boundaries to stop at the interim early for efficacy or futility. The efficacy boundary can be defined based on Adjusted p-value scale whereas futility boundary can be on Adjusted p-value or δ scale. Click the button on the bottom left corner of the screen. This will take you back to the input window of the last simulation scenario. Go to Boundary tab and set Efficacy and Futility boundaries to “Adjusted p-value”. These boundaries are for 29.4 Adding Early Stopping Boundaries 639 <<< Contents 29 * Index >>> Two-Stage Multi-arm Designs using p-value combination early stopping at look1. As the note on this tab says: If any one adjusted p-value is ≤ efficacy p-value boundary then stop the trial for efficacy If only all the adjusted p-values are > futility p-value then stop the trial for futility. Else carry forward all the treatments to the next step of treatment selection. Stopping early for efficacy or futility is step which is carried out before treatment selection rules are applied. The simulation output has the same explanation as above except the Lookwise Summary table may have some trials stopped at the first look due to efficacy or futility. 29.5 Monitoring this trial Select the simulation node with SSR implementation and click the invoke the Interim Monitoring dashboard. Click the open the Test Statistic Calculator. Enter the data as shown below: icon. It will icon to Click Recalc to calculate the test statistic as well as the raw p-values. Notice that the p-value for 1mg is 0.095 which is greater than 0.025. We will drop this dose in the second stage. On clicking OK, it updates the dashboard. 640 29.5 Monitoring this trial <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Open the test statistic calculator for the second look and enter the following information and also drop the dose 1mg. Click Recalc to calculate the test statistic as 29.5 Monitoring this trial 641 <<< Contents 29 * Index >>> Two-Stage Multi-arm Designs using p-value combination well as the raw p-values. On clicking OK, it updates the dashboard. Observe that the adjusted p-value for 10mg crosses the efficacy boundary. It can also be observed in the Stopping Boundaries chart. 642 29.5 Monitoring this trial <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 29.5 Monitoring this trial 643 <<< Contents * Index >>> 30 30.1 Logistic Regression with Single Normal Covariate Binomial Superiority Regression Logistic regression is widely used for modeling the probability of a binary response in the presence of covariates. In this section we will show how East may be used to design clinical trials with binomial endpoints, while adjusting for the effects of covariates through the logistic regression model. The sample size calculations for the logistic regression models discussed here and implemented in East are based on the methods of Hsieh et al., 1997. We note, however, that these methods are limited to continuous covariates only. When the covariate is normal, the log odds value β1 is zero if and only if the group means between the two response categories are the same assuming equal variances. Suppose in a logistic regression model, Y is a binary response variable and X1 is a covariate related to Y . The model is given by log( P ) = β0 + β1 X1 1−P (30.1) where P = P (Y = 1). The null hypothesis that the coefficient of the covariate β1 is zero is tested against the two sided alternative hypothesis that β1 is not equal to zero. The slope coefficient β1 is the change in log odds for every one unit increase in X1 . The sample size required for a two sided test with type-I error rate of α to have a power 1 − β is n= (Z1− α2 + Z1−β )2 P1 (1 − P1 )β ∗2 (30.2) Where β ∗ is the effect size to be tested, P1 is the event rate at the mean of X and Zu is the upper u-th percentile of the standard normal distribution. 30.1.1 Trial Design We use a Department of Veterans Affairs Cooperative Study entitled ’A Psychophysiological Study of Chronic Post-Traumatic Stress Disorder’ to illustrate the preceding sample size calculation for logistic regression with continuous covariates. The study developed and validated a logistic regression model to explore the use of certain psychophysiological measurements for the prognosis of combat-related 644 30.1 Logistic Regression with Single Normal Covariate – 30.1.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 post-traumatic stress disorder (PTSD). In the study, patients’ four psychophysiological measurements-heart rate, blood pressures, EMG and skin conductance- were recorded while patients were exposed to video tapes containing combat and neutral scenes. Among the psychophysiological variables, the difference of the heart rates obtained while viewing the combat and the neutral tapes (DCNHR) is considered a good predictor of the diagnosis of PTSD. The prevalence rate of PTSD among the Vietnam veterans was assumed to be 20 per cent. Therefore, we assumed a four to one sample size ratio for the non-PTSD versus PTSD groups. The effect size of DCNHR is approximately 0.3 which is the difference of the group means divided by the standard deviation. We would like to determine the sample size to achieve 90% power based on a two-sided test at significance level 0.05 (Hsieh et. al.,1998). Start East. Click Design tab, then click Regression in the Discrete group, and then clickLogistic Regression - Odds Ratio. The input dialog box, with default input values will appear in the upper pane of this window. Enter 0.2 in Proportion Success at X = µ, (P0 ) and 1.349 in Odds Ratio P1 (1 − P0 )/P0 (1 − P1 ) field. Enter the rest of the inputs as shown below and click Compute. The design output will be displayed in the Output Preview, with the computed sample size highlighted in yellow. A total of 733 subjects must be enrolled in order to achieve 90% power under the alternative hypothesis. Besides sample size, one can also compute the power and the level of significance for this design. You can select this design by clicking anywhere on the row in the Output Preview. If 30.1 Logistic Regression with Single Normal Covariate – 30.1.1 Trial Design 645 <<< Contents 30 * Index >>> Binomial Superiority Regression you click on icon, some of the design details will be displayed in the upper pane. In the Output Preview toolbar, click the icon, to save this design to workbook Wbk1 in the Library. If you hover the cursor over Des1 in the Library, a tooltip will appear that summarizes the input parameters of the design. With Des1 selected in the Library, click below: 646 icon to see the detailed output as shown 30.1 Logistic Regression with Single Normal Covariate – 30.1.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Observe that this kind output gives us the summary of the output as well. With Des1 selected in the Library, click icon, on the Library toolbar, and then click Power vs. Sample Size. The resulting power curve for this design is shown. You can export the chart in one of several image formats (e.g., Bitmap or JPEG) by clicking Save As.... For now, you may close the chart before continuing 30.1 Logistic Regression with Single Normal Covariate 647 <<< Contents 30 648 * Index >>> Binomial Superiority Regression 30.1 Logistic Regression with Single Normal Covariate <<< Contents * Index >>> 31 31.1 Cohen’s Kappa 31.1.1 Trial Design Agreement In some experimental situations, to check inter-rater reliability, independent sets of measurements are taken by more than one rater and the responses are checked for agreement. For a binary response, Cohen’s Kappa test for binary ratings can be used to check inter-rater reliability. Conventionally, the kappa coefficient is used to express the degree of agreement between two raters when the same two raters rate each of a sample of n subjects independently, with the ratings being on a categorical scale consisting of k categories (Fleiss, 1981). A simple example is given in the below table where two tests Test 1 and Test 2 (k = 2) were performed. In the below table, πij denotes the true population proportion in the i-th row and the j-th column category. Table 31.1: Table of proportions of two raters Test 1\ Test 2 Test 1(+) Test 1(-) Marginal Probability Test 2(+) π11 π21 π.1 Test 2(-) π12 π22 π.2 Marginal Probability π1. π2. 1 The Kappa coefficient (κ) is defined by κ= where π0 = P2 i=1 πii and πe = P2 i=1 π0 − πe 1 − πe (31.1) πi. π.i . We want to test the null hypothesis H0 : κ ≤ κ0 against H1 : κ > κ0 where κ0 > 0. The total sample size required for a test with type-I error rate of α to have a power 1 − β is n= 31.1 Cohen’s Kappa (zα + zβ )2 (E + F − G) [(1 − πe )2 (κ − κ0 )]2 (31.2) 649 <<< Contents 31 * Index >>> Agreement where E= 2 X πii [(1 − πe ) − (π.i + πi. )(1 − π0 )]2 (31.3) i=1 F = (1 − π0 ) 2 2 X X πij (π.i + πj. )2 (31.4) i=1 j6=i and G = [π0 (1 + πe ) − 2πe ]2 (31.5) We can calculate power, sample size or level of significance for your Cohen’s Kappa test for two ratings. 31.1.1 Trial Design Consider responses from two raters. The example is based on a study to develop and validate a set of clinical criteria to identify patients with minor head injury who do not undergo a CT scan (Haydel, et al., 2000). In the study, CT scan was first reviewed by a staff neuroradiologist. An independent staff radiologist then reviewed 50 randomly selected CT scans and the two sets of responses checked for agreement. Let κ denote the level of agreement. The null hypothesis is H0 : κ = 0.9 versus the one-sided alternative hypothesis H1 : κ < 0.9. We wish to compute the power of the test at the alternative value κ1 = 0.6. We expect each rater to identify 8% of CT scans to be positive. Also we expect 5% of the positive CT scans were rated by both the raters. Start East. Click Design tab, then click Agreement in the Discrete group, and then clickCohen’s Kappa (Two Binary Ratings . 650 31.1 Cohen’s Kappa – 31.1.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 The input dialog box, with default input values will appear in the upper pane of this window. Enter 0.9 in Null Agreement (κ0 ) field. Specify the α = 0.05, sample size and the kappa parameter values as shown below. Enter the rest of the inputs as shown below and click Compute. The design output will be displayed in the Output Preview, with the computed power highlighted in yellow. The power of the test is 64.9% given a sample size of 50 scans to establish agreement of ratings by the two radiologists. Besides power, one can also compute the sample size for this study design. You can select this design by clicking anywhere on the row in the Output Preview. If you click icon, some of the design details will be displayed in the upper pane. In 31.1 Cohen’s Kappa – 31.1.1 Trial Design 651 <<< Contents 31 * Index >>> Agreement the Output Preview toolbar, click icon, to save this design to workbook Wbk1 in the Library. If you hover the cursor over Des1 in the Library, a tooltip will appear that summarizes the input parameters of the design. With Des1 selected in the Library, click icon on the Library toolbar, and then click Power vs. Sample Size. The resulting power curve for this design is shown. You can export the chart in one of several image formats (e.g., Bitmap or JPEG) by clicking Save As.... For now, you may close the chart before continuing 652 31.1 Cohen’s Kappa – 31.1.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 31.2 Cohen’s Kappa (C Ratings) Let κ denotes the measure of agreement between two raters who each classify n objects into C mutually exclusive ratings (categories). Here the null hypothesis is H0 : κ = κ0 is tested against two-sided hypothesis H1 : κ 6= κ0 or one sided hypothesis H1 : κ > κ0 or H1 : κ < κ0 . The total sample size required for a test with type-I error rate of α to have a power 1 − β when κ = κ1 is n≥[ Z1−α max τ (κ̂|κ = κ0 ) + Z1−β max τ (κ̂|κ = κ1 ) ] κ1 − κ0 (31.6) Where 1 τ (κ̂) = (Q1 + Q2 − 2Q3 − Q4 ) 2 (1 − πe )2 (31.7) and 31.2 Cohen’s Kappa (C Ratings) 653 <<< Contents 31 * Index >>> Agreement Q1 = π0 (1 − πe )2 , PC PC Q2 = (1 − π0 )2 i=1 j=1 πij (πi. + π.j )2 , PC Q3 = 2(1 − π0 )(1 − πe ) i=1 πij (πi. + π.j ), Q4 = (π0 πe − 2πe + π0 )2 . πij is the proportion of subjects that Rater 1 places in category i but Rater 2 places in category j, π0 is the proportion of agreement and πe is the expected proportion of agreement. The power of the test is given by √ Power = Φ[ 31.2.1 n(κ1 − κ0 ) − Z1−α max τ (κ̂|κ = κ0 ) ] max τ (κ̂|κ = κ1 ) (31.8) Trial Design Consider a hypothetical problem of physical health ratings from two different raters-health instructor and subject’s general practitioner. 360 subjects were randomly selected and the two sets of responses were checked for agreement. Let κ denote the level of agreement. The null hypothesis is H0 : κ = 0.6 versus the one-sided alternative hypothesis H1 : κ < 0.6. We wish to compute the power of the test at the alternative value κ1 = 0.5. Table 31.2: Table: Contingency Table General Petitioner \ Health Instructor Poor Fair Good Excellent Total Poor 2 9 4 1 16 Fair 12 35 36 8 91 Good 8 43 103 30 184 Excellent 0 7 40 22 69 Total 22 94 183 61 360 Start East. Click Design tab, then click Agreement in the Discrete group, and then clickCohen’s Kappa (Two Categorical Ratings . 654 31.2 Cohen’s Kappa (C Ratings) – 31.2.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 The input dialog box, with default input values will appear in the upper pane of this window. Enter Number of Ratings (C) as 4. Enter 0.6 in Null Agreement (κ0 ) field and 0.5 in Alternative Agreement (κ1 ) . Click Marginal Probabilities and specify the marginal probabilities calculated from the above table. Specify the sample size. Leave all other values as defaults, and click Compute. The design output will be displayed in the Output Preview, with the computed power highlighted in yellow. The power of the test is 73.3% given a sample size of 360 subjects to establish agreement of ratings by the two raters. Besides power, one can also compute the sample size for this study design. 31.2 Cohen’s Kappa (C Ratings) – 31.2.1 Trial Design 655 <<< Contents 31 * Index >>> Agreement You can select this design by clicking anywhere on the row in the Output Preview. If you click icon, some of the design details will be displayed in the upper pane. In the Output Preview toolbar, click icon to save this design to workbook Wbk1 in the Library. If you hover the cursor over Des1 in the Library, a tooltip will appear that summarizes the input parameters of the design. With Des1 selected in the Library, click icon on the Library toolbar, and then click Power vs. Sample Size. The resulting power curve for this design is shown. You can export the chart in one of several image formats (e.g., Bitmap or JPEG) by clicking Save As.... For now, you may close the chart before continuing. 656 31.2 Cohen’s Kappa (C Ratings) <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 31.2 Cohen’s Kappa (C Ratings) 657 <<< Contents * Index >>> 32 Dose Escalation This chapter deals with the design, simulation, and interim monitoring of Phase 1 dose escalation trials. A brief overview of the designs is given below; more technical details are available in the Appendix N. One of the primary goals of Phase I trials in oncology is to find the maximum tolerated dose (MTD). Currently, the vast majority of such trials have employed traditional dose escalation methods such as the 3+3 design. The 3+3 design starts by allocating three patients typically to the lowest dose level, and then adaptively moves up and down in subsequent cohorts until either the MTD is obtained, or the trial is stopped for excessive toxicity. In addition to the 3+3, East also provides the Continual Reassessment Method (CRM), the modified Toxicity Probability Interval (mTPI) method, and the Bayesian logistic regression model (BLRM) for single agent designs. Compared to the 3+3, these modern methods may offer a number of advantages, which can be explored systematically via simulation and interim monitoring. The CRM (Goodman et al., 1995; O’Quigley et al., 1990) is a Bayesian model-based method that uses all available information from all doses to guide dose assignment. One first specifies a target toxicity, a one-parameter dose response curve and corresponding prior distribution. The posterior mean and predictions for the probability of toxicity at each dose are updated as the trial progresses. The next recommended dose is the one whose toxicity probability is closest to the target toxicity. The mTPI method (Ji et al., 2010) is Bayesian like the CRM, but rule-based like the 3+3. In this way, the mTPI represents a useful compromise between the other methods. An independent beta distribution is assumed for the probability of toxicity at each dose. A set of decision intervals are specified, and subsequent dosing decisions (up, down, or stay) are determined by computing the normalized posterior probability in each interval at the current dose. The normalized probability for each interval is known as the unit probability mass (UPM). A more advanced version of the CRM is the BLRM (Neuenschwander et al., 2008; Sweeting et al., 2013), which assumes a two-parameter logistic dose response curve. In addition to a target toxicity, one specifies a set of decision intervals, and optional associated losses, for guiding dosing decisions. For dual-agent combination designs, East provides a combination version of the BLRM (Neuenschwander et al., 2014), as well as the PIPE (product of independent beta probabilities escalation) method (Mander & Sweeting, 2015). 658 <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 32.1 3+3 32.1.1 Simulation 32.1.2 Interim Monitoring 32.1.1 Simulation Click Discrete: Dose Escalation on the Design tab, and then click Single Agent Design: 3+3. This window is the Input dialog box, which is separated into three tabs: Design Parameters, Response Generation, and Simulation Control. First, you may specify the Max. Number of Doses as 7. In the Design Parameters tab, enter 30 as the Max. Sample Size. For the 3+3 design, the Cohort Size is fixed at 3. There are three variants of 3+3 offered: L and H and L(modified). The key differences between these variants can be seen in the respective Decision Rules table. Select 3+3 32.1 3+3 – 32.1.1 Simulation 659 <<< Contents 32 * Index >>> Dose Escalation L. You also have the option of starting with an Accelerated Titration design (Simon et al., 1997), which escalates with single-patient cohorts until the first DLT is observed, after which the cohort is expanded at the current dose level with two more patients. In the Response Generation tab, you can specify a set of true dose response curves from which to simulate. For the Starting Dose, select the lowest dose (5). In the row titled Dose, you can specify the dose levels (e.g., in mg). In the row titled GC1, you can edit the true probabilities of toxicity at each dose. You can also rename the profile by directly editing that cell. For now, leave all entries at their default values. You can add a new profile generated from a parametric curve family. For example, click on the menu Curve Family and select Emax. You may construct a 660 32.1 3+3 – 32.1.1 Simulation <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 four-parameter Emax curve by adjusting its parameters, then click Add Profile. Click Plot Profiles to plot the two dose toxicity curves in this grid. In the Simulation Control tab, check the boxes corresponding to Save summary statistics and Save subject-level data. These options will provides access to several charts derived from these more detailed levels of simulated data. If you wish to display subject-level plots for more than one simulation, you can increase the number. For now, leave this at 1 to save computation time. 32.1 3+3 – 32.1.1 Simulation 661 <<< Contents 32 * Index >>> Dose Escalation You may also like to examine the Local Options button of the input window toolbar. This gives you different options for computing average allocations for each dose. Click Simulate. East will simulate data generated from the two profiles you specified, and apply the 3+3 design to each simulation data set. Once completed, the two simulations will appear as two rows in the Output Preview. Select both rows in the Output Preview and click the icon in the toolbar. The two simulations will be displayed side by side in the Output Summary. In the Output Preview toolbar, click the 662 32.1 3+3 – 32.1.1 Simulation icon to save both simulations to the <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Library. Double-click Sim1 in the Library to display the simulation output details. With Sim1 selected in the Library, click the Plots icon to access a wide range of available plots. Below is an example of the MTD plot, showing the percentage of simulations that each dose level was selected as the MTD. The ”true” MTD is displayed as the second dose level. This is the dose whose true probability of DLT 32.1 3+3 – 32.1.1 Simulation 663 <<< Contents 32 * Index >>> Dose Escalation (0.1) was closest to and below the target probability (1/6). Another useful plot is that showing the possible sample sizes, shown as percentages over all simulations. Close each plot after viewing, or save them by clicking Save in Workbook. Finally, to save the workbook to disk, right-click Wbk1 in the Library and then Save 664 32.1 3+3 – 32.1.1 Simulation <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 As.... 32.1.2 Interim Monitoring Right-click one of the Simulation nodes in the Library, and select Interim Monitoring. This will open an empty interim monitoring dashboard. Click Enter Interim Data to open a window in which to enter data for the first cohort: in particular, the Dose Assigned and the DLTs Observed. Click OK to continue. 32.1 3+3 – 32.1.2 Interim Monitoring 665 <<< Contents 32 * Index >>> Dose Escalation The dashboard will be updated accordingly, and the next Recommended Dose is 10. Click Enter Interim Data again, with 10 selected as Dose Assigned, enter 2 for DLTs Observed, and click OK. . 666 32.1 3+3 – 32.1.2 Interim Monitoring <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 East now recommends de-escalation to 5. Click Enter Interim Data, with 5 selected as Dose Assigned, enter 1 for DLTs Observed, and click OK. East recommends that you stop the trial. . Click Stop Trial to generate a table for final inference. . 32.1 3+3 – 32.1.2 Interim Monitoring 667 <<< Contents 32 32.2 * Index >>> Dose Escalation Continual Reassessment Method (CRM) 32.2.1 Simulation 32.2.2 Interim Monitoring 32.2.1 Simulation Click Discrete: Dose Escalation on the Design tab, and then click Single Agent Design: Continual Reassessment Method. This window is the Input dialog box, which is separated into four tabs: Design Parameters, Stopping Rules, Response Generation, and Simulation Control. In the Design Parameters tab, enter 30 as the Maximum Sample Size, and 3 for Cohort Size. If you were to check the box Start With, then you would be simulating from the 3+3 or Accelerated Titration design first, before switching to the CRM. For this tutorial, however, leave the box unchecked. Enter 0.25 for the Target Probability of Toxicity, and 0.3 for the Target Probability Upper Limit. This will ensure that the next dose assignment is that whose posterior mean toxicity probability is closest to 0.25, and below 0.3. Click the Posterior Sampling... button. By default, CRM requires the posterior mean only. If instead you wish to sample from the posterior distribution (using a 668 32.2 Continual Reassessment Method (CRM) – 32.2.1 Simulation <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Metropolis-Hastings algorithm), you will be able to compute and plot the posterior probabilities of being the MTD for each dose. Note that this option will increase the simulation time. Click the Dose Skipping... button. As was recommended in later variations of CRM, in the interests of promoting safety, leave the default options: No untried doses will be skipped while escalating, and no dose escalation will occur when the most recent subject experienced a DLT. For Model Type, select Power, with a Gamma(α = 1,β = 1) prior for θ. Other model types available include the Logistic and the Hyperbolic Tangent. Finally, for the prior probabilities of toxicity of all doses (known as the skeleton), enter: 0.05, 0.1, 32.2 Continual Reassessment Method (CRM) – 32.2.1 Simulation 669 <<< Contents 32 * Index >>> Dose Escalation 0.2, 0.3, 0.35, 0.4, and 0.45. Click the icon to generate a chart of the 95% prior intervals at each dose for probability of DLT. In the Stopping Rules tab, you may specify various rules for stopping the trial. Enter 670 32.2 Continual Reassessment Method (CRM) – 32.2.1 Simulation <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 the following inputs as below. The early stopping rules are divided into two types: Those where the MTD is not determined, and those where the MTD is determined. The former case may arise when the MTD is estimated to be below the lowest dose or above the highest dose. Thus, if the posterior probability of overdosing (toxicity at the lowest dose is greater than target toxicity) exceeds 0.8, then the trial will be stopped. Similarly, if the posterior probability of underdosing (toxicity at the highest dose is lower than target toxicity) exceeds 0.9, then the trial will be stopped. A minimum of 6 subjects will need to be observed on a dose before either of these two rules is activated. A further stopping rule is based on the Allocation Rule: If the number of subjects already allocated to the current MTD is at least 9, the trial will be stopped. In the Response Generation tab, you can specify a set of true dose response curves from which to simulate. For the Starting Dose, select the lowest dose (5). Leave the default profile as shown below. If you wish to edit or add additional profiles (dose 32.2 Continual Reassessment Method (CRM) – 32.2.1 Simulation 671 <<< Contents 32 * Index >>> Dose Escalation response curves), see the corresponding section for the 3+3 design. In the Simulation Control tab, check the boxes corresponding to Save summary statistics and Save subject-level data. These options will provides access to several charts derived from these more detailed levels of simulated data. If you wish to display subject-level plots for more than one simulation, you can increase the number. For now, leave this at 1 to save computation time. Click Simulate to simulate the CRM design. In the Output Preview toolbar, click the icon to save the simulation to the Library. Double-click the simulation node in the Library to display the simulation output details. Click the Plots icon in the Library to access a wide range of available plots. Below is an example of the MTD plot, showing the percentage of simulations that each dose level was selected as the MTD. The true MTD is displayed as the third dose level (15). This is the dose whose true probability of DLT (0.2) was closest to and below the 672 32.2 Continual Reassessment Method (CRM) – 32.2.1 Simulation <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 target probability (0.25). 32.2.2 Interim Monitoring Right-click the Simulation node in the Library, and select Interim Monitoring. This will open an empty interim monitoring dashboard. Click Enter Interim Data to open a window in which to enter data for the first cohort: in particular, the Dose Assigned and the DLTs Observed. Click OK to continue. Continue in this manner by clicking Enter Interim Data, entering the following doses, and the corresponding number of DLTs. 32.2 Continual Reassessment Method (CRM) – 32.2.2 Interim Monitoring 673 <<< Contents 32 * Index >>> Dose Escalation If you click Display by Dose, you will see the data grouped by dose level. You may click Display by Cohort to return to the original view. After each cohort, East will update the Interim Monitoring Dashboard. You may replace the IM dashboard plots with any other plots or corresponding tables, by clicking on the associated icons at the top left of each panel. At this point, East recommends that you stop the trial. . 674 32.2 Continual Reassessment Method (CRM) – 32.2.2 Interim Monitoring <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Click Stop Trial to generate a table for final inference. . 32.3 modified Toxicity Probability Interval (mTPI) 32.3.1 Simulation 32.3.2 Interim Monitoring 32.3.1 Simulation Click Discrete: Dose Escalation on the Design tab, and then click Single Agent Design: Modified Toxicity Probability Interval. This window is the Input dialog box, which is separated into five tabs: Design Parameters, Stopping Rules, Trial Monitoring Table, Response Generation, and Simulation Control. In the Design Parameters tab, enter 30 as the Maximum Sample Size, and 3 for Cohort Size. If you were to check the box Start With, then you would be simulating from the 3+3 or Accelerated Titration design first, before switching to the mTPI. For this tutorial, however, leave the box unchecked. Enter 0.25 for the Target Probability of Toxicity, 0.2 for the upper limit of the Under 32.3 modified Toxicity Probability Interval (mTPI) – 32.3.1 Simulation 675 <<< Contents 32 * Index >>> Dose Escalation dosing interval, and 0.3 for the upper limit of Proper dosing interval. These entries imply that toxicity probabilities within this interval [0.2 to 0.3] can be regarded as equivalent to the target toxicity (0.25) as far as dosing decisions are concerned. Finally, we will assume a uniform Beta(a = 1, b = 1) prior distribution for all doses. In the Stopping Rules tab, enter the following inputs as below. For the mTPI design, the stopping rule is based on dose exclusion rules. This states that if there is greater than a 0.95 posterior probability that toxicity for a given dose is greater than the target toxicity, that dose and all higher doses will be excluded in subsequent cohorts. When this dose exclusion rule applies to the lowest dose, then all doses are excluded, and hence the trial will be stopped for excessive toxicity. 676 32.3 modified Toxicity Probability Interval (mTPI) – 32.3.1 Simulation <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Furthermore, the dose exclusion rule is not activated until at least 3 subjects are observed on a dose. A similar idea can be applied to the highest dose: If there is a greater than 95% posterior probability that the toxicity at the highest dose is less than the target toxicity, then stop the trial early. The remaining stopping rules allow one to stop the trial early with MTD determined. The Allocation Rule requires a certain number of subjects already allocated to the next recommended dose. The CI Rule requires that the credible interval for probability of DLT at the MTD is within some range. The Target Rule requires that the posterior probability of being in the target toxicity, or proper dosing interval, exceeds some threshold. Finally, any of these rules can be combined with Minimum Ss Observed in the Trial. In the Trial Monitoring Table tab, you can view the decision table corresponding to the inputs entered in the previous tabs. East also provides the option of creating and simulating from a customized trial monitoring table. If you click Edit Trial Monitoring Table, you can click on any cell 32.3 modified Toxicity Probability Interval (mTPI) – 32.3.1 Simulation 677 <<< Contents 32 * Index >>> Dose Escalation in the grid to edit and change the dose assignment rule for that cell. In the Response Generation tab, you can specify a set of true dose response curves from which to simulate. For the Starting Dose, select the lowest dose (5). Leave the default profile as shown below. If you wish to edit or add additional profiles (dose response curves), see the corresponding section for the 3+3 design. In the Simulation Control tab, check the boxes corresponding to Save summary statistics and Save subject-level data. These options will provides access to several charts derived from these more detailed levels of simulated data. If you wish to display subject-level plots for more than one simulation, you can increase the number. For now, leave this at 1 to save computation time. Click the Local Options button at the top left corner of the input window toolbar. This gives you different options for computing average allocations for each dose, and for computing isotonic estimates. Select the following options and click OK. Click Simulate to simulate the mTPI design. In the Output Preview toolbar, click the 678 32.3 modified Toxicity Probability Interval (mTPI) – 32.3.1 Simulation <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 icon to save the simulation to the Library. Double-click the simulation node in the Library to display the simulation output details. For example, the true MTD was D3 (15), and this dose was selected as MTD the most often (43% of the time). Click the Plots icon in the Library to access a wide range of available plots. 32.3.2 Interim Monitoring Right-click one of the Simulation nodes in the Library, and select Interim Monitoring. This will open an empty interim monitoring dashboard. In the interim monitoring toolbar, click the chart icon, and Trial Monitoring Table to generate a table to guide dosing decisions for this trial. Click Enter Interim Data to open a window in which to enter data for the first cohort: 32.3 modified Toxicity Probability Interval (mTPI) – 32.3.2 Interim Monitoring 679 <<< Contents 32 * Index >>> Dose Escalation in particular, the Dose Assigned and the DLTs Observed. Click OK to continue. The dashboard will be updated accordingly. The decision for the next cohort is based on the highest Unit Probability Mass (UPM): the posterior probability for each toxicity interval divided by the length of the interval. The underdosing interval corresponds to an E (Escalate) decision, the proper dosing interval corresponds to an S (Stay) decision, and the overdosing interval corresponds to a D (De-escalate) decision. In this case, the UMP for underdosing is highest. Thus, the recommendation is to escalate to dose 10. Continue in this manner by entering data for each subsequent cohort, and observe how the interim monitoring dashboard updates. 680 32.3 modified Toxicity Probability Interval (mTPI) – 32.3.2 Interim Monitoring <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 One example is given below. After each cohort, East will update the Interim Monitoring Dashboard. You may replace the IM dashboard plots with any other plots or corresponding tables, by clicking on the associated icons at the top left of each panel. Suppose we wished to end the study after 8 cohorts (24 patients). Click Stop Trial to end the study and generate a table of final inference. 32.4 Bayesian logistic regression model (BLRM) 32.4.1 Simulation 32.4.2 Interim Monitoring 32.4.1 Simulation Click Discrete: Dose Escalation on the Design tab, and then click Single Agent 32.4 Bayesian logistic regression model (BLRM) – 32.4.1 Simulation 681 <<< Contents 32 * Index >>> Dose Escalation Design: Bayesian Logistic Regression Model. This window is the Input dialog box, which is separated into four tabs: Design Parameters, Stopping Rules, Response Generation, and Simulation Control. In the Design Parameters tab, enter 30 as the Maximum Sample Size, and 3 for Cohort Size. If you were to check the box Start With, then you would be simulating from the 3+3 or Accelerated Titration design first, before switching to the BLRM. For this tutorial, however, leave the box unchecked. The next step is to choose a Dose Selection Method: either by Bayes Risk or by Max Target Toxicity. For the next cohort, the Bayes risk method selects the dose that minimizes the posterior expected loss, aka Bayes risk. In contrast, Max Target Toxicity method selects the dose that maximizes the posterior probability of targeted toxicity. For both methods, the dose selected must not exceed the EWOC (Escalation With Overdose Control) threshold: the posterior probability of overdosing, 682 32.4 Bayesian logistic regression model (BLRM) – 32.4.1 Simulation <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 either excessive or unacceptable toxicity, is less than the threshold (e.g., 0.25). Recall that the BLRM method applies the following model: logit(πd ) = log(α) + β log(d/d∗ ) (32.1) The Reference Dose (D*) is the dose at which the odds of observing a DLT is α. Click the Dose Skipping button, and select Allow skipping any doses / No Restrictions. You can specify the prior directly in terms of a bivariate normal distribution for log(α) 32.4 Bayesian logistic regression model (BLRM) – 32.4.1 Simulation 683 <<< Contents 32 * Index >>> Dose Escalation and log(β). Alternatively, if you click Prior Calculator, a calculator will appear allowing you to specify a prior indirectly by one of three methods: (1) lowest dose and reference dose, (2) lowest dose and highest dose, or (3) lowest dose and MTD. Click Recalc to convert the prior inputs into matching bivariate normal parameter values, and click OK to paste these values into the input window. Appendix N of the manual, and Appendix A of Neuenschwander et al. (2008) describes some of these methods. Click the 684 icon to generate a chart of the 95% prior intervals at each dose for 32.4 Bayesian logistic regression model (BLRM) – 32.4.1 Simulation <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 probability of DLT. Click Posterior Sampling Methods to select from one of two methods: Metropolis Hastings, or direct Monte Carlo. For this tutorial, click OK to select Direct. In the Stopping Rules tab, you can specify multiple rules for stopping the trial. The trial is stopped early and MTD not determined if there is evidence of underdosing. This rule is identical to that from mTPI: If there is a greater than some threshold posterior probability that the toxicity at the highest dose is less than the target toxicity, then stop the trial early. The remaining stopping rules allow one to stop the trial early with MTD determined. The Allocation Rule requires a certain number of subjects already allocated to the 32.4 Bayesian logistic regression model (BLRM) – 32.4.1 Simulation 685 <<< Contents 32 * Index >>> Dose Escalation next recommended dose. The CI Rule requires that the credible interval for probability of DLT at the MTD is within some range. The Target Rule requires that the posterior probability of being in the target toxicity exceeds some threshold. Finally, any of these rules can be combined with Minimum Ss Observed in the Trial. Check the appropriate boxes and enter values as below. In the Response Generation tab, you can specify a set of true dose response curves from which to simulate. For the Starting Dose, select the lowest dose (5). Leave the default profile as shown below. If you wish to edit or add additional profiles (dose response curves), see the corresponding section for the 3+3 design. In the Simulation Control tab, check the boxes corresponding to Save summary statistics, Save subject-level data, and Save final posterior samples. These options will provides access to several charts derived from these more detailed levels of simulated data. If you wish to display subject-level plots, or posterior distribution plots, for more than one simulation, you can increase the number. For now, leave both of these at 1 to save computation time. Click Simulate to simulate the BLRM design. In the Output Preview toolbar, click 686 32.4 Bayesian logistic regression model (BLRM) – 32.4.1 Simulation <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 the icon to save the simulation to the Library. Double-click the simulation node in the Library to display the simulation output details. Click the Plots icon in the Library to access a wide range of available plots. 32.4.2 Interim Monitoring Right-click the Simulation node in the Library, and select Interim Monitoring. This will open an empty interim monitoring dashboard. Click Enter Interim Data to open a window in which to enter data for the first cohort: in particular, the Dose Assigned and the DLTs Observed. Click OK to continue. The dashboard will be updated accordingly. The acceptable dose range is on a continuous scale between the minimum and maximum doses. The upper limit of the acceptable dose range is the largest dose whose probability of overdosing is less than the EWOC threshold. The lower limit of the acceptable range is the dose whose DLT rate is equal to the lower limit of the targeted toxicity interval. When the computed lower limit exceeds the recommended 32.4 Bayesian logistic regression model (BLRM) – 32.4.2 Interim Monitoring 687 <<< Contents 32 * Index >>> Dose Escalation dose, it is set to the recommended dose. In the IM toolbar, click the Plots icon, then Interval Probabilities by Dose and Panel. Notice that for all doses greater than or equal to 25, the posterior probability of overdosing exceeds the EWOC threshold (0.25). Of the remaining doses, dose 15 maximizes the probability of targeted toxicity, and is therefore the next recommended 688 32.4 Bayesian logistic regression model (BLRM) – 32.4.2 Interim Monitoring <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 dose. In the IM toolbar, click the Plots icon, then Predictive Distribution of Number of DLTs. You can enter a planned cohort size and select a next dose, to plot the posterior predictive probability of the number of DLTs to be observed in next cohort. 32.4 Bayesian logistic regression model (BLRM) – 32.4.2 Interim Monitoring 689 <<< Contents 32 * Index >>> Dose Escalation After each cohort, East will update the Interim Monitoring Dashboard. You may replace the IM dashboard plots with any other plots or corresponding tables, by clicking on the associated icons at the top left of each panel. Continue entering data for each subsequent cohort, and observe how the interim monitoring dashboard updates. One example is given below. Click Stop Trial to generate final inference table. 32.5 Bayesian logistic regression model for dual-combination (comb2BLRM) 32.5.1 Simulation 32.5.2 Interim Monitoring 690 32.5.1 Simulation Click Discrete: Dose Escalation on the Design tab, and then click Two Agents 32.5 BLRM Dual Combination – 32.5.1 Simulation <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Design: Bayesian Logistic Regression Model for Dual-Combination. Set the Max. Number of Doses as 4 for both Agent 1 and Agent 2, the Max. Sample Size as 60, the Cohort Size as 3. Set the target toxicity interval to 16-35%, with an EWOC criterion of 0.25. Set the reference doses to 290 and 20 for Agents 1 and 2, respectively. Click the button for Dose Skipping. These options imply that the dose of only one compound can be increased for the next cohort (no diagonal escalation), with a 32.5 BLRM Dual Combination – 32.5.1 Simulation 691 <<< Contents 32 * Index >>> Dose Escalation maximum increment of 100 The prior distribution is an extension of that for the single-agent BLRM, but includes a normal prior for the interaction term. As with the single-agent BLRM, you can use the calculator to transform prior information on particular dose levels to a bivariate normal for either Agent 1 or Agent 2.In this tutorial, we will simply enter the following values adapted from Neuenschwander et al. (2015). In the Stopping Rules tab, you may specify various rules for stopping the trial. The logical operators (And/Or) follow left-to-right precedence, beginning with the top-most rule in the table. The order of the stopping rules is determined by the order of selection. Enter the following inputs as below. Select the Minimum Ss rule first, followed by the Target Rule, followed by the Allocation Rule. Be sure to select the appropriate logical operators. This combination of rules implies the MTD dose combination declared will meet the following conditions: (1) At least 6 patients have already been allocated to this combination, and (2) This dose satisfies one of the following: (i) The 692 32.5 BLRM Dual Combination – 32.5.1 Simulation <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 probability of targeted toxicity at this combination exceeds 0.5, or (ii) A minimum of 15 subjects have already been observed in the trial. In the Response Generation tab, enter the following inputs. Make sure that the starting dose combination is the lowest dose level for each agent. In the Simulation Control tab, select the following options. In this tutorial, we will run only 1000 simulations. Click Simulate. 32.5 BLRM Dual Combination – 32.5.1 Simulation 693 <<< Contents 32 * Index >>> Dose Escalation In the Output Preview toolbar, click the icon to Sim1 to the Library. Double-click Sim1 in the Library to display the simulation output details. With Sim1 selected in the Library, click the Plots icon to access a wide range of available plots. Below is an example of the MTD plot, showing the percentage of simulations that each dose combination was selected as the MTD. The combinations whose true DLT rates were below, within, and above the target toxicity interval (0.16 − 0.35) are colored blue, green, and red, respectively. 32.5.2 Interim Monitoring Right-click the Simulation node in the Library, and select Interim Monitoring. This will open an empty interim monitoring dashboard. Click Enter Interim Data to open a window in which to enter data for the first cohort: in particular, the Dose Assigned for 694 32.5 BLRM Dual Combination – 32.5.2 Interim Monitoring <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 each agent, the Subjects Allocated and the DLTs Observed. Click OK to continue. The next recommended dose is 100 mg for Agent 1 and 20 mg for Agent 2. Recall that the dose skipping constraints are that the dose increment cannot exceed 100% of the current dose, and that only one compound can be increased. Of the eligible dose combinations, the recommended one has the highest probability of 32.5 BLRM Dual Combination – 32.5.2 Interim Monitoring 695 <<< Contents 32 * Index >>> Dose Escalation targeted toxicity. You may replace the IM dashboard plots with any other plots or corresponding tables, by clicking on the associated icons at the top left of each panel. For example, change the left-hand plot to Dose Limiting Toxicity to view the number of subjects and DLTs 696 32.5 BLRM Dual Combination – 32.5.2 Interim Monitoring <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 observed at each dose combination. Continue in this manner by clicking Enter Interim Data, entering the following doses, and the corresponding number of DLTs. The recommended MTD combination is 200 mg for Agent 1 and 30 mg for Agent 2. 32.5 BLRM Dual Combination – 32.5.2 Interim Monitoring 697 <<< Contents 32 32.6 * Index >>> Dose Escalation Product of Independent beta Probabilities dose Escalation (PIPE) 32.6.1 Simulation One of the core concepts underlying the PIPE method is the maximum tolerated contour (MTC), a line partitioning the dose combination space into toxicity probabilities either less than or greater than the target. The recommended dose combination at the end of the trial is the dose combination closest from below to the MTC. The following figures from Mander and Sweeting (2015) illustrate the MTC, and the related concepts of admissible dose combinations (adjacent or closest) and dose skipping options (neighborhood vs non-neighborhood constraint). This figure below shows six monotonic MTCs for two agents, each with two dose levels. After each cohort, the most likely contour is selected before applying a dose selection strategy. The next dose combination is chosen from a set of admissible doses, which are either closest to the most likely contour, or adjacent. In the figure below, all the (X) and (+) symbols are considered adjacent. Of these, the (X) symbols represent the closest doses. 698 32.6 PIPE – 32.6.1 Simulation <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Of the admissible doses, the next dose combination chosen is that with the minimum sample size, where sample size is defined as the prior and trial sample size combined. The weighted randomization method selects one of the admissible doses at random, with selection probabilities weighted by the inverse of their sample size. For dose skipping options, one can choose between a neighborhood constraint, or a non-neighborhood constraint. The neighborhood constraint restricts the set of admissible doses to those a single dose level higher or lower than the current dose for both agents, while the non-neighborhood constraint restricts the set of admissible doses to a single dose level higher or lower than any previously administered dose combination. This figure below illustrates the neighborhood constraint, at two different cohorts. Only those combinations within the dashed box are admissible. The asterisk symbol on the left represents the admissible dose combination closest to the MTC. 32.6 PIPE – 32.6.1 Simulation 699 <<< Contents 32 * Index >>> Dose Escalation This figure below illustrates the non-neighborhood constraint. The set of admissible doses is now larger because all previously administered doses are included. Finally, there is a safety constraint threshold to avoid overdosing. Averaging over the posterior distribution of all monotonic contours, the expected probability of being above the MTC is calculated for all dose combinations. Those dose combinations whose expected probabilities exceed the safety threshold are excluded from the admissible set. 700 32.6 PIPE – 32.6.1 Simulation <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Click Discrete: Dose Escalation on the Design tab, and then click Two Agents Design: Product of Independent Beta Probabilities Dose Escalation. In the Design Parameters tab, select the following options. In addition to the Closest and Adjacent options for Admissible Dose Combinations, there is also an Interval option. This allows you to specify a margin around the target toxicity level to define the admissible dose set, rather than relying on the MTC. Dose combinations whose posterior mean toxicity risk lies in the specified target interval (PT ± ) are considered admissible. For the prior specification, enter the following values. When entering the same prior 32.6 PIPE – 32.6.1 Simulation 701 <<< Contents 32 * Index >>> Dose Escalation sample size for each dose combination, a value of 1 considered a strong prior, whereas a value of 1 divided by the number of combinations can be considered a weak prior (Mander & Sweeting, 2015). In the Stopping Rules tab, there are a number of options similar to those from other designs. However, for this tutorial, leave these options unchecked. Similarly, leave the default options in the Response Generation tab. In this tutorial, the true probabilities of toxicity will be in agreement with the prior medians specified 702 32.6 PIPE – 32.6.1 Simulation <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 above. In the Simulation Controls tab, you can run 1000 simulations, although the PIPE method runs relatively quickly. In the Output Preview toolbar, click the icon to save the simulation to the Library. Double-click the simulation node in the Library to display the simulation output details. In the MTD Analysis table, you can see that the (Agent 1, Agent 2) dose combinations selected most often as MTD were: (300, 10) at 22.1% and (300, 20) at 20.8%. The true 32.6 PIPE – 32.6.1 Simulation 703 <<< Contents 32 * Index >>> Dose Escalation probabilities of toxicity at these combinations were 0.24 and 0.28, respectively. 32.6.2 Interim Monitoring Right-click the Simulation node in the Library, and select Interim Monitoring. This will open an empty interim monitoring dashboard. Click Enter Interim Data to open a window in which to enter data for the first cohort: in particular, the Dose Assigned for 704 32.6 PIPE – 32.6.2 Interim Monitoring <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 each agent, the Subjects Allocated and the DLTs Observed. Click OK to continue. The next recommended dose is 200 mg for Agent 1 and 20 mg for Agent 2. Recall that the dose skipping constraints allow for diagonal escalation (that is, escalation on both agents at the same time), but the neighborhood constraint restricts the set of admissible doses to a single dose level higher or lower than the current dose. Given these constraints, the dose combination (200, 10) is the only combination closest to the most probable MTC. The MTC plot allows you to view the most probable MTC, the current dose, the 32.6 PIPE – 32.6.2 Interim Monitoring 705 <<< Contents 32 * Index >>> Dose Escalation recommended dose, and all tried doses. You may replace the IM dashboard plots with any other plots or corresponding tables, by clicking on the associated icons at the top left of each panel. Continue in this manner by clicking Enter Interim Data, entering the following doses, and the corresponding number of DLTs. Click Stop Trial. The recommended MTD combination is 200 mg for Agent 1 and 10 706 32.6 PIPE – 32.6.2 Interim Monitoring <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 mg for Agent 2. The recommended MTD combination must meet three criteria: (i) closest to MTC from below, (ii) have been experimented on, and (iii) below safety threshold. If there is no dose combination satisfying all three criteria, the MTD will be undetermined. 32.6 PIPE – 32.6.2 Interim Monitoring 707 <<< Contents * Index >>> Volume 4 Exact Binomial Designs 33 Introduction to Volume 8 709 34 Binomial Superiority One-Sample – Exact 714 35 Binomial Superiority Two-Sample – Exact 736 36 Binomial Non-Inferiority Two-Sample – Exact 37 Binomial Equivalence Two-Sample – Exact 38 Binomial Simon’s Two-Stage Design 774 751 767 <<< Contents * Index >>> 33 Introduction to Volume 8 This volume describes various cases of clinical trials using binomial endpoints where the asymptotic normal approximation to the test statistic may fail. This is often the case in situations where the trial sample size is too small, however testing and analysis based on the exact binomial distribution would provide valid results. Asymptotic tests may also fail when proportions are very close to the boundary, namely zero or one. These exact methods can be applied in situations where the normal approximation is adequate, in which case the solutions to both the exact and asymptotic method would converge to the same result. Using exact computations, Chapter 34 deals with the design and interim monitoring of a one sample test of superiority for proportion. The first section discusses a fixed and group sequential design in which an observed binomial response rate is compared to a fixed response rate. The following section illustrates how, for a fixed sample, McNemar’s conditional test can be used to compare matched pairs of binomial responses. Chapters 35 through 37 illustrates how to use East to design two-sample exact tests of superiority, non-inferiority and equivalence, including examples for both the difference and ratio of proportions. Chapter 38 describes Simon’s two stage design in an exact setting, which computes the expected minimal sample size of a trial that may be stopped due to futility or continue to a second stage to further study efficacy and safety. It is important to note that all exact tests work with only integer values for sample size, and will override the Design Defaults - Common: Do not round off sample size/events flag in the Options menu. Whenever the Perform Exact Computations check box is selected in the Design Input Output dialog box, resulting sample sizes will be converted to an integer value for all computations, including power and chart/table values. 709 <<< Contents 33 33.1 * Index >>> Introduction to Volume 8 Settings Click the icon in the Home menu to adjust default values in East 6. The options provided in the Display Precision tab are used to set the decimal places of numerical quantities. The settings indicated here will be applicable to all tests in East 6 under the Design and Analysis menus. All these numerical quantities are grouped in different categories depending upon their usage. For example, all the average and expected sample sizes computed at simulation or design stage are grouped together under the category ”Expected Sample Size”. So to view any of these quantities with greater or lesser precision, select the corresponding category and change the decimal places to any value between 0 to 9. The General tab has the provision of adjusting the paths for storing workbooks, files, and temporary files. These paths will remain throughout the current and future sessions even after East is closed. This is the place where we need to specify the installation directory of the R software in order to use the feature of R Integration in East 6. 710 33.1 Settings <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 The Design Defaults is where the user can change the settings for trial design: Under the Common tab, default values can be set for input design parameters. You can set up the default choices for the design type, computation type, test type and the default values for type-I error, power, sample size and allocation ratio. When a new design is invoked, the input window will show these default choices. Time Limit for Exact Computation This time limit is applicable only to exact designs and charts. Exact methods are computationally intensive and can easily consume several hours of computation time if the likely sample sizes are very large. You can set the maximum time available for any exact test in terms of minutes. If the time limit is reached, the test is terminated and no exact results are provided. Minimum and default value is 5 minutes. Type I Error for MCP If user has selected 2-sided test as default in global settings, then any MCP will use half of the alpha from settings as default since MCP is always a 1-sided test. Sample Size Rounding Notice that by default, East displays the integer sample size (events) by rounding up the actual number computed by the East algorithm. In this case, the look-by-look sample size is rounded off to the nearest integer. One can also see the original floating point sample size by selecting the option ”Do not round sample size/events”. 33.1 Settings 711 <<< Contents 33 * Index >>> Introduction to Volume 8 Under the Group Sequential tab, defaults are set for boundary information. When a new design is invoked, input fields will contain these specified defaults. We can also set the option to view the Boundary Crossing Probabilities in the detailed output. It can be either Incremental or Cumulative. Simulation Defaults is where we can change the settings for simulation: If the checkbox for ”Save summary statistics for every simulation” is checked, then East simulations will by default save the per simulation summary data for all the 712 33.1 Settings <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 simulations in the form of a case data. If the checkbox for ”Suppress All Intermediate Output” is checked, the intermediate simulation output window will be always suppressed and you will be directed to the Output Preview area. The Chart Settings allows defaults to be set for the following quantities on East6 charts: We suggest that you do not alter the defaults until you are quite familiar with the software. 33.1 Settings 713 <<< Contents * Index >>> 34 Binomial Superiority One-Sample – Exact This chapter deals with the design and interim monitoring of tests involving binomial response rates using exact computations. Section 34.1 discusses a fixed sample and group sequential design in which an observed binomial response rate is compared to a fixed response rate. In Section 34.2, McNemar’s conditional test for comparing matched pairs of binomial responses for a fixed sample is discussed. 34.1 Binomial OneSample 34.1.1 Trial Design 34.1.2 Interim Monitoring In experimental situations where the variable of interest has a binomial distribution, it may be of interest to determine whether the response rate π differs from a fixed value π0 . Specifically, we wish to test the null hypothesis H0 : π = π0 against one-sided alternatives of the form H1 : π > π0 or H1 : π < π0 . Either the sample size or power is determined for a specified value of π which is consistent with the alternative hypothesis, denoted as π1 . 34.1.1 Trial Design Consider a single-arm oncology trial designed to determine if the tumor response rate for a new cytotoxic agent is at least 15%. Thus it is desired to test the null hypothesis H0 : π = 0.15 against the one-sided alternative hypothesis H1 : π > 0.15. The trial will be designed using a one-sided test that achieves 80% power at π = π1 = 0.25 with a level α = 0.05 test. Single-Look Design To illustrate this example, in East under the Design ribbon for Discrete data, click One Sample and then choose Single Arm Design: Single Proportion: 714 34.1 Binomial One-Sample – 34.1.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 This will launch the following input window: Leave the default values of Design Type: Superiority and Number of Looks: 1. In the Design Parameters dialog box, select the Perform Exact Computations checkbox and enter the following parameters: Test Type: 1 sided Type 1 Error (α): 0.05 Power: 0.8 Sample Size (n): Computed (select radio button) Prop. Response under Null (π0 ): 0.15 Prop. Response under Alt (π1 ): 0.25 34.1 Binomial One-Sample – 34.1.1 Trial Design 715 <<< Contents 34 * Index >>> Binomial Superiority One-Sample – Exact Click Compute. The sample size for this design is calculated and the results are shown as a row in the Output Preview window: The sample size required in order to achieve 80% power is 110 subjects. Note that because of the discreteness involved in performing exact computations, the attained type-1 error is 0.035, less than the specified value of 0.05. Similarly, the attained power is 0.81, slightly larger than the specified value of 0.80. As is standard in East, this design has the default name Des 1. To see a summary of the output of this design, click anywhere in the row and then click the icon in the Output Preview toolbar. The design details will be displayed in the upper pane, labeled Output Summary. This can be saved to the Library by selecting Des 1 and 716 34.1 Binomial One-Sample – 34.1.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 clicking the icon. The design details can be displayed by clicking the icon. The critical point, or the boundary set for the rejection of H0 is 24 (on the # response scale). Therefore out of 110 subjects, if the observed number of patients responding to the new treatment exceeds 24, the null hypothesis will be rejected in favor of declaring the new treatment to be superior. This can also be seen using both a response scale and proportion scale in either the Stopping Boundaries chart or table, available in the 34.1 Binomial One-Sample – 34.1.1 Trial Design 717 <<< Contents 34 * Index >>> Binomial Superiority One-Sample – Exact Library Three-Look Design In order to reach an early decision and enter into comparative trials, conduct this single-arm study as a group sequential trial with a maximum of 3 718 34.1 Binomial One-Sample – 34.1.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 looks. Create a new design by selecting Des1 in the Library, and clicking the icon on the Library toolbar. To generate a study with two interim looks and a final analysis, change the Number of Looks from 1 to 3. A Boundary Info tab will appear, which allows the specification of parameters for the Efficacy and Futility boundary families. By default, an efficacy boundary to reject H0 is selected, however there is no futility boundary to reject H1 . The Boundary Family specified is of the Spending Functions type and the default Spending Function is the Lan-DeMets (Lan & DeMets, 1983), with Parameter OF (O’Brien-Fleming). The default Spacing of Looks is Equal, therefore the interim analyses will be equally spaced by the number of patients accrued between looks. Return to the the Design Parameters dialog box. The binomial parameters π0 = 0.15 34.1 Binomial One-Sample – 34.1.1 Trial Design 719 <<< Contents 34 * Index >>> Binomial Superiority One-Sample – Exact and π1 = 0.25 are already specified. Click Compute to generate this exact design: The maximum sample size is again 110 subjects with 110 also expected under the null hypothesis H0 : π = 0.15 and 91 expected when the true value is π=0.25. Save this design to the Library. 720 34.1 Binomial One-Sample – 34.1.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 The details for Des2 can be displayed by clicking the icon. Here we can see the cumulative sample size and cumulative type 1 error (α) spent at each of the three looks. The boundaries set for the rejection of H0 at each look are 14, 19 and 24 (on the # response scale). For example, at the second look with a cumulative 73 subjects, if the observed number of patients responding to the new treatment exceeds 19, the null hypothesis would be rejected in favor of declaring the new treatment to be superior. In addition, the incremental boundary crossing probabilities under both the null and alternative are displayed for each look. The cumulative boundary stopping probabilities can also be seen in the Stopping Boundaries chart and table. Select Des 2 in the Library, click the icon and 34.1 Binomial One-Sample – 34.1.1 Trial Design 721 <<< Contents 34 * Index >>> Binomial Superiority One-Sample – Exact choose Stopping Boundaries. The default scale is # Response Scale. The Proportion Scale can also be 722 34.1 Binomial One-Sample – 34.1.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 chosen from the drop-down list Boundary Scale in the chart. To examine the Error Spending function click the 34.1 Binomial One-Sample – 34.1.1 Trial Design icon in the Library and 723 <<< Contents 34 * Index >>> Binomial Superiority One-Sample – Exact choose Error Spending. When the sample size for a study is subject to external constraints, power can be computed for a specified maximum sample size. Suppose for the previous design the total sample size is constrained to be at most 80 subjects. Create a new design by editing Des2 in the Library. Change the parameters so that the trial is now designed to compute power for a maximum sample size of 80 subjects, as shown below. 724 34.1 Binomial One-Sample – 34.1.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 The trial now attains only 73.9% power. Power vs Sample size-Sawtooth paradigm Generate the Power vs. Sample Size graph for Des 2. You will get the following power chart which is commonly described in the literature as a sawtooth chart. This chart illustrates that it is possible to have designs with different sample sizes but all with the same power. What is not apparent is that for designs with the same power, the attained significance level may vary. Upon examination, the sample sizes of 43 and 55 seem to have the same power of about 0.525. The data can also be displayed in a 34.1 Binomial One-Sample – 34.1.1 Trial Design 725 <<< Contents 34 * Index >>> Binomial Superiority One-Sample – Exact chart form by selecting the icon in the Library, and can be printed from here as well. Compute the power for two new designs based on Des 2 with sample sizes of 43 and 55 respectively. Although sample sizes of 43 and 55 attain nearly same power, the attained significance levels are different, at 0.049 and 0.031 respectively. Though both are less than the design specification of 0.05, the plan with lower sample size of 43 pays a higher penalty in terms of type-1 error than the plan with a larger sample size of 55. 34.1.2 Interim Monitoring Consider the interim monitoring of Des 2, which has 80% power. Select this design 726 34.1 Binomial One-Sample – 34.1.2 Interim Monitoring <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 from the Library and click the icon. Suppose at the first interim look, when 40 subjects have enrolled, the observed cumulative response is 12. Click the Enter Interim Data button at the top left of the Interim Monitoring window. Enter 40 for the Cumulative Sample Size and 12 34.1 Binomial One-Sample – 34.1.2 Interim Monitoring 727 <<< Contents 34 * Index >>> Binomial Superiority One-Sample – Exact for the Cumulative Response in the Test Statistic Calculator window. At the second interim monitoring time point when 80 subjects have enrolled, suppose the cumulative responses increases to 20. Again click the Enter Interim Data button at the top left of the Interim Monitoring window. Enter 80 for the Cumulative Sample Size and 20 for the Cumulative Response in the Test Statistic Calculator window. This will result in the following message: 728 34.1 Binomial One-Sample – 34.1.2 Interim Monitoring <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 It can be concluded that π > 0.15 and the trial should be terminated. Clicking on Stop results in the final analysis. 34.2 McNemar’s Conditional Exact Test McNemar’s conditional test is used in experimental situations where paired comparisons are observed. In a typical application, two binary response measurements are made on each subject – perhaps from two different treatments, or from two different time points. For example, in a comparative clinical trial, subjects are matched on baseline demographics and disease characteristics and then randomized with one subject in the pair receiving the experimental treatment and the other subject receiving the control. Another example is the crossover clinical trial in which each subject receives both treatments. By random assignment, some subjects receive the experimental treatment followed by the control while others receive the control followed by the experimental treatment. Let πc and πt denote the response probabilities for the control and experimental treatments, respectively. The probability parameters for this test are displayed in Table 34.1. 34.2 McNemar’s Conditional Exact Test 729 <<< Contents 34 * Index >>> Binomial Superiority One-Sample – Exact Table 34.1: A 2 x 2 Table of Probabilities for McNemar’s Conditional Exact Test Control No Response Response Total Probability Experimental No Response Response π00 π01 π10 π11 1 − πt πt Total Probability 1 − πc πc 1 The null hypothesis H0 : πc = πt is tested against the alternative hypothesis H1 : πc > πt (or H1 : πc < πt ) for the one-sided testing problem. Since πt = πc if and only if π01 = π10 , the null hypothesis is also expressed as H0 : π01 = π10 , is tested against corresponding one-sided alternative. The power of this test depends on two quantities: 1. The difference between the two discordant probabilities (which is also the difference between the response rates of the two treatments) δ = π01 − π10 = πt − πc ; 2. The sum of the two discordant probabilities ξ = π10 + π01 . East accepts these two parameters as inputs at the design stage. 34.2.1 Trial Design Consider a trial in which we wish to determine whether a transdermal delivery system (TDS) can be improved with a new adhesive. Subjects are to wear the old TDS (control) and new TDS (experimental) in the same area of the body for one week each. A response is said to occur if the TDS remains on for the entire one-week observation 730 34.2 McNemar’s Conditional Exact Test – 34.2.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 period. From historical data, it is known that control has a response rate of 85% (πc = 0.85). It is hoped that the new adhesive will increase this to 95% (πt = 0.95). Furthermore, of the 15% of the subjects who did not respond on the control, it is hoped that 87% will respond on the experimental system. That is, π01 = 0.87 × 0.15 = 0.13. Based on these data, we can fill in all the entries of Table 34.1 as displayed in Table 34.2. Table 34.2: McNemar Probabilities for the TDS Trial Control No Response Response Total Probability Experimental No Response Response 0.02 0.13 0.03 0.82 0.05 0.95 Total Probability 0.15 0.85 1 As it is expected that the new adhesive will increase the adherence rate, the comparison is posed as a one-sided testing problem, testing H0 : πc = πt against H1 : πc < πt at the 0.05 level. We wish to determine the sample size to have 90% power for the values displayed in Table 34.2. To illustrate this example, in East under the Design ribbon for Discrete data, click One Sample and then choose Paired Design: McNemar’s: 34.2 McNemar’s Conditional Exact Test – 34.2.1 Trial Design 731 <<< Contents 34 * Index >>> Binomial Superiority One-Sample – Exact This will launch the following input window: Leave the default values of Design Type: Superiority and Number of Looks: 1. In the Design Parameters dialog box, select the Perform Exact Computations checkbox and enter the following parameters: Test Type: 1 sided Type 1 Error (α): 0.05 Power: 0.9 Sample Size (n): Computed (select radio button) Difference in Probabilities (δ1 = πt − πc ): 0.1 Prop. of Discordant Pairs (ξ = π01 + π10 ): 0.16 Click Compute. The sample size for this design is calculated and the results are shown 732 34.2 McNemar’s Conditional Exact Test – 34.2.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 as a row in the Output Preview window: The sample size required in order to achieve 90% power is 139 subjects. As is standard in East, this design has the default name Des 1. To see a summary of icon in the the output of this design, click anywhere in the row and then click the Output Preview toolbar. The design details will be displayed in the upper pane, labeled Output Summary. This can be saved to the Library by selecting Des 1 and clicking the icon. 34.2 McNemar’s Conditional Exact Test – 34.2.1 Trial Design 733 <<< Contents 34 * Index >>> Binomial Superiority One-Sample – Exact The design details can be displayed by clicking the icon. The critical point, or the boundary set for the rejection of H0 is 1.645 It is important to note that in this exact computation the displayed sample size may not be unique due to the discreteness of the distribution. This can be seen in the Power Vs Sample Size graph, which is a useful tool along with its corresponding table, and can be used to find all other sample sizes that guarantee the desired power. These visual 734 34.2 McNemar’s Conditional Exact Test <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 tools are available in the Library under the Plots and Tables menus. 34.2 McNemar’s Conditional Exact Test 735 <<< Contents * Index >>> 35 Binomial Superiority Two-Sample – Exact In many experiments based on binomial data, the aim is to compare independent samples from two populations in terms of the proportion of sampling units presenting a given trait. In medical research, outcomes such as the proportion of patients responding to a therapy, developing a certain side effect, or requiring specialized care, would satisfy this definition. East exact tests support the design and monitoring of clinical trials in which this comparison is based on either the difference of proportions or ratio of proportions of the two populations. These two cases are discussed in Sections 35.1, and 35.2 respectively. Caution: The methods presented in this chapter are computationally intensive and could consume several hours of computer time if the exact sample sizes are very large. Here are some guidelines: 1. Estimate the likely sample size under the Exact method by first determining the asymptotic sample size 2. If the exact sample size is likely to be larger than 1000, computing power is preferable to computing the sample size 35.1 Difference of Two Binomial Proportions Let πc and πt denote the binomial probabilities for the control and treatment arms, respectively, and let δ = πt − πc . Interest lies in testing the null hypothesis that δ = 0 against one and two-sided alternatives. 35.1.1 Trial Design The technical details of the sample size computations for this option are given in Appendix V. 35.1.1 Trial Design In a clinical study, an experimental drug coded Y73 is to be compared with a control drug coded X39 to treat chronic hepatitis B infected adult patients. The primary end point is histological improvement as determined by Knodell Scores at week 48 of treatment period. It is estimated that the proportion of patients who are likely to show histological improvement under treatment X39 to be 25% and under the treatment Y73, as much as 60%. A one-sided fixed sample study is to be designed with α = 0.05 and 90% power. Single Look Design To illustrate this example, in East under the Design ribbon for Discrete data, click Two Samples and then choose Parallel Design: Difference of 736 35.1 Difference of Two Binomial Proportions – 35.1.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Proportions: This will launch the following input window: The goal of this study is to test the null hypothesis, H0 , that the X39 and Y73 arms both have an event rate of 25%, versus the alternative hypothesis, H1 , that Y73 increases the event rate by 35%, from 25% to 60%. This will be a one-sided test with a single fixed look at the data, a type-1 error of α = 0.05 and a power of (1 − β) = 0.9. Leave the default values of Design Type: Superiority and Number of Looks: 1. In the Design Parameters dialog box, select the Perform Exact Computations checkbox and enter the following parameters: Test Type: 1 sided (required) Type 1 Error (α): 0.05 Power: 0.9 Sample Size (n): Computed (select radio button) Prop. under Control (πc ): 0.25 Prop. under Treatment (πt ): 0.6 Diff. in Prop. (δ1 = πt − πc ): (will be calculated) 35.1 Difference of Two Binomial Proportions – 35.1.1 Trial Design 737 <<< Contents 35 * Index >>> Binomial Superiority Two-Sample – Exact Click Compute. The sample size for this design is calculated and the results are shown as a row in the Output Preview window: The sample size required in order to achieve 90% power is 68 subjects. Note that because of the discreteness involved in performing exact computations, the attained type-1 error is 0.049, slightly less than the specified value of 0.05. Similarly, the attained power is 0.905, slightly larger than the specified value of 0.90. As is standard in East, this design has the default name Des 1. To see a summary of the output of this design, click anywhere in the row and then click the icon in the Output Preview toolbar. The design details will be displayed in the upper pane, labeled Output Summary. This can be saved to the Library by selecting Des 1 and 738 35.1 Difference of Two Binomial Proportions – 35.1.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 clicking the icon. The design details can be displayed by clicking the icon. It is important to note that in this exact computation the displayed sample size may not be unique due to the discreteness of the distribution. This can be seen in the Power Vs Sample Size graph, which is a useful tool along with its corresponding table, and can be used to find all other sample sizes that guarantee the desired power. These visual 35.1 Difference of Two Binomial Proportions – 35.1.1 Trial Design 739 <<< Contents 35 * Index >>> Binomial Superiority Two-Sample – Exact tools are available in the Library under the Plots and Tables menus. In tabular form: 740 35.1 Difference of Two Binomial Proportions – 35.1.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 35.1 Difference of Two Binomial Proportions – 35.1.1 Trial Design 741 <<< Contents 35 * Index >>> Binomial Superiority Two-Sample – Exact The critical point, or the boundary set for the rejection of H0 is 1.715 attained at πU = 0.371 (on the Z scale) and 0.176 (on the δ scale). If the observed test statistic exceeds this boundary the null will be rejected in favor of declaring the new treatment to be superior. This can also be seen in the Stopping Boundaries chart and table, available in the Library. The Power vs. Treatment Effect chart dynamically generates power under this design for all values of treatment effect δ = πt − πc . Here it is easy to see how as treatment effect size increases (H1 : alternative treatment is superior) the power of the study 742 35.1 Difference of Two Binomial Proportions – 35.1.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 reaches the desired 90%. This is available in tabular form as well. 35.2 Ratio of Two Let πc and πt denote the binomial probabilities for the control and treatment arms, Binomial Proportions respectively, and let ρ = πt /πc . It is of interest to test the null hypothesis that ρ = 1 against a one-sided alternative. The technical details of the sample size computations for this option are given in Appendix V. 35.2.1 Trial Design In a clinical study, an experimental drug coded Y73 is to be compared with a control drug coded X39 to treat chronic hepatitis B infected adult patients. The primary end point is histological improvement as determined by Knodell Scores at week 48 of treatment period. It is estimated that the proportion of patients who are likely to show histological improvement under treatment coded X39 to be 25% and under the treatment coded Y73 as much as 60%, that is 2.4 times the rate for X39. A single look, one-sided fixed sample study is to be designed with α = 0.05 and 90% power. Single Look Design 35.2 Ratio of Two Binomial Proportions – 35.2.1 Trial Design 743 <<< Contents 35 * Index >>> Binomial Superiority Two-Sample – Exact To illustrate this example, in East under the Design ribbon for Discrete data, click Two Samples and then choose Parallel Design: Ratio of Proportions: This will launch the following input window: Leave the default values of Design Type: Superiority and Number of Looks: 1. In the Design Parameters dialog box, select the Perform Exact Computations checkbox and enter the following parameters: Test Type: 1 sided (required) Type 1 Error (α): 0.05 Power: 0.9 Sample Size (n): Computed (select radio button) Prop. under Control(πc ): 0.25 Prop. under Treatment (πt ): (will be calculated to be 0.6) Ratio of Proportions (ρ1 ): 2.4 744 35.2 Ratio of Two Binomial Proportions – 35.2.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Click Compute. The sample size for this design is calculated and the results are shown as a row in the Output Preview window: The sample size required in order to achieve 90% power is 72 subjects. Note that because of the discreteness involved in performing exact computations, the attained type-1 error is 0.046, less than the specified value of 0.05. Similarly, the attained power is 0.903, slightly larger than the specified value of 0.90. As is standard in East, this design has the default name Des 1. To see a summary of the output of this design, click anywhere in the row and then click the icon in the Output Preview toolbar. The design details will be displayed in the upper pane, labeled Output Summary. This can be saved to the Library by selecting Des 1 and 35.2 Ratio of Two Binomial Proportions – 35.2.1 Trial Design 745 <<< Contents 35 * Index >>> Binomial Superiority Two-Sample – Exact clicking the icon. Design details can be displayed by clicking the icon. It is important to note that in this exact computation the displayed sample size may not be unique due to the discreteness of the distribution. This can be seen in the Power Vs Sample Size graph, which is a useful tool along with its corresponding table, and can be used to find all other sample sizes that guarantee the desired power. These visual 746 35.2 Ratio of Two Binomial Proportions – 35.2.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 tools are available in the Library under the Plots and Tables menus. 35.2 Ratio of Two Binomial Proportions – 35.2.1 Trial Design 747 <<< Contents 35 * Index >>> Binomial Superiority Two-Sample – Exact In tabular form: The critical point, or the boundary set for the rejection of H0 is 1.813 (on the Z scale). If the observed test statistic exceeds this boundary the null will be rejected in favor of declaring the new treatment to be superior. This boundary can be seen in terms of the observed ratio (0.916 on the ln(ρ) scale and 2.5 on the ρ scale) in the Stopping 748 35.2 Ratio of Two Binomial Proportions – 35.2.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Boundaries chart and table, available in the Library. The Power vs. Treatment Effect chart dynamically generates power under this design for all values of treatment effect (ratio or ln(ratio) scale). Here it is easy to see how as the ratio (treatment effect size) increases (H1 : the new treatment is superior) the power 35.2 Ratio of Two Binomial Proportions 749 <<< Contents 35 * Index >>> Binomial Superiority Two-Sample – Exact of the study reaches the desired 0.9%. This is available in tabular form as well. 750 35.2 Ratio of Two Binomial Proportions <<< Contents * Index >>> 36 Binomial Non-Inferiority Two-Sample – Exact In a non-inferiority trial, the goal is to establish that the response rate of an experimental treatment is no worse than that of an established control. A therapy that is demonstrated to be non-inferior to the current standard therapy might be an acceptable alternative if, for instance, it is easier to administer, cheaper, or less toxic. Non-inferiority trials are designed by specifying a non-inferiority margin, which is the acceptable amount by which the response rate on the experimental arm can be less than the response rate on the control arm. If the experimental response rate falls within this margin, the new treatment can claim to be non-inferior. This chapter presents the design of non-inferiority trials in which this margin is expressed as either the difference between or the ratio of two binomial proportions. The difference is examined in Section 36.1 and is followed by two formulations for the ratio in Section 36.2. Caution: The methods presented in this chapter are computationally intensive and could consume several hours of computer time if the exact sample sizes are very large. Here are some guidelines: 1. Estimate the likely sample size under the Exact method by first determining the asymptotic sample size 2. If the exact sample size is likely to be larger than 1000, computing power is preferable to computing the sample size 36.1 Difference of Proportions Let πc and πt denote the response rates for the control and experimental treatments, respectively. Let δ = πt − πc . The null hypothesis is specified as 36.1.1 Trial Design H0 : δ = δ 0 and is tested against one-sided alternative hypotheses. If the occurrence of a response denotes patient harm rather than benefit, then δ0 > 0 and the alternative hypothesis is H1 : δ < δ 0 or equivalently as H1 : πc > πt − δ0 . Conversely, if the occurrence of a response denotes patient benefit rather than harm, then δ0 < 0 and the alternative hypothesis is H1 : δ > δ 0 or equivalently as H1 : πc < πt − δ0 . 36.1 Difference of Proportions 751 <<< Contents 36 * Index >>> Binomial Non-Inferiority Two-Sample – Exact For any given πc , the sample size is determined by the desired power at a specified value of δ = δ1 . A common choice is δ1 = 0 (or equivalently πt = πc ) but East allows the study to be powered at any value of δ1 which is consistent with the choice of H1 . Let π̂t and π̂c denote the estimates of πt and πc based on nt and nc observations from the experimental and control treatments, respectively. The test statistic is Z= δ̂ − δ0 se(δ̂) (36.1) where δ̂ = π̂t − π̂c and s se(δ̂) = π̃t (1 − π̃t ) π̃c (1 − π̃c ) + . nt nc (36.2) (36.3) Here π̃t and π̃c are the restricted maximum likelihood estimates of πt and πc . For more details refer to Appendix V. 36.1.1 Trial Design To evaluate the efficacy and safety of drug A vs. drug B in antiretroviral naive HIV-infected individuals, a phase3, 52 week double-blind randomized study is conducted. The primary response measure is the proportion of patients with HIV-RNA levels ¡ 50 copies/mL. The study is a non-inferiority designed trial where a standard drug A is expected to have a response rate of 80% and a new experimental drug B is to be compared under a non-inferiority margin of 20% (δ0 = 0.20). For these studies, inferiority is assumed as the null hypothesis and is to be tested against the alternative of non-inferiority using a one-sided test. Therefore under the null hypothesis H0 : πc = 0.8 and πt = 0.60. We will test this hypothesis against H1 , that both response rates are equal to the null response rate of the control arm, i.e. δ1 = 0. Thus, under H1 , we have πc = πt = 0.8. East will be used to conduct a one-sided α = 0.025 level test with 90% power. Single Look Design To illustrate this example, in East under the Design ribbon for Discrete data, click Two Samples and then choose Parallel Design: Difference of 752 36.1 Difference of Proportions – 36.1.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Proportions: This will launch the following input window: Change Design Type: Noninferiority and keep Number of Looks: 1. In the Design Parameters dialog box, select the Perform Exact Computations checkbox and enter the following parameters: Test Type: 1 sided (required) Type 1 Error (α): 0.025 Power: 0.9 Sample Size (n): Computed (select radio button) Specify Proportion Response Prop. under Control (πc ): 0.8 Specify Null Hypothesis Prop. under Treatment (πt0 ): 0.6 Noninferiority margin (δ0 ): -0.2 (will be calculated) Specify Alternative Hypothesis Prop. under Treatment (πt1 ): 0.8 36.1 Difference of Proportions – 36.1.1 Trial Design 753 <<< Contents 36 * Index >>> Binomial Non-Inferiority Two-Sample – Exact Diff. in Prop. (δ1 = πt1 − πc ): 0 Click Compute. The sample size for this design is calculated and the results are shown as a row in the Output Preview window: This single look design requires a combined total of 172 patients in order to achieve 90% power. As is standard in East, this design has the default name Des 1. To see a summary of icon in the the output of this design, click anywhere in the row and then click the Output Preview toolbar. The design details will be displayed in the upper pane, labeled Output Summary. This can be saved to the Library by selecting Des 1 and 754 36.1 Difference of Proportions – 36.1.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 clicking the icon. The design details can be displayed by clicking the icon. It is important to note that in this exact computation the displayed sample size may not be unique due to the discreteness of the distribution. This can be seen in the Power Vs Sample Size graph, which is a useful tool along with its corresponding table, and can be used to find all other sample sizes that guarantee the desired power. In this example, sample sizes ranging from approximately 168-175 result in power close to the required 0.9. These visual tools are available in the Library under the Plots and Tables menus. 36.1 Difference of Proportions – 36.1.1 Trial Design 755 <<< Contents 36 * Index >>> Binomial Non-Inferiority Two-Sample – Exact The critical point, or the efficacy boundary set for the rejection of H0 is 1.991 (on the 756 36.1 Difference of Proportions – 36.1.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Z scale) and (-0.056 on the δ scale). If the magnitude of the observed test statistic exceeds this boundary the null will be rejected in favor of declaring the new treatment to be non-inferior. This can also be seen in the Stopping Boundaries chart and table, available in the Library. The Power vs. Treatment Effect chart dynamically generates power under this design for all values of treatment effect δ = πt − πc . Here it is easy to see how as treatment effect size approaches zero (H1 : no difference between the two treatments) the power 36.1 Difference of Proportions – 36.1.1 Trial Design 757 <<< Contents 36 * Index >>> Binomial Non-Inferiority Two-Sample – Exact of the study reaches the desired 90%. This is available in tabular form as well. 36.2 Ratio of Proportions Let πc and πt denote the response rates for the control and the experimental treatments, respectively. Let the difference between the two arms be captured by the ratio πt ρ= . πc The null hypothesis is specified as H0 : ρ = ρ0 and is tested against one-sided alternative hypotheses. If the occurrence of a response denotes patient benefit rather than harm, then ρ0 < 1 and the alternative hypothesis is H1 : ρ > ρ0 or equivalently as H1 : πt > ρ0 πc . Conversely, if the occurrence of a response denotes patient harm rather than benefit, then ρ0 > 1 and the alternative hypothesis is H1 : ρ < ρ0 758 36.2 Ratio of Proportions <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 or equivalently as H1 : πt < ρ0 πc . For any given πc , the sample size is determined by the desired power at a specified value of ρ = ρ1 . A common choice is ρ1 = 1 (or equivalently πt = πc ), but East permits you to power the study at any value of ρ1 which is consistent with the choice of H1 . 36.2.1 Trial Design Suppose with a rare disease condition, the cure rate with an expensive treatment A is estimated to be 90%. The claim of non-inferiority for an inexpensive new treatment B can be held if it can be statistically proven that the ratio ρ = πt /πc is at least 0.833. In other words, B is considered to be non-inferior to A as long as πt > 0.75. Thus the null hypothesis H0 : ρ = 0.833 is tested against the one-sided alternative hypothesis H1 : ρ > 0.833. We want to determine the sample size required to have power of 80% when ρ = 1 using a one-sided test with a type-1 error rate of 0.05. Single Look Design Powered at ρ = 1 Consider a one look study with equal sample sizes in the two groups. In East under the Design ribbon for Discrete data, click Two Samples and then choose Parallel Design: Ratio of Proportions: 36.2 Ratio of Proportions – 36.2.1 Trial Design 759 <<< Contents 36 * Index >>> Binomial Non-Inferiority Two-Sample – Exact This will launch the following input window: Change Design Type: Noninferiority and keep Number of Looks: 1. In the Design Parameters dialog box, select the Perform Exact Computations checkbox and keep the Test Statistic selected to Wald. Enter the following parameters: Test Type: 1 sided (required) Type 1 Error (α): 0.05 Power: 0.8 Sample Size (n): Computed (select radio button) Specify Proportion Prop. under Control (πc ): 0.9 Specify Null Hypothesis Prop. under Treatment (πt0 ): 0.75 Noninferiority margin (ρ0 ): 0.833 (will be calculated) Specify Alternative Hypothesis Prop. under Treatment (πt1 ): 0.9 Ratio of Proportions (ρ1 = πt1 /πc ): 1 760 36.2 Ratio of Proportions – 36.2.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Click Compute. The sample size for this design is calculated and the results are shown as a row in the Output Preview window: The sample size required in order to achieve 80% power is 120 subjects. Note that because of the discreteness involved in performing exact computations, the attained power is 0.823, slightly larger than the specified value of 0.80. As is standard in East, this design has the default name Des 1. To see a summary of icon in the the output of this design, click anywhere in the row and then click the Output Preview toolbar. The design details will be displayed in the upper pane, labeled Output Summary. This can be saved to the Library by selecting Des 1 and 36.2 Ratio of Proportions – 36.2.1 Trial Design 761 <<< Contents 36 * Index >>> Binomial Non-Inferiority Two-Sample – Exact clicking the icon. Design details can be displayed by clicking the icon. It is important to note that in this exact computation the displayed sample size may not 762 36.2 Ratio of Proportions – 36.2.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 be unique due to the discreteness of the distribution. This can be seen in the Power Vs Sample Size graph, which is a useful tool along with its corresponding table, and can be used to find all other sample sizes that guarantee the desired power. These visual tools are available in the Library under the Plots and Tables menus. 36.2 Ratio of Proportions – 36.2.1 Trial Design 763 <<< Contents 36 * Index >>> Binomial Non-Inferiority Two-Sample – Exact The critical point, or the boundary set for the rejection of H0 is 1.961 (on the Z scale), 0.076 (on the ln(ρ) scale)and 1.079 (on the ρ scale). If the observed test statistic exceeds this boundary the null will be rejected in favor of declaring the new treatment to be non-inferior. This can also be seen in the Stopping Boundaries chart and table, 764 36.2 Ratio of Proportions – 36.2.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 available in the Library. The Power vs. Treatment Effect chart dynamically generates power under this design for all values of treatment effect (ratio or ln(ratio) scale). Here it is easy to see how as treatment effect size approaches zero (H1 : no difference between the two 36.2 Ratio of Proportions – 36.2.1 Trial Design 765 <<< Contents 36 * Index >>> Binomial Non-Inferiority Two-Sample – Exact treatments) the power of the study reaches the desired 0.8%. This is available in tabular form as well. 766 36.2 Ratio of Proportions <<< Contents * Index >>> 37 37.1 Equivalence Test Binomial Equivalence Two-Sample – Exact In some experimental situations, it is desired to show that the response rates for the control and the experimental treatments are ”close”, where ”close” is defined prior to the collection of any data. It may be of interest to show that the rate of an adverse event associated with an aggressive therapy is similar to that of the established control. For example, the bleeding rate associated with thrombolytic therapy or cardiac outcomes with a new stent. Let πc and πt denote the response rates for the control and the experimental treatments, respectively and let δ = πt − πc . (37.1) The null hypothesis H0 : |πt − πc | = δ0 is tested against the two-sided alternative H1 : |πt − πc | < δ0 , where δ0 (> 0) defines equivalence. The theory is presented in Section V.4 of Appendix V. Caution: The methods presented in this chapter are computationally intensive and could consume several hours of computer time if the exact sample sizes are very large. Here are some guidelines: 1. Estimate the likely sample size under the Exact method by first determining the asymptotic sample size 2. If the exact sample size is likely to be larger than 1000, computing power is preferable to computing the sample size 37.1.1 Trial Design Burgess et al. (2005) describe a randomized controlled equivalence trial, in which the objective is to evaluate the efficacy and safety of a 4% dimeticone lotion for treatment of head lice infestation, relative to a standard treatment. The success rate of the standard treatment is estimated to be about 77.5%. Equivalence is defined as δ0 = 0.20. The sample size is to be determined with α = 0.025 (two-sided) and power, i.e. probability of declaring equivalence, of 1 − β = 0.90. To illustrate this example, in East under the Design ribbon for Discrete data, click 37.1 Equivalence Test – 37.1.1 Trial Design 767 <<< Contents 37 * Index >>> Binomial Equivalence Two-Sample – Exact Two Samples and then choose Parallel Design: Difference of Proportions: This will launch the following input window: Change Design Type: Equivalence and in the Design Parameters dialog box, select the Perform Exact Computations checkbox. Enter the following parameters: Test Type: 2 sided (required) Type 1 Error (α): 0.025 Power: 0.9 Sample Size (n): Computed (select radio button) Specify Proportion Response Prop. under Control (πc ): 0.775 Prop. under Treatment (πt0 ): 0.775 (will be calculated) Expected Diff. (δ1 = πt − πc ): 0 Equivalence Margin (δ0 ): 0.2 768 37.1 Equivalence Test – 37.1.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Click Compute. The sample size for this design is calculated and the results are shown as a row in the Output Preview window: This single look design requires a combined total of 228 patients in order to achieve 90% power. As is standard in East, this design has the default name Des 1. To see a summary of the output of this design, click anywhere in the row and then click the icon in the Output Preview toolbar. The design details will be displayed in the upper pane, labeled Output Summary. This can be saved to the Library by selecting Des 1 and 37.1 Equivalence Test – 37.1.1 Trial Design 769 <<< Contents 37 * Index >>> Binomial Equivalence Two-Sample – Exact clicking the icon. The design details, which include critical points, or the boundaries set for the rejection of H0 , can be displayed by clicking the icon. It is important to note that in this exact computation the displayed sample size may not be unique due to the discreteness of the distribution. This can be seen in the Power Vs Sample Size graph, which is a useful tool along with its corresponding table, and can be used to find all other sample sizes that guarantee the desired power. These visual 770 37.1 Equivalence Test – 37.1.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 tools are available in the Library under the Plots and Tables menus. In tabular form: 37.1 Equivalence Test – 37.1.1 Trial Design 771 <<< Contents 37 * Index >>> Binomial Equivalence Two-Sample – Exact Suppose the expected value of the difference in treatment proportions δ1 is 0.05 or 0.10. A recalculation of the design shows the required sample size will increase to 300 772 37.1 Equivalence Test <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 and 606 respectively: 37.1 Equivalence Test 773 <<< Contents * Index >>> 38 Binomial Simon’s Two-Stage Design The purpose of a phase II trial is to determine if a new drug has sufficient efficacy against a specific disease or condition to either warrant further development within Phase II, or to advance onto a Phase III study. In a two-staged design, a fixed number of patients are recruited and treated initially, and if the protocol is considered effective the second stage will continue to enroll additional patients for further study regarding efficacy and safety. This chapter presents an example for the widely used two-stage optimal and minimax designs developed by Simon (1989). In addition, East supports the framework of an admissible two-stage design, a graphical method geared to search for an alternative with more favorable features (Jung, et al. 2004). The underlying theory is examined in Appendix U. 38.1 An Example During a Phase II study of an experimental drug, a company determined that a response rate of 10% or less is to be considered poor, whereas a response rate is 40% or more is to be considered promising or good. Requirements call for a two-stage study with the following hypotheses: H0 : π ≤ 0.10 H1 : π ≥ 0.40 and design parameters α = 0.05 and 1 − β = 0.90. 38.1.1 Trial Design To illustrate this example, in East under the Design ribbon for Discrete data, click One Sample and then choose Single Arm Design: Simon’s Two Stage Design: 774 38.1 An Example – 38.1.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 This will launch the following input window: Choose Design Type: Optimal and enter the following parameters in the Design Parameters dialog box: Test Type: 1 sided (required) Type 1 Error (α): 0.05 Power: 0.9 Upper Limit for Sample Size: 100 Prop. Response under Null (π0 ): 0.1 Prop. Response under Alternative (π1 ): 0.4 38.1 An Example – 38.1.1 Trial Design 775 <<< Contents 38 * Index >>> Binomial Simon’s Two-Stage Design Click Compute. The design is calculated and the results are shown as a row in the Output Preview window: As is standard in East, this design has the default name Des 1. To see a summary of the output of this design, click anywhere in the row and then click the icon. The design details will be displayed in the upper pane, labeled Output Summary. Note that because of the discreteness involved in performing exact computations, the attained type-1 error is less than the specified value of 0.05. Similarly, the attained power is slightly larger than the specified value. Save this design to the Library by selecting Des 1 and clicking the icon. Under the optimal design, the combined maximum sample size for both stages is computed to be 20. The boundary parameter for futility at the first look is 1, and at the second look it is 4. What this means can be further explained using the Stopping 776 38.1 An Example – 38.1.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Boundaries chart available under the Plots menu. The scale of the stopping boundaries can be displayed using either number of responses (# Resp. Scale) or Proportion Scale. The above graph uses the number of responses, which tells us that at the first look, when the cumulative sample size is 9, the trial could be stopped for futility if no more than one patient shows a favorable response to treatment. At the second stage, when all 20 patients are enrolled, the boundary response to reject H1 is 4 or less. The Stopping Boundaries table under the Tables menu also tells us that the probability of crossing the stopping 38.1 An Example – 38.1.1 Trial Design 777 <<< Contents 38 * Index >>> Binomial Simon’s Two-Stage Design boundary, thus warranting early termination, is 0.775. Results can be further analyzed using the Expected Sample size (under Null) vs. 778 38.1 An Example – 38.1.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Sample Size graph, which is also available in tabular form: To generate a more sophisticated analysis of the design, select the icon in the Library. In addition to details pertaining to the required optimal design, East also generates results for both minimax as well as admissible designs in regards to sample size, power and probability, and weights used. 38.1 An Example – 38.1.1 Trial Design 779 <<< Contents 38 * Index >>> Binomial Simon’s Two-Stage Design For the optimal design the expected sample size under the null, which assumes the drug performs poorly, is 11.447, which can also be seen in the Admissible Designs table, available under the Tables menu: To regenerate the study using a minimax design, select the Edit Design icon. Select Design Type: Minimax, leave all design parameters the same and click Compute. The cumulative maximum sample size for both stages using this design is 18. As with the optimal design, the first stage boundary response to reject H1 is 1 or 780 38.1 An Example – 38.1.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 less and the second stage boundary response to reject H1 is 4 or less. Save this design to the Library by selecting Des 2 and clicking the icon. Design details, graphs and tables can be attained as with the optimal design described above. East provides the capability to visually compare stopping boundaries for both methods simultaneously using a compare plots graph. From the Library select both designs, click the icon, and select Stopping Boundaries. 38.1 An Example – 38.1.1 Trial Design 781 <<< Contents 38 * Index >>> Binomial Simon’s Two-Stage Design These stopping boundaries can be compared in tabular format as well: Although the two futility boundaries are the same for both designs, the cumulative sample size at both stages differ. We also see that the probability of early stopping for futility is higher under the optimal design (0.775) than with the minimax design (0.659). However the cumulative sample size at stage one for the optimal design is only 9 whereas the minimax design requires 12 subjects for the first stage. Referring to the design details generated for the optimal design above, we see that an admissible design (labeled Design # 2) requires a total sample size of 19. Here, the cumulative number of subjects required at the end of stage one is only 6 and offers a probability of early stopping of 0.531, less than both the optimal and minimax designs. It is also worthy to note that for the admissible design, the boundary parameter for futility at the first look is 0. This means that only one patient has to show a promising result for the study to proceed to a second stage, whereas at least two successes are required for both 782 38.1 An Example – 38.1.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 the optimal and minimax designs to warrant a second stage. 38.1 An Example – 38.1.1 Trial Design 783 <<< Contents * Index >>> Volume 5 Poisson and Negative Binomial Endpoints 39 Introduction to Volume 4 785 40 Count Data One-Sample 790 41 Count Data Two-Samples 799 <<< Contents * Index >>> 39 Introduction to Volume 4 This volume describes various cases of clinical trials involving count data. This is often useful in medical research due to its nature of modeling events counted in terms of whole numbers, particularly events that may be considered rare. Typically, interest lies in the rate of occurrence of a particular event during a specific time interval or other unit of space. Chapter 40 describes the design of tests involving count or Poisson response rates in which an observed response rate is compared to a fixed response rate, possibly derived from historical data. Chapter 41 deals with the comparison of independent samples from two populations in terms of the rate of occurrence of a particular outcome. East supports the design of clinical trials in which this comparison is based on the ratio of rates, assuming a Poisson or Negative Binomial distribution. 785 <<< Contents 39 39.1 * Index >>> Introduction to Volume 4 Settings Click the icon in the Home menu to adjust default values in East 6. The options provided in the Display Precision tab are used to set the decimal places of numerical quantities. The settings indicated here will be applicable to all tests in East 6 under the Design and Analysis menus. All these numerical quantities are grouped in different categories depending upon their usage. For example, all the average and expected sample sizes computed at simulation or design stage are grouped together under the category ”Expected Sample Size”. So to view any of these quantities with greater or lesser precision, select the corresponding category and change the decimal places to any value between 0 to 9. The General tab has the provision of adjusting the paths for storing workbooks, files, and temporary files. These paths will remain throughout the current and future sessions even after East is closed. This is the place where we need to specify the installation directory of the R software in order to use the feature of R Integration in East 6. 786 39.1 Settings <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 The Design Defaults is where the user can change the settings for trial design: Under the Common tab, default values can be set for input design parameters. You can set up the default choices for the design type, computation type, test type and the default values for type-I error, power, sample size and allocation ratio. When a new design is invoked, the input window will show these default choices. Time Limit for Exact Computation This time limit is applicable only to exact designs and charts. Exact methods are computationally intensive and can easily consume several hours of computation time if the likely sample sizes are very large. You can set the maximum time available for any exact test in terms of minutes. If the time limit is reached, the test is terminated and no exact results are provided. Minimum and default value is 5 minutes. Type I Error for MCP If user has selected 2-sided test as default in global settings, then any MCP will use half of the alpha from settings as default since MCP is always a 1-sided test. Sample Size Rounding Notice that by default, East displays the integer sample size (events) by rounding up the actual number computed by the East algorithm. In this case, the look-by-look sample size is rounded off to the nearest integer. One can also see the original floating point sample size by selecting the option ”Do not round sample size/events”. 39.1 Settings 787 <<< Contents 39 * Index >>> Introduction to Volume 4 Under the Group Sequential tab, defaults are set for boundary information. When a new design is invoked, input fields will contain these specified defaults. We can also set the option to view the Boundary Crossing Probabilities in the detailed output. It can be either Incremental or Cumulative. Simulation Defaults is where we can change the settings for simulation: If the checkbox for ”Save summary statistics for every simulation” is checked, then East simulations will by default save the per simulation summary data for all the 788 39.1 Settings <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 simulations in the form of a case data. If the checkbox for ”Suppress All Intermediate Output” is checked, the intermediate simulation output window will be always suppressed and you will be directed to the Output Preview area. The Chart Settings allows defaults to be set for the following quantities on East6 charts: We suggest that you do not alter the defaults until you are quite familiar with the software. 39.1 Settings 789 <<< Contents * Index >>> 40 Count Data One-Sample This chapter deals with the design of tests involving count or Poisson response rates. Here, independent outcomes or events under examination can be counted in terms of whole numbers, and typically are considered rare. In other words, a basic assumption of the Poisson distribution is that the probability of an event occurring is proportional to the length of time under consideration. The longer the time interval, the more likely the event will occur. Therefore, in this context interest lies in the rate of occurrence of a particular event during a specified period. Section 40.1 focuses on designs in which an observed Poisson response rate is compared to a fixed response rate, possibly derived from historical data. 40.1 Single Poisson Rate Data following a Poisson distribution are non-negative integers, and the probability that an outcome occurs exactly k times can be calculated as: P (k) = e−λ λk , k = 0, 1, 2, . . . where λ is the average rate of occurrence. k! When comparing a new protocol or treatment to a well-established control, a preliminary single-sample study may result in valuable information prior to a full-scale investigation. In experimental situations it may be of interest to determine whether the response rate λ differs from a fixed value λ0 . Specifically we wish to test the null hypothesis H0 : λ = λ0 against the two sided alternative hypothesis H1 : λ 6= λ0 or against one sided alternatives of the form H1 : λ > λ0 or H1 : λ < λ0 . The sample size, or power, is determined for a specified value of λ which is consistent with the alternative hypothesis, denoted λ1 . 40.1.1 Trial Design Consider the design of a single-arm clinical trial in which we wish to determine if the positive response rate of a new acute pain therapy is at least 30% per single treatment cycle. Thus, it is desired to test the null hypothesis H0 : λ = 0.2 against the one-sided alternative hypothesis H1 : λ ≥ 0.3. The trial will be designed such that a one sided α = 0.05 test achieves 80% power at λ = λ1 = 0.3. In the Design tab under the Count group choose One Sample and then Single Poisson 790 40.1 Single Poisson Rate – 40.1.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Rate. This will launch the following input window: Enter the following design parameters: Test Type: 1 sided Type 1 Error (α): 0.05 Power: 0.8 Sample Size (n): Computed (select radio button) Rate under Null (λ0 ): 0.2 Rate under Alt. (λ1 ): 0.3 Follow-up Time (D): 1 40.1 Single Poisson Rate – 40.1.1 Trial Design 791 <<< Contents 40 * Index >>> Count Data One-Sample Click Compute. The design is shown as a row in the Output Preview window: The sample size required in order to achieve the desired 80% power is 155 subjects. As is standard in East, this design has the default name Des 1. To see a summary of the output of this design, click anywhere in the row and then click the icon in the Output Preview toolbar. The design details are displayed labeled Output Summary. In the Output Preview toolbar, click icon to save this design Des1 to workbook Wbk1 in the Library. An alternative method to view design details is to hover the cursor over the node Des1 in the Library. A tooltip will appear that summarizes the 792 40.1 Single Poisson Rate – 40.1.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 input parameters of the design. Click icon on the Library toolbar, and then click Power vs. Sample Size. The power curve for this design will be displayed. You can save this chart to the Library by clicking Save inWorkbook. Alternatively, you can export the chart in one of several image formats (e.g., Bitmap or JPEG) by clicking Save As... or Export into a 40.1 Single Poisson Rate – 40.1.1 Trial Design 793 <<< Contents 40 * Index >>> Count Data One-Sample PowerPoint presentation. Close the Power vs. Sample Size chart. To view a summary of all characteristics of this design, select Des1 in the Library, and click icon. In addition to the Power vs. Sample size chart and table, East also provides the efficacy boundary in the Stopping Boundaries chart and table. 794 40.1 Single Poisson Rate – 40.1.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Alternatively, East allows the computation of either the Type-1 error (α) or Power for a given sample size. Using the Design Input/Output window as described above, simply enter the desired sample size and click Compute to calculate the resulting power of the test. Power vs Sample Size: Sawtooth paradigm Consider the following design which uses East to compute power assuming a one sample, single Poisson rate. Test Type: 1 sided Type 1 Error (α): 0.025 Power: Computed Sample Size (n): 525 Rate under Null (λ0 ): 0.049 Rate under Alt. (λ1 ): 0.012 Follow-up Time (D): 0.5 Save the design to a workbook, and then generate the Power vs. Sample Size graph to obtain the power chart. The resulting curve is commonly described in the literature as a 40.1 Single Poisson Rate – 40.1.1 Trial Design 795 <<< Contents 40 * Index >>> Count Data One-Sample sawtooth chart. This chart illustrates that it is possible to have a design where different sample sizes could obtain the same power. As with the binomial distribution, the Poisson distribution is discrete. For power and sample size computations for discrete data, the so called ”Saw tooth” phenomena occurs. The data can also be displayed in a chart form by selecting the 796 40.1 Single Poisson Rate – 40.1.1 Trial Design icon in the <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Library, and can be printed or saved as case data. It is important to note that for designs with the same power, the attained significance level may vary. For example, the sample sizes of 565 and 580 seem to have a similar power of about 0.94. Upon computing two new designs based on the above design with sample sizes of 565 and 580 respectively, it is apparent that the attained significance levels are different. The design with a lower sample size of 565 pays a higher penalty in terms of type-1 error (α = 0.03) than the plan with a larger sample 40.1 Single Poisson Rate 797 <<< Contents 40 * Index >>> Count Data One-Sample size of 580 (α = 0.016). 798 40.1 Single Poisson Rate <<< Contents * Index >>> 41 Count Data Two-Samples Often in experiments based on count data, the aim is to compare independent samples from two populations in terms of the rate of occurrence of a particular outcome. In medical research, outcomes such as the number of times a patient responds to a therapy, develops a certain side effect, or requires specialized care, are of interest. Or perhaps a therapy is being evaluated to determine the number of times it must be applied until an acceptable response rate is observed. East supports the design of clinical trials in which this comparison is based on the ratio of rates, assuming a Poisson or Negative Binomial distribution. These two cases are presented in Sections 41.1 and 41.2, respectively. 41.1 Poisson - Ratio of Rates 41.1.1 Trial Design 41.1.2 Example - Coronary Heart Disease Let λc and λt denote the Poisson rates for the control and treatment arms, respectively, and let ρ1 = λt /λc . We want to test the null hypothesis that ρ1 = 1 against one or two-sided alternatives. The sample size, or power, is determined to be consistent with the alternative hypothesis, that is H1 : λt 6= λc , H1 : λt > λc , or H1 : λt < λc . 41.1.1 Trial Design Suppose investigators are preparing design objectives for a prospective randomized trial of a standard treatment (control arm) vs. a new combination of medications (therapy arm) to present at a clinical trials workshop. The endpoint of interest is the number of abnormal ECGs (electrocardiogram) within seven days. The investigators were interested in comparing the therapy arm to the control arm with a two sided test conducted at the 0.025 level of significance. It can be assumed that the rate of abnormal ECGs in the control arm is 30%, thus λt = λc = 0.3 under H0 . The investigators wish to determine the sample size to attain power of 80% if there is a 25% decline in the event rate, that is λt /λc = 0.75. It is important to note that the power of the test depends on λc and λt , not just the ratio, so different values of the pair (λc , λt ) with the same ratio will yield different solutions. We will now design a study that compares the control arm to the combination therapy arm. In the Design tab under the Count group choose Two Samples and then Parallel 41.1 Poisson - Ratio of Rates – 41.1.1 Trial Design 799 <<< Contents 41 * Index >>> Count Data Two-Samples Design - Ratio of Poisson Rates. This will launch the following input window: Enter the following design parameters: Test Type: 2-sided Type 1 Error (α): 0.05 Power: 0.8 Sample Size (n): Computed (select radio button) Allocation Ratio (nt /nc ): 1 Rate for Control (λc ): 0.3 Rate for Treatment (λt ): 0.225 (will be automatically calculated) Ratio of Rates ρ1 = (λt /λc ): 0.75 Follow-up Control (Dc ): 7 Follow-up Treatment (Dt ): 7 800 41.1 Poisson - Ratio of Rates – 41.1.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 The Allocation Ratio (nt : nc ) describes the ratio of patients to each arm. For example, an allocation ratio of 3:1 indicates that 75% of the patients are randomized to the treatment arm as opposed to 25% to the control. Here we assume the same number of patients in both arms. Click Compute. The design is shown as a row in the Output Preview window: The sample size required in order to achieve the desired 80% power is 211 subjects. As is standard in East, this design has the default name Des 1. To see a summary of the output of this design, click anywhere in the row and then click the 41.1 Poisson - Ratio of Rates – 41.1.1 Trial Design icon in the 801 <<< Contents 41 * Index >>> Count Data Two-Samples Output Preview toolbar. The design details are displayed, labeled Output Summary. In the Output Preview toolbar, click icon to save this design Des1 to workbook Wbk1 in the Library. An alternative method to view design details is to hover the cursor over the node Des1 in the Library. A tooltip will appear that summarizes the input parameters of the design. With the design Des1 selected in the Library, click icon on the Library toolbar, and then click Power vs. Sample Size. The power curve for this design will be displayed. You can save this chart to the Library by clicking Save inWorkbook. Alternatively, you can export the chart in one of several image formats (e.g., Bitmap or 802 41.1 Poisson - Ratio of Rates – 41.1.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 JPEG) by clicking Save As... or Export into a PowerPoint presentation. Close the Power vs. Sample Size chart. To view all computed characteristics of this 41.1 Poisson - Ratio of Rates – 41.1.1 Trial Design 803 <<< Contents 41 * Index >>> Count Data Two-Samples design, select Des1 in the Library, and click icon. In addition to the Power vs. Sample size chart and table, East also provides the efficacy boundary in the Stopping Boundaries chart and table. Alternatively, East allows the computation of either the Type-1 error (α) or Power for a given sample size. Using the Design Input Output window as described above, simply enter the desired sample size and click Compute to calculate the resulting power of the test. 41.1.2 Example - Coronary Heart Disease The following example is presented in the paper by Gu, et al. (2008) which references a prospective study reported by Stampfer and Willett (1985) examining the relationship between post-menopausal hormone use and coronary heart disease (CHD). Researchers were interested if the group using hormone replacement therapy exhibited less coronary heart disease. The study did show strong evidence that the incidence rate of CHD in the group who did not use hormonal therapy was higher than that in the group who did use post-menopausal hormones. The authors then determined the sample size necessary for the two groups when what they referred to as the ratio of sampling frames is 2, known as the allocation ratio in East. The study assumed an observation time of 2 years, and that the incidence rate of CHD for those using the 804 41.1 Poisson - Ratio of Rates – 41.1.2 Example - Coronary Heart Disease <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 hormone therapy is 0.0005. The following excerpt from the paper presents the required sample sizes for the participants using hormone therapy in order to achieve 90% power at α = 0.05, for multiple different test procedures: It is first necessary to determine the difference in notation between the referenced paper and that used by East: Gu et al. (2008) γ1 γ0 0 R =4 D East λt λc 1/ρ1 = 0.25 Allocation Ratio = 2 Once again in the Design tab under the Count group choose Two Samples and then Parallel Design - Ratio of Poisson Rates. Enter the following design parameters: Test Type: 1-sided Type 1 Error (α): 0.05 Power: 0.9 Sample Size (n): Computed (select radio button) Allocation Ratio (nt /nc ): 2 41.1 Poisson - Ratio of Rates – 41.1.2 Example - Coronary Heart Disease 805 <<< Contents 41 * Index >>> Count Data Two-Samples Rate for Control (λc ): 0.002 Rate for Treatment (λt ): 0.0005 Ratio of Rates ρ1 = (λt /λc ): 0.25 Follow-up Control (Dc ): 2 Follow-up Treatment (Dt ): 2 The Allocation Ratio (nt : nc ) describes the ratio of patients to each arm. For example, an allocation ratio of 2:1 indicates that two-thirds of the patients are randomized to the treatment arm as opposed to one-third to the control. Compute the design to produce the following output: 806 41.1 Poisson - Ratio of Rates – 41.1.2 Example - Coronary Heart Disease <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Table 6 in the referenced paper shows the number of subjects required for the treatment group. The East results show that the total number of subjects required for the entire study is 10027. Given that the allocation ratio is 2, the number of subjects required for the control group is 10027/3=3342 and the treatment group is therefore 6685. This falls in the range of the sample sizes presented in the referenced paper (and close to the minimum of 6655), which again calculates these sizes using a number of different methods. 41.2 Negative Binomial Ratio of Rates In experiments where the data follows a binomial distribution, the number of successful outcomes for a fixed number of trials is of importance when determining the sample size to adequately power a study. Suppose instead that it is of interest to observe a fixed number of successful outcomes (or failures), but the overall number of trials necessary to achieve this is unknown. In this case, the data is said to follow a Negative Binomial Distribution. There are two underlying parameters of interest. As with the Poisson distribution, λ denotes the average rate of response for a given outcome. In addition, a shape parameter γ specifies the desired number of observed ”successes”. As with the Poisson distribution, the Negative Binomial distribution can be useful when designing a trial where one must wait for a particular event. Here, we are waiting for a specific number of successful outcomes to occur. A Poisson regression analysis assumes a common rate of events for all subjects within a stratum, as well as equal mean and variance (equidispersion). With over dispersed count data, estimates of standard error from these models can be invalid, leading to difficulties in planning a clinical trial. Increased variability resulting from over dispersed data requires a larger sample size in order to maintain power. To address this issue of allowing variability between patients, East provides valid sample size and power calculations for count data using a negative binomial model, resulting in a better evaluation of study design and increased likelihood of trial success. 41.2.1 Trial Design Suppose that a hypothetical manufacturer of robotic prostheses, those that require several components to fully function, has an order to produce a large quantity of artificial limbs. According to historical data, about 20% of the current limbs fail the rigorous quality control test and therefore cannot be shipped to patients. For each order, the manufacturer must produce more than requested; in fact they must continue to produce the limbs until the desired quantity passes quality control. Given that there is a high cost in producing these prosthetic limbs, it is of great interest reduce the number of those that fail the test. 41.2 Negative Binomial Ratio of Rates – 41.2.1 Trial Design 807 <<< Contents 41 * Index >>> Count Data Two-Samples The company plans to introduce a new feature to the current model, the goal being the probability of failure is reduced to 10%. It is safe to assume that the enhancement will not cause a decline in the original success rate. In this scenario, we wish to test the null hypothesis H0 : λc = λt = 0.2 against the one sided alternative of the form H1 : λc > λt . Quality control investigators wish to conduct a one-sided test at the α = 0.05 significance level to determine the sample size required obtain 90% power to observe a 50% decline in the event rate, i.e. λt /λc = 0.5. It is important to note that the power of the test depends on λc and λt , not just the ratio, so different values of the pair (λc , λt ) with the same ratio will have different solutions. The same holds true for the shape parameter. Different values of (γc , γt ) will result in different sample sizes or power calculations. East allows user specific shape parameters for both the treatment and control groups, however for this example assume that the desired number of successful outcomes for both groups is 10. The following illustrates the design of a two-arm study comparing the control arm, which the current model of the prosthesis, to the treatment arm, which is the enhanced model. In the Design tab under the Count group choose Two Samples and then Parallel Design - Ratio of Negative Binomial Rates. 808 41.2 Negative Binomial Ratio of Rates – 41.2.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 This will launch the following input window: Enter the following design parameters: Test Type: 1 sided Type 1 Error (α): 0.05 Power: 0.9 Sample Size (n): Computed (select radio button) Allocation Ratio (nt /nc ): 1 Rate for Control (λc ): 0.2 Rate for Treatment (λt ): 0.1 Ratio of Rates ρ = (λt /λc ): 0.5 Follow-up Time (D): 1 Shape Control (γc ): 10 Shape Treatment (γt ): 10 41.2 Negative Binomial Ratio of Rates – 41.2.1 Trial Design 809 <<< Contents 41 * Index >>> Count Data Two-Samples The Allocation Ratio (nt : nc ) describes the ratio of patients to each arm. For example, an allocation ratio of 3:1 indicates that 75% of the patients are randomized to the treatment arm as opposed to 25% to the control. Here we assume the same number of patients in both arms. Click Compute. The design is shown as a row in the Output Preview window: The sample size required in order to achieve the desired 90% power is 1248 subjects. As is standard in East, this design has the default name Des 1. To see a summary of the output of this design, click anywhere in the row and then click the icon in the Output Preview toolbar. The design details will be displayed, labeled Output 810 41.2 Negative Binomial Ratio of Rates – 41.2.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Summary. In the Output Preview toolbar, click icon to save this design Des1 to workbook Wbk1 in the Library. An alternative method to view design details is to hover the cursor over the node Des1 in the Library. A tooltip will appear that summarizes the input parameters of the design. With the design Des1 selected in the Library, click icon on the Library toolbar, and then click Power vs. Sample Size. The power curve for this design will be displayed. You can save this chart to the Library by clicking Save inWorkbook. Alternatively, you can export the chart in one of several image formats (e.g., Bitmap or 41.2 Negative Binomial Ratio of Rates – 41.2.1 Trial Design 811 <<< Contents 41 * Index >>> Count Data Two-Samples JPEG) by clicking Save As... or Export into a PowerPoint presentation. Close the Power vs. Sample Size chart. To view all computed characteristics of this 812 41.2 Negative Binomial Ratio of Rates – 41.2.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 design, select Des1 in the Library, and click icon. In addition to the Power vs. Sample size chart and table, East also provides the efficacy boundary in the Stopping Boundaries chart and table. For a specific desired sample size, East allows the computation of either the Type-1 error (α) or Power for a test. Using the Design Input Output window and methods as described above, simply enter the desired sample size and click Compute to calculate the resulting power of the test. In addition to this example, consider the following illustration of the benefit of using the negative binomial model in clinical trials. In real life settings, the variance of count data observed between patients is typically higher than the observed mean. The negative binomial model accommodates between subject heterogeneity according to a Gamma distribution. For example: Poisson: Y ∼ P oisson(λ) Negative Binomial: Yi ∼ P oisson(λki ) where ki ∼ Gamma(k) In the case of no overdispersion (k = 0) the negative binomial model reduces to the 41.2 Negative Binomial Ratio of Rates – 41.2.1 Trial Design 813 <<< Contents 41 * Index >>> Count Data Two-Samples Poisson model. In the figure below, the Poisson and negative binomial models are displayed under various values of the dispersion parameter. Assuming the above parameterization, the variance of the negative binomial model is λ + kλ2 . The inflation in variance is thus linear by the factor 1 + k ∗ λ and dependent on the mean. Depending on the distributional assumption used and its impact on the variance, sample size and power can vary widely. In multiple sclerosis (MS) patients, magnetic resonance imaging (MRI) is used as a marker of efficacy by means of serial counts of lesions appearing on the brain. Exacerbations rates as a primary endpoint are frequently used in MS as well as in chronic obstructive pulmonary disease (COPD) and asthma (Keene et al. 2007). Poisson regression could be considered, however this model would not address variability between patients, resulting in over dispersion. The negative binomial model offers an alternative approach. TRISTAN (Keene et al. 2007) was a double-blind, randomized study for COPD comparing the effects of the salmeterol/fluticasone propionate combination product 814 41.2 Negative Binomial Ratio of Rates – 41.2.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 (SFC) to salmeterol alone, fluticasone proprionate alone and placebo. Although the primary end-point was pre-bronchodilator FEV1, the number of exacerbations was an important secondary endpoint. Suppose we are to design a new trial to be observed over a period of 1 to 2 years. The primary objective is the reduction of the rate of exacerbations, defined as a worsening of COPD symptoms that require treatment with antibiotics, cortisone or both, with the combination product SFC versus placebo. Based on the TRISTAN results, we aim to reduce the incidence of events by 33%. Suppose the exacerbation rate is 1.5 per year, and can expect to detect a rate of 1.0 in the combination group. Assume a 2-sided test with a 5% significance level and 90% power. Using a Poisson model, a total of 214 patients are needed to be enrolled in the study. For the TRISTAN data, the estimate of the overdispersion parameter was 0.46 (95% CI: 0.34-0.60). Using a negative binomial model with overdispersion of 0.33, 0.66, 1 41.2 Negative Binomial Ratio of Rates – 41.2.1 Trial Design 815 <<< Contents 41 * Index >>> Count Data Two-Samples and 2, the increase in sample size ranged from 298 to 725, respectively. Exacerbation rates are calculated as number of exacerbations divided by the length of time in treatment in years. EAST can be used to illustrate the impact of a one versus two year study by changing the follow-up duration. For 382 patients and a shape parameter of 0.66, power is increased from 90% to 97% 816 41.2 Negative Binomial Ratio of Rates – 41.2.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 when follow-up time is doubled : The number of patients required for a two year study powered at 90% is 277, whereas 382 patients would be required to achieve the same power for a study period of one 41.2 Negative Binomial Ratio of Rates – 41.2.1 Trial Design 817 <<< Contents 41 * Index >>> Count Data Two-Samples year. Negative binomial models are increasing in popularity for medical research, and as the industry standard for trial design, East continues to evolve by incorporating sample size methods for count data. These models allow the count to vary around the mean for groups of patients instead of the population means. Additionally, increased variability does lead to a larger test population; consequently the balance between power, sample size and duration of observation needs to be evaluated. 818 41.2 Negative Binomial Ratio of Rates – 41.2.1 Trial Design <<< Contents * Index >>> Volume 6 Time to Event Endpoints 42 Introduction to Volume 6 43 Tutorial: Survival Endpoint 820 826 44 Superiority Trials with Variable Follow-Up 45 Superiority Trials with Fixed Follow-Up 865 908 46 Non-Inferiority Trials Given Accrual Duration and Accrual Rates 934 47 Non-Inferiority Trials with Fixed Follow-Up 950 48 Superiority Trials Given Accrual Duration and Study Duration 966 49 Non Inferiority Trials Given Accrual Duration and Study Duration 984 50 A Note on Specifying Dropout parameters in Survival Studies 994 51 Multiple Comparison Procedures for Survival Data 999 <<< Contents * Index >>> 42 Introduction to Volume 6 The chapters in this volume deal with clinical trials where the endpoint of interest is the time from entry into the study until a specific event –for example, death, tumour recurrence, or heart attack – occurs. Such trials are also referred to as survival trials, time-to-event trials, or time-to-failure trials. Long-term mortality trials in oncology, cardiology or HIV usually select time-to-event as the primary endpoint. The group sequential methodology is particularly appropriate for such trials because of the potential to shorten the study duration and thereby bring beneficial new therapies to patients sooner than would be possible by a conventional single-look design. In contrast to studies involving normal and binomial endpoints, the statistical power of a time-to-event study is determined, not by the number of individuals accrued, but rather by the number ofs events observed. Accruing only as many individuals as the number of events required to satisfy power considerations implies having to wait until all individuals have reached the event. This will probably make the trial extend over an unacceptably long period of time. Therefore one usually accrues a larger number of patients than the number of events required, so that the study may be completed within a reasonable amount of time. East allows the user a high degree of flexibility in this respect. This volume contains Chapters 42 through 47. Chapter 42 is the present chapter. It describes the contents of the remaining chapters of Volume 6. Chapter 43 introduces you to East on the Architect platform, using an example clinical trial to compare survival in two groups. In Chapter 44 we discuss the Randomized Aldactone Evaluation Study (RALES) for decreasing mortality in patients with severe heart failure (Pitt et al., 1999). The chapter illustrates how East may be used to design and monitor a group sequential two-sample superiority trial with a time-to-event endpoint. We begin with the simple case of a constant enrollment rate, exponential survival and no drop-outs. The example is then extended to cover non-uniform enrollment, non-constant hazard rates for survival, and differential drop-out rates between the treatment and control arms. The role of simulation in providing additional insights is discussed. Simulations in presence of non-proportional hazard rates, stratification variables are explained. The trial was designed so that every subject who had not dropped out or reached the stated endpoint would be followed until the trial was terminated. This is an example of a variable follow-up design, because subjects who are enrolled at the beginning of the enrollment phase are followed for a longer time than subjects who are enrolled later. In contrast to Chapter 44, Chapter 45 deals with the fixed follow-up design. Here we 820 <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 design a trial in which each subject can only be followed for a maximum of one year and then goes off study. We use East to design such a trial basing the design parameters on the PASSION and TYPHOON trials – two recently published studies of drug eluting stents (Spaulding et al., 2006; Laarman et al., 2006). The impact of variable accrual patterns and drop-outs is also taken into account. Chapter 46 shows how to use East to design a non-inferiority trial with a time-to-event endpoint. The setting is a clinical trial to demonstrate the non-inferiority of Xeloda to 5-FU+LV in patients with metastatic colorectal cancer (Rothman et al., 2003). Part of the discussion in this chapter is about the choice of the non-inferiority margin. Chapter 47 will illustrate through a worked example how to design, monitor and simulate a two-sample non-inferiority trial with a time-to-event endpoint in which each subject who has not dropped out or experienced the event is followed for a fixed duration only. This implies that each subject who does not drop-out or experience the event within a given time interval, as measured from the time of randomization, will be administratively censored at the end of that interval. In East we refer to such designs as fixed follow-up designs. Chapters 48 and 49 handle the trade-off between patient accruals and study duration in a different way from the previous chapters. In contrast to publicly funded trials, which usually lack the resources to exert control over the accrual rate of a trial, industry trials are often run with a fixed timeframe as the constraint. Industry sponsors would rather adjust the patient recruitment rate by opening and closing investigator sites than delay the end of a study and therefore their entire drug development program, time to market, and revenue. Chapters 48 and 49 illustrate how to design superiority and non-inferiority trials in East given a fixed accrual period and fixed study duration. Additionally, these design options provide the users with many useful graphs that chart the relationship between power, sample size, number of events, accrual duration, and study duration. Also note that Chapter 44 contains a section that guides the user through the powerful survival simulation tool available in East. Chapter 50 is a note which gives details on specifying dropout parameters for survival studies in East with the help of an example. A unified formula for calculating the expected number of events d(l) in a time-to-event trial can be found in the Appendix D. 821 <<< Contents 42 42.1 * Index >>> Introduction to Volume 6 Settings Click the icon in the Home menu to adjust default values in East 6. The options provided in the Display Precision tab are used to set the decimal places of numerical quantities. The settings indicated here will be applicable to all tests in East 6 under the Design and Analysis menus. All these numerical quantities are grouped in different categories depending upon their usage. For example, all the average and expected sample sizes computed at simulation or design stage are grouped together under the category ”Expected Sample Size”. So to view any of these quantities with greater or lesser precision, select the corresponding category and change the decimal places to any value between 0 to 9. The General tab has the provision of adjusting the paths for storing workbooks, files, and temporary files. These paths will remain throughout the current and future sessions even after East is closed. This is the place where we need to specify the installation directory of the R software in order to use the feature of R Integration in East 6. 822 42.1 Settings <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 The Design Defaults is where the user can change the settings for trial design: Under the Common tab, default values can be set for input design parameters. You can set up the default choices for the design type, computation type, test type and the default values for type-I error, power, sample size and allocation ratio. When a new design is invoked, the input window will show these default choices. Time Limit for Exact Computation This time limit is applicable only to exact designs and charts. Exact methods are computationally intensive and can easily consume several hours of computation time if the likely sample sizes are very large. You can set the maximum time available for any exact test in terms of minutes. If the time limit is reached, the test is terminated and no exact results are provided. Minimum and default value is 5 minutes. Type I Error for MCP If user has selected 2-sided test as default in global settings, then any MCP will use half of the alpha from settings as default since MCP is always a 1-sided test. Sample Size Rounding Notice that by default, East displays the integer sample size (events) by rounding up the actual number computed by the East algorithm. In this case, the look-by-look sample size is rounded off to the nearest integer. One can also see the original floating point sample size by selecting the option ”Do not round sample size/events”. 42.1 Settings 823 <<< Contents 42 * Index >>> Introduction to Volume 6 Under the Group Sequential tab, defaults are set for boundary information. When a new design is invoked, input fields will contain these specified defaults. We can also set the option to view the Boundary Crossing Probabilities in the detailed output. It can be either Incremental or Cumulative. Simulation Defaults is where we can change the settings for simulation: If the checkbox for ”Save summary statistics for every simulation” is checked, then East simulations will by default save the per simulation summary data for all the 824 42.1 Settings <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 simulations in the form of a case data. If the checkbox for ”Suppress All Intermediate Output” is checked, the intermediate simulation output window will be always suppressed and you will be directed to the Output Preview area. The Chart Settings allows defaults to be set for the following quantities on East6 charts: We suggest that you do not alter the defaults until you are quite familiar with the software. 42.1 Settings 825 <<< Contents * Index >>> 43 Tutorial: Survival Endpoint This tutorial introduces you to East 6, using examples for designing a clinical trial to compare survival in two groups. It is suggested that you go through the tutorial while you are at the computer, with East 6 running in it. 43.1 A Quick Feel of the Software When you open East 6, the screen will look as shown below. In the tabs bar at the top of the ribbon, Design tab is already selected. Each tab has its own ribbon. All the commands buttons under Design tab are displayed in its ribbon, with suggestive icons. These commands have been grouped under the categories of Continuous, Discrete, Count, Survival and General. For this tutorial, let us explore the command Two Samples under Survival category. In East, we use the terms ’time to event’ and ’survival’ interchangeably. Click on Two Samples. You will see a list of 826 43.1 A Quick Feel of the Software <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 action items, which are dialog box launchers. Click on Logrank Test Given Accrual Duration and Study Duration. You will get the following dialog box in the work area. This dialog box is for computing Sample Size (n) and Number of Events. All the default input specifications under the tab Design Parameters are on display: Design Type=Superiority, Number of Looks=1, Test Type=1-Sided, Type-1 Error (α)=0.025, Power (1-β)=0.9, Allocation Ratio (nt /nc )=1, # of Hazard Pieces=1, Input Method=Hazard Rates, Hazard Ratio (λt /λc )=0.5, Log Hazard Ratio ln(λt /λc )=-0.693, Hazard Rate (Control)=0.0347, Hazard Rate (Treatment)=0.0173, and Variance of Log-Hazard Ratio=Null. There are two radio buttons in this dialog box, one at the side of Power (1-β) box and the second at the side of the combined boxes for Sample Size (n) and Number of Events. By default, the latter radio button is 43.1 A Quick Feel of the Software 827 <<< Contents 43 * Index >>> Tutorial: Survival Endpoint selected indicating that the items against this radio button are to be computed using all other inputs. Similarly, if the first radio button is selected, then Power will be computed using all other inputs. Now click on the tab Accrual/Dropout and you will see the following dialog box. The default specifications in this dialog box are: Subjects are followed=Until End of Study, Accrual Duration=22, Study Duration=38, # of Accrual Periods=1, and no Dropouts. Now accept all the default specifications that are displayed for this single look design and be ready to compute the Sample Size (n) and the Number of Events for the design. Click Compute. At the end of the computation, you will see the results appearing at the bottom of the screen, in the Output Preview pane, as shown below. This single row of output preview contains relevant details of all the inputs and the computed results for events and accruals. The maximum value for events is 88 and the committed accrual is 182 subjects. Since this is a fixed-look design, the expected events are same as the maximum required. Click anywhere in this row, and then click on the 828 icon to get a detailed display in the upper pane of the screen as shown 43.1 A Quick Feel of the Software <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 below. The contents of this output, displayed in the upper pane, are the same as what is contained in the output preview row for Design1 shown in the lower pane, but the upper pane display is easier to read and comprehend. The title of the upper pane display is Output Summary. This is because, you can choose more than one design in the Output Preview pane and the display in the upper pane will show the details of all the selected designs in juxtaposed columns. 43.1 A Quick Feel of the Software 829 <<< Contents 43 * Index >>> Tutorial: Survival Endpoint The discussion so far gives you a quick feel of the software for computing the required events and sample size for a single look survival design. We have not discussed about all the icons in the output preview pane or the library pane or the hidden Help pane in the screen. We will describe them taking an example for a group sequential design in the next section. 43.2 Group Sequential Design for a Survival Superiority Trial 43.2.1 Background Information on the study 43.2.2 Creating the design in East 43.2.3 Design Outputs 43.2.4 East icons explained 43.2.5 Saving created designs 43.2.6 Displaying Detailed Output 43.2.7 Comparing Multiple Designs 43.2.8 Events vs. Time plot 43.2.9 Simulation 43.2.10 Interim Monitoring 43.2.1 Background Information on the study The randomized aldactone evaluation study (RALES) was a double-blind multicenter clinical trial of aldeosterone-recepter blocker vs. placebo published in New England Journal of Medicine (vol 341, 10, pages 709-717, 1999). This trial was open to patients with severe heart failure due to systolic left ventricular dysfunction. The Primary endpoint was all-causes mortality. The anticipated accrual rate was 960 patients/year. The mortality rate for the placebo group was 38%. The investigators wanted 90% power to detect a 17% reduction in the mortality hazard rate for the Aldactone group (from 0.38 to 0.3154) with α = 0.05, 2-sided test. Six DMC meetings were planned. The dropout rate in both the groups is expected to be 5% each year. The patient accrual period is planned to be 1.7 years and the total study duration to be 6 years. 43.2.2 Creating the design in East For our purpose, let us create our own design from the basic details of this study. Now start afresh East. On the Design tab, click on Two Samples under Survival category. You will see a list of action items, which are dialog box launchers. Click on the second option Logrank Test Given Accrual Duration and Study 830 43.2 Group Seq. Design – 43.2.2 Creating the design in East <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Duration. You will get the following dialog box in the work area. All the specifications you see in this dialog box are default values, which you will have to modify for the study under consideration. Now, let the Design Type be Superiority. Next, enter 6 in the Number of Looks box. You can see the range of choices for the 43.2 Group Seq. Design – 43.2.2 Creating the design in East 831 <<< Contents 43 * Index >>> Tutorial: Survival Endpoint number of looks is from 1 to 20. Immediately after this selection, you will see a new tab Boundary Info added to the input dialog box. We will look into this tab, after you complete the filling of current tab Design Parameters. Next, choose 2-Sided in the Test Type box. Next, enter 0.05 in the Type-1 Error (α) box, and 0.9 in the Power box. Next enter the specifications for survival parameters. Keep # of Hazard Pieces as 1. Click on the check box against Hazard Ratio and choose Hazard Rates as the Input Method. Enter 0.83 as the Hazard Ratio and 0.38 as the Hazard Rate (Control). East computes and displays the Hazard Rate (Treatment) as 0.3154. Keep the default choice of Null for Variance of Log-Hazard Ratio. Now the dialog box will look as shown 832 43.2 Group Seq. Design – 43.2.2 Creating the design in East <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 below. Next click the tab Accrual/Dropout . Keep the specification ‘Until End of Study’ for Subjects are followed. Enter 1.7 as Accrual Duration and 6 as Study Duration. Keep # of Accrual Periods as 1. Change the # of Pieces for dropouts to 1. Choose ’Prob. of Dropout’ as the Input Method for entering information on dropouts. Enter 0.05 as probability of dropout at end of 1 year for both the groups. Now the dialog box will appear as shown below. Now click on the Boundary tab. In the dialog box of this tab, you can specify stopping boundaries for efficacy or futility or both. For this trial, let us consider only Efficacy boundaries only. Choose ’Spending Functions’ as the Efficacy Boundary Family. 43.2 Group Seq. Design – 43.2.2 Creating the design in East 833 <<< Contents 43 * Index >>> Tutorial: Survival Endpoint Choose ’Lan-DeMets’ in the Spending Function box. Choose ’OF’ in the Parameter box. Next, click the radio button near ’Equal’ for Spacing of Looks. Choose ’Z Scale’ in the Efficacy Boundary Scale box. In the table below of look-wise details, the columns - Info Fraction, Cumulative Alpha Spent, and the upper and lower efficacy boundaries are computed and displayed as shown here. Scroll a little bit to see the sixth look details. The two icons and represent buttons for Error Spending Function chart and Stopping Boundaries chart respectively. Click these two buttons one by one to see 834 43.2 Group Seq. Design – 43.2.2 Creating the design in East <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 the following charts. 43.2 Group Seq. Design – 43.2.2 Creating the design in East 835 <<< Contents 43 * Index >>> Tutorial: Survival Endpoint 43.2.3 Design Outputs Now you have completed specifying all the inputs required for a group sequential trial design and you are ready to compute the required events and sample size or accruals for the trial. Click on the Compute button. After the computation is over, East will show in the Output Preview pane the following results: This single row of output preview contains relevant details of all the inputs and the computed results for events and accruals. The maximum required Events is computed as 1243 and the Committed Accrual to be 1646 subjects. The expected Events under H0 and H1 are estimated to be 1234 and 904 respectively. The expected Study Duration under H0 and H1 are 5.359 and 3.729 respectively. Click anywhere in this Output Preview row and then click on 836 43.2 Group Seq. Design – 43.2.3 Design Outputs icon to get a <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 summary in the upper pane of the screen as shown below. 43.2.4 East icons explained In the ’Output Preview’ pane, you see the following icons in the upper row. The functions of the above icons are as indicated below. The tooltips also will indicate their functions. 43.2 Group Seq. Design – 43.2.4 East icons explained 837 <<< Contents 43 * Index >>> Tutorial: Survival Endpoint Output Summary(The output summary of selected design(s) will appear in the upper pane) Edit Design (The input dialog box of a selected design will appear in the upper pane) Save in Workbook (Save one or more selected designs in a workbook) Delete (Delete one or more selected designs) Rename (Rename Design names) Print (Print selected designs) Display Precision (Local Settings) Filter (Filter and select designs according to specified conditions) Show/Hide Columns (Show/Hide Columns of the designs in the Output Preview panel) The following icons can be seen at the right end of Output Preview pane and Output Summary or Input/Output window respectively. Their functions are: Maximize Output Preview Pane Minimize Output Preview Pane You may also notice a row of icons at the top of Output Summary window as shown below. The first icon is for Plot (Plots of a selected design will appear in a pop-up window). 838 43.2 Group Seq. Design – 43.2.4 East icons explained <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 The second icon is for Show Tables (The data for different plots can be displayed in tabular form in pop-up windows). If you have multiple designs in the output summary window, the third icon becomes active and can be used to move the order of those columns in the Output Summary. The fourth icon is to print the Output Summary window. As an example, if you click Power vs. Sample Size under Plot icon, you will get the 43.2 Group Seq. Design – 43.2.4 East icons explained 839 <<< Contents 43 * Index >>> Tutorial: Survival Endpoint following chart. If you want to see the data underlying the above chart, click Show Table icon and click 840 43.2 Group Seq. Design – 43.2.4 East icons explained <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Power vs. Sample Size. You will see the following table in a pop-up window. You can customize the format of the above table and also save it as case data in a workbook. You may experiment with all the above icon / buttons to understand their functions. 43.2.5 Saving created Designs in the library and hard disk In the Output Preview pane, select one or more design rows and click the icon, The selected design(s) will then get added as a node(s) in the current workbook, as 43.2 Group Seq. Design – 43.2.5 Saving created designs 841 <<< Contents 43 * Index >>> Tutorial: Survival Endpoint shown below. The above action only adds the design to the workbook node in the library and it is not saved in the hard disk. For saving in the hard disk, you may either save the entire workbook or only the design by right-clicking on the desired item and choosing save or save as options. Here in the library also, you see rows of icons. Some of these icons you have already seen. The functions of other icons are: Details (Details of a selected design will appear on the upper pane in the work area) Output Settings (Output Settings can be changed here) Simulate (Start the simulation process for any selected design node) Interim Monitoring (Start the Interim Monitoring process for any selected design) 43.2.6 Displaying Detailed Output Select the design from the Library and click the 842 icon or Right-click on the Des1 43.2 Group Seq. Design – 43.2.6 Displaying Detailed Output <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 node in the library and click Details. You will see the detailed output of the design displayed in the Work area. 43.2 Group Seq. Design – 43.2.6 Displaying Detailed Output 843 <<< Contents 43 * Index >>> Tutorial: Survival Endpoint 43.2.7 Comparing Multiple Designs Click on Des1 row and then click Edit icon . You will get the input dialog box in the upper pane. Change the Power value to 0.8 and then click Compute. You will see now Des2 is created and a row added to Output Preview pane as shown below. Click on Des1 row and then keeping Ctrl key pressed, click on Des2 row. Now both the rows will be selected. Next, click the Output Summary icon 844 43.2 Group Seq. Design – 43.2.7 Comparing Multiple Designs . <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Now you will see the output details of these two designs displayed in the upper pane Compare Designs in juxtaposed columns, as shown below. In a similar way, East allows the user to easily create multiple designs by specifying a range of values for certain parameters in the design window. For example, in a survival trial the Logrank Test given Accrual Duration and Study Duration design allows the input of multiple key parameters at once to simultaneously create a number of different designs. For example, suppose in a multi-look study the user wants to generate designs for all combinations of the following parameter values: Power = 0.8 and 0.9, and Hazard Ratio - Alternative = 0.6, 0.7, 0.8 and 0.9. The number of combinations is 2 x 4 = 8. East creates all permutations using only a single specification under the Design Parameters tab in the design window. As shown below, the values for Power are entered as a list of comma separated values, while the alternative hazard ratios are entered as a colon separated range of values, 0.6 to 0.9 in 43.2 Group Seq. Design – 43.2.7 Comparing Multiple Designs 845 <<< Contents 43 * Index >>> Tutorial: Survival Endpoint steps of 0.1. East computes all 8 designs and displays them in the Output Preview window: East provides the capability to analyze multiple designs in ways that make comparisons between the designs visually simple and efficient. To illustrate this, a selection of a few of the above designs can be viewed simultaneously in both the Output Summary section as well as in the various tables and plots. The following is a subsection of the designs computed from the above example with differing values for number of looks, power and hazard ratio. Designs are displayed side by side, allowing 846 43.2 Group Seq. Design – 43.2.7 Comparing Multiple Designs <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 details to be easily compared: In addition East allows multiple designs to be viewed simultaneously either graphically or in tabular format: Notice that all the four designs in the Output Summary window are selected. Following figures compare these four designs in different formats. 43.2 Group Seq. Design – 43.2.7 Comparing Multiple Designs 847 <<< Contents 43 * Index >>> Tutorial: Survival Endpoint Stopping Boundaries (table) Expected Sample Size (table) 848 43.2 Group Seq. Design – 43.2.7 Comparing Multiple Designs <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Power vs. Sample Size (plot) Total Sample Size / Events vs. Time (plot) This capability allows the user to explore a greater space of possibilities when determining the best choice of study design. 43.2.8 Events vs. Time plot For survival studies, East provides a variety of charts and plots to visually validate and 43.2 Group Seq. Design – 43.2.8 Events vs. Time plot 849 <<< Contents 43 * Index >>> Tutorial: Survival Endpoint analyze the design. For example, the Sample Size / Events vs. Time plot allows the user to see the rate of increase in the number of events (control and treatment) over time (accrual duration, study duration). An additional feature of this particular chart is that a user can easily update key input parameters to determine how multiple different scenarios can directly impact a study. This provides significant benefits during the design phase, as the user can quickly examine how a variety of input values affect a study before the potentially lengthy task of simulation is employed. To illustrate this feature what follows is the example from the RALES study. For study details, refer to subsection Background Information on the study of this tutorial. Currently there are ten designs in the Output Preview area. Select Des1 from them and save it to the current workbook. You may delete the remaining ones at this point. To view the Sample Size / Events vs. Time plot, select the corresponding node in the Library and under the Charts icon choose Sample Size / Events vs. Time: 850 43.2 Group Seq. Design – 43.2.8 Events vs. Time plot <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Survival parameters for this design can be edited directly through this chart by clicking the Modify button. The Modify Survival Design window is then displayed for the user to update design parameters: To illustrate the benefit of the modification feature, suppose at design time there is potential flexibility in the accrual and duration times for the study. To see how this may affect the number of subsequent events, modify the design to change the Accrual Duration to 3 and Study Duration to 4. Re-create the plot to view the effect of these new values on the shape and magnitude of the curves by clicking OK: Similar steps can be taken to observe the effect of changing other parameter values on the number of events necessary to adequately power a study. 43.2 Group Seq. Design – 43.2.8 Events vs. Time plot 851 <<< Contents 43 * Index >>> Tutorial: Survival Endpoint 43.2.9 Simulation In the library, right-click on the node Des1 and click Simulate. You will be presented with the following Simulation sheet. This sheet has four tabs - Test Parameters, Response Generation, Accrual/Dropout, and Simulation Controls. Additionally, you can click Include Options and add some more tabs like Site, Randomization, User Defined R Function and Stratification. The first three tabs essentially contain the details of the parameters of the design. In the Simulation Control tab, you can specify the number of simulations to carry out and specify the file for storing simulation data. Let us first carry out 1000 simulations to check whether the design can reach the specified power of 90%. The Response Generation tab, by default, shows the hazard rates for control and treatment. We will use these values in our simulation. In the Simulation Control tab, specify the number of simulations as 1000. Use the 852 43.2 Group Seq. Design – 43.2.9 Simulation <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Random number seed as Fixed 12345. Let us keep the values in other tabs as they are and click Simulate. The progress of simulation process will appear in a temporary window as shown below. This is the intermediate window showing the complete picture of simulations. Close this window after viewing it. You can see the complete simulation output in the details view. A new row, with the ID as Sim1, will be added in Output Preview. 43.2 Group Seq. Design – 43.2.9 Simulation 853 <<< Contents 43 * Index >>> Tutorial: Survival Endpoint Click on Sim1 row and click the Output Summary icon . You will see Simulation Output summary appearing in the upper pane. It shows that the simulated power as 0.892, indicating that in 892 out of 1000 simulations the boundary was crossed. You can save Sim1 as a node in the workbook. If you right-click on this node and then click Details, you will see the complete details of simulation appearing in the work 854 43.2 Group Seq. Design – 43.2.9 Simulation <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 area. Here is a part of it. 43.2.10 Interim Monitoring Click Des1 node under workbook wbk1 and click the icon. Alternatively, you can right-click the Des1 node and select the item Interim Monitoring. In either case, you will see the IM dashboard appearing as shown below. 43.2 Group Seq. Design – 43.2.10 Interim Monitoring 855 <<< Contents 43 * Index >>> Tutorial: Survival Endpoint In the top row, you see a few icons. For now, we will discuss only the first icon which represents Test Statistic Calculator. Using this calculator, you will enter the details of interim look data analysis results into the IM dashboard. Suppose we have the following data used by the Data Monitoring Committee during the first 5 looks of interim monitoring. Date Aug 96 Mar 97 Aug 97 Mar 98 Aug 98 Total Deaths 125 299 423 545 670 δ̂ -0.283 -0.195 -0.248 -0.259 -0.290 SE(δ̂) 0.179 0.116 0.097 0.086 0.077 Z-Statistic -1.581 -1.681 -2.557 -3.012 -3.766 The first look was taken at 125 events and the analysis of the data showed the value of δ= -0.283 and SE(δ)=0.179. First, click the blank row in the IM Dashboard and then click the icon. Now you can enter the first analysis results into the TS calculator and click Recalc. The Test Statistic value will be computed and the TS 856 43.2 Group Seq. Design – 43.2.10 Interim Monitoring <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 calculator will appear as shown below. Now click on the button ’OK’ to get the first look details into IM Dashboard. The following message will appear that some required computations are being carried out. After the computations are over, the output for the first look will appear in the IM 43.2 Group Seq. Design – 43.2.10 Interim Monitoring 857 <<< Contents 43 * Index >>> Tutorial: Survival Endpoint Dashboard as shown below. For the first look at total number of events, 125, the Information Fraction works out to be 0.101. The efficacy boundaries for this information fraction are newly computed. The Repeated 95% Confidence Interval limits and Repeated p-value are computed and displayed. You may also see that the charts at the bottom of the IM Dashboard have been updated with relevant details appearing on the side. In a similar way, enter the interim analysis results for the next 4 looks in the IM Dashboard. At the fifth look, the boundary is crossed. A message window appears as shown below. 858 43.2 Group Seq. Design – 43.2.10 Interim Monitoring <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Click Stop and you will see the details of all the looks in the IM Dashboard as shown below. The final Adjusted Inference output also appears as displayed below. One important point to note here is that this study got over almost about 2 years ahead of planned schedule, because of the very favorable interim analysis results. This completes the Interim Monitoring exercise in this trial. 43.3 User Defined R Function East allows you to customize simulations by inserting user-defined R functions for one or more of the following tasks: generate response, compute test statistic, randomize subjects, generate arrival times, and generate dropout information. The R functionality for arrivals and dropouts will be available only if you have entered such information at the design stage. Although the R functions are also available for all normal and binomial endpoints, we will illustrate this functionality for a time-to-event endpoint. Specifically, we will use an R function to generate Weibull survival responses. Start East afresh. On the Design tab, click Survival: Two Samples and then Logrank 43.3 User Defined R Function 859 <<< Contents 43 * Index >>> Tutorial: Survival Endpoint Test Given Accrual Duration and Study Duration. Choose the design parameters as shown below. In particular, select a one sided test with type-1 error of α = 0.025. Click Compute and save this design (Des1) to the Library. Right-click Des1 in the Library and click Simulate. In the Simulation Control Info tab, check the box for Suppress All Intermediate Output. Type 10000 for Number of 860 43.3 User Defined R Function <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Simulations and select Clock for Random Number Seed. In the top right-hand corner for the input window, click Include Options, and then click User Defined R Function. Go to the User Defined R Function tab. For now, leave the box Initialize R simulation (optional) unchecked. This optional task can be used to load required libraries, set seeds for simulations, and initialize global variables. Select the row for Generate Response, click Browse..., and navigate to the folder containing your R file. Select the file and click Open. The path should now be displayed under File Name. 43.3 User Defined R Function 861 <<< Contents 43 * Index >>> Tutorial: Survival Endpoint Click View to open a notepad application to view your R file. In this example, we are generating survival responses for both control and treatment arms from a Weibull with shape parameter = 2 (i.e. exponential), with the same hazard rate in both arms. This sample file is available in the folder named R Samples under installation directory of East 6. Copy the function name (in this case GenWeibull) and paste it into the cell for Function Name. Save and close the R file, and click Simulate. Return to the tab for User Defined R Function, select the Generate Response row, and click View. In the R function, change the shape parameter = 1, to generate responses from a Weibull distribution with increasing hazards. Save and close the R 862 43.3 User Defined R Function <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 file, and click Simulate. You may have to save this file on some other location. Select both simulations (Sim1 and Sim2) from the Output Preview, and on the toolbar, click to display in the Output Summary. Notice that the type-1 error appears to be controlled in both cases. When we simulated from the exponential (Sim2), the average study duration (30.7 months) was close to what was calculated at Des1 for the expected study duration under the null. However, when we simulated from the Weibull with decreasing hazards (Sim1), the average study duration increased to 34.6 months. 43.3 User Defined R Function 863 <<< Contents 43 * Index >>> Tutorial: Survival Endpoint The ability to use custom R functions for many simulation tasks allows considerable flexibility in performing sensitivity analyses and assessment of key operating characteristics. 864 43.3 User Defined R Function <<< Contents * Index >>> 44 Superiority Trials with Variable Follow-Up This chapter will illustrate through a worked example how to design, monitor and simulate a two-sample superiority trial with a time-to-event trial endpoint. Each subject who has not dropped out or experienced the event is followed until the trial ends. This implies that a subject who is enrolled earlier could potentially be followed for a longer time than a subject who is enrolled later on in the trial. In East we refer to such designs as variable follow-up designs. 44.1 The RALES Clinical Trial: Initial Design The RALES trial (Pitt et al., 1999) was a double blind study of aldosterone-receptor blocker spironolactone at a daily dose of 25 mg in combination with standard doses of an ACE inhibitor (treatment arm) versus standard therapy of an ACE inhibitor (control arm) in patients who had severe heart failure as a result of systolic left ventricular dysfunction. The primary endpoint was death from any cause. Six equally-spaced looks at the data using the Lan-DeMets-O’Brien-Fleming spending function were planned. The trial was designed to detect a hazard ratio of 0.83 with 90% power at a two-sided 0.05 level of significance. The hazard rate of the control arm was estimated to be 0.38/year. The trial was expected to enroll 960 patients/year. We begin by using East to design RALES under these basic assumptions. Open East, click Design tab and then Two Samples button in Survival group. You will see the following screen. Note that there are two choices available in the above list; Logrank Test Given Accrual Duration and Accrual Rates and Logrank Test Given Accrual Duration and Study Duration. The option Logrank Test Given Accrual Duration and Study Duration is explained later in Chapter 48. Now click Logrank Test Given Accrual 44.1 The RALES Clinical Trial: Initial Design 865 <<< Contents 44 * Index >>> Superiority Trials with Variable Follow-Up Duration and Accrual Rates and you will get the following input dialog box. In the above dialog box, enter 6 for Number of Looks, keep the default choices of Design Type: Superiority, change the Test Type to 2-Sided, Type I Error (α) to 0.05, Power : 0.9, and the Allocation Ratio: 1. Further, keep the default choices of # of Hazard Pieces as 1 and the Input Method: as Hazard Rates. Click the check box against Hazard Ratio and enter the Hazard Ratio as 0.83. Enter Hazard Rate (Control) as 0.38. You will see the Hazard Rate (Treatment:Alt) computed as 0.3154. Also, keep the Variance of Log Hazard Ratio to be used as under Null. Now the Test Parameters tab of the input 866 44.1 The RALES Clinical Trial: Initial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 dialog will appear as shown below. Now click on the tab Boundary. You will see the following input dialog box. Keep all the default specifications for the boundaries to be used in the design. You can 44.1 The RALES Clinical Trial: Initial Design 867 <<< Contents 44 * Index >>> Superiority Trials with Variable Follow-Up look at the Error Spending Chart by clicking on the icon Close this chart. If you click on the boundary chart icon 868 , you will see the boundary chart as 44.1 The RALES Clinical Trial: Initial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 displayed below. Close this chart. Now click Accrual/Dropouts tab. Keep the default choice Until End of Study for the input Subjects are followed:. Keep the # of Accrual Periods as 1 and enter 960/year as the accrual rate. For this example, assume no dropouts. The dialog box 44.1 The RALES Clinical Trial: Initial Design 869 <<< Contents 44 * Index >>> Superiority Trials with Variable Follow-Up will look as shown below. Under the Accrual section and in column titled Comtd. (commited) , you see two radio buttons Durations and Subjects with the latter selected by default. The selected item will appear as the x-axis item in the Study Duration vs. Accrual chart, which you can get by clicking on the icon displayed on the side. Against Durations and Subjects you see two rows of three cells each. The first and third cells will show the min and max values for the row item and the middle cell, mid value between min and max values. From these results, you see that any sample size in the range 1243 to 3111 will suffice to attain the desired 90% power and selects 2177, the mid-point of the allowable range, as the default sample size. Depending on the needs of the study, you may wish to use a different sample size within the allowable range. The choice of sample size generally depends on how long you wish the study to last. The larger you make the patient accrual the shorter will be the total study duration, consisting of accrual time plus follow up time. To understand the essence of this trade-off, bring up 870 44.1 The RALES Clinical Trial: Initial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 the Study Duration vs. Accrual chart by clicking on the icon . Based on this chart, a sample size of 1660 subjects is selected. Close the chart and enter 1660 for Committed Accrual (subjects). Click on Compute and see icon to the results in the new design created under Output Preview. Click the see the design summary. This sample size ensures that the maximum study duration will be slightly more than 4.9 years. Additionally, under the alternative hypothesis, the 44.1 The RALES Clinical Trial: Initial Design 871 <<< Contents 44 * Index >>> Superiority Trials with Variable Follow-Up expected study duration will be only about 3.3 years. 44.2 Incorporating Drop-Outs The investigators expect 5% of the patients in both the groups to drop out each year. To incorporate this drop-out rate into the design, in the Piecewise Constant Dropout Rates tab, select 1 for the number of pieces and change the Input Method from Hazard Rates to Prob. of Dropout. Then enter 0.05 as the probability of dropouts at 1 year for both the groups. To make Des1 and Des2 comparable, change the sample size of Des2 to 1660 by 872 44.2 Incorporating Drop-Outs <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 typing this value into the Committed Accrual (Subjects) cell. Click on Compute and see the results in the new design created under Output Preview. Select the two designs and click on icon to see them side-by-side. A comparison of two designs reveals that, because of the drop-outs, the maximum study duration will be prolonged from 4.9 years under Des1 to 5.9 years under Des2. The expected study duration will likewise be prolonged from 3.3 years to 3.7 years under the alternative hypothesis, and from 4.5 years to 5.3 years under the null hypothesis. 44.2 Incorporating Drop-Outs 873 <<< Contents 44 44.3 * Index >>> Superiority Trials with Variable Follow-Up Incorporating NonConstant Accrual Rates In many clinical trials, the enrollment rate is low in the beginning and reaches its maximum expected level a few months later when all the sites enrolling patients have been recruited. Suppose that patients are expected to enroll at an average rate of 400/year for the first six months and at an average rate of 960/year thereafter. Click on icon on the bottom of your screen to go back to the input the window of Des2. Now in Accrual section, specify that there are two accrual periods and enter the accrual rate for each period in the dialog box as shown below. Once again let the sample size be 1660 to make Des3 comparable to the other two designs. Click on Compute to complete the design. Select all the three designs in the 874 44.3 Incorporating Non-Constant Accrual Rates <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Output Preview area and click on icon to see them side-by-side. Notice that the enrollment period has increased from 1.7 years to 2 years. Likewise, the maximum study duration and the expected study durations under H0 and H1 have also increased relative to Designs 1 and 2. Now the maximum study duration is 6.15 years. 44.4 Incorporating Piecewise Constant Hazards Prior studies had suggested that the survival curves might not follow an exponential distribution. Suppose it is believed that the hazard rate for failure on the control arm decreases after the first 12 months from 0.38 to 0.35. We will assume that the hazard ratio is still 0.83. We can enter the appropriate piecewise hazard rates into East as follows. Click on icon on the bottom of your screen to go back to 44.4 Incorporating Piecewise Constant Hazards 875 <<< Contents 44 * Index >>> Superiority Trials with Variable Follow-Up the input window and go to Test Parameters tab. Change the sample size to 1660 on Accrual/Dropouts tab for comparability with the previous designs. Click on Compute and see the results of the design in the Output Summary mode. We observe that the impact of changing from a constant hazard rate to a piecewise constant hazard rate is substantial. The maximum study duration has increased from 6.15 years for Des3 to 6.56 years for Des4. Before proceeding further, save all the four designs in the workbook. 876 44.4 Incorporating Piecewise Constant Hazards <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 44.5 Simulating a Trial with Proportional Hazards 44.5.1 Simulation Worksheet 44.5.2 Simulating Under H1 44.5.3 Simulating... It would be useful to verify the operating characteristics of the various designs created in the previous section by simulation. The new survival simulation capabilities in East permit this. Let us use these capabilities to simulate Des4. Save this design in the workbook. Right-click on this design node and select the menu item Simulate. You’ll see the following Survival Simulation worksheet. 44.5.1 Components of the Simulation Worksheet This simulation worksheet consists four tabs - Test Parameters, Response Generation, Accrual/Dropouts, and Simulation Controls. The Test Parameters tab displays all the parameters of the simulation. If desired, you may modify one or more 44.5 Simulating a Trial with Prop.Hazards – 44.5.1 Simulation Worksheet 877 <<< Contents 44 * Index >>> Superiority Trials with Variable Follow-Up of these parameter values before carrying out simulation. The second tab Response Generation will appear as shown below. In this tab, you may modify values of response parameters before carrying out simulation. The third tab Accrual/Dropouts will display information relating to accrual and dropouts. As in the case of other tabs, you may modify one or more values appearing in this tab before simulation is carried out. In the Simulation Controls, you may specify the simulation parameters like 878 44.5 Simulating a Trial with Prop.Hazards – 44.5.1 Simulation Worksheet <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 number of simulations required and the desired simulation seed etc. Also optionally, you may bring out one more tab Randomization by clicking on Include Options button on the right hand top corner. In the Randomization, you may alter the allocation ratio of the design before carrying out simulation. The other tabs under the Include Options will be discussed elsewhere in this manual. Keeping all the default parameter values same as in the different tabs, click Simulate. You can see the progress of the simulation process summarized as shown in the following screen shot. The complete summary of simulations will be displayed 44.5 Simulating a Trial with Prop.Hazards – 44.5.1 Simulation Worksheet 879 <<< Contents 44 * Index >>> Superiority Trials with Variable Follow-Up at the end of simulations. Close this window. The simulation results appear in a row in the Output Preview as shown below. The output summary can be seen by clicking on the icon 880 after selecting the 44.5 Simulating a Trial with Prop.Hazards – 44.5.1 Simulation Worksheet <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 simulation row in the Output Preview. Now save the simulation results to the workbook by selecting the simulation results . On this newly added workbook node for simulation, row and then clicking on right-click and select Details. You will see the complete details simulation 44.5 Simulating a Trial with Prop.Hazards – 44.5.2 Simulation Worksheet 881 <<< Contents 44 * Index >>> Superiority Trials with Variable Follow-Up appearing on the output pane. The core part is shown below. 44.5.2 Simulating Under H1 Notice that in the above simulations, we did not change anything on the Response Generation tab which indicates that we executed 10000 simulations under the designs assumptions or in other words, under alternative hypothesis. Let us examine these 10000 simulations more closely. The actual values may differ from the manual, depending on the starting seed used. The column labeled Events in the second table, displays the number of events after which each interim look was taken. The column labeled Avg. Look Time in the first table, displays the average calendar times at which each interim look was taken. Thus, the first interim look (taken after observing 207 events) occurred after an average elapse of about 1.5 years; the second interim look (taken after observing 414 events) occurred after an average elapse of about 2.1 years; and so on. The remaining columns of the simulation output are self-explanatory. The columns labeled Stopping For show that 8966 of the 10000 simulations crossed the lower stopping boundary, thus confirming (up to Monte Carlo accuracy) that this design has 90% power. The detailed output tables also show how the events, drop-outs, accruals, and average follow-up times were observed at each interim analysis. 882 44.5 Simulating a Trial with Prop.Hazards – 44.5.3 Simulating... <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 44.5.3 Simulating Under H0 To simulate under the null hypothesis we must go back to the input window of Sim1 and then to the Response Generation tab. In this pane change the hazard rate for the treatment arm to 0.38 for the first piece and to 0.35 for the second piece of the hazard function. This change implies that we will be simulating under the null hypothesis. Click on the Simulate button. A new row in Output Preview will be added now. Select this row and add to the library node. By double-clicking on this node, you will see the detailed simulation output as shown below. The results are displayed below. Out of 10000 simulated trials only 27 crossed the upper stopping boundary and 25 crossed the lower stopping boundary thus confirming (up to Monte Carlo accuracy) that the type-1 error is preserved for this design. 44.5 Simulating a Trial with Prop.Hazards – 44.6.3 Simulating... 883 <<< Contents 44 44.6 * Index >>> Superiority Trials with Variable Follow-Up Simulating a Trial with NonProportional Hazards 44.6.1 Single-Look Design 44.6.2 Single-Look Design 44.6.3 Group Seq. Design A new agent is to be tested against placebo in a large cardiovascular study with the endpoint being time to stroke, MI or death. The control arm has a 12-month event-free rate of 97%. We wish to design the study to detect a hazard ratio of 0.75 with 90% power, using a two-sided test conducted at the 0.05 level. An important design consideration is that treatment differences are expected to emerge only after one year of therapy. Subjects will enroll at the rate of 1000/month and be followed to the end of the study. The dropout rate is expected to be 10% per year for both treatment arms. Finally, the study should be designed for maximum study duration of 50 months. The usual design options in East are not directly applicable to this trial because they require the hazard ratio to be constant under the alternative hypothesis. Here, however, we are required to power the trial to detect a hazard ratio of 0.75 that only emerges after patients have been on the study for 12 months. The simulation capabilities of East can help us with the design. 44.6.1 Single-Look Design with Proportional Hazards We begin by creating a single-look design powered to detect hazard ratio of 0.75, ignoring the fact that the two survival curves separate out only after 12 months. Open a new survival design worksheet by clicking on Design>Survival>Logrank Test Given Accrual Duration and Accrual Rates. In the resulting Test Parameters tab, enter the parameters values as shown below. Click on the tab Accrual/Dropouts and enter the values as shown below, 884 44.6 Simulating a Trial with Non-Prop.Hazards – 44.6.1 Single-Look Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 excluding the Accrual tab. East informs you in the Accrual tab, that any sample size in the range 2524 to 22260 will suffice to attain the desired 90% power. However, the study will end sooner if we enroll more patients. Recall that we wish the trial to last no more than 50 months, inclusive of accrual and follow-up. The Accrual-Duration chart can provide guidance on sample size selection. This chart reveals that if 6400 subjects are enrolled, the expected maximum duration of a trial is close to 50 months. 44.6 Simulating a Trial with Non-Prop.Hazards – 44.6.1 Single-Look Design 885 <<< Contents 44 * Index >>> Superiority Trials with Variable Follow-Up Now change the Comtd. number of subjects to 6400 and click on Compute to complete the design. A new row is added for this design in the Output Preview. Select this row and add it to a library node under a workbook. Now you double-click on this node, you will see the detailed output. A section of it is shown below: We can verify the operating characteristics of Des1 by simulation. With the cursor on Des1 node, Click on Simulation icon from the library menu bar. You’ll be taken to the survival simulation worksheet. In the Simulation Control tab, specify the number of simulations to be 1000. Now click on Simulate button. This will generate 1000 simulations from the survival curves specified in the design. Each simulation will consist of survival data on 6400 subjects entering the trial uniformly at the rate of 1000/month. Events (failures) will be tracked and the simulated trial will be terminated when the total number of events equals 508. Subjects surviving past this termination time point will have their survival times censored. The resulting survival data will be summarized in terms of the logrank test statistic. Each simulation records two important quantities: the calendar time at which the last of the specified 508 events arrived; whether or not the logrank test statistic rejected the null hypothesis. We would expect that, on average, the 508 events will occur in about 48.7 months and about 90% of the simulations will reject the null hypothesis. The simulation summary 886 44.6 Simulating a Trial with Non-Prop.Hazards – 44.6.1 Single-Look Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 is shown in the following screen shot. Indeed we observe that the average study duration for this set of 1000 simulations was 48.691 months, and that 913 of the 1000 simulated trials crossed the critical value and rejected H0 and hence the power attained is 0.913. This serves as an independent verification of the operating characteristics of Des1, up to Monte Carlo accuracy. 44.6.2 Single-Look Design with Non-Proportional Hazards Were it not for the fact that the hazard ratio of 0.75 only emerges after 12 months of therapy, Des1 would meet the goals of this study. However, the impact of the late separation of the survival curves must be taken into consideration. This is accomplished, once again, by simulation. Click the Edit Simulation icon while the cursor is on the last simulation node. In the resulting simulation sheet click on Response Generation tab. In this tab, specify that the hazard rates for the control and treatment arms are identical and equal to 0.0025 for the first 12 months and the hazard ratio is 0.75 thereafter. This is done by making appropriate entries in this 44.6 Simulating a Trial with Non-Prop.Hazards – 44.6.2 Single-Look Design 887 <<< Contents 44 * Index >>> Superiority Trials with Variable Follow-Up tab as shown below. Click on the Simulate button. This will generate 10000 simulations from survival curves specified in the Survival Parameters Pane. As before, each simulation will consist of survival data on 6400 subjects entering the trial uniformly at the rate of 1000/month. Events (failures) will be tracked and the simulated trial will be terminated when the total number of events equals 508. The summary output of this simulation 888 44.6 Simulating a Trial with Non-Prop.Hazards – 44.6.2 Single-Look Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 run as shown below. This time only 522 of the 1000 trials were able to reject H0 .The drop in power is of course due to the fact that the two survival curves do not separate out until 12 months have elapsed. Thus events that arise within the first 12 months arrive at the same rate for both arms and are not very informative about treatment differences. We need to increase the power of the study to 90%. This can be accomplished in one of two ways: 1. Prolonging the study duration until a sufficient number of events are obtained to achieve 90% power. 2. Increasing the sample size. The first approach cannot be used because the study duration is not permitted to exceed 50 months. The simulations have shown that the study duration is already almost 50 months, and it has only achieved 56.5% power. Thus we must resort to increasing the sample size. 44.6 Simulating a Trial with Non-Prop.Hazards – 44.6.2 Single-Look Design 889 <<< Contents 44 * Index >>> Superiority Trials with Variable Follow-Up Now if we increase the sample size while keeping the total number of events fixed at 508, the average study duration will drop. The power, however, may not increase. In fact it might even decrease since a larger fraction of the 508 events will arise in the first 12 months, before the two survival curves have separated. To see this, increase the sample size from 6400 to 10000 in the Accrual/Dropouts tab. Then click on Simulate button. From this simulation run, you will get the output summary as shown below. Notice that the average study duration has dropped to 29.7 months. But the power has dropped also. This time only 261 of the 10000 simulations could reject the null hypothesis. To increase power we must increase sample size while keeping the study duration fixed at about 50 months. This is accomplished by selecting the Look Time option from the drop-down box in the Fix at Each Look section of the Survival Parameters Pane and choosing a 50 month Total Study Durn., while keeping the 890 44.6 Simulating a Trial with Non-Prop.Hazards – 44.6.2 Single-Look Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 sample size increase from 6400 to 10000. We will now run 10000 simulations in each of which 10000 subjects are enrolled at the rate of 1000/year. Each simulated trial will be terminated at the end of 50 months of calendar time and a logrank test statistic will be derived from the data. Click on the Simulate button. Add the simulation run output to library node and see the 44.6 Simulating a Trial with Non-Prop.Hazards – 44.6.2 Single-Look Design 891 <<< Contents 44 * Index >>> Superiority Trials with Variable Follow-Up following output summary. 892 44.6 Simulating a Trial with Non-Prop.Hazards – 44.6.2 Single-Look Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 For more details, you can click icon after selecting the saved simulation node. Now you can see, the power of the study has increased to 73.5%. On average 811 events occurred during the 50 months that the study remained open. Since we require 90% power, the sample size must be increased even further. This can be done by trial and error over several simulation experiments. Eventually we discover that a sample size of 18000 patients will provide about 90% power with an average of 1358 events. It is evident from these simulations that the proportional hazards assumption is simply not appropriate if the survival curves separate out late. In the present example the 44.6 Simulating a Trial with Non-Prop.Hazards – 44.6.2 Single-Look Design 893 <<< Contents 44 * Index >>> Superiority Trials with Variable Follow-Up proportional hazards assumption would have led to a sample size of 6400 whereas the sample size actually needed was 18000. 44.6.3 Group Sequential Design with Non-Proportional Hazards The single-look design discussed in the previous section required a sample size of 17200 subjects. A group sequential design, monitored by an independent data monitoring committee, is usually more efficient for large studies of this type. Such a trial can be designed with efficacy stopping boundaries or with efficacy and futility stopping boundaries. Consider first a design with five equally spaced efficacy boundaries. Go back to the library, click on Des1 node, and then click on . In the resulting design input dialog window, change the entry in the Number of Looks cell from 1 to 5. Click on Compute button and save the plan as Des2 in the library. Select Des1 and Des2 nodes and then click on 894 to see the following 44.6 Simulating a Trial with Non-Prop.Hazards – 44.6.3 Group Seq. Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 details for both the designs. Des2 reveals that a group sequential design, with five equally spaced looks, taken after observing 104, 208, 312, 416 and 520 events, respectively, utilizing the default Lan-DeMets-O’Brien-Fleming (LD(OF)) spending function, achieves 90% power with a maximum sample size of 12555 and a maximum study duration of 27.232 months. The expected study duration under H1 is 21.451 months. However, these operating characteristics are based on the assumption that the hazard ratio is constant and equals 0.75. Since in fact the hazard ratio is 0.75 only after 12 months of treatment, the actual power of this design is unlikely to be 90%. We can use simulation to determine the actual power. With the cursor in any cell of Des2 node, select 44.6 Simulating a Trial with Non-Prop.Hazards – 44.6.3 Group Seq. Design 895 <<< Contents 44 * Index >>> Superiority Trials with Variable Follow-Up from the menu bar. You will be taken to the simulation worksheet. In the Response Generation tab, make the changes in the hazard rates as shown below. After changing the number of simulations as 1000 in the Simulation Control, click on the Simulate button to run 1000 simulations of Des2 with data being generated from the survival distributions that were specified in the Response Generation tab. 896 44.6 Simulating a Trial with Non-Prop.Hazards – 44.6.3 Group Seq. Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 The results of this simulation run are as shown below. Only 187 of the 1000 simulated trials were able to reject the null hypothesis indicating that the study is grossly underpowered. We can improve on this performance by extending the total study duration so that additional events may be observed. To increase study duration, go to the Simulation Parameters tab and select the Look Time option under Fix at Each Look. We had specified at the outset that the total study duration should not exceed 50 months. Let us therefore fix the total study duration at 50 months and space each interim look 10 months apart by editing 44.6 Simulating a Trial with Non-Prop.Hazards – 44.6.3 Group Seq. Design 897 <<< Contents 44 * Index >>> Superiority Trials with Variable Follow-Up the Study Duration. We are now ready to simulate a 5-look group sequential trial in which the LD(OF) stopping boundaries are applied and the looks are spaced 10 months apart. Each simulated trial will enroll 12555 subjects at the rate of 1000/month. The simulation data will be generated from survival distributions in which the hazard rates of both arms are 0.0025 for the first 12 months and the hazard ratio is 0.75 thereafter. To generate 1000 simulations of this design click on the Simulate button. These simulations do indeed show a substantial increase in power, from 18.7% previously to 79.9% . 898 44.6 Simulating a Trial with Non-Prop.Hazards – 44.6.3 Group Seq. Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 The design specifications stated, however, that the trial should have 90% power. In order to achieve this amount of power we will have to increase the sample size. By trial and error, upon increasing the sample size to 18200 on the Simulation Parameters tab we observe that the power has increased to 90 % (up to Monte Carlo accuracy). 44.6 Simulating a Trial with Non-Prop.Hazards – 44.6.3 Group Seq. Design 899 <<< Contents 44 44.7 * Index >>> Superiority Trials with Variable Follow-Up Simulating a Trial with Stratification variables The data presented in Appendix I of Kalbfleisch and Prentice (1980) on lung cancer patients were used as a basis for this example. We will design a trial to compare two treatments (Standard and Test) in a target patient group where patients had some prior therapy. The response variable is the survival time in days of lung cancer patients. First, we will create a design for 3 looks, to compare the two treatment groups. Next, using this design, we will carry out simulation with stratification variables. Three covariates in the data are used here as stratum variables: a) type of cancer cell (small, adeno, large, squamous,), b) age in years (<= 50, > 50), and c) performance status score (<= 50, > 50 and <= 70, > 70). The input data for base design are as follows: Trial type:superiority; test type:2-sided; type I error:0.05; power:0.90; allocation ratio:1; hazard rate (control):0.009211; hazard rate (treatment):0.004114; number of looks:3; Boundary family:spending functions; spending function:Lan-DeMets (OF); subjects are followed:until end of study; subjects accrual rate:12 per day. The input data for stratified simulation are as given below: The number of stratum variables=3 (cell type; age group; performance status score). Table 44.1: Input data for stratified simulation 44.7.1 Cell type small adeno large squamous Proportion 0.28 0.13 0.25 0.34 Hazard ratio Baseline 2.127 0.528 0.413 Age group ≤ 50 years > 50 years Proportion 0.28 0.72 Hazard ratio Baseline 0.438 Performance status score group ≤ 50 > 50 and ≤ 70 > 70 Proportion 0.43 0.37 0.20 Hazard ratio Baseline 0.164 0.159 Creating the design First we will create a design using the input data. Open East, click Design tab and then Two Samples button in Survival group. Now click Logrank Test: Given Accrual Duration and Accrual Rates. In the resulting screen, enter the input data in the dialog 900 44.7 Simulating a trial with stratification – 44.7.1 Creating the design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 boxes under the different tabs. Finally click on Compute button. Now the dialog boxes under the different tabs will appear as shown below. The Test Parameters tab is shown below, where you can see the computed value of No.of Events. The Boundary will appear as shown below, where all the input data are seen. 44.7 Simulating a trial with stratification – 44.7.1 Creating the design 901 <<< Contents 44 * Index >>> Superiority Trials with Variable Follow-Up The Accrual/Dropouts tab containing the input data will be as shown below. After the design is completed and saved in a workbook, select the design node and 902 44.7 Simulating a trial with stratification – 44.7.1 Creating the design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 click on the output summary icon to see the following output display. 44.7.2 Running Stratified Simulation After selecting the design node, click on Simulate icon. You will see simulation screen with the dialog boxes under different tabs. Click on Include Options and select Stratification. 44.7 Simulating a trial with stratification – 44.7.2 Running Stratified Simulation 903 <<< Contents 44 * Index >>> Superiority Trials with Variable Follow-Up The dialog box under Test Parameters will be as shown below. Keep the default test statistic LogRank and the default choice of Use Stratified Statistic. After entering the stratification input information, the dialog box under Stratification will appear as shown below. After entering adding response related input information, the dialog box under 904 44.7 Simulating a trial with stratification – 44.7.2 Running Stratified Simulation <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Response Generation will display details as shown in the following screen shots. 44.7 Simulating a trial with stratification – 44.7.2 Running Stratified Simulation 905 <<< Contents 44 * Index >>> Superiority Trials with Variable Follow-Up The Accrual/Dropout dialog box will appear as shown below. In the Simulation Control tab, specify number of simulations as 1000 and select the choices under output options to save simulation data. The dialog box will appear as shown below. After clicking on Simulate button, the results will appear in the Output Preview row. Click on it and save it in the workbook. Select this simulation node and click on 906 44.7 Simulating a trial with stratification – 44.7.2 Running Stratified Simulation <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Output Summary icon to see the following stratification simulation output summary. The stratified simulation results show that the attained power 0.856 is slightly less than the design specified power of 0.90. 44.7 Simulating a trial with stratification 907 <<< Contents * Index >>> 45 Superiority Trials with Fixed Follow-Up This chapter will illustrate through a worked example how to design, monitor and simulate a two-sample superiority trial with a time-to-event endpoint in which each subject who has not dropped out or experienced the event is followed for a fixed duration only. This implies that each subject who does not drop-out or experience the event within a given time interval, as measured from the time of randomization, will be administratively censored at the end of that interval. In East we refer to such designs as fixed follow-up designs. 45.1 Clinical Trial of Drug Eluting Stents Drug-eluting coronary-artery stents were shown to decrease the risks of death from cardiac causes, myocardial infarction and target-vessel revascularization as compared to uncoated stents in patients undergoing primary percutaneous coronary intervention (PCI) in two randomized clinical trials published in the September 14, 2006 issue of the New England Journal of Medicine. In the Paclitaxel-Eluting Stent versus Conventional Stent in Myocardial Infarction with ST-Segment Elevation (PASSION) trial, Laarman et al. (2006) randomly assigned 619 patients to receive either a paclitaxel-eluting stent or an uncoated stent. The primary endpoint was the percentage of cardiac deaths, recurrent myocardial infarctions or target-lesion revascularizations at 12 months. A marginally lower 12-month failure rate was observed in the paclitaxel-stent group compared with the uncoated-stent group (8.8% versus 12.8%, p = 0.09). The Trial to Assess the Use of the Cypher Stent in Acute Myocardial Infarction Treated with Balloon Angioplasty (TYPHOON), (Spaulding et al., 2006) showed even more promising results. In this trial of 712 patients the sirolimus-eluting stents had a significantly lower target-vessel failure rate at 12 months than the uncoated stents (7.3% versus to 14.3%, p = 0.004). Based on these results an editorial by Van de Werf (2006) appeared in the same issue of the New England Journal of Medicine as the Typhoon and PASSION trials, recommending that studies with a larger sample size and a hard clinical endpoint be conducted so that drug-eluting stents might be routinely implanted in patients undergoing PCI. In this chapter we will use East to design and monitor a possible successor to the PASSION trial using a time-to-event endpoint with one year of fixed follow-up for each subject. 45.2 Single-Look Design The primary endpoint for the trial is the time to target-vessel failure, with a failure being defined as target-vessel related death, recurrent myocardial infarction, or target-vessel revascularization. Each subject will be followed for 12 months. Based on the PASSION data we expect that 87.2% of subjects randomized to the uncoated stents will be event-free at 12 months. We will design the trial for 90% power to detect an increase to 91.2% in the paclitaxel-stents group, using a two-sided level-0.05 test. Enrollment is expected to be at the rate of 30 subjects per month. 45.2.1 Initial Design 908 45.2 Single-Look Design – 45.2.1 Initial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 45.2.1 Initial Design We begin by opening a new East Workbook and selecting Logrank Test Given Accrual Duration and Accrual Rates. This will open the input window for the design as shown below. Select 2-Sided for Test Type, and enter 0.05 for Type I error. The right hand side panel of this input window is to be used for entering the relevant time-to event information. The default values in the above dialog box must be changed to reflect the time-to-event parameters specified for the design. Select % Cumulative Survival for the Input 45.2 Single-Look Design – 45.2.1 Initial Design 909 <<< Contents 45 * Index >>> Superiority Trials with Fixed Follow-Up Method and enter the relevant 12-month event-free percentages. Change the Input Method to Hazard Rates. You will see the information you entered converted as shown below. Note that you may need to change the decimal display options for hazard rates using the decimal places. icon to see these numbers with more Another parameter to be decided is the Variance which specifies whether the calculation of the required number of events is to be based on the variance estimate of log hazard ratio under the null hypothesis or the alternative hypothesis. The default choice in East is Null. Most textbooks recommend this choice as well (see, for example Collett, 1994, equation (2.21) specialized to no ties). It will usually not be necessary to change this default. For a technical discussion of this issue refer to 910 45.2 Single-Look Design – 45.2.1 Initial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Appendix B, Section B.5.3 The second tab, labeled Accrual/Dropouts is used to enter the patient accrual rate and, for fixed follow-up designs, the duration of patient follow-up and the dropout information. In this example the clinical endpoint is progression-free survival for 12 months. Patients who are still on study at month 12 and who have not experienced the endpoint will be treated as censored. Therefore, in the first panel out of two, we select the entry from the dropdown that indicates that subjects are followed For Fixed Period and enter the number 12 in the corresponding edit box. Suppose that the anticipated rate of enrollment is 30 patients per month. This number is also entered into the dialog box as shown below. Let the committed accrual of subjects be same as 2474. The second panel, labeled Piecewise Constant Dropout Rates, is used to enter the rate at which we expect patients to drop out of the study. For the present we will assume that there are no drop-outs. 45.2 Single-Look Design – 45.2.1 Initial Design 911 <<< Contents 45 * Index >>> Superiority Trials with Fixed Follow-Up An initial design, titled Des1, is created in the Output Preview pane upon clicking the Compute button. Click on icon to save the design in a workbook or on icon to see the output summary of this design. East reveals that 268 events are required in order to obtain 90% power. If each patient can only be followed for a maximum of 12 months, we must commit to enrolling a total of 2474 patients over a period of 82.5 months. With this commitment we expect to see the required 268 events within 12 months of the last patient being enrolled. So the total study duration is expected to be 82.5 + 12 = 94.5 months. To see how the 912 45.2 Single-Look Design – 45.2.1 Initial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 events are expected to arrive over time, invoke a plot of Sample Size/ Events vs. Time by clicking the Plots icon from the toolbar. Uncheck the Sample Size box, to see the events graphs on a larger scale as shown below. 45.3 Shortening the Study Duration 45.3.1 Increasing the Sample Size 45.3.2 Patient Follow-Up 45.3.3 Increasing the Rate of Enrollment Under Des1 the trial will last for 94.5 months, with 82.5 months of patient enrollment (i.e., a sample size of 2474 subjects). This is not considered to be satisfactory to the trial sponsor. There are three possible ways in which the study duration might be shortened; by increasing the sample size, by increasing the duration of patient follow-up, or by increasing the rate of patient enrollment. 45.3.1 Increasing the